Reliability of ADDIS for diagnoses of substance use disorders according to ICD-10, DSM-IV and DSM-5: test-retest and inter-item consistency

Background This study investigates test-retest and inter-item consistency of Alcohol Drog Diagnos InStrument (ADDIS), a structured interview to diagnose substance use disorders according to ICD-10, DSM-IV and DSM-5. ADDIS, the Swedish version of SUDDS, is the only instrument in Swedish that produces diagnostic proposals specific to all drug categories, and for all three diagnostic systems. Screening of stressful life events, anxiety, and depression is also included. Methods Thirty patients at addiction treatment facilities were interviewed for diagnostic assessment and re-interviewed after one week. Results ADDIS has excellent internal consistency. There is also very high test-retest correlation on number of fulfilled criteria for all diagnostic systems. Agreement of diagnostic proposals is substantial, mean absolute agreement is excellent, and mean systematic correlation is almost perfect. Conclusion ADDIS is a reliable tool for specific diagnostic assessment of SUDs.

In Sweden, substance use disorders (SUDs) are diagnosed according to ICD-10 [1]. In research, and internationally, they may also be diagnosed according to DSM-IV [2] or the new DSM-5 [3]. As of today (2015), there are three instruments for diagnostic assessment of SUDs with manuals translated into Swedish: SCID-I (Structured Clinical Interview for the DSM-IV [axis I disorders]) [4], MINI (The Mini-International Neuropsychiatric Interview) [5,6] and ADDIS [7]. Of these three, ADDIS is the only one that provides detailed information on all substances as well as the only one that produces diagnostic proposals for all three diagnostic systems. With such ambitions, there should be high quality demands.
ADDIS (Alcohol Drog Diagnos InStrument) is the Swedish version of SUDDS (Substance Use Disorder Diagnostic Schedule) [8], a tool to diagnose SUDs. SUDDS was constructed as an improvement for NIMH-DIS (The National Institute of Mental Health Diagnostic Interview Schedule-Version II), a structured interview for the assessment of psychiatric disorders with high validity and reliability [9]. SUDDS is an event-oriented structured diagnostic interview that yields information for lifetime and current diagnosis (the past 12 months) of alcohol and other drug dependencies and abuse according to DSM. The latest version is SUDDS-5 [10].
ADDIS was translated and introduced to Sweden, with cultural adaptions, in 1987 by Wickström, who was also responsible for revising ADDIS making it compatible with DSM-III-R, DSM-IV and ICD-10. ADDIS consists of 75 questions, of which 47 are specific, behaviourally oriented questions on alcohol and other drugs based on DSM's and ICD's criteria for abuse/harmful use and dependence. Replies on specific questions are transferred to checklists for the diagnostic system used, i.e. ICD-10, DSM-IV or DSM-5, with columns for each drug category, resulting in specified current and lifetime diagnostic proposals. In addition, ADDIS includes smaller sections of screening for stressful experiences and problems concerning depression and anxiety.
Validity and reliability of SUDDS, studied in patients when assessed for inpatient alcohol and drug addiction treatment, was presented by Davis et al. [11]. SUDDS has good agreement with diagnostic assessment of experienced clinicians (Ƙ = .71 -.87). Test-retests show high correlation (R = .81 -.90), indicating high global consistence. When diagnoses based on SUDDS were compared to assessments by clinicians, the false negatives were less than one per cent and none were false positive [12]. Dependence and abuse appear as distinctive categories and reliability is similar in various ethnic groups (Afro-Americans, Hispanics, Native Americans and Caucasians), with internal consistency (α) for dependence varying between .93 and .97, and for abuse between .84 and .90 [13].
The Swedish ADDIS shows good construct validity concerning alcohol in two populations: a clinical population and a DWI population [14]. Principal Component Analysis found the two DSM-IV constructsdependence and abuseto be homogeneous with all items having factor loadings above .40 and acceptable explained variances. Separate analyses for the two populations and for women provided similar results. Discriminant validity was assessed by means of Discriminant Analysis. ADDIS could correctly classify 93.8 per cent of the two samples. Cronbach's alpha is either satisfying or excellent in all analysesboth on criterion level and on item level, and for the different groups (clinical, DWI, women).
Sensitivity and specificity of ADDIS concerning alcohol and drug use disorders in comparison to the SCID and in comparison with a LEAD golden standard (Longitudinal, Expert, All Data) was studied by Gerdner et al. [15]. There is satisfactory agreement between ADDIS and SCID, although ADDIS is more sensitive than SCID. ADDIS demonstrates substantial to perfect agreement with the LEAD golden standard concerning alcohol as well as drugs, both lifetime and last year, and shows excellent to perfect overall sensitivity and specificity. Severity ratings (number of criteria) are almost perfectly in agreement with the LEAD standard.
Although internal consistency of ADDIS was presented concerning alcohol and DSM-IV, there is no previous study on the reliability of ADDIS concerning all drug categories and all three diagnostic systems. This present study is designed to measure the reliability of ADDIS, both as global consistency and as internal consistency, concerning the various drug categories. Internal consistency, or inter-item consistency, concerns how well various items of a test correlate with each other, while global consistency concerns how well the total test agrees with itself, if used at different occasions under same or similar circumstances. Global consistency can be studied either as inter-rater reliability, i.e. how well different raters agree with each other using the same instrument, or as test-retest reliability, i.e. the ability of a test to produce the same measure at different occasions under similar conditions. Test-retest is therefore the repeatability of a test, and is explored in repeating the test on the same individuals. Test-retest is sometimes found to be more critical, showing lower agreement, than interrater reliability in psychiatric assessment [16,17].
Test-retest reliability is desired for constructs that are not expected to change within a short time. Behaviours, e.g. drinking or using drugs, may differ from one day to another. An alcohol dependent person may one day be "on the wagon" and another day, after a relapse, be involved in binge drinking. Diagnoses, however, are supposed to be relatively stable. Instruments used for diagnostic assessment should therefore produce the same diagnostic proposal for the same person when assessed on a new occasion shortly after. These diagnoses should be repeatable within a time-span suitable for the condition assessed. Persons may recover, so that they no longer meet the criteria. SUDs diagnoses according to ICD-10 and DSM-IV are assessed as current diagnosis of dependence, substance abuse or harmful use, and in DSM-5 as substance use disorder if the person has met the criteria during the past 12 months. Test-retest should have a time-span between tests long enough for the test person not to remember previous replies when retested, but not so long that real change is likely to have occurred.
The aim here is to explore the reliability of ADDIS, as test-retest as well as inter-item consistency for the criteria and diagnostic proposals on SUDs according to ICD-10, DSM-IV and DSM-5, specific to various drug categories, and for its three screening tests concerning stress, anxiety and depression.

Methods
The study was done as a quality assurance study in cooperation with alcohol and drug treatment facilities, where patients agreed to participate in a test-retest investigation of ADDIS. Trained interviewers among the treatment staff carried out the interviews. In this study, one week is regarded as a time-span during which recovery would not affect diagnoses and long enough for most persons not to remember previous replies. The first and second interviews were separated by approximately a week (mean 7.6 days, s.d. = 1.89). The two interview protocols, attached to each other, were sent to the researchers (i.e. the authors) without names or personal ID:s of the interviewed patients. Thus, patients were totally anonymous to the researchers, and only known to the treatment staff. This procedure was reviewed in the Research Ethics Committee of Mid Sweden University without objections.
Thirty patients (27 men and 3 women, mean age 37 ys [s.d. = 12.0]) agreed to participate. Nineteen were patients in three municipal residential treatment settings, 8 were in ambulatory community care organized by a probation office, and 3 were patients in opioid substitution treatment at a psychiatric clinic. Participation was voluntary, and patients were assured that there would be no negative consequences from the ADDIS assessment for participation in their respective treatment programme.
The participants had 9-16 years of education (median = 12). Twelve were married, 4 divorced and 14 never married. All reported having misused alcohol. I addition, 15 had misused (i.e. taken other than prescribed by a physician) tranquillizers or sleeping pills, and 10 had misused analgesic. Sixteen had misused cannabis, 11 amphetamine, 8 cocaine, 8 inhalants, 7 hallucinogens and 6 heroin.
All were interviewed, for all substances used, in order to examine SUDs criteria met in the three diagnostic systems -ICD-10 (research version), DSM-IV and DSM-5. When tabulated, fulfilled drug criteria and diagnoses are reported according to drug categorisations stipulated in the three diagnostic systems. In all these, analgesics and heroin are collapsed to opioids, while tranquilizers and sleeping pills are categorised as "sedatives, hypnotics and anxiolytics". Inhalants are categorised as "volatile solvents", and amphetamines are categorised as "other stimulants", besides cocaine, while in DSM-5, amphetamines and cocaine are collapsed into "stimulants".

Results
Although analyses were conducted for the assessment of both lifetime and current diagnoses with similar results, only the current will be presented here in order to save space.

Internal consistency on item and criteria level
The basis of ADDIS diagnostic assessment on SUDs are 47 items, of which two were 6-point Likert-scale questions on quantity and frequency of alcohol use (but not other drugs) and 45 are questions on symptoms asked for all substance categories which can be answered as follows: 0 = Not used, 1 = No, 2 = Yes, lifetime, 3 = Yes, last year. Internal consistency using all relevant items is extremely high for each of these categories (alcohol: α = .95, all other drugs: α > .98). Since items are not used to create scales, but organized according to criteria of the diagnostic systems, ICD-10, DSM-IV and DSM-5, the internal consistency on criteria level is more relevant, and shown in Table 1 for the drug categories of each diagnostic system, respectively. The criteria of the diagnostic categories harmful use in ICD-10, and substance abuse in DSM-IV are added to the criteria of substance dependence in the table.
All analyses show satisfactory or excellent internal consistency. For instance, the mean internal consistency across diagnostic systems range from an alpha of .86 for alcohol (lowest) to .96 for sedatives, hypnotics and anxiolytics (highest). The overall mean alpha was .93. Thus, there is strong evidence of inter-item reliability for all types of drugs in all three diagnostic systems.

Test-retest correlations of number of fulfilled criteria
Systematic test-retest correlations on number of criteria met for various drugs according to the diagnostic systems are presented in Table 2. For ICD-10, agreement concerning the dichotomous variable harmful use was investigated with absolute as well as systematic agreement.
Systematic agreement of harmful use in ICD-10 range from moderate to almost perfect (К:s = .46 -.93), with the mean being borderline to almost perfect (.79). Systematic correlations of dependence criteria in ICD-10 range from substantial to perfect (γ:s = .76 -1.00), with the mean being almost perfect (.92). Systematic correlation concerning number of dependence criteria met in DSM-IV were almost perfect or perfect (γ:s = .84 -1.00), and for substance abuse criteria, they range from approaching almost perfect to being perfect (γ:s = .80 -1.00). Systematic correlation on number of fulfilled DSM-5 criteria of substance use disorder range from substantial to almost perfect (γ:s = .78 -.99). Subsequently, systematic correlation on number of fulfilled criteria are very high for all three diagnostic systems. The mean systematic correlation range from .90 to .94 with a summary mean of .92.

Test-retest agreement on diagnoses of SUDs
Based on number of various criteria met, test-retest reliability of diagnostic proposals according to ICD-10 is presented in Table 3 for each drug category.
Here too, systematic agreement varied from fair to almost perfect. Differences, compared to ICD-10, were lower agreement on alcohol (moderate) and cannabis (substantial), while agreement on cocaine and amphetamines increased (almost perfect). Mean systematic agreement approached substantial (Ƙ = .69). Absolute agreements were all satisfactory or excellent (range: .87 -.97; mean .93), and all systematic correlations were almost perfect or indeed perfect (γ range: .89 -1.00, mean .97). Table 5 presents the agreement concerning DSM-5 diagnoses.
As for the other two diagnostic systems, systematic agreement varied from fair to almost perfect. In comparison to DSM-IV, alcohol improved from moderate to substantial; cannabis wasas in ICD-10almost perfect; and stimulants (cocaine as well as amphetamines/other stimulants) were substantially in agreement. Opioids improved to almost perfect. For sedatives/hypnotics/anxiolytics, hallucinogens and volatile solvents findings were stable across diagnostic systems. As with ICD-10 and DSM-IV, mean systematic agreement of DSM-5 diagnoses was substantial (Ƙ = .65). Absolute agreement ranged from substantial to excellent (87 to 93%, mean = 90), and all systematic correlations were almost perfect or perfect (γ range: .88 -1.00, mean .95).

Summing up drug categories
Findings concerning inter-item consistency, and test-retest consistency on criteria as well as diagnostic proposals Table 1 Internal consistency (α) on criteria level of substance use disorders for the drug categories of ICD-10, DSM-IV and DSM-5, respectively (n =30)  a) Cannabis in DSM-IV; b) Cocaine, amphetamines and other stimulants are collapsed to stimulants in DSM-5; c) In ICD-10 this includes amphetamine and amphetamine-like substances (not caffeine), while in DSM-5, stimulants also include cocaine; d) Includes opiates as well as synthetic opioids, e.g. analgesics.
according to ICD-10, DSM-IV and DSM-5 are summed up for each drug category.

Alcohol
Internal consistency is satisfactory (α = .84 to .87). Systematic correlation between test and retest on number of fulfilled criteria range from substantial to almost perfect (γ = .78 to .84). Absolute agreement on diagnostic level is satisfactory to excellent (87 to 93%), while systematic agreement range from moderate to substantial (К = .44 to .72), and systematic correlation is almost perfect or indeed perfect (γ = .88 to 1.00).

Sedatives, hypnotics, anxiolytics
Internal consistency is excellent (α = .95 to .97). Systematic correlation on criteria level between test and retest is almost perfect (γ = .84 to .93). On diagnostic level absolute agreement between test and retest is satisfactory to excellent (90 to 97%), while systematic agreement is almost perfect (К = .81 to .93), and systematic correlation is almost perfect or perfect (γ = .98 to 1.00).

Cannabis/cannabinoids
Internal consistency is excellent (α = .94 to .97). Systematic correlation on criteria level between test and retest is almost perfect (γ = .94 to .98). On diagnostic level absolute agreement between test and retest is excellent (90 to 93%), while systematic agreement is substantial or almost perfect (К = .79 to .87), and systematic correlation is almost perfect or perfect (γ = .97 to 1.00).

Cocaine, amphetamine and other stimulants
Internal consistency is excellent (α = .91 to .94). Systematic correlation on criteria level between test and retest is almost perfect or perfect (γ = .90 to 1.00). On diagnostic level absolute agreement between test and retest is satisfactory to excellent (80 to 97%), while systematic agreement range from moderate to almost perfect (К = .50 to .89), and systematic correlation is almost perfect or perfect (γ = .92 to 1.00).

Opioids
Internal consistency is excellent (α = .94 to .95). Systematic correlation on criteria level between test and retest is substantial to almost perfect (γ = .76 to .97). On diagnostic Table 3 Agreement between test and retest for ICD-10  level absolute agreement between test and retest is excellent (90 to 93%), while systematic agreement is substantial or almost perfect (К = .77 to .85), and systematic correlation is almost perfect or perfect (γ = .97 to 1.00).

Hallucinogens
Internal consistency is excellent (α = .90 to .98). Systematic correlation on criteria level between test and retest is almost perfect or indeed perfect (γ = .88 to 1.00). On diagnostic level absolute agreement between test and retest is excellent (90 to 93%), while systematic agreement is only fair (К = .21 to .31), and the systematic correlation is almost perfect or perfect (γ = .93 to 1.00).

Volatile solvents
Internal consistency is satisfactory to excellent (α = .89 to .98). Systematic correlation on criteria level between test and retest is almost perfect to perfect (γ = .93 to 1.00). On diagnostic level absolute agreement between test and retest is excellent (all 93%), while systematic agreement is moderate (all К = .47), and systematic correlation is almost perfect (all γ = .93).

Reliability of scales concerning stress, anxiety and depression
The scales for screening stressful life events, anxiety and depression were analysed for inter-item as well as for test-retest consistency (n = 27). Internal consistency was satisfactory for stress (α = .72), and excellent for anxiety and depression (α = .90 and .96, respectively). Test-retest showed almost perfect systematic correlation for all scales (stressful life events: γ = .85; anxiety: γ = .96, and depression: γ = .92).

Discussion
The study has some obvious limitations. It was conducted without external funding. A convenience sample was used. The cooperating treatment facilities conducted interviews on their own budgets. The sample is therefore smaller than would have been preferred, but about the same as in similar studies e.g. [20][21][22]. Due to the variety of patients enrolled, and despite the relatively small sample, it was possible to include all types of drugs that might be assessed using ADDIS with more than five cases for each condition, a cut-off used in e.g. Zanarini et al. [16]. These drugs were, as expected, used by different numbers of patients resulting in some skewness in number of cases vs. non-cases. Systematic agreement (Cohen's Kappa) and systematic correlation (Gamma) handles such skewness to some degree. While Kappa tends to be lower than Gamma on skewed categories, Gamma tends to be more conservative in testing statistical significance (p-values). We therefore present both of these. The relatively lower Kappa values for hallucinogens, volatile solvents and alcohol may be related to skewness of these variableswith relatively few who used hallucinogens and volatile solvents and with almost everyone having some problem concerning alcohol. The problem was not however indicated by absolute agreement or systematic correlation.
Taking all drug categories together, ADDIS has an excellent overall mean internal consistency of .93. Testretest correlation on number of fulfilled criteria is very high for all three diagnostic systems. Their mean systematic correlations on criteria level for various drugs and diagnostic systems are almost perfect (.92 -.94). At the level of diagnostic proposals across all three diagnostic systems, mean systematic agreement is substantial (Ƙ = .65 -.69), while mean absolute agreement is excellent (90 -93%), and mean systematic correlation is almost perfect (γ = .95-.97).
In addition, screening tests on stressful life events, anxiety and depression showed satisfactory to excellent internal consistency (α = .72, .90 and .96 respectively),  and almost perfect global consistency for all scales (γ = .84 -.96). Despite the small sample, the reliability findings are sufficiently striking so as to indicate that the ADDIS consistently provides substance specific diagnostic documentation. ADDIS is the only currently used instrument in Swedish, which is capable of providing substance specific diagnostic information in a relatively brief interview to be practical in routine clinical practice. Further research with a larger and more diverse sample is indicated to extend the findings of the current study.