Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

McKenna, Hannah; Treanor, Charlene; O’Reilly, Dermot; Donnelly, Michael

doi:10.1186/s13011-018-0143-8

Substance Abuse Treatment, Prevention, and Policy

Table 2 Summary of characteristics and psychometric properties for included studies

From: Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Author (country)	Study Population	Methods used	Studies and measures	Psychometric properties reported by studies	COSMIN quality ratings
Bonevski et al. (2010) Australia	Group 1 was 30% male and 70% female, Group 2 37% male and 63% female, Group 3 44% male and 56% female and Group 4 41% male and 59% female. Group 1 mean age 25 years. Group 2 mean age 27 years. Group 3 mean age 25 years. Group 4 mean age 25 years.	Participants were asked to recall alcohol intake using either a computer or paper administered measure. 4–7 days later both modes of measures were administered again.	Weekly quantity-frequency measure.	Test-retest reliability-kappa coefficient range (0.90–0.96). Test-retest reliability was good.	Test-retest reliability (poor)
Chaikelson et al. (1994) Canada	Random sampling was used. The sample was 100% male with mean age 69 years. Wives were also asked same questions via written questionnaire to assess concordance.	Results compared to alcohol test the MAST (Michigan Alcoholism Screening Test [55]) for reliability and validity.	Short-term recall measure (drinking occasions in the previous month recall).	Test-retest reliability- kappa coefficients (0.76) total lifetime drinking, (0.84) last reported month and (0.77) monthly alcohol consumption indicating good test-retest reliability. Concurrent validity- correlations between self-reports (0.87) husband alcohol intake and (0.85) wife alcohol intake indicating good criterion validity. Construct validity- correlations with the MAST self-report test in 1987(0.60) with total lifetime drinking (0.05) with current drinking. Correlations with 1990 data (0.53) with total lifetime drinking (− 0.14) with current drinking. Construct validity shows moderate reported correlation.	Test-retest reliability (fair) Criterion validity (poor) Construct validity (poor)
Crum et al. (2002) USA	Random sampling was used. The sample was 58% female and 42% male with mean age 76.2 years. Data was obtained from the 1993–1994 follow-up of the Washington County cohort of men and women 65 years and older.	Participants completed a measure of their usual alcohol consumption in two ways: (1) a quantity-frequency measure; (2) same questions asked in an interview about drinking habits.	Weekly quantity-frequency measure. Short-term recall measure (past week recall).	Hypothesis validity-past week recall of alcohol intake 15–20% lower than the quantity-frequency measure. Hypothesis validity was good. Inter-rater reliability-kappa statistic value 0.76 indicating good inter-rater reliability.	Hypothesis validity (good) Inter-rater reliability (poor)
Cutler et al. (1988) UK	Random sampling was used. 63.4% of the sample were male and 36.6% female. No median or mean age was reported but participants were aged 18 and older.	CAGE responses and the quantity-frequency questions taken from Health Survey Questionnaire were compared.	Weekly quantity-frequency measure.	Criterion validity-sensitivity (42.9) specificity (97.1) positive predictive value (65.8) negative predictive value (92.8) for males and sensitivity (46.6) specificity (98.6) positive predictive value (50.3) negative predictive value (98.4) for females indicating good criterion validity.	Criterion validity (excellent)
Dollinger et al. (2009) USA	The sample was composed of volunteers and was 61% female and 39% male with a mean age 22 years.	Responses to quantity-frequency measures at both time points compared. Nightly log of alcohol consumption compared to hours spent studying, socialising and religious behaviours.	Daily graduated-frequency measure. Short-term recall measure (daily alcohol intake recall).	Test-retest reliability-alcohol quantity coefficient of 0.85 and an alcohol frequency coefficient of 0.84 indicating good test-retest reliability. Divergent validity-religion-by-alcohol correlations were negative with values from −0.14 to −0.37. Convergent validity-positive correlations with alcohol with values of 0.40 and 0.41 respectively. Good divergent and convergent validity were reported.	Test-retest reliability (fair) Divergent validity (fair) Convergentvalidity (fair)
Greenfield et al. (2014) USA	Random sampling was used. Respondents were 48.1% male and 53.2% female and aged over 18 years.	Participants completed questionnaires and a follow-up survey by phone or mail.	Short-term recall measure (occasions of ≥5 drinks during specific life decades).	Test-retest reliability-kappa values for gender (0.64–0.80), age groups (0.59–0.83), ethnicity (0.70–0.73), interview mode (0.72–0.73) and childhood victimisation (0.75) (0.73) indicating moderate to good test-retest reliability. Predictive validity-disclosure of prior heavy drinking increased risk for alcohol dependence by 18%, increased risk of consequences by 21% (by 15% when age of onset was controlled), increased risk for alcohol-use disorder by 18% indicating good predictive validity.	Test-retest reliability (fair) Predictive validity (fair)
Gruenewald et al. (1995) USA	Random sampling was used. Respondents were 43.5% male and 56.5% female and aged 18 years or older.	Responses to graduated-frequency measures at two time points compared.	Gruenewald et al. (1995) Monthly graduated-frequency measure	Test-retest reliability-coefficients for average drinking quantity r = 0.76 and for variance in drinking quantities r = 0.78, indicating good test-retest reliability.	Test-retest reliability (fair)
Hansell et al. (2008) Australia	Random sampling was used. Respondents were 40% male and 60% female and aged between 19 and 90 years old.	The measures examined were a dependence score, based on DSM-IIIR (Diagnostic and Statistical Manual of Mental Disorders [56]) and DSM-IV criteria for substance dependence, and a quantity × frequency of alcohol consumed taken from the quantity-frequency measure.	Annual quantity-frequency measure	Test-retest reliability-continuous data quantity x frequency of alcohol (0.61) between phase 1 and phase 3, and (0.55) between phase 2 and phase 3. Categorical data quantity x frequency of alcohol (0.64) between phase 1 and phase 3, and (0.59) between phase 2 and phase 3, indicating moderate test-retest reliability.	Test-retest reliability (poor)
Hilton (1989) USA	Volunteer sample. Respondents were 50% male and 50% female and had a mean age of 30 years. The volunteer participants were recruited from the San Francisco Bay Area newspaper.	Participants completed 2 retrospective recall measures-graduated-frequency and beverage-specific quantity-frequency measures post diary completion. Responses compared.	Short-term recall measure (10 week recall). Graduated-frequency measure (30 day recall). Beverage specific Quantity-frequency measure (2 week recall).	Convergentvalidity-correlations 0.88 for volume of drinks consumed, 0.85 for days of beer consumed, 0.89 for days of beer usually consumed, 0.80 for days of wine consumed, 0.66 for days of wine usually consumed, 0.81 for days of liquor consumed and 0.65 for days of liquor usually consumed, indicating moderate to good convergent validity.	Convergent validity (fair)
Koppes et al. (2002) Netherlands	Random sampling was used. Respondents were 46% male and 54% female with mean age 36 years. Data was collected from 1 time point, the 2000 follow-up measurement of 171 male and 197 female participants from the Amsterdam Growth and Health Longitudinal Study.	Subjects visited study premises for 1 day. The quantity-frequency measure and dietary history interview were based on alcohol consumption over the previous month and were completed in no particular order.	Quantity-frequency measure (ranging from never drinking to daily alcohol intake). Short-term recall measure (dietary history interview).	Concurrent validity-correlation between (0.77) for men and (0.87) for women, which indicates good concurrent validity.	Criterion validity (poor)
LaBrie et al. (2004) USA	The sample was composed of volunteers and was 100% male with a mean age of 20.6 years. 211 male college students participated.	Drinking variables assessed were drinking days, average drinks, and total drinks during a 30-day period.	Short-term recall measure (monthly TimeLine follow back method).	Convergentvalidity-correlation coefficients between 0.52–0.69 showing moderate convergent validity.	Convergent validity (fair)
Lennox et al. (1996) USA	Analysis was conducted of a sample of a household survey aged 18–64 years. Gender proportions were not reported. Responses were analysed from 1 time point (the 1991 follow-up) from 8755 participants in the 1988 National Household Survey of Drug Abuse.	Used a latent variable approach. In this model covariation among multiple indicators was used as an estimate of the latent construct.	Quantity-frequency measure of alcohol consumption over past 30 days.	Structural validity-correlations at 0.36, alcohol abuse and consequences between constructs correlates at 0.28 showing poor structural validity.	Structural validity (fair)
McGinley et al. (2014) USA	A sample of 18–20 year olds were selected from respondents to the National Survey on Drug Use and Health. Gender proportions were not reported.	Quantity and frequency of alcohol consumption estimates derived from graduated-frequency measure. Estimates compared to the quantity-frequency measure.	Graduated-frequency measure of alcohol consumption over past 30 days.	Construct validity-mid values for quantity of alcohol consumed were (3.5) and (14.5) for frequency indicating poor construct validity.	Construct validity (fair)
Northcote and Livingston (2011) Australia	Respondents were 47.3% male and 53.3% female and aged 18–25 years.	Participants reported number of alcoholic drinks consumed 1–2 days after drinking occasion which was compared to reported alcohol intake observed by peer-based researchers on the occasion.	Short-term recall measure (last occasion self-report of drinks consumed).	Criterion validity-significant associations with p values of 0.6, 0.31, 0.04 and < 0.01 for: up to 4 drinks, 5–8 drinks, 9–12 drinks and more than 12 drinks respectively indicating good criterion validity for respondents consuming ≥9 drinks. . Convergent validity- significant at 0.74, with gender specific correlations formen as 0.79 and women 0.60. Moderate to good convergent validity was reported.	Criterion validity (poor)
O’Hare et al. (1991) USA	Respondents were 41.6% female 58.4% male and with mean age 20.6 years.	Participants were asked to complete mailed questionnaire with both measures of alcohol consumption included.	Weekly graduated-frequency measure. Short-term recall measure (retrospective recall of past 7 day alcohol intake).	Convergent validity-correlations were significant at 0.74, with gender specific correlations for men as 0.79 and women 0.60, indicating moderate to good convergent validity.	Convergent validity (good)
O’Hare et al. (1997) USA	Random sample of an undergraduate university population. Gender proportions were reported as ‘representative of sex’. Respondents had a mean age of 18.7 years.	All students completed quantity-frequency questions, MmMAST and 7 day recall. The MmMAST was used as a criterion variable.	Weekly graduated-frequency measure. Short-term recall measure (retrospective recall of past 7 day alcohol intake).	Criterion validity-association was significant at p < 0.01 indicating good criterion validity. Predictive validity-sensitivity and specificity values were 76 and 59.8 for the recall measure. Using MAST cut off score ≥ 2 sensitivity and specificity values were 59.7 and 70.9 indicating moderate to good predictive validity.	Criterion validity (fair) Predictive validity (fair)
Parker et al. (1996) USA	Random sampling was used. Respondents were 39% male and61% female and aged 18–64. Data was taken from surveys 1987–1989, 1989–1990 and 1992–1993 of the Pawtucket Health Program conducted among home dwelling adults.	Alcohol intake assessed with food frequency question as a component of the general health survey was compared against alcohol intake assessed with a graduated-frequency measure as part of a survey.	Short-term recall measure (beverage specific past 24 h recall). Annual graduated-frequency measure	Concurrent validity-kappa statistics reported between measures ranged from 0.08 (p < 0.001), 0.38 (p < 0.001) and 0.81 (p < 0.001), indicating good concurrent validity for high consumers of alcohol only. Inter-rater reliability Kappa values for both measures were (0.28–0.47). Inter-rater reliability was poor (below 0.70).	Criterion validity (poor) Inter-rater Reliability (fair)
Poikolainen et al. (2002) Finland	Volunteer sample recruited from their workplace. Respondents were 83% female and 17% male with a mean age of 42 years.	Quantity-frequency and graduated-frequency obtained before and after 1-month daily recall on alcohol intake. Blood sample obtained at outset.	Annual quantity-frequency questionnaire. Daily graduated-frequency measure. Short-term recall measure (past month recall of intake).	Convergent validity-coefficients were 0.95 between the short-term recall measure and quantity-frequency 1, 0.95 between the short-term recall measure and quantity-frequency 2, 0.90 between the short-term recall measure and graduated-frequency 1 and 0.93 between the short-term recall measure andgraduated-frequency 2. Convergent validity was reported as good.	Convergent validity (good)
Read et al. (2006) USA	College students who reported drinking different amounts of alcohol were selected for the sample to be representative of variation in drinking levels. Respondents were 52% female and 48% male with a mean age 19 years.	College students completed self-report questionnaire on demographic characteristics, drinking behaviours and drinking consequences. Drinking consequences assessed with composite measure based on Drinker Inventory of Consequences and Young Adult Alcohol Problem Screening Test developed by researchers.	Short-term recall measure (past 90 day intake).	Concurrent validity-correlation values of 0.36, p < 0.001 and with quantities of alcohol consumed with anr value of 0.31, p < 0.001, indicating poor concurrent validity.	Criterion validity (excellent)
Rehm et al. (1999) Canada	The sample was chosen to be representative of the wider drinking population. Respondents were 48% male and 52% female, and chosen to be representative of age ≥ 18 years.	Population samples from 4 surveys conducted for Alcohol Research Group. Surveys used computer-assisted telephone interviews with random digit dialling sampling techniques.	Quantity-frequency measure for drinking occasion. Annual Graduated-frequency measure. Short-term recall measure (past week recall.	Convergent validity-correlations moderate at both approximately 0.40. Predictive validity-estimates by graduated-frequency measure 22% higher than short-term recall estimate. Quantity-frequency estimate of alcohol-related mortality 13% than short-term recall estimate, indicating poor predictive validity.	Convergent validity (fair) Predictive validity (excellent)
Reid et al. (2003) USA	Random sampling was used. The veteran primary care sample was 3% female 97% male and the community dwelling sample was 60% female 40% male. Mean ages were 73.1 for the veteran primary care sample and 75.9 for the community dwelling sample.	Telephone call allowed self-report of quantity-frequency measure, binge and heavy drinking questions, and the AUDIT (Alcohol Use Disorders Identification Test [44]) and CAGE (Cut down, Annoyed, Guilty, Eye-opener [45]) tests.	Weekly quantity-frequency measure.	Inter-rater reliability-kappa values were 0.44 and 0.33. For population sample 2 kappa values were 0.21 and 0.46 indicating moderate to poor inter-rater reliability.	Inter-rater Reliability (fair)
Russell et al. (1991) USA	Random sampling was used. Respondents were 50.5% male and 49.5% female and aged over 18 years. Data was taken from 1 time point of the survey.	Quantity-frequency questions were asked about the amount and frequency of particular alcoholic beverages consumed via telephone interview using a random-digit-dial technique and supplemented by samples of homeless people, college students and those without telephones.	Typical annual beverage-specific Quantity-frequency measure	Criterion validity-correlations between 0.73 and 0.77 for subtypes of alcohol reported showing good criterion validity.	Criterion validity (poor)
Sander et al. (1997) USA	175 patients with traumatic brain injury were recruited from a medical rehabilitation centre along with their relatives. Respondents were 65% male and 35% female. Mean age 39.2 years for patients and 45.9 years for relatives.	Alcohol use examined 1 year after injury through quantity-frequency measure and brief MAST test. Patients and their relatives both completed measures and concordance between reports were examined.	Annual quantity-frequency measure	Concurrent validity-concordance showed 95.4% agreement indicating good criterion validity.	Criterion validity (fair)
Searles et al. (1995) USA	The sample was chosen to be representative of male drinking population in Vermont enrolled in the Alcohol Research Centre. Respondents had a median age of 28 years(ranging from 21 to 56 years) and were 100% male.	Subjects self-reported daily alcohol intake via telephone.At 90days subjects completed an interview using DSM criteria to assess alcohol abuse ordependence.	Short-term recall measure (Daily self-report of alcohol intake). Short-term recall measure (annual retrospective recall).	Predictive validity-correlations0.86 andwith alcohol related problems level as 0.69. Predictive validity is moderate between daily self-report and retrospective recall and alcohol related problems, and good between daily self-report and retrospective recall and alcohol intoxication level.	Predictive validity (poor)
Searles et al. (2000) USA	Volunteer sample of those enrolled in the Vermont Alcohol Research Centre. Respondents were 100% male and had a mean age of 36.2 years for those without alcohol problems tested at outset and 30.4 years for those with alcohol problems.	Participants recorded alcohol intake on interactive voice response system using telephones. In person interviews were conducted every 13 weeks during which they completed timeline follow back. Results were compared.	Short-term recall measure (Timeline Follow back over 366 days). Short-term recall measure (Daily self-report of alcohol intake).	Convergent validity-correlations 0.60 at 180 days of administration, 0.57 at 270 days of administration and 0.57 at 366 days of administration, indicating moderate convergent validity.	Convergent validity (fair)
Tuunanen et al. (2013) Finland	The sample included 45 year olds resident in Finnish city of Tampere. The sample was 100% male.	Participants completed a mailed health questionnaire which invited previous week recall of alcohol intake, a quantity-frequency measure and structured quantity-frequency questions based on the AUDIT.	Quantity-frequency measure (typical drinks consumed per occasion). Short-term recall measure (past week recall).	Hypothesis validity-the past week recall measure reported mean alcohol consumption lower than the quantity-frequency measure indicating good hypothesis validity.	Hypothesis validity (fair)
Weingardt et al. (1998) USA	Random sampling was used. Respondents were 58% female and 42% male and aged 18–20 years.Data was taken from 1990 and 1994 cohorts of college undergraduate students.	Peak consumption, typical weekend quantity and typical daily quantity measures used to derive binge drinking data to analyse validity. Binge drinking defined as 5–6 drinks per occasion for men and 3–4 drinks per occasion for women.	Graduated-frequency measure (peak monthly alcohol consumption). Graduated-frequency measure (typical weekend quantity). Short-term recall measure (typical daily quantity).	Concurrent validity-r value 0.57 and Alcohol Dependence Scale with r value 0.54. Predictive validity-daily quantity measure classified 6.2% of drinkers as chronic and 7.4% indicating poor predictive validity.	Criterion validity (good) Predictive validity (good)
Whitfield et al. (2004) Australia	Voluntary sample. Respondents were 36% male and 64% female with a mean age of 33.7 years. Data was taken from 3 waves (1980, 1989 and 1993) using adult male and female participants of the AustralianTwin Registry.	Test-retest reliability was calculated as correlations between occasions and between measures. Relationships between alcohol use and lifetime DSMIIIR alcohol dependence examined.	Annual quantity-frequency measure. Short-term recall measure (past week recall of alcohol intake).	Test-retest reliability-correlations between (0.54–0.70) indicating moderate to good test-retest reliability.	Test-retest reliability (fair)

Table Legend: Table summarising the characteristics, findings and COSMIN quality ratings of included studies grouped by study author, study population, methods used, studies and measures, psychometric properties reported by study authors and COSMIN quality ratings

Back to article page

ISSN: 1747-597X

Contact us

General enquiries: journalsubmissions@springernature.com