Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review
Substance Abuse Treatment, Prevention, and Policy volume 13, Article number: 6 (2018)
To review studies about the reliability and validity of self-reported alcohol consumption measures among adults, an area which needs updating to reflect current research.
Databases (PUBMED (1966-present), MEDLINE (1946-present), EMBASE (1947-present), Cumulative Index of Nursing and Allied Health Literature (CINAHL) (1937-present), PsycINFO (1887-present) and Social Science Citation Index (1976-present)) were searched systematically for studies from inception to 11th August 2017. Pairs of independent reviewers screened study titles, abstracts and full texts with high agreement and a third author resolved disagreements. A comprehensive quality assessment was conducted of the reported psychometric properties of measures of alcohol consumption using the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) to derive ratings of poor, fair, good or excellent for each checklist item relating to each psychometric property.
Twenty-eight studies met inclusion criteria and, collectively, they investigated twenty-one short-term recall measures, fourteen quantity-frequency measures and eleven graduated-frequency measures. All measures demonstrated adequate/good test-retest reliability and convergent validity. Quantity-frequency measures demonstrated adequate/good criterion validity; graduated-frequency and short-term recall measures demonstrated adequate/good divergent validity. Quantity-frequency measures and short-term recall measures demonstrated adequate/good hypothesis validity; short-term recall measures demonstrated adequate construct validity. Methodological quality varied within and between studies.
It was difficult to discern conclusively which measure was the most reliable and valid given that no study assessed all psychometric properties and the included studies varied in the psychometric properties that they selected to assess. However, when the results from the range of studies were considered and summed, they tended to indicate that the quantity-frequency measure compared to the other two measures performed best in psychometric terms and, therefore, it is likely to produce the most reliable and valid assessment of alcohol consumption in population surveys.
Alcohol use and associated consequences are a major public health problem, described as the third leading risk factor for poor health globally . Recently, new revised guidelines from UK (United Kingdom) Chief Medical Officers advised adults about the likely harmful health effects of drinking more than 14 units/week , which is approximately six 175 ml glasses of (13%) wine, six 568 ml pints of (4%) lager or ale or (4.5%) cider or fourteen 25 ml measures of (40%) spirts (1 unit is 10 ml or 8 g of pure alcohol) in the UK . The Global Burden of Disease Survey identified alcohol as a top five risk factor for non-communicable disease in the UK . It is important that reliable and valid measures are used to monitor and assess alcohol misuse and related problems and, in turn, to inform public health strategies.
Our initial scoping exercise indicated that data about alcohol intake tends to be collected in surveys using one or more of the following three types of self-report questionnaires: Quantity-frequency measures ask questions about ‘usual’ alcohol drinking to estimate the frequency (e.g. number of days per week) and volume of alcohol consumed (e.g. ‘how many (cans/bottles/ glasses) were consumed on a typical drinking day’ [5,6,7]). Graduated-frequency questionnaires measure the volume of consumed alcohol by grouping the number of drinks per occasion into graduated categories, beginning typically with the highest amount consumed by a respondent and decreasing in pre-set categories (e.g. ‘During the last 12-months, how often did you have 12 or more drinks of any kind of alcoholic beverage in a single day?’ ‘During the last 12 months, how often did you have at least 8 but less than 12 drinks of any kind of alcoholic beverage in a single day?’ [8, 9]). Short-term recall measures ask respondents to recall the alcohol that they consumed within a predetermined timeframe such as during the previous week or the last 24-h (e.g. the ‘Yesterday’ method) or using a diary to record all alcohol consumption over a period of time [10, 11].
There is a need to ensure that survey instruments discern accurately alcohol consumption in order to identify the population of drinkers who consume over 14 units of alcohol per week , or misuse alcohol. In this review alcohol misuse is defined as ‘drinking excessively – more than the lower-risk limits of alcohol consumption’ . Gmel  conducted a literature review of self-report measures (the quantity-frequency, graduated-frequency and short-term recall measures) compared to biological tests (i.e. blood alcohol concentration) using studies published in this field since 2004; and Feunekes  conducted a systematic review of studies published 1984–1999 on the capacity of the quantity frequency, extended quantity frequency, retrospective diary, prospective diary, and 24-h recall measures, respectively, to classify individuals according to their alcohol intake. These previous reviews are outdated and not in keeping with advances in survey methodology and design concerning alcohol research or with public health guideline changes (such as the reduction in alcohol guidelines in the UK ). This paper presents the results of a systematic review of all relevant research evidence regarding the reliability and validity of different types of survey measures of self-reported alcohol consumption in the adult population. Reliability and validity in this review are defined by the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) methodology . COSMIN provided an iterative way of assessing the psychometric properties of included measures. The review adds to previous research by providing the first COSMIN-type review of alcohol intake measures as well as providing an updated review of the alcohol consumption measures. This review addressed the following questions:
Are self-reporting measures (the quantity-frequency, graduated-frequency and short term recall measures) reliable and valid in their assessment of alcohol consumption for the general population? If so, which of the self-reporting measures are most reliable and valid? Which measure most accurately identifies levels of alcohol consumption? The use of a reliable and valid measure in alcohol survey research will enhance the rigour and comparability of studies.
The review was reported in accordance with PRISMA guidelines (see checklist attached as Additional file 1) . No protocol exists for this review. Study authors searched PUBMED (1966-present), MEDLINE (1946-present), EMBASE (1947-present), CINAHL (1937-present), PsycINFO (1887-present) and SSCI (1976-present) from their inception to 11th August 2017 for peer-reviewed articles. Search terms were based on a COSMIN search filter to identify studies of psychometric properties, combined with terms relevant to alcohol intake measures (Fig. 1).
Papers were included if they were English language peer-reviewed studies that evaluated the reliability or validity of survey measures of alcohol consumption that were ‘self-completed’ by adults aged ≥18 years via telephone, paper, computer or interview. Studies were included if they assessed the reliability or validity of self-report alcohol consumption measures (the quantity-frequency, graduated-frequency or short term recall measures or any variation of these measures). Studies were excluded if they did not focus on reliability or validity, were reviews of the literature or study participants had a mental or alcohol disorder diagnosis, were in receipt of treatment for alcohol misuse or were being cared for in a care institution. The review focused upon evaluating the psychometric properties of alcohol consumption measurement for the general drinking population; previous research indicates that people with an alcohol use disorder diagnosis tend to self-report differently from other drinkers (see discussion ). Studies were excluded also if they measured self-reported alcohol consumption using other methods only (biological testing or self-reporting alcohol tests).
Titles were exported to Refworks, duplicates were removed and titles and then suitable abstracts were screened and examined by HMcK, CT and MD independently. Cases of disagreement over study inclusion were resolved via review and discussion. Data collection from eligible studies involved extracting information about population characteristics, measures, results and COSMIN quality ratings onto an Excel spreadsheet (see Table 2). This was completed by HMcK and checked by other reviewers. Reference lists of literature reviews and citation lists of included studies were searched for relevant papers. The search strategy identified 806 studies after duplicate removal, 478 remained following examination of abstracts and 28 papers were included following full-text review (Fig. 2).
Pairs of independent reviewers applied the well-validated COSMIN checklist to assess the methodological quality of included studies. Definitions of the psychometric properties are provided by COSMIN (see Table 1). Information (e.g. coefficients) on psychometric properties reported on each measure by included studies were assessed using the quality criteria COSMIN checklist created by Terwee  which generated ratings of good, moderate or poor. An additional methodological quality score was calculated for each psychometric property checklist using the ‘worst score counts’ method, where the lowest rating of any of the items in an individual psychometric property checklist is taken as the overall score for that property . Risk of bias (where evidence reported by studies may not be trustworthy ) was accounted for by assessing methodological quality of studies. It is important to note that the review reported the properties that were recorded in the original articles and that most articles did not assess or report the full range of properties recommended by COSMIN.
Table 2 presents the characteristics and results from the 28 papers that met inclusion criteria. It acts as a summary of the content from Additional file 2: Tables S1 and S2 which are included as Additional files 2 and 3. Included studies reported drinks/alcohol measures in standard sizes for the country of publication (see Additional file 2: Table S1). Some studies included beverage specific measures. Studies were conducted in the USA (n = 18), Australia (n = 4), Canada (n = 2), Finland (n = 2), UK (n = 1) and the Netherlands (n = 1). Most studies included short-term recall measures (n = 21), quantity-frequency measures (n = 14) and graduated-frequency measures (n = 11). Convergent validity (n = 15), criterion validity (n = 14), test-retest reliability (n = 10), predictive validity (n = 9), inter-rater reliability (n = 5), hypothesis validity (n = 4), construct validity (n = 2), divergent validity (n = 2), and structural validity (n = 1) were assessed across the studies. Some studies assessed the psychometric properties of more than one measure and measure type but not one study assessed all COSMIN psychometric properties.
Methodological quality assessment
There was wide variation in methodological quality ratings for each psychometric property (as presented and discussed below).
Quantity-frequency measures achieved criterion validity ratings of excellent (n = 1), fair (n = 1) and poor (n = 2). Test-retest reliability quality ratings were good (n = 1), fair (n = 1) and poor (n = 2), with inter-rater reliability rated fair (n = 1) and poor (n = 1). Convergent validity ratings were good (n = 1) and fair (n = 2). Hypothesis validity was rated good (n = 1) and fair (n = 1). Predictive validity was rated excellent (n = 1) and structural validity fair (n = 1).
The graduated-frequency measures achieved convergent validity ratings of good (n = 2) and fair (n = 3). Test-retest reliability ratings were rated fair (n = 2) and good (n = 1) and inter-rater reliability was also rated fair (n = 1). Criterion validity was rated good (n = 1), fair (n = 1) and poor (n = 1). Predictive validity was rated excellent (n = 1), good (n = 1) and fair (n = 1). Divergent validity was rated fair (n = 1). Construct validity was rated fair (n = 1).
The criterion validity ratings for the short-term recall measures were excellent (n = 1), good (n = 1), fair (n = 1) and poor (n = 4). Convergent validity was rated good (n = 2) and fair (n = 5). Predictive validity was rated excellent (n = 1), good (n = 1), fair (n = 2) and poor (n = 1). Test-retest reliability scores were rated fair (n = 3), with inter-rater reliability also rated fair (n = 1). Hypothesis validity was rated good (n = 1) and fair (n = 1). Divergent validity was rated fair (n = 1) and construct validity was rated poor (n = 1).
Quantity-frequency and graduated-frequency measures completed by a Finnish population sample  and a computer and paper administered quantity-frequency measure demonstrated good test-retest reliabilities . Moderate test-retest reliabilities were reported for a quantity-frequency measure administered to a general population sample  and for quantity-frequency and short-term recall measures in an Australian general sample of twins . Good test-retest reliability was reported in an undergraduate student population sample for a graduated-frequency measure  and in a general population . Test-retest reliability of a daily intake short-term recall measure was good for an older adult sample . Moderate test-retest reliability was reported for a short-term recall measure of ≥5 drinks consumed per drinking occasion . In an older population sample, inter-rater reliability was good for quantity-frequency and short-term recall measures  though poor inter-rater reliability was reported in a study administering a weekly quantity-frequency measure to over 65-year olds  and for the graduated-frequency and short-term recall measures in a general population  (for detailed results see Table 2).
Studies of quantity-frequency measures administered to the general population sample [28,29,30] and a quantity-frequency and short-term recall measure  demonstrated good criterion validity. An annual graduated-frequency measure and previous 24 h short-term recall measure administered in a general population sample indicated good criterion validity for ‘heavy drinkers’. Poor validity was reported for moderate drinkers in this study (due perhaps to the fact that consumers of lower levels of alcohol may drink irregularly and not within the 24-h before administration of the short-term recall measure) . An undergraduate student sample completed two graduated-frequency measures and a short-term recall measure with moderate criterion validity . Short-term recall spousal reports that were used as a criterion or standard to validate alcohol intake in an older sample reported good criterion validity . A short-term recall measure administered to an undergraduate student sample had poor criterion validity  though other studies of the short-term recall measure  and the short-term recall and graduated-frequency measures  reported good criterion validity (see Table 2).
Poor construct validity was found for 30-day graduated-frequency measure completed in an undergraduate sample (age range 18–20 years) . A short-term recall measure compared with the MAST measure on two separate occasions in a sample of older adults reported poor to moderate construct validity  (see Table 2).
Good hypothesis validity was reported for a quantity-frequency measure compared to a short-term recall measure in an older adult population sample  and for a quantity-frequency measure compared to a short-term measure in a general population sample  (see Table 2).
One study of a graduated-frequency and short-term recall measure that was completed by an undergraduate student sample demonstrated adequate to good predictive validity  whilst another (albeit small sample size) study of the same measures in an undergraduate student sample (age range 18–20 years) recorded poor predictive validity . A general population study found poor predictive validity for the three measures  though measured against unstandardized indicators of alcohol-related mortality, morbidity and harm. A short-term recall measure achieved good or adequate prediction properties regarding heavy drinking (≥5 drinks per occasion) for samples aged 18–39  and for a general population  (see Table 2).
Moderate to good convergent validity was found in a general population sample for a two-week beverage-specific quantity-frequency measure, a graduated-frequency and short-term recall measure . Similarly, adequate or good convergent validity was recorded for the three types of measures of alcohol intake in a cohort of 20 to 63-year olds  and in a general population . A graduated-frequency and short-term recall measure demonstrated good convergent validity in an undergraduate student samples [8, 10]. A short-term recall measure completed by undergraduate student samples reported adequate to good convergent validity . Also, adequate convergent validity was found for short-term recall measures in a male population sample  (see Table 2). Only one study referred to divergent validity of the graduated-frequency and short-term recall measures and only in terms of a negative correlation in an undergraduate student sample between religiosity and alcohol consumption  (see Table 2). Similarly, only one study referred explicitly to structural validity - a 30-day quantity-frequency measure that was used to collect data on alcohol consumption in a general population reported poor validity  (see Table 2).
Overall, the review found that only a relatively small number of studies investigated the COSMIN psychometric domains of each type of measure. Furthermore, the hypothesis validity or structural validity of the graduated-frequency measure was not investigated at all nor was the structural validity of the short-term recall measure. Divergent validity or construct validity were not assessed for the quantity-frequency measure.
Psychometric property ratings for measure types
Each type of measure appeared to have good criterion validity according to COSMIN methodology. Several different reference standards or criterions were used in the included studies to measure alcohol consumption (e.g. [9, 29]). The appropriateness of using peers , spousal reports  and short-term recall measures  as criterion standards is questionable and perhaps it is unsurprising that these studies reported a low quality rating (despite reporting good content validity). Currently, there is no gold standard for the measurement of alcohol consumption. Most countries use some standard unit of measurement (e.g. one drink, one unit) but there is a lack of consensus and no internationally accepted definition thereby posing difficulties for the conduct of comparative analyses. Biological markers of alcohol consumption should be used more frequently to support and validate findings from self-reporting measures, as these methods are not subject to sampling errors or researcher or participant bias . However these measures are also not without risk of error. Alcohol abstinence in the 24 h prior to breath-, blood- or urine- ethanol measurement has been shown to produce low results even for heavy drinkers . More research is needed to find a gold standard for alcohol consumption measurement.
Construct validity was poor for graduated-frequency and short-term recall measures, and not assessed for quantity-frequency measures. The structural validity of the quantity-frequency measure only was assessed and this construct validity-related property was deemed to be poor. Only one study investigated the predictive validity of the quantity-frequency measure and it found that the validity was poor. Poor predictive validity results suggest the measure may not be valid in predicting the measurement of future alcohol intake among the general population or in predicting the measurement of drinking trajectories and alcohol-related consequences. The study was conducted with good methodological quality and received a good COSMIN score.
In contrast, the graduated-frequency and short-term recall measures achieved mixed results including predicting with variable accuracy the outcomes of alcohol-related morbidity and mortality and alcohol dependence. There were several studies of the convergent validity of each measure and generally this property was deemed to be moderate to good.
Test-retest results tended to indicate that similar outcome-assessments of alcohol consumption were found when the quantity-frequency measure, graduated-frequency measure and the short-term recall measure were re-administered. Mixed results were reported for inter-rater reliability of quantity-frequency and short-term recall measures, with poor inter-rater reliability found when the graduated-frequency measure was applied. In particular, there appeared to be difficulty obtaining good agreement between raters regarding the measurement of consumed beer, wine and liquor respectively , between self-report tests (AUDIT (Alcohol Use Disorders Identification Test ) and CAGE (Cut down, Annoyed, Guilty, Eye-opener) ) and a quantity-frequency measure when research assistants interviewed participants using a face-to-face predetermined appointment schedule . It is important to note that these studies achieved only fair or poor COSMIN ratings. Indeed, many of the reported poor psychometric properties may be due to poorly conducted studies as indicated by poor COSMIN ratings [6, 21, 31]. Variation between types of psychometric properties for the same measure (e.g. high validity for one property and low for another property) may be due to differences in study design and methodological quality.
Discrepancies between COSMIN ratings and psychometric properties
There were some studies in which there were discrepancies between COSMIN ratings of the quality of a psychometric property and the performance of a measure. For example, one study  reported good test-retest reliability for a typical weekly quantity-frequency measure but the methodological quality of a particular aspect of the study was rated poor because the method of administering the (computer or paper) measure of consumption was not consistent across time-points. Reasons for poor methodological quality ratings using the COSMIN checklist included inappropriate time intervals between measure administrations, ambiguity over management of missing responses, lack of assurance that patients remained stable between measure administrations, inadequate sample size and choice of inappropriate statistical methods (e.g. reporting Spearman’s correlation coefficients  over kappa values for test-retest reliability).
Issues with self-reporting alcohol consumption
Self-reported alcohol consumption is difficult to measure accurately due to the influence of social desirability and memory issues and these factors were alluded to in many included studies (e.g. [25, 27, 32, 35]). Possible solutions to these challenges include using more anonymised interview types, randomised response techniques, checking responses using more than one alcohol measure and using memory aids (interviewer prompts, calendars or diaries) . Also, population-based survey research about alcohol consumption and drinking habits are particularly problematic when the sample includes alcoholics because of uncertainty about whether or not participants are sober when interviewed, difficulty recalling consumption due to the effect of alcohol on memory and increased alcohol tolerance in frequently heavy drinkers . These issues pose challenges for the reliable and valid assessment of alcohol consumption in surveys. Potential solutions include factoring in more complex survey questions requiring greater reflection on alcohol intake (if respondents are asked to consider the timing, type of beverage drank and episodic heavy drinking their responses should be more considered),  use of a breathalyser before measure administration to ensure participants are alcohol-free  and creating an environment that is conducive to confidentiality and honest disclosure of alcohol consumption [48, 50]. These potential solutions may be incorporated into population-based survey collection of alcohol consumption data in order to afford greater confidence in the drinking status of participants and significant assurance that responses reflect consumption accurately.
Comparison with previous reviews
Generally, the measures did not appear to vary significantly across population age and sex groupings. The assessment of the amount of alcohol consumed appeared to exert some influence on the psychometric performance of self-report measures. Parker  reported good concurrent validity using a short-term recall measure though for heavy drinkers only. Gmel  found the graduated-frequency measure over reported alcohol intake, whereas the beverage specific quantity-frequency measure provided a more accurate measure of consumption. The Feunekes review recommended that the quantity and frequency of alcohol consumption should be prioritised and assessed separately for specific types of alcoholic beverages  and beverage-specific quantity-frequency measures performed accurately and reliably though only in relation to the consumption of lower levels of alcohol [26, 28]. The use of a ‘diary’ format with a predetermined timeframe (that afforded individuals an opportunity to record all alcohol consumption in a format of their choice; and usually in the format of a short-term recall measure) had good psychometric properties [24, 29]. This finding may suggest that the use of an ‘actual’ time period instead of the ‘usual’ timeframes in quantity-frequency and graduated-frequency measures  may add to the reliability and validity of assessments of alcohol consumption. However both reviews found that the quantity-frequency measure performed with most reliability and validity and was the measure with the highest concordance with the short-term recall ‘diary’ measure [22, 29, 33, 38].
Recommendations for improved reliability and validity
The review findings suggest that the reliability and validity of self-reporting alcohol consumption measures may be improved in various ways. For example, computerised or automated modes of administration rather than an interviewer-based mode might facilitate greater privacy and assure more candid reporting . Longer timeframes may be more desirable as they tend to capture less frequent drinkers (i.e. weekly, monthly or annual recall) and questions which involve specified timeframes (i.e. last week, last year) over ‘usual’ reference frames require respondents to focus their recall. Beverage-specific questions and questions that ask respondents to group responses into graduated categories may encourage a more thorough consideration of their alcohol consumption and, in turn, produce more accurate reporting. It is worth considering that the self-report measures themselves are outdated as they focus only upon frequency and volume of alcohol. It may be worthwhile to instead use self-report tests to assess alcohol consumption which take into account symptoms of alcohol addiction/dependence as well. Using review findings, the advantages and disadvantages of each measure type are summarised (Table 3).
Limitations and strengths
The review found wide variation in the structure, content and format of quantity-frequency, graduated-frequency and short-term recall measures. For example, time-period referents ranged from 24-h recall to alcohol intake over the previous year and alcohol consumption was assessed in terms of units (standardised to the country of each sample of respondents), grams of alcohol, typical sizes of sold drinks and beverage-specific drinks. The included studies from various multidisciplinary databases covered a range of locations, cultures and populations and these factors were taken into account in the analytical comparisons of measures of alcohol consumption. It is important to note that a proportion of the review studies focused on undergraduate student populations (e.g. [8, 10, 34, 40]). Arguably, students may be atypical with respect to the general population  and their alcohol consumption patterns may have limited read-across to the general population particularly the population of older people. Some psychometric properties were not assessed including measurement error, cross-cultural validity, internal consistency and responsiveness. All studies were in the English language (in keeping with COSMIN manual guidelines) and it is possible that important studies in other languages may have been missed. The review adhered to the COSMIN manual  and whilst the COSMIN method adds rigour to the exercise of psychometric assessment, arguably, a limitation is the use of the ‘worst score counts’ which means that despite attaining higher quality scores on some items, the lowest score of an item list is taken as the overall quality rating (e.g. [28, 31]). Furthermore, studies of poor design quality were included in the review due to the overall lack of studies that met initial eligibility criteria.
Nevertheless, the review was completed in a methodologically robust fashion as per the COSMIN approach which has transparent, tested and validated resources such as a manual, search filters and a quality appraisal tool . Particular strengths include the use of extensive search terms and having two reviewers search the literature.
The studies of quantity-frequency measures indicated good/adequate psychometric properties for test-retest reliability, criterion validity, convergent validity and hypothesis validity; predictive- and structural-validity were rated as poor and inter-rater reliability reported mixed results. Regarding graduated-frequency measures, good/adequate psychometric properties were reported for test-retest reliability, convergent validity and divergent validity; criterion validity and predictive validity reported mixed results and construct validity and inter-rater reliability were reported as poor. Short-term recall measures achieved good/adequate psychometric properties for test-retest reliability, convergent validity, hypothesis validity, construct validity, divergent validity. Criterion validity, predictive validity and inter-rater reliability reported mixed results. The review findings add to previously published alcohol self-report literature by providing an updated appraisal of measures of alcohol consumption research and indicate that a combination of aspects of the various measures may enhance the reliable and valid assessment patterns of drinking.
It is difficult to discern which one of the existing measures is the most reliable and valid given the absence of any assessment of certain psychometric properties and the mixed results of studies included in the review. Arguably, when the results from the range of studies are considered and summed, they indicate that the quantity-frequency measure compared to the other two measures appeared to perform best in psychometric terms and, therefore, it is likely to produce the most reliable and valid assessment of alcohol consumption in population surveys. The results indicated that the features of alcohol consumption measures which performed with good reliability and validity were those that assessed beverage-specific alcohol consumption, used actual timeframes and asked about episodes of binge drinking; and that the quantity-frequency measures appeared to be the ‘best’ questionnaire-type currently available to measure self-reported alcohol consumption. Clearly, there is a need for more focused psychometric studies of measures of alcohol consumption including head-to-head comparative population-based and community surveys. Comparability of review results with previous reviews [13, 14] is difficult because they did not employ a COSMIN methodology to appraise studies. Overall, findings appeared to be in keeping with the results of the Gmel review  which found a beverage-specific, quantity-frequency measure recorded alcohol consumption more reliably, and with the Feunekes  which reported that the most accurate alcohol intake measurement was provided by quantity-frequency and short-term recall measures.
Alcohol use disorders identification test 
Cut down, Annoyed, guilty, eye-opener (test for problem alcohol use) 
Consensus-based Standards for the selection of health measurement instruments 
Diagnostic and statistical manual of mental disorders 
Michigan alcoholism screening Test 
Diagnostic and statistical manual of mental disorders revised 3rd edition
Diagnostic and statistical manual of mental disorders 4th edition
World Health Organisation, “Global strategy to reduce the harmful use of alcohol,” World Health Organisation, 1st May 2010. Available: http://www.who.int/substance_abuse/activities/gsrhua/en/. [Accessed 18 July 2017].
Department of Health, “Health risks from alcohol: new guidelines,” gov.uk, 8th January 2016. Available: https://www.gov.uk/government/consultations/health-risks-from-alcohol-new-guidelines. [Accessed 1 Aug 2017].
DrinkAware, “What is an alcohol unit?,” DrinkAware, 16 January 2016. Available: https://www.drinkaware.co.uk/alcohol-facts/alcoholic-drinks-units/what-is-an-alcohol-unit/. [Accessed 21 Dec 2017].
Murray C, Richards M, Newton JN, Fenton KA, Anderson HR, Atkinson C, Bennett D, Bernabe E, Blencowe H, Bourne R, Braithwaite T, Brayne C, Bruge T, Brugha TS, Burney P, Dherani M, Dolk H, Edmond K, Ezzati M, Fleming ND, Fleming ND, Freedman G, Gunnell D, Hay RJ, Hutchings SJ, LOhno S, Lozano R, Lyons RA, Marcenes W, Magnavi M, Newton CR, Pearce N, Pope D, Rushton L, Salomon JA, Shibuya K, Wang T, Wang T, Williams HC, Woolf AD, Lopez AD, Davis A. UK health performance: findings of the global burden of disease study 2010. Lancet. 2013;381(9871):997–1020.
Dawson D. Methodological issues in measuring alcohol use. Alcohol Res Health. 2003;27(1):18–28.
Bonevski B, Campbell E, Sanson-Fisher R. The validity and reliability of an interactive computer tobacco and alcohol use survey in general practice. Addicit Behav. 2010;35(1):492–8.
Reid M, Tinetti M, O'Connor P, Kosten T, Concato J. Measuring alcohol consumption among older adults: a comparison of available methods. Am J Addictions. 2003;12(3):211–9.
O'Hare T. Measuring alcohol consumption: a comparison of the retrospective diary and the quantity-frequency methods in a college drinking survey. J Stud Alcohol. 1991;52(5):500–2.
O'Hare T. Comparing the QFI, the retrospective diary and binge drinking in college first offenders. J Alcohol Drug Educ. 1997;42(3):40–53.
Dollinger S, Malmquist D. Reliability and validity of single-item self-reports: with special relevance to college Students’ alcohol use, Religiousity, study and social life. J Gen Psychol. 2009;136(3):231–41.
Poikolainen K, Podkletnova I, Alho H. Accuracy of quantity-frequency and graduated frequency questionnaires in measuring alcohol intake: comparison with daily diary and commonly used laboratory markers. Alcohol Alcoholism. 2002;37(6):573–6.
National Health Service, “Alcohol Misuse,” National Health Service, 28 November 2015. Available: https://www.nhs.uk/conditions/alcohol-misuse/. [Accessed 21 Dec 2017].
Gmel G, Rehm J. Measuring alcohol consumption. Contemp Drug Probl. 2004;31(3):467–540.
Feunekes G, van ‘t Veer P, van Staveren WA, Kok FJ. Alcohol intake assessment: the sober facts. Am J Epidemiol. 1999;150(1):105–12.
Mokkink L, Terwee C, Patrick D, Alonso J, Stratford P, Knol D, Bouter L, de Vet HC. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539–49.
Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097.
Toneatto T, Sobell M, Sobell L. Predictors of alcohol abusers’ inconsistent self-reports of their drinking and life events. Alcoholism Clinl Exp Res. 1992;16:542–6.
C. Terwee, S. Bot, M. de Boer, D. van der Windt , D. Knol, J. Dekker, L. Bouter, H. de Vet, “Terwee C, Bot S, de Boer M, van der Windt D, Knol D, Dekker J, Bouter L and de Vet H (2007) ‘Quality criteria were proposed for measurement properties of health status questionnaires’., J Clin Epidemiol, 60(1), pp. 34-42,”
Mokkink L, Terwee C, Knol D, Stratford P, Alonso J, Patrick D, Bouter L, de Vet HC. The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: a clarification of its content. BMC Med Res Methodol. 2006;10(22):1471–2288.
L. Mokkink, H. de Vet, C. Prinsen, D. Patrick , J. Alonso, L. Bouter and C. Terwee, “COSMIN risk of bias checklist for systematic reviews of patient reported outcome measures,” 12th December 2017. Available: https://doi.org/10.1007/s11136-017-1765-4. [Accessed 21 Dec 2017].
Hansell N, Agrawal A, Whitfield J, Morley K, Zhu G. Long-term stability and heritability of telephone interview measures of alcohol consumption and dependence. Twin Res Hum Genet. 2008;11(3):287–305.
Whitfield J, Madden P, Neale M, Heath A, Martin N. The genetics of alcohol intake and of alcohol dependence. Alcoholism Clin Exp Res. 2004;28(8):1153–60.
Gruenewald P, Johnson F. The stability and reliability of self-reported drinking measures. J Stud Alcohol. 2006;67(1):738–45.
Chaikelson J, Arbuckle T, Lapidus S, Pushkar Gold D. Measurement of lifetime alcohol consumption. J Stud Alcohol. 1994;55(1):133–40.
Greenfield T, Nayak M, Bond J, Kerr W, Ye Y. Test-retest reliability and validity of life-course alcohol consumption measures: the 2005 National Alcohol Survey Follow up. Alcoholism Clin Exp Res. 2014;38(9):2479–87.
Crum R, Puddley I, Gee G, Fried L. Reproducbility of two approaches for assessing alcohol consumption among older adults. Addict Res Theory. 2002;10(4):373–85.
Parker D, Derby C, Usner D, Gonzalez S, Lapane K, Carleton R. Self-reported alcohol intake using two different question formats in southeastern New England. Int J Epidemiol. 1996;25(4):770–4.
Russell M, Welte J, Barnes G. Quantity-frequency measures of alcohol consumption: beverage-specific vs global questions. Br J Addict. 1991;86(1):409–17.
Sander A, Witol A, Kreutzer J. Alcohol use after traumatic brain injury: concordance of patients’ and relatives’ reports. Alcohol Trauma Brain Inj. 1997;78(1):138–41.
Cutler S, Wallace P, Haines A. Assessing alcohol consumption in general practice patients- a comparison between questionnaire and interview. Alcohol Alcoholism. 1988;23(6):441–50.
Koppes L, Twisk J, Snel J, Kemper H. Concurrent validity of alcohol consumption measurement in a ‘healthy’ population; quantity-frequency questionnaire v. Dietary history interview. Bri J Nutr. 2002;88(1):427–34.
Weingardt K, Baer J, Kivlahan D. Episodic heavy drinking among college students: methodological issues and longitudinal perspectives. Psychol Addict Behav. 1998;12(3):155–67.
Read J, Kahler C, Strong D, Colder C. Development and preliminary validation of the young adult alcohol consequences questionnaire. J Stud Alcohol. 2006;67(1):169–77.
Northcote J, Livingston M. Accuracy of self-reported drinking: observational verification of ‘last occasion’ drink estimates of young adults. Alcohol Alcoholism. 2011;46(6):709–13.
McGinley J, Curran P. Validity counts with multiplying ordinal items defined by binned counts: an application to a quantity-frequency measure of alcohol use. Methodol (Gott). 2014;10(3):108–16.
Tuunanen M, Aalto M, Seppa K. Mean-weekly alcohol questions are not recommended for clinical work. Alcohol Alcoholism. 2013;48(3):308–11.
Rehm J, Greenfield T, Walsh G, Xic X, Robson L, Single E. Assessment methods for alcohol consumption, prevalence of high risk drinking and harm: a sensitivity analysis. Int J Epidemiol. 1999;28(1):219–24.
Searles J, Perrine M, Mundt J, Helzer J. Self-report of drinking Uisng touch-tone telephone: extending the limits of reliable daily contact. J Stud Alcohol. 1995;56(4):375–82.
Hilton M. A comparison of a prospective diary and two summary recall techniques for recording alcohol consumption. Br J Addict. 1989;84(1):1085–92.
LaBrie J, Penderson E, Earleywine M. A group-administered timeline Followback assessment of alcohol use. J Stud Alcohol. 2004;66(5):693–7.
Searles J, Helzer J, Walter D. Comparison of drinking patterns measured by daily reports and timeline Followback. Psychol Addict Behav. 2000;14(3):277–86.
Lennox R, Zarkin G, Bray J. Latent variable models of alcohol-related constructs. J Subst Abus. 1996;8(2):241–50.
Sharpe P. Biochemical detection and monitoring of alcohol abuse and abstinence. Ann Clin Biochem. 2001;38:652–64.
World Health Organisation. The alcohol use disorders identification test. Geneva: Department of Mental Health and Substance Dependence; 2001.
Ewing J. Detecting alcoholism. The CAGE questionnaire. J Am Med Assoc. 1984;252(14):1905–7.
Daniel WW. Applied nonparametric statistics. London: Houghton Mifflin; 1978.
Bowling A. Mode of questionnaire administration can have serious effects on data quality. J Public Health. 2005;27(3):281–91.
L. Sobell and M. Sobell, “Alcohol consumption measures,” 01 august 2004. Available: https://pubs.niaaa.nih.gov/publications/assessingalcohol/measures.htm. [Accessed 07 June 2017].
Sobell L, Toneatto T, Sobell M. Behavioral assessment and treatment planning for alcohol, tobacco, and other drug problems: current status with an emphasis on clinical applications. Behav Ther. 1994;25:533–80.
Midanik L. The validity of self-reported alcohol consumption and alcohol problems: a literature review. Addiction. 1982;77(4):357–82.
Werch C. Quantity-frequency and diary measures of alcohol consumption for elderly drinkers. Int J Addict. 1989;24(9):859–65.
Lucas R, Mullin P, Luna C, McInroy D. Psychiatrists and a computer as interrogators of patients with alcohol-related illnesses: a comparison. Br J Psychiatry. 1977;131:160–7.
Slutske WS, Hunt-Carter EE, Nabors-Oberg RE, Sher KJ, Bucholz KK, Madden PAF, Anokhin A, Heath AC. Do College students drink more than their non-college-attending peers? Evidence from a population-based longitudinal female twin study. J Abnorm Psychol. 2004;113(4):530–40.
Streiner DL, Norman GR, Cairney J. Health measurement scales: a practical guide to their development and use. Oxford: Oxford University Press; 2015.
Selzer M. The Michigan alcoholism screening test: the quest for a new diagnostic instrument. Am J Psychiat. 1971;127(12):1653–8.
Diagnostic & Statistical Manual of Mental Disorder. Diagnostic and statistical manual of mental disorders, fifth edition. 5th ed. Arlington: American Psychiatric Association; 2013.
This review was completed as part of a PhD which was funded by the Department of Employment and Learning Northern Ireland (DEL NI).
Availability of data and materials
The study was conducted at the Centre for Public Health, Queen’s University Belfast.
Ethics approval and consent to participate
All included studies involving the use of human participants were conducted with ethical approval and consent.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA statement checklist . Checklist for the minimum required items to be reported as part of a systematic review. (DOC 62 kb)
Characteristics of included studies. A full description of the characteristics of each study which met the review inclusion criteria (n = 28). (DOCX 25 kb)
Psychometric properties of included studies grouped into results reported by study authors and COSMIN quality ratings assigned by review authors (n = 28). (DOCX 41 kb)
About this article
Cite this article
McKenna, H., Treanor, C., O’Reilly, D. et al. Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review. Subst Abuse Treat Prev Policy 13, 6 (2018). https://doi.org/10.1186/s13011-018-0143-8