Outcome based evaluation of interventions, which play a central role in public health prevention, need to show the effect the policy or intervention makes at the public level. Whilst a plethora of literature focuses on evaluating various social marketing campaigns that tackle public health and safety issues such as drug use, health compromising lifestyle choices, unprotected or risky sexual behaviour, or unsafe driving practices, tend to rely on self reports, regardless of whether or not they were conducted in laboratory or field settings [1–3]. The issues that may hinder an evaluation of any health promotion  are further complicated by the influence of social desirability that may cast doubt over the validity of self-reported information when to the study topic relates to socially sensitive behaviour . In addition to public health concerns where obtaining accurate information on drug use is vital in establishing the need for and to evaluate preventive measures or intervention strategies, policy makers in public service utilities and law enforcement agencies also require the most accurate estimates of the problematic behavioural choices as possible in order to make informed choices.
The need to obtain the maximum intelligence on health related behaviours stems from the necessity to develop and deploy optimal intervention measures to counteract consistent failures to attain acceptable levels of behaviour across a wide range of health practices. These range from adherence to medication, resistance to addiction, avoidance of exploration of social drug use through to uptake of illegal and health damaging performance enhancement agents. The immense health, financial and social consequences of enhancing these health related behaviours has led to decades of investigation into improved approaches to obtain accurate data on sensitive personal behaviours.
Investigating the epidemiology of socially sensitive or transgressive behaviours such as illicit drug use, unhealthy weight management practices, risky behaviour, cheating, doping or non-adherence to prescribed medication or treatment, is hindered by respondents evasively answering questions about sensitive behaviours . A recent research programme provides further evidence for self-protective strategic responding, even under anonymous answer conditions [7–9]. Consequently, much effort has been made to develop reliable methods to collect valid epidemiological data in these sensitive behavioural domains.
Approaches range from techniques such as the Bogus Pipeline  to providing incentives for honest answers such as the Bayesian Truth Serum (BTS) . Whilst the Bogus Pipeline has been used for decades and accumulated reasonable evidence that the BPL shifts self-reports toward veracity , the BTS approach is relatively new and in need for further refinements [13, 14]. Based on empirical evidence, Barrage and Lee  also suggest that to be effective, respondents may need to have a positive experience with and trust in the BTS method, which can lead to respondents learning how to maximise their incentives and therefore their answers might be biased towards maximum income at the expense of telling the truth. Although these methods possess the potential to overcome to an extent, self-protective response bias by either evoking fear of exposure of lying or providing financial gain for truthfulness, their feasibility in self-administered epidemiological scale studies appears to be compromised. An alternative approach has made notable progress in collecting data on sensitive behaviours through the development of indirect methods using randomisation or deliberate uncertainty to provide respondent protection over and above ensuring anonymity .
The concept behind randomised response models (collectively termed RRT) rests on introducing a randomising element to the survey question by using some device (e.g. by rolling a dice, flipping a coin or picking a card) which determines how the respondent should answer . Since the researcher has no control over this randomising device, answers cannot be directly traced back to any particular individual, which in turn heightens the respondents' sense of increased protection. A common characteristic of RRTs is that to obtain useful data on the sensitive question, the technique requires respondents to answer directly, in some form, the sensitive question. By contrast, non-random models (NRM) do not require a direct answer as they rely on implicit uncertainty rendering impossible the link between an individual and the sensitive behaviour. Whilst NRMs build on combining the sensitive question with unrelated innocuous questions, some RRTs also incorporate innocuous questions where the population prevalence may or may not be known. When population prevalence needs to be established, it requires an independent sample randomly selected from the same population.
Randomised response models
The RRT aims to elicit sensitive, embarrassing or compromising information that may portray respondents unfavourably. The common characteristic of the RRT is that sensitive behaviour estimation can only be made at the aggregated population level. The method is based on the principle introduced by Warner  using a spinner as a randomising device to gauge the proportion of the sample with a compromising behaviour. The method assumed that any person in a sample is either characterised by the behaviour (group A) or is not (group B). The respondents, hidden from the interviewer, were asked to use the spinner which either landed on group A or on group B and answer with a simple 'yes' or 'no' depending on whether the spinner pointed to the group he/she belonged to. Whilst the outcome of the spinner exercise for each individual was not known to the interviewer, hence protecting the individual, the chance that a spinner points to group A or B was known (p and 1-p). Thus compared to the observed pattern of 'yes' and 'no' answers Warner was able to determine the proportion of respondents in the sample admitting the sensitive behaviour.
Subsequent adaptations of the RRT have covered a wide range of sensitive issues along with numerous attempts to refine the approach . Among the wide array of models, the Forced Alternative/Response model, used only when the sensitive question is presented , has been found to be one of the most efficient variants of Warner's original conception . Recently the RRT method has been expanded to multi-item scales and tested with male date rape attitude  and alcohol abuse . The extension of the RRT to multi-item scales allows its application to psychological measures such as attitudes toward sensitive issues. This approach can be expanded to areas where honest responding might be compromised by self-protective lying, for example illegal substance dependence, domestic violence, disordered eating or cheating and doping use in sport.
Research has shown that whilst respondents understand the reason behind the use of the RRT approach in surveys, they generally find it obtrusive and favour simpler approaches . Contrary to the RRT, non-random models present a more straightforward approach that provides protection by asking the number or combination of behaviours respondents are engaged in rather than asking about each behaviour in turn.
The non-randomised model (NRM) has received increased attention lately. A recent review  showed that NRMs appear to successfully address many of the limitations typically associated with RRTs such as the need for a randomisation device which often requires interviewers; forcing participants to say 'yes' to an embarrassing question when their honest answer would be the opposite or requiring a direct answer to the same question. Contrary to the RRT, in the NRM every participant is required to answer the research question in an evasive way. The fact that a response is required to the research question can help participants to feel that they have made a contribution by volunteering to take part in the research whereas with many RRT variations, a significant proportion of respondents are simply instructed to ignore the research question and just say 'yes' or 'no'. Owing to this characteristic, NRMs can also be more efficient with comparable or even increased privacy protection levels.
Alternative approaches have been progressively developed which preclude the need for the randomising device. These include an item count method , later termed the 'item count technique'  and later the 'unmatched count technique' . In a similar concept to the unrelated question (UQ) method [16, 24], item counts (IC) utilise a simple response task whilst embedding the sensitive question in a list of innocuous questions. In place of the randomising device the experimental group receives all questions with instruction and are asked to indicate only the number of affirmative answers. As a control sample is required to establish the population prevalence of the innocuous questions, respondents are randomly assigned to one of two groups (experimental and control), where the control sample receives the identical list of questions minus the sensitive question. The mean number of 'yes' responses are compared between the two groups. Assuming that the innocuous behaviour is equally manifest in both groups, the difference between the observed proportion of 'yes' answers must be due to the presence of the sensitive question in one of the groups and not the other.
Using prior knowledge of the population prevalence for an innocuous question, has led to the development of a number of competing techniques over the past five years. In these models, the innocuous question is outside the researcher's control, independent of the research question but the population prevalence is already established such as birth month or season, geographical location for the person or a family member. The Triangular Model (TM) and the Crosswise Model (CWM) use a combination of a sensitive and an innocuous question with known population prevalence . The question and answer options are then placed in a 2 × 2 contingency table where two 'quadrants' relate to the innocuous questions are with known population prevalence (e.g. 3/12 and 9/12 if someone's birth month is used as the innocuous question). The other two quadrants represent the binomial response options to the sensitive question. In the TM respondents are asked to indicate whether they belong to the No-No quadrant or any of the other three quadrants (Yes-No, Yes-Yes or No-Yes). The CWM asks people to indicate whether they belong to any of the mixed categories (Yes-No and No-Yes) which only reveals that one of the two statements is true but which one remains hidden. Similarly, the Hidden Sensitivity (HS) model for two sensitive questions with binary outcomes using one quadrant such as season for birthday or geographic location (e.g. South/West/North/East, East/West side of a river or any criteria that creates meaningful and useable groups) . In this technique two response pathways are provided. Respondents are required to either answer truthfully or are forced to an option for the non-sensitive question (e.g. about birth date or place of living) based on their answers to the two sensitive behaviours. The drawback of this technique is that only those who belong to the category of not having a sensitive behaviour (0,0) are asked to answer the innocuous question honestly, whereas others (0,1; 1,0; 1,1) are forced to select an answer for the innocuous question based on their sensitive behaviour. Therefore, people admitting to a sensitive behaviour (or both) are protected by the true answers of those who do not have a sensitive behaviour to declare. The advantage of the HS model over the Triangular or Crosswise models is that HS allows two sensitive questions to be simultaneously investigated .
Other models such as the Unmatched Count Technique (UCT)  or the Cross-Based Method (CBM) and the Double Cross-based Method (DCBM)  work with unknown population prevalence. The common characteristic of these models is that an independent sample randomly drawn from the same population is required to establish the prevalence rate for the innocuous questions in order to estimate the prevalence rate for the sensitive question. The UCT  contains two parallel questionnaires with several innocuous questions but only one version of the questionnaire features the sensitive questions. The total number of endorsed answers is calculated for each version independently, and then compared. The difference between the two sample means indicates the proportion of the respondents who endorsed the sensitive question.
Currently, studies comparing the performance of the item count method to other NRM or RRT models, or direct self-reports, are inconclusive. Coutts and Jann  found that the UCT outperformed the RRT counterparts in assessing many sensitive behavioural domains. By contrast, Tsuchiya et al. , using a web-based survey, compared the item counts to direct self reports and concluded the item count technique yielded lower numbers of endorsed behaviour. However, Tsuchiya's  list of behaviours contained items to which over-reporting can reasonably be expected (e.g. donating blood), which might have skewed upwards the total numbers of reported behaviours in direct self-reports. Where differences were found between self-reports and item counts (using CBM and shoplifting) the differences were explained by the sample demographic. The largest difference was found among the middle-aged, domiciled in urban areas and highly-educated (e.g. in or completed tertiary education) female respondents .
Constraints of each approach were associated with whether or not the population prevalence used for the non-sensitive questions was known. When this information is not available, the research requires an independent sample of significant size to establish this, parallel to collecting a sample to answer the research question about some sensitive issue. Furthermore, the chosen probability that requires respondents to answer truthfully determines the proportion of the sample that is directly useable to answer the research question. Finally, the actual prevalence rate of the target behaviour also has an effect on the minimum required sample size.
Investigating the efficiency of the RRT, Lensvelt-Mulders et al.  compared five RRT methods and found the Forced Response method and a special from of the Unrelated Question design the most efficient requiring about 2.2 times the sample size required of a direct self-report method. Sample sizes for the Crosswise model were estimated for a number of combinations of power and population prevalence  where estimates for minimum required sample sizes ranged between 2.5 and 19.3 times the sample size required for direct questioning surveys. Based on these simulations, the Crosswise model's efficiency compared favourably to Warner's  model.
An alternative way to think about efficiency is to consider the proportion of the population sample solely used to provide an estimate of the population prevalence for the non-sensitive questions. This 'waste', which accompanies most models, is the acceptable efficiency cost of providing the added anonymity. The proportion of the sample inefficiency ranges between 25% and 75%, depending on the research design. Consequently, in order to achieve a sample size with sufficient statistical power for meaningful analysis there is a requirement for more extensive data collection than in a typical survey.
The recent change in legal status (in the UK) of the drug Mephedrone provided an opportunity to explore a novel approach to data collection on a sensitive issue. Mephedrone is a central nervous system stimulant that produces effects similar to amphetamines. It produces a euphoric effect, and has been reported to increase empathy, stimulation and mental clarity, but can lead to adverse effects such as nasal irritation, tachycardia and restlessness . Although limiting in scope (i.e. we asked about the use of one specific drug), Mephedrone was a topical choice at the time of the study's conception as it had been reclassified as a Schedule 1 Class B drug on April 16th 2010 , making it unlawful to possess, produce, and/or distribute without licence and carrying a five year prison sentence for possession and up to 14 years for producing, selling or distributing. The ban generated considerable debate, with some expressing discontent about the hastened reaction and the generic ban  along with a concern that the ban may not stop Mephedrone use, but could make the demand and supply clandestine, leading to unintended consequences from the addition of toxic excipients (through "cutting" or chemical by-products) and thus present an even greater danger to health . In spite of the new legislation, internet retailers appear to have continued to sell products under different brand names that contain, albeit unlabelled, Mephedrone-like substances . This case is a good illustration of the situation when the change in regulation could (and should) have been supported with at least an estimation of what proportion of the population uses Mephedrone and is at risk.
Recent inter-disciplinary approaches to estimating doping prevalence in sporting sub-populations has led to advances in estimation through improved efficiencies . The current study aimed to develop and test a new research tool for use at the epidemiological scale. To achieve this aim, a fuzzy response model, Single Sample Count (SSC), was proposed.