Skip to main content
Help us understand how you use our websites. Take part in our 30 minute study now.

Application of data mining techniques and logistic regression to model drug use transition to injection: a case study in drug use treatment centers in Kermanshah Province, Iran



Drug injection has been increasing over the past decades all over the world. Hepatitis B and C viruses (HBV and HCV) are two common infections among people who inject drugs (PWID) and more than 60% of new human immunodeficiency virus (HIV) cases are PWID. Thus, investigating risk factors associated with drug use transition to injection is essential and was the aim of this research.


We used a database from drug use treatment centers in Kermanshah Province (Iran) in 2013 that included 2098 records of people who use drugs (PWUD). The information of 29 potential risk factors that are commonly used in the literature on drug use was selected. We employed four classification methods (decision tree, neural network, support vector machine, and logistic regression) to determine factors affecting the decision of PWUD to transition to injection.


The average specificity of all models was over 84%. Support vector machine produced the highest specificity (0.9). Also, this model showed the highest total accuracy (0.91), sensitivity (0.94), positive likelihood ratio [1] and Kappa (0.94) and the smallest negative likelihood ratio (0). Therefore, important factors according to the support vector machine model were used for further interpretation.


Based on the support vector machine model, the use of heroin, cocaine, and hallucinogens were identified as the three most important factors associated with drug use transition injection. The results further indicated that PWUD with the history of prison or using drug due to curiosity and unemployment are at higher risks. Unemployment and unreliable sources of income were other suggested factors of transition in this research.


Drug injection has been increasing over the past decades all over the world [2]. Compared to smoking, inhaling, snorting and swallowing, injecting of drugs for various reasons, like non-compliance with health tips, increases the chance of health consequences such as viral infections. Using shared needles and syringes spread infectious diseases among people who inject drugs (PWID). High prevalence rates of HBV and HCV among PWID represent the vulnerability of this population [3]; the chance of HCV infection is 53 times higher among PWID compared with general population [4]. According to the results of a meta-analysis related to the incidence time of HCV infection (considering from the onset of injection), the one-year cumulative incidence of drug injection was 28% (with 95% CI: 17–42%) [5].

Recently, a systematic review of HIV among people who use drugs (PWUD) showed that the prevalence of HIV among PWID is 4.4 times more than others [6]. A third of all HIV cases outside of sub-Saharan Africa are PWID [7]. Also, this infection can spread to other groups of society via sexual relationships with PWID. In seven out of ten areas under the coverage of the joint United Nations’ program on HIV and AIDS (UNAIDS), drug injection was identified as the first (or second) cause of HIV transmission [8, 9].

It is estimated that there are approximately 260,000 PWID in Iran [10], and more than 60% of new HIV cases are PWID.

Iran has adopted large-scale harm reduction policies such as provision of methadone maintenance treatment (MMT) and needle and syringe programs targeting PWID since 2002. Although these policies are the most important preventive measure against drug injection and risks experienced by PWID [11], it is believed that preventing injection initiation takes precedence over reducing a range of risks that these individuals encounter with after starting the use of drug injection [12,13,14].. Experiences in Amsterdam, Netherlands, and New York, USA, [15, 16] showed that preventing the transition to drug injection is quite feasible. However, little attention has been paid to the prevention of PWUD to transition from other routes of drug administration (smoking, inhaling, snorting and swallowing) to injection in Iran. A better understating of risk factors associated with drug use transition to injection in Iran can help authorities make more effective preventive strategies and identify PWUD at risk of transition. This research aimed to determine these factors, using classification models.

It should be noted that the performance of different classification models may vary over different datasets. No model works very well in all situations. Therefore, we employed the most widely used classifiers (neural network and support vector machine, decision tree and logistic regression) whose prediction accuracy has been confirmed by several studies [17,18,19]. At final, the model with the best performance was used to interpret the findings.



This research used a dataset that included 2098 records. The data were collected based on a researcher-made checklist of information about people who were referred to drug use treatment centers. The checklist was completed by the PWUD, therapist or experts and consultant of treatment centers. Based on agreement with the treatment centers, checklists were collected based on specific codes for each individual and personal information (such as name, family and national code) was not included in the checklists. Informed consent was obtained from the PWUD to permission of using the data and permission to do this research had been registered with the Ethics Committee of Kermanshah University of Medical Sciences under code KUMS.RES.1394.480. Our methodology for modeling process are shown in Fig. 1.

Fig. 1

Classification model building process.

We used the information of 29 risk factors that are believed associate to transition PWUD to injection. These risk factors included, age, gender, marital status, housing status, education, occupational status, age at the first drug use experience, the first used drug, number of years of drug use, family history of drug use, history of suicide, history of overdose, history of mental disorder of the individual and the family, history of taking opium, hallucinogens, hashish, heroin, sap (the milky latex sap of opium), crystal, cocaine, amphetamine, sedative, methadone, cigarette and alcohol, history of prison, number of referrals to drug use treatment centers, motivation for starting drug use. History of drug injection was considered as dependent variable with two subsets: people who inject drugs (PWID) and people who do not inject drugs (people who smoky, inhale, snort or swallow drugs) (PWNID). For cases with a history of injection, only those were enrolled that injection was the latest type of drug using.

Data pre-processing and dealing with missing values

Before model application, the missing data and outliers were checked consistently. The missing data across all variables for the dataset ranged from 0 to 11.83%. The highest missing data were history of suicide (11.83%) and history of overdose (1.24%). The data for these variables were imputed by using CART regression trees. CART is one of the popular methods for imputing missing data. It was proposed by Breiman et al. in 1984 [20]. The other missing data with missing values lower than 0.057% (history of mental disorder of the individual and the family, history of prison, marital status, housing status, history of drug injection, number of referrals to drug treatment centers, and motivation for starting drug use) were imputed by their mode. Anomaly detection was used for finding the outlier records. Anomaly detection provides very significant and critical information for outlier detection in various applications [21]. Fifteen records with anomaly index greater than 2 [22] were eliminated from further investigations. The eliminated records were belonged to PWNID that was the majority class. So, deleting these records because they were outliers did not affect the results.

The variable of housing status encompassed four subsets of home ownership, rentals, homelessness, and others. Furthermore, the homelessness and others were merged as one group. Marital status was defined as married, divorced or widow (widower) and single. Since in more than 80% of cases, the first used drug was the family of opium (opium and sap), then the first used drug variable was divided into the opioids and other drugs. In order to facilitate the interpretation of the results, university degrees of associate, bachelor, and master were combined to one single group of “College education” to analyze the variable of education (with no record in the PhD group). Occupational status was reduced into four groups of unemployed, self-employed, employed and housewife. The motivation variable for first drug use including factors such as sex enhancement, drugs available and others were merged into one single group. The demographic and summary statistics of variables included in the analysis for the full dataset were shown in Tables 1 and 2. For cases with a history of injection, only those enrolled that injection was the latest type of drug using.

Table 1 Summary of discrete variables
Table 2 Summary of continues variables

Classification models

Decision tree, neural network, support vector machine and logistic regression were employed to identify factors affecting PWUD‘s decisions to shift to injection among the people who were referred to the treatment centers for drug use in Kermanshah in 2013.

Decision trees (DTs) fit piecewise constant models by recursively partitioning the predictor spaces [23]. They are helpful in identifying individuals with or without history of injection through easily interpreted grouping rules. A rule is induced by a binary split on covariates with questions such as “Has the history of taking heroin” or “Is the subject male or female?” According to some criteria, the algorithm searches for the best split among all possible splits and the data are partitioned accordingly. The procedure is repeated till the data set is split into a number of mutually exclusive groups. Decision tree is simple to understand and interpret even with hard data. Although it is unstable and with a small changing in data, the optimal decision tree change very large.

The field of neural networks (NNs) was originally kindled by psychologists and neurobiologists who sought to develop and test computational analogues of neurons [24]. Roughly speaking, an NN is a set of connected input/output units in which each connection has a weight associated with it. During the learning phase, the network learns by adjusting the weights so as to be able to predict the correct class label of the input tuples. NNs involve long training times, and are, therefore, more suitable for applications where long training time is feasible. It requires a number of parameters that are typically best determined empirically, such as the network topology or “structure”. Several topologies of NNs can be used in binary classification problems. Two of the most commonly used NNs are the Multilayer Perceptron (MLP) and the Radial Basis Function (RBF). The main differences between these two NNs reside in the activation functions of the hidden layers. NN has the ability to model a dataset with a large number of input variables and highly complex nonlinear relationships. The disadvantage of NN is that this is a “black box” and output cannot be explicitly interpreted [25,26,27].

Support vector machine (SVM) is based on the fact that with an appropriate function to a sufficiently high dimension, data from two categories can always be separated by a hyperplane [28]. SVM separates a given set of binary labeled training data with a hyperplane that is maximally distant from them (known as the maximal margin hyper-plane). Data are then classified according to which side of the hyperplane they lie on. SVM model provides efficient solutions to classification problems without considering any assumption about the distribution of data and models nonlinearity of the variables based on minimization of structural risk [18]. The main disadvantage of the SVM is that there are several key parameters such as Kernel function that should be set correctly to attain the best results for any particular problem.

Logistic regression (LR) is a standard statistical Generalized Linear Model (GLM) approach for modeling binary outcomes [29]. In this approach, the logit of the conditional probability of dependent variable (history of drug injection) being formulated as a linear function of independent variables. The slope parameters in a logistic model can be interpreted as a log of odds ratios. Simple linear structure, widely available fitting software and some flexibility to deal with categorical variables are the main advantages of LR. However, the LR method is sensitive to dependent variables and the researcher must choose them correctly before using it.

All the models were fitted with the variables introduced in Tables 1 and 2. 70% of the data was used as training data and 30% as testing data. The IBM SPSS modeler 14.2 was applied for data analysis.

Imbalanced dataset

Our dataset was imbalanced because the data for PWNID and PWID were 1824 and 259, respectively. Imbalanced data set creates a new challenging problem for data mining models, because standard classification algorithms usually consider a balanced training set and this makes a bias towards the majority class. So, a number of solutions to the class-imbalance problem were previously proposed both at the sampling and algorithmic levels [30]. At the sampling level, these solutions include many different forms of re-sampling such as random oversampling, random under-sampling, and combination of them. Random under-sampling seeks to create balance between the two classes by reducing the size of the majority class. This is accomplished by randomly removing instances from the majority class until the desired class ratio has been achieved. Alternatively, random oversampling seeks to improve the class balance by increasing the size of the minority class. The increase is performed through randomly duplicating instances from the minority class until the desired class ratio has been achieved [31]. At the algorithmic level, solutions include adjusting the costs of the various classes so as to counter the class imbalance, and adjusting the probabilistic estimate at the tree leaf (when working with decision trees). In this research, a combination of oversampling and under-sampling methods were used for NN and LR. For DT method, combination of oversampling and under-sampling methods and cost method were used. Since the result for the SVM without considering the class-imbalance problem was acceptable, therefore, we did not consider the imbalanced problem for the SVM model.

Implementation and performance criteria

For comparing the models, we used 10-fold cross-validation: one with 90% subjects for training and the other with 10% subjects for validation. This process repeated 10 times. Then, Sensitivity, specificity, total accuracy, positive likelihood ratio, negative likelihood ratio and Kappa were used to compare the models and calculated based on the following formulas:

$$ Sensitivity=\frac{TP}{TP+ FN}, Specificity=\frac{TN}{TN+ FP}, Total\kern0.17em Acuraccy=\frac{TP+ TN}{TP+ FP+ TN+ FN} $$
$$ Positive\ likelihood\ ratio=\frac{Sensitivity}{1- Specificity} $$
$$ Negetive\ likelihood\ ratio=\frac{1- Sensitivity}{Specificity} $$
$$ Kappa=\frac{P_o-{P}_e}{1-{P}_e}\kern3.12em {P}_o=\frac{TP+ TN}{TP+ FP+ TN+ FN}\kern3em {P}_e=\frac{\left( TP+ TN\right)\left( TP+ FN\right)+\left( FN+ TN\right)\left( FP+ TN\right)}{{\left( TP+ FN+ TN+ FP\right)}^2} $$

Where TP, FP, TN, and FN represent the number of true positives, false positives, true negatives, and false negatives, respectively. Classification models indicate the importance of a variable based on the percentage increase in the prediction error. A variable is selected as the most important if it creates the most error when it is removed. After scoring the importance of variables, they are ranked based on their importance.


Data mining models

Decision tree

The number of variables in this research was large. Therefore, we used C5.0 decision tree that can automatically winnow the variables before a classifier is constructed, discarding those that appear to be only marginally relevant. This algorithm generates smaller classifiers with higher predictive accuracy, and can often reduce the time required to generate rule sets. The decision tree (DT) was created with three different methods: a) combination of oversampling and undersampling methods, b) cost method, and c) combination of the first and second methods. Different settings of the parameters were tested, and the best result was obtained by the first method. The samples of PWNID and PWID were multiplied by 0.6 and 4 for the training samples, respectively. Expected noise was set zero. Also, simple and accuracy were used for mode and favor in the software, respectively. The most informative variables, according to the values of variable importance, estimated by the DT model were shown in Fig. 2.

Fig. 2

Importance of variables estimated by the decision tree

Neural network

In this research, the multilayer perceptron was trained with 30 inputs (one for each predictor) in the input layer and two hidden layers with 30 and 18 neurons. The number of neurons in the hidden layer was iteratively adjusted by the software to minimize classification errors in the training dataset. Maximum training time and overfit prevention were set 15 min and 30%, respectively. Figure 3 showed the importance of variables associated with drug injection by the NN model.

Fig. 3

Importance of variables estimated by the neural network

Support vector machine

The polynomial function was used as kernel for the SVM model because it had better results than other kernel functions in our dataset. Regularization (C) and degree parameters were optimized by trying different values, and the best-obtained values were 15 and 3, respectively. We used expert mode and stopping criteria was set 0.001. The SVM model ranked all of the variables associated with drug injection, and the final results were shown in Fig. 4.

Fig. 4

Importance of variables estimated by the support vector machine

Logistic regression

Based on p <  0.05, the backwards stepwise logistic regression (LR) model indicated occupational status, education, the first used drug, number of years of drug use, motivation for starting drug use, number of referrals to drug treatment centers, family history of drug use, history of taking heroin, history of taking hashish, history of taking cocaine, history of taking hallucinogens, history of taking crystal, history of taking methadone, history of suicide, history of prison, as statistically significant factors associated with drug injection (Table 3). Reference subset was “having no history of injection”.

Table 3 Logistic regression model

Model comparison

Table 4 showed the total accuracy, sensitivity, specificity, positive likelihood ratio, negative likelihood ratio (Mean and standard deviation) and Kappa estimated by the cross-validation of the testing set for each models. The results indicated that the reliability indices of SVM model were higher than the other three models.

Table 4 Mean and standard deviation of total accuracy, sensitivity, specificity, positive likelihood ratio, negative likelihood ratio and Kappa statistic for DT, NN, SVM and LR

Applying logistic regression to important variables of the SVM model

The SVM model delineates the important variables but does not show which subset of these variables are significant. For this reason, we modeled a logistic regression based on six major variables as independent variables that had importance greater than 0.05 (including history of taking heroin, history of taking cocaine and history of taking hallucinogens, history of prison, motivation for starting drug use, and occupational status) and history of drug injection as dependent variable. Reference subset was “having no history of injection”. The obtained results were shown in Table 5.

Table 5 Logistic regression model based on the six important variables of the SVM model

Table 5 showed that the odds ratio of being unemployment to housewife was 1.495 more in transition to drug injection. Also, the odds ratio of being self-employed and employed to housewife were 0.782 and 0.362 lower in transition to drug injection, respectively. Results revealed that having the history of prison and history of taking heroin, hallucinogens, and cocaine are another important factors. Our findings indicated that the odds ratio of people who start to use drugs because of curiosity to unemployment was 1.478 more in transition to injection. The odds ratio of people who start to use drugs because of pleasure, drug use of friends, curiosity, emotional distress and mental, use as a pain reliever and others to unemployment were lower than 1.


This research aimed at determining risk factors associated with transition to injection among the PWUD referred to drug use treatment centers in Kermanshah Province in 2013, using logistic regression, decision tree, natural network and support vector machine. Based on the reliability indices, the SVM model outperformed other models. Therefore, this model was used for further interpretation.

Our finding indicated unemployment as a risk factor associated with drug use transition to injection. This result is consistent with the findings of Abelson et al. 2006 [32]. They expressed that unreliable source of income was a determining factor in transition to injection. Results of the SVM further showed that the history of taking heroin, hallucinogens, and cocaine are another important factors. It is noticeable that the decision tree model also predicted histories of taking heroin and cocaine as the most important variables. Harocopos et al. (2009) and Neaigus et al. (2006) reported that many PWNID used heroin and cocaine before injection [16, 33]. Rahimi et al. (2012) believed that heroin and opium were the predominant patterns of drug use before the first injection [34]. Also, Cheng et al. (2006) stated that the rate of transition to injection use in Iran and other countries in the Middle and South Asia, with the higher rates of heroin use among PWNID, was higher than in the areas with higher use of stimulants [35].

Hallucinogens are new addiction substances that like heroin and cocaine provide different sense in PWUD in comparison to traditional substances (opium and sap). The hallucinogenic substance was not identified in previous researches; therefore, it was added to our research.

In the present research, having the history of prison was another factor identified as effective in transition to injection. Since injection is smokeless and odorless, imprisoned PWUD prefer it in prison. Low availability, poor quality, and high cost of drugs are the main factors that facilitate the transition to injection in prison [1]. This finding is in line with the results from studies conducted in other developing countries [1, 35,36,37]. Carles (2005) found that imprisonment increased the probability of transition to injection [37]. Between 6 and 48% of prisoners injected drugs throughout their lives [38].

The variable of motivation for starting drug use has not been considered in previous researches; therefore, it was added to our research. Our results showed that people who start to use drugs because of curiosity are at higher risk in transition to injection.


There were some limitations in this research. First, this study was a cross-sectional study and therefore the temporality relationship between case and outcome cannot be properly approved, but as cases with a history of injection, only those enrolled in study that injection were the latest type of drug using, that can be said that these findings can greatly right. Second, in this research, we selected potential risk factors associated with drug use transition to injection from the literature of drug use. There may be other factors not mentioned in the literature that we could identify by interviewing experts.


The aim of this research was to identify risk factors associated with drug use transition to injection, employing four classification methods (decision tree, neural network, support vector machine, and logistic regression).

According to the findings, it was concluded that the heroin, cocaine and hallucinogenic substances can play an effective role in transition of PWUD to injection. Efforts to reduce the use of these substances in society should be more increased. Also, those who use them should be more supported and monitored as being more susceptible to transition to injection. PWUD with a history of imprisonment are another group at risk. The entrance and exit channels of prison should be further scrutinized to prevent the entry of drugs into prison. Also, in prisons, policymakers provide treatment services for PWUD.

With respect to drug using, since unemployment and unreliable sources of income are important factors, creating jobs for PWUD is essential.

Availability of data and materials

The data is available to authors.



Classification and Regression Trees


Decision tree


Hepatitis B virus


Hepatitis C virus


Human immunodeficiency virus


Logistic regression


Methadone maintenance treatment


Neural network


People who inject drugs


People who do not inject drugs (people who smoky, inhale, snort or swallow drugs)


People who use drugs


Support vector machine


  1. 1.

    EMCDDA (2018), European drug report 2018: health and social responses to drug problems in prisons (available at

  2. 2.

    Zibbell JE, Asher AK, Patel RC, Kupronis B, Iqbal K, Ward JW, Holtzman D. Increases in acute hepatitis C virus infection related to a growing opioid epidemic and associated injection drug use, United States, 2004 to 2014. Am J Public Health. 2018;108(2):175–81.

    Article  Google Scholar 

  3. 3.

    Kim WR. Global epidemiology and burden of hepatitis C. Microbes Infect. 2002;4(12):1219–25.

    Article  Google Scholar 

  4. 4.

    Alavian SM, Gholami B, Masarrat S. Hepatitis C risk factors in Iranian volunteer blood donors: a case–control study. J Gastroenterol Hepatol. 2002;17(10):1092–7.

    Article  Google Scholar 

  5. 5.

    Hagan H, Pouget E, Des Jarlais D, Lelutiu-Weinberger C. Meta-regression of hepatitis C virus infection in relation to time since onset of illicit drug injection: the influence of time and place. Am J Epidemiol. 2008;168(10):1099–109.

    Article  Google Scholar 

  6. 6.

    Amin-Esmaeili M, Rahimi-Movaghar A. Haghdoost Aa, Mohraz M. evidence of HIV epidemics among non-injecting drug users in Iran: a systematic review. Addiction. 2012;107(11):1929–38.

    Article  Google Scholar 

  7. 7.

    UNODC, 2016. World drug report. United Nations Office on drugs and crime. (Available at:

  8. 8.

    Vlahov D, Junge B. The role of needle exchange programs in HIV prevention. Public Health Rep. 1998;113(Suppl 1):75.

    PubMed  PubMed Central  Google Scholar 

  9. 9.

    Islam MM, Wodak A, Conigrave KM. The effectiveness and safety of syringe vending machines as a component of needle syringe programmes in community settings. Int J Drug Policy. 2008;19(6):436–41.

    Article  Google Scholar 

  10. 10.

    Rahimi-Movaghar A, Amin-Esmaeili M, Haghdoost A-A, Sadeghirad B, Mohraz M. HIV prevalence amongst injecting drug users in Iran: a systematic review of studies conducted during the decade 1998–2007. Int J Drug Policy. 2012;23(4):271–8.

    Article  Google Scholar 

  11. 11.

    Bridge J. Route transition interventions: potential public health gains from reducing or preventing injecting. Int J Drug Policy. 2010;21(2):125–8.

    Article  Google Scholar 

  12. 12.

    Bluthenthal RN, Kral AH. Next steps in research on injection initiation incidence and prevention. Addiction. 2015;110:1258–9.

    Article  Google Scholar 

  13. 13.

    Vlahov D, Fuller CM, Ompad DC, Galea S, Des Jarlais DC. Updating the infection risk reduction hierarchy: preventing transition into injection. J Urban Health. 2004;81:14–9.

    Article  Google Scholar 

  14. 14.

    Werb D, Garfein R, Kerr T, Davidson P, Roux P, Jauffret-Roustide M, Auriacombe M, Small W, Strathdee SA. A socio-structural approach to preventing injection drug use initiation: rationale for the PRIMER study. Harm Reduct J. 2016;13(1):25.

    Article  Google Scholar 

  15. 15.

    Van Ameijden E, Coutinho R. Large decline in injecting drug use in Amsterdam, 1986–1998: explanatory mechanisms and determinants of injecting transitions. J Epidemiol Community Health. 2001;55(5):356–63.

    Article  Google Scholar 

  16. 16.

    Neaigus A, Gyarmathy VA, Miller M, Frajzyngier VM, Friedman SR, Des Jarlais DC. Transitions to injecting drug use among noninjecting heroin users: social network influence and individual susceptibility. JAIDS J Acquir Immune Defic Syndr. 2006;41(4):493–503.

    Article  Google Scholar 

  17. 17.

    Amini P, Ahmadinia H, Poorolajal J, Amiri MM. Evaluating the high risk groups for suicide: a comparison of logistic regression, support vector machine, decision tree and artificial neural network. Iran J Public Health. 2016 Sep;45(9):1179.

    PubMed  PubMed Central  Google Scholar 

  18. 18.

    Tapak L, Mahjub H, Hamidi O, Poorolajal J. Real-data comparison of data mining methods in prediction of diabetes in Iran. Healthcare Inform Res. 2013 Sep 1;19(3):177–85.

    Article  Google Scholar 

  19. 19.

    Chen WH, Hsu SH, Shen HP. Application of SVM and ANN for intrusion detection. Comput Oper Res. 2005 Oct 1;32(10):2617–34.

    Article  Google Scholar 

  20. 20.

    Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Boca Raton: CRC press; 1984.

    Google Scholar 

  21. 21.

    Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv (CSUR). 2009;41(3):15.

    Article  Google Scholar 

  22. 22.

    IBM. IBM Knowledge Center [Available from:

  23. 23.

    Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.

    Google Scholar 

  24. 24.

    Bishop CM. Neural networks for pattern recognition. Oxford: Oxford university press; 1995 Nov 23.

    Google Scholar 

  25. 25.

    Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol. 1996;49(11):1225–31.

    CAS  Article  Google Scholar 

  26. 26.

    Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform. 2002;35(5–6):352–9.

    Article  Google Scholar 

  27. 27.

    Nielsen MA. Neural networks and deep learning. USA: determination press; 2015.

    Google Scholar 

  28. 28.

    Duda RO, Hart PE, Stork DG. Pattern classification. Hoboken: John Wiley & Sons; 2012.

    Google Scholar 

  29. 29.

    Hosmer DW, Lemeshow S. Special topics. Hoboken: Wiley online library; 2000.

    Google Scholar 

  30. 30.

    Chawla NV, Japkowicz N, Kotcz A. Special issue on learning from imbalanced data sets. ACM Sigkdd Explor Newslett. 2004;6(1):1–6.

    Article  Google Scholar 

  31. 31.

    Ganganwar V. An overview of classification algorithms for imbalanced datasets. Int J Emerg TechnolAdv Eng. 2012;2(4):42–7.

    Google Scholar 

  32. 32.

    Abelson J, Treloar C, Crawford J, Kippax S, Van Beek I, Howard J. Some characteristics of early-onset injection drug users prior to and at the time of their first injection. Addiction. 2006;101(4):548–55.

    Article  Google Scholar 

  33. 33.

    Harocopos A, Goldsamt LA, Kobrak P, Jost JJ, Clatts MC. New injectors and the social context of injection initiation. Int J Drug Policy. 2009;20(4):317–23.

    Article  Google Scholar 

  34. 34.

    Rahimi-Movaghar A, Amin-Esmaeili M, Shadloo B, Noroozi A, Malekinejad M. Transition to injecting drug use in Iran: a systematic review of qualitative and quantitative evidence. Int J Drug Policy. 2015;26(9):808–19.

    Article  Google Scholar 

  35. 35.

    Cheng Y, Sherman SG, Srirat N, Vongchak T, Kawichai S, Jittiwutikarn J, et al. Risk factors associated with injection initiation among drug users in northern Thailand. Harm Reduct J. 2006;3(1):10.

    Article  Google Scholar 

  36. 36.

    Mehta SH, Sudarshi D, Srikrishnan AK, Celentano DD, Vasudevan CK, Anand S, et al. Factors associated with injection cessation, relapse and initiation in a community-based cohort of injection drug users in Chennai. India Addict. 2012;107(2):349–58.

    Article  Google Scholar 

  37. 37.

    Carles March J, Oviedo-Joekes E, Romero M. Injection and non-injection drug use related to social exclusion indicators in two Andalusian cities. Drugs: Educ Prev Policy. 2005;12(6):437–47.

    Google Scholar 

  38. 38.

    EMCDDA. European drug report 2016: trends and developments. Luxembourg: Publications Office of the European Union; 2016. available at

    Google Scholar 

Download references


We would like to appreciate the Vice-Chancellor for Research and Technology of Kermanshah University of Medical Sciences for technical support and the Vice-chancellor of Research and Technology of Kermanshah University of Technology for their approval and support of this work.


This study was partially funded by of Kermanshah University of Medical Science (The Ethics Committee code No. KUMS.RES.1394.480). Kermanshah University of Medical Science provided technical support for the present study.

Author information




Somayeh Najafi ghobadi, Khadijeh Najafi ghobadi and Leili Tapak conceived the research topic, explored that idea, performed the statistical analysis and drafted the manuscript. Abass aghaei provided the data. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Somayeh Najafi-Ghobadi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not Applicable.

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Najafi-Ghobadi, S., Najafi-Ghobadi, K., Tapak, L. et al. Application of data mining techniques and logistic regression to model drug use transition to injection: a case study in drug use treatment centers in Kermanshah Province, Iran. Subst Abuse Treat Prev Policy 14, 55 (2019).

Download citation


  • Drug injection
  • Neural network
  • Decision tree
  • Support vector machine
  • Logistic regression