Recruitment
The BioSUD initiative aims to provide a genomic resource for studying phenotypic traits associated with SUDs. We strive to collect and analyze 3,000 individuals, of whom 1,500 have a diagnosis of SUD. On the 1st of February 2024, the cohort included 1,806 participants, of whom 1,508 individuals served as control participants, comprising 1046 males and 462 females, recruited exclusively at a single blood donor center (Centro Trasfusionale of the University General Hospital, Bari, Italy) and 298 case participants, 278 males and 20 females, collected from several private therapeutic centers and public structures (detailed below).
The control samples were collected between March and October 2021 during their blood donation process following the acquisition of the written informed consent. Before the sample collection, they were informed about the aims and rationale of the project, and an informative document was handed out.
All the cases met the standardized diagnostic criteria for SUDs according to the International Classification of Diseases, 11th Revision (ICD-11) [36], or the DSM-5 TR (1). Eligible participants meeting these criteria were recruited from private (N = 71) and public (N = 227) healthcare facilities in Apulia, a region of Southern Italy. Specifically, the private center samples were collected from the Therapeutic Community Emmanuel Onlus - Sector Dependencies (Lecce) and the Therapeutic Community "Fratello Sole" - Social Cooperative (Gioia del Colle, BA). The samples from public Institutions were gathered at SerD of Bari (BA), Bitonto (BA), Brindisi (BR), Campi Salentina (LE), Castellaneta (TA), Casarano (LE), Foggia (FG), Francavilla (TA), Galliano del Capo (LE), Gallipoli (LE), Giugliano (LE), Grumo Appula (BA), Lecce (LE), Maglie (LE), Manduria (TA), Martina Franca (TA), Nardò (LE), Ostuni (BR), Poggiardo (LE), San Cesario di Lecce (LE), San Pietro Vernotico (BR), Taranto (TA), Ugento (LE) and the SerD in the Brindisi Prison (BR).
Different engagement strategies were employed to encourage volunteer participation in the cases. In all facilities, the BioSUD members presented the project separately to staff and participants, using multimedia tools such as presentations and short demonstrative videos. After the presentation, the volunteers' willingness to participate in the study was recorded. Written consent and blood samples were collected during scheduled routine examinations in subsequent weeks to reduce participant burden. A dedicated medical professional or psychologist was on-site to oversee the process, including obtaining written consent and aiding with the questionnaire (detailed below). Healthcare professionals, including doctors and specialized nurses on the research team, collected blood samples. After transport to the BioSUD lab facilities, all the blood samples were processed as described above within 72 hours. The questionnaire data from both the cases and the controls was subsequently entered into Excel and processed with R Studio, version 4.5.0 [37].
Sampling
Blood samples from controls were collected by a specialized nurse and from cases by healthcare professionals, including physicians and specialized nurses affiliated with the research team. Specifically, after the written consent was returned, venous blood (8ml) was drawn using a Vacutainer K2 EDTA and kept refrigerated until arrival at the processing laboratory at the University of Bari. Within 72 hours from collection, all samples underwent centrifugation at 0.8 x g for 15 minutes at a 45-degree angle. Following stratification, 1 mL of plasma was stored at -80°C, while the remaining sample was preserved at -20°C for subsequent DNA extraction and analyses.
DNA extraction and genotyping
DNA was extracted from 250µL of the layer of nucleated blood cells obtained after centrifugation during the initial processing. The Qiagen DNA Blood Mini Kit was used per the manufacturer's protocol. The quality and concentration of the extracted DNA were evaluated using the nanodrop spectrophotometer. Samples with a concentration of at least 30ng/µl and a 260/280 ratio higher than 1.6 were selected for DNA genotyping, performed at the Institute of Genomics of Tartu, Estonia, using the Illumina Global Screening Array (GSA, Illumina Inc.). All the samples with a quality call rate lower than 97% were discarded. To date, 1,279 samples have been genotyped.
Dataset
The newly generated genotypes were combined with publicly available datasets comprising 4551 individuals from 140 different populations, 107 of which are Eurasian [38–49], using –bmerge of PLINK version 1.9 [50]. Before merging each dataset, we removed all markers and individuals with more than 5% missingness. The resulting dataset includes 3188,788 autosomal variants from 5,830 individuals. Further filtering steps are detailed in the following sections.
Principal Component Analysis (PCA)
To explore the genetic variation of the BioSUD cohort, we performed a Principal Component Analysis. We kept only the Eurasian populations from the complete dataset; loci and individuals with missingness rates higher than 5% were discarded. After pruning for SNPs with high linkage disequilibrium score (indep-pairwise 200 50 0.4), 69 359 SNPs and 3,530 samples remained, and the PLINK files were converted to EIGENSTRAT format using convertf (version 5722).
Principal Component Analysis (PCA) was performed using SmartPCA (version 16000) from the EIGENSOFT package [51]. Specifically, we projected the BioSUD samples into the principal component space inferred from all other Eurasian individuals (using the poplistname option). Outliers were automatically removed with the numoutlieriter, numoutlierevec, and outliersigmathresh options set to default parameters. This process led to removing 101 samples, reducing the sample size to 3,429 individuals. After visual inspection of the PCA plot, we identified and removed three additional BioSUD participants falling outside the genetic variability of the cohort, likely due to ancestral discrepancies (Fig. S1 – Supplementary Materials). Thus, the final dataset was composed of 3,426 individuals.
ROH estimation
The --homozyg function in PLINK was exploited to detect Runs Of Homozygosity (ROHs) containing at least 50 SNPs. The minimum ROH length was set at 1,500 Kb to exclude short ROH due to Linkage Disequilibrium (LD). ROHs were detected by scanning along genotypes for each BioSUD cohort and all other individuals sharing the same bulk of SNPs.
Admixture analysis
To increase the number of SNPs analyzed while maintaining a proper sample size for the population, the analysis was confined to data from the 1000 Genomes Project [38] and Raveane et al. [52] The resulting dataset comprised 2697 individuals: 1408 from European (Finnish: FIN; Central Europeans: CEU; British from England and Scotland: GBR; Iberians from Spain: IBS; Italians: ITA; Tuscans: TSI), African (Luhya from Webuye, Kenya: LWK; Yoruba from Nigeria: YRI), and Asian (Gujarati Indians from Houston: GIH; Dai Chinese: CDX; Japanese from Tokyo: JPT) populations, and 1279 from the BioSUD sample.
We carried out an ADMIXTURE analysis to further explore the genetic similarity of the BioSUD samples with other populations.
We performed ten independent repetitions (using time as a starting point for randomization with the –seed option) for each K value ranging from two to ten using the ADMIXTURE software tool (version 1.3; [53]. The “optimal” number of K was inferred using the cross-validation (CV) procedure with the –cv option, and the lowest CV error was observed at K = 7 and 8 (Fig. S2 – Supplementary Materials). We first conducted an ADMIXTURE analysis excluding the BioSUD data. After obtaining the initial results, we projected the BioSUD data onto the resulting ADMIXTURE profiles (-P flag) to integrate and analyze their genetic structure within the established framework.
Questionnaire
Each participant filled out a tailored questionnaire to assess various aspects of substance use, including nicotine, alcohol, cocaine, heroin, cannabis, and other substances (Tables S1 and S2 - Supplementary Materials). This comprehensive tool delves into the frequency, amounts, and patterns of substance use, leveraging the DSM-5 TR checklist to assess potential SUDs (APA, 2022). Additionally, the questionnaire investigates behaviors associated with substance use, emphasizing psychosocial elements such as family dynamics, peer groups, substance accessibility, and social environments experienced in adolescence.
To accommodate the specific challenges and time constraints often encountered by individuals struggling with addiction outside of structured rehabilitation settings, the questionnaire administered to the 'Emmanuel' group and control participants was longer (233 items) than the one given to other patients (165 items). This adjustment aimed to optimize data collection efficiency without compromising essential information. The survey encompasses three main sections: sociodemographic, psychosocial, and substance use (Fig. 1).
The sociodemographic section collects a broad spectrum of information about participants, considering the potential influence of diverse life aspects on the study's outcomes. Critical demographic details, such as gender, age, education, marital status (including the number of children), place of residence, place of birth, income, employment status, self-reported health, and family background (parents, siblings, and other caregivers) were included.
The survey's psychosocial section focuses on life events affecting psychological and social well-being. It explores experiences such as early separations from parents, parental divorce, and relocations, scrutinizing potential impacts on individuals (Tables S1 and S2 - Supplementary Materials). Closely tied to the substance use section, our exploration extended to the exposure to substances in the social context, encompassing both family and peers. In addition, we investigated substance accessibility in the cities where participants lived and the perceived level of safety. Furthermore, this section explored adverse events across life stages, including grief, accidents, illness, violent crimes, sexual abuse, and other painful events. Participants reported occurrences in age categories (< 14, 14–18, 18–25, > 25 years), and the overall incidence was quantified by calculating cumulative events across categories.
In the relationship quality subsection, participants self-reported their quality estimates of relationships with fathers, mothers, siblings, and peers on a scale ranging from "Very Poor" to "Very Good". This categorical representation was converted into a 5-point numerical scale (1 to 5). Cumulative scores across family members were aggregated and later reclassified into categories: 1–3 as "Very Poor," 4–6 as "Poor," 7–9 as "Average", 10–12 as "Good", and 13–15 as "Very Good" for analytical purposes.
The substance use section explores substance consumption, encompassing nicotine, alcohol, cannabis, cocaine, heroin, and other substances (e.g., amphetamines, MDMA, ecstasy, hallucinogens, etc.). We tailored questions for each substance to identify potential use disorders. We utilized the DSM-5 TR checklist [1] to thoroughly examine substance use across various categories, which aligned with our research goals. The exposure subsection explores family, peer, and accessibility influences on substance use, considering social settings and craving behaviors. Subjective feelings, including relief, reward, and obsession, are measured. A Visual Analogue Scale (VAS) was also used for the craving assessment. Positive responses to substance consumption-related questionnaire items were assigned a value of 1 and summed to create a 'family substance consumption' score. This score was then categorized into five ordinal levels: 'None' (0 positive responses), 'Low' (1–2 positive responses), 'Average' (3–4 positive responses), 'High' (5–6 positive responses), and 'Very High' (more than seven positive responses), based on average and standard deviation.
Figure 1: Overview of Questionnaire Sections: Sociodemographic, Psychosocial, and Substance Use Assessment Variables.
*Areas investigated in Controls and Emmanuel questionnaire only.