FinnGen study data release 10 and ethics
Launched in 2017, the FinnGen study (https://www.finngen.fi/en) is a public-private research project that will, over time, combine the genome and digital health care data of 500,000 Finns. The goal of the project is to provide a novel medically and therapeutically useful insight into human diseases. The FinnGen study is a pre-competitive collaboration between Finnish biobanks and their supporting organizations (universities and university hospitals) as well as international pharmaceutical industry partners and a Finnish biobank cooperative (FINBB). In the current study, we included data that were released in autumn 2022. (Release 10). For release ten, there was a total of 412,181 post-QC samples, 430,897 (pre-QC samples), 429,207 individuals with endpoints, and 429,861 individual with detailed longitudinal data. In the study, information about disease diagnoses and the procedures performed were collected from the Care Register for Health Care (THL) and the Causes of Death Register (Statistics Finland). As the population of northern and eastern Finland has expanded dramatically in isolation, following a series of bottlenecks, it harbors numerous deleterious alleles at a high frequency [42].
Ethics statement and materials & methods:
Patients and control subjects in the FinnGen study provided informed written consent for biobank research, based on the Finnish Biobank Act. Alternatively, separate research cohorts collected prior to the Finnish Biobank Act coming into force ( September 2013) and the start of the FinnGen study (August 2017), were collected based on study-specific consents and later transferred to the Finnish biobanks after approval by the Finnish Medicines Agency (Fimea), the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Fimea. The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) statement number for the FinnGen study is Nr HUS/990/2017.
The FinnGen study is approved by the Finnish Institute for Health and Welfare (permit numbers: THL/2031/6.02.00/2017, THL/1101/5.05.00/2017, THL/341/6.02.00/2018, THL/2222/6.02.00/2018, THL/283/6.02.00/2019, THL/1721/5.05.00/2019 and THL/1524/5.05.00/2020), Digital and population data service agency (permit numbers: VRK43431/2017-3, VRK/6909/2018-3, VRK/4415/2019-3), the Social Insurance Institution (permit numbers: KELA 58/522/2017, KELA 131/522/2018, KELA 70/522/2019, KELA 98/522/2019, KELA 134/522/2019, KELA 138/522/2019, KELA 2/522/2020, KELA 16/522/2020), Findata permit numbers THL/2364/14.02/2020, THL/4055/14.06.00/2020, THL/3433/14.06.00/2020, THL/4432/14.06/2020, THL/5189/14.06/2020, THL/5894/14.06.00/2020, THL/6619/14.06.00/2020, THL/209/14.06.00/2021, THL/688/14.06.00/2021, THL/1284/14.06.00/2021, THL/1965/14.06.00/2021, THL/5546/14.02.00/2020, THL/2658/14.06.00/2021, THL/4235/14.06.00/2021, Statistics Finland (permit numbers: TK-53-1041-17 and TK/143/07.03.00/2020 (earlier TK-53-90-20) TK/1735/07.03.00/2021, TK/3112/07.03.00/2021), and the Finnish Registry for Kidney Diseases permission/extract from the meeting minutes on 4th July 2019.
The Biobank Access Decisions for FinnGen samples and the data utilized in FinnGen Data Freeze 10 include: THL Biobank BB2017_55, BB2017_111, BB2018_19, BB_2018_34, BB_2018_67, BB2018_71, BB2019_7, BB2019_8, BB2019_26, BB2020_1, BB2021_65, Finnish Red Cross Blood Service Biobank 7.12.2017, Helsinki Biobank HUS/359/2017, HUS/248/2020, HUS/150/2022 § 12, §13, §14, §15, §16, §17, §18, and §23, Auria Biobank AB17-5154 and amendment #1 (August 17 2020) and amendments BB_2021-0140, BB_2021-0156 (August 26 2021, Feb 2 2022), BB_2021-0169, BB_2021-0179, BB_2021-0161, AB20-5926 and amendment #1 (April 23 2020)and it´s modification (Sep 22 2021), Biobank Borealis of Northern Finland_2017_1013, 2021_5010, 2021_5018, 2021_5015, 2021_5023, 2021_5017, 2022_6001, Biobank of Eastern Finland 1186/2018 and amendments 22 § /2020, 53§/2021, 13§/2022, 14§/2022, 15§/2022, Finnish Clinical Biobank Tampere MH0004 and amendments (21.02.2020 & 06.10.2020), §8/2021, §9/2022, §10/2022, §12/2022, §20/2022, §21/2022, §22/2022, §23/2022, Central Finland Biobank 1-2017, and Terveystalo Biobank STB 2018001 and amendment 25th Aug 2020, the Finnish Hematological Registry and Clinical Biobank decision 18th June 2021, Arctic biobank P0844: ARC_2021_1001.
Phenotype descriptions and analysis
We began by performing a genome wide association study (GWAS) of all instances of chronic non-suppurative otitis media. The following codes were used as inclusion factors for the case cohort: at least one entry of the International Classification of Diseases, Tenth Revision (ICD-10) codes H65.2 or H65.3 and the International Classification of Diseases, Ninth Revision (ICD-9) codes 3812A and 3811A. Individuals with no record of chronic secretory otitis media (ICD-10 code H65.2 or ICD-9 code 3811A), chronic mucous otitis media (ICD-10 code H65.3 or ICD-9 code 3812A), nonspecific otitis codes (ICD-10 code H65.9, ICD-10 code H65.4, or ICD-9 code 3813X) or non-suppurative otitis (ICD-10 code H65.1 or ICD-9 code 3810A) were included in the control cohort. The GWAS of overall non-suppurative otitis media was performed with a case cohort of individuals of either secretory or mucous otitis media. The GWAS for chronic secretory otitis media was performed with a case cohort of individuals with chronic secretory otitis media (ICD-10 code H65.2). All cases with reported chronic secretory otitis media were included without any exclusion criteria. Finally, a case cohort of individuals with chronic mucous otitis media (ICD-10 code H65.3) was used for a GWAS of chronic mucous otitis media, without any exclusion criteria. For all three GWAS, the control cohort remained unchanged. Chronic non-suppurative otitis media can be secretory or mucus. In clinical practice, however, the borderline can be hazy in some instances, and there is known to be a case overlap. Recognizing a risk for bias, we found 608 cases among our analyzed data with both ICD-10 codes. After exclusion of the overlapping cases, 1,081 chronic secretory otitis media cases remained. The number of cases with only chronic mucous otitis media (excluding overlapped cases) was 5,345. However, we did not exclude the 608 overlapped cases from our primary analyses, as it was a considerable number of cases to remove from a case cohort of 1,689 individuals (ICD-10= H65.2 code). Therefore, according to the "less strict" definition, there were 1,689 participants in the case cohort for secretory otitis media and 5,953 participants in the case cohort for mucous otitis media. The overall number of cases of chronic secretory or mucous otitis media was 7,034, which included cases with either one or both diagnoses. The controls cohort included
417,745 cases. To correct for the population substructure, the outcome associations were tested using an additive model and the standard covariates: sex, age, and the first ten principal components from the genetic data. We performed a chi-square test of heterogeneity to explore whether the effect estimates differed significantly between the subtypes of OM.
Genotyping, imputation, and quality control
The FinnGen study uses a biobank sample consisting of 1) prospective samples (“legacy samples”) and 2) “new samples” that are collected upon request to obtain voluntary biobank consent and donate biobank samples (usually blood). Hospital Biobank samples and Terveystalo Biobank samples are typically collected during diagnostic sampling in hospital laboratories or on hospital wards. Blood Services Biobank approval and sample collection is also performed in connection with blood donation. Approval and sampling by the THL Biobank are typically done in connection with the collection of research samples. In contrast, legacy samples are older sample cohorts that were collected for a specific research project before the Finnish Biobank Act came into force (September 2013).
These older study cohorts were later transferred to a biobank according to the Finnish Biobank Act 13 §. FinnGen samples were genotyped with ThermoFisher, Illumina, and Affymetrix arrays.
Chip genotype data processing and QC Samples were genotyped with Illumina (Illumina Inc., San Diego, CA, USA) and Affymetrix arrays (ThermoFisher Scientific, Santa Clara, CA, USA). Chip genotyping data produced with earlier chip platforms and reference genome builds were lifted over to build version 38 (GRCh38/hg38) following the protocol described here (dx.doi.org/10.17504/protocols.io.xbhfij6). The ‘legacy samples’ were genotyped over the years using various generations of the Illumina and Affymetrix GWAS arrays.
Genotype calls were made with the GenCall and zCall algorithms for the Illumina data and the AxiomGT1 algorithm for the Affymetrix data. The ‘new samples’ were genotyped with the FinnGen ThermoFisher Axiom custom array at the ThermoFisher genotyping service in San Diego, CA, US. The current array (v2) contains 723, 376 probesets for 664,510 markers. In addition to the core GWAS markers (about 500,000), it contains about 116,000 coding variants enriched in Finland, >10,000 specific markers for the HLA/KIR region, almost 15,000 ClinVar variants, about 4,600 pharmacogenomic variants, and 57,000 selected markers that were of special interest for partners.
All GWAS data were imputed against a Finnish population-specific whole genome sequence (WGS) backbone. In sample-wise quality control steps, individuals with ambiguous sex, high genotype missingness (>5%), excess heterozygosity (+/-4SD), and non-Finnish ancestry were excluded. This resulted in the inclusion of samples from 412,181 individuals included in the association analysis. In variant-wise quality control steps, variants with high missingness (>2%), low Hardy-Weinberg equilibrium (HWE) P-value (<1x10-6), and low number of minor alleles (MAC<3) were excluded. This resulted in the inclusion of 21,311,942 variants in the association analysis.
Genome-wide associations
To correct for the population substructure, the outcome associations were tested using an additive model and the standard covariates: sex, age, and the first ten principal components from the genetic data. We performed a chi-square test of heterogeneity to explore whether the effect estimates differed significantly between the subtypes of OM.
Characterization of the associated loci
The genomic regions within the +/- 1 Mb window around the primary variant are called the associated locus. Therefore, each connected locus consisted of at least one genome-wide significant variant (P < 5x10 -8) separated by at least a 1 Mb variable. According to the NHGRI-EBI catalog of human genome-wide association studies, the locus was considered novel. Candidate genes in each new locus were prioritized based on physical proximity to the index variant and the prior literature on the biological role of genes and clinical relevance. Variants with allele frequency (AF) of less than 1% were excluded.
By using bioinformatics, conservation of the residue (position) was analyzed using ConSurf with default settings. PolyPhen-2 was used to predict the pathogenicity of the variant [35]. The structural context of the target residue was analyzed using the Protein Data Bank and the AlphaFold database. The appearance of the mutation was assessed among somatic cancer mutations using ProteinPaint (see Table S1) .
Immunohistochemistry
The immunohistochemistry (IHC) protocol was modified as needed and several antibody dilutions were tested. Murine endogenous immunoglobulins were blocked with Rodent Block solution (Biocare Medical, Concord, CA, US). By using negative controls, we were able to test that the solution did not leak and that the mouse’s own antibodies were specifically blocked by the primary antibody. Thus, the technical part of the optimization of the protocol was successful.
We used immunohistochemistry of murine middle ear samples to evaluate the expression of annexin A13 in the Eustachian tube and the middle ear. Samples from murine colon were used as positive controls. According to the protein-atlas, there is an abundance of annexin A13 in the intestinal columnar epithelium. (https://www.proteinatlas.org/ENSG00000104537-ANXA13/tissue) Ethical permission for the use of mice for research purposes was obtained (permit number =ESAVI/23659/2018). Four adult mice (all of them were C57bl/6 wild type mice) were euthanized by CO2 inhalation according to guidelines for the euthanasia of rodents.
The rodents were then decapitated, and their brains removed using blunt dissection. Middle ear specimens and temporal bones were then dissected as previously reported (http://goodrich.med.harvard.edu/uploads/3/7/7/1/37718659/cochlearwholemountstainprotocol_120420_lw.pdf). After dissection, the tissue was fixed with 4% paraformaldehyde (PFA). The specimens were decalcified with 120 mM EDTA solution for 3 days at 4 °C and then embedded in paraffin. Sectioning was performed with Leica Microtome SM2010 using 4 µm-5 µm section thickness and 1-2 sections per slide. Paraffin sections were placed on Superfrost Plus microscope slides. Fully automated immunostaining was performed by a LabVision Autostainer instrument. Xylene steps were used for deparaffinization and ethanol series for the rehydration of the sections. Antigen retrieval was performed at 98 °C for 30 minutes (98 °C PT-Module) with 10 mM Tris-1mM EDTA Buffer (+0.05% Tween 20, pH 9.0). TBS-0.05%Tween washing buffer was used for the washing steps. Slides were then incubated in the LabVision Autostainer at room temperature for 30 minutes with an un-conjugated polyclonal primary antibody for annexin A13 (ABIN7006070) 1:600. Primary antibody specific to ANXA3 was bought from ELISA Kits & Proteins for Life Science Research (antibodies-online.com) (Catalog No. ABIN7006070). Endogenous peroxidase was blocked with 3% H202. Histofine SimpleStain Mouse Max Po Rabbit 414341F (Nichirei, Tokyo, Japan) was used as a detection method for primary antibody with diaminobenzidine (DAB) chromogen. As negative controls, we used both temporal bone and colon samples. Although the same protocol was followed, we excluded the antibody for annexin A13. Finally, the slides were counterstained with Hematoxylin, dehydrated, cleared, and mounted with xylene-based cover-slipping. Thereafter, the slides were scanned by Nanozoomer scanner (Hamamatsu photonics). The preparation of the slides was performed by an experienced laboratory technician. The sections were then examined with a light microscope by a pathologist and medical cell biologist.