Ethics statement
The study protocol was approved by the Ethics Committee of the University of Fukui, Japan (Assurance no. 20200028), Yamaguchi University School of Medicine, Japan (Assurance no. 2020 − 202), and Sugita Genpaku Memorial Obama Municipal Hospital (Assurance no. 2–7). Moreover, this study was carried out in accordance with the Declaration of Helsinki and the Ethical Guidelines for Clinical Studies of the Ministry of Health, Labour and Welfare of Japan. All participants provided either written informed consent or both informed consent and assent.
Participants and sample collection
Twenty subjects undergoing neurosurgery for their clinical purposes, including intractable epilepsy, meningioma, and cerebrovascular diseases, were recruited for this study at the University of Fukui Hospital, Yamaguchi University Hospital, and Sugita Genpaku Memorial Obama Municipal Hospital (Table 1). Subjects were excluded if they had other serious concurrent genetic or physical diseases, but a patient no. 4 was included since the researchers noticed it later and the brain tissue was classified as normal tissue (Supplementary Table S1). A portion of each resected brain tissue was immediately cut into several pieces less than 5 mm3 and preserved in RNAlater® (Thermo Fisher Scientific, Inc., MA, US) for RNA and DNA stability and long-term storage. All Montreal Neurological Institute (MNI) coordinates corresponding to the regions of brain tissue resected were recorded by the primary surgeons for each case by clicking on the standard online brain image (https://neurosynth.org/locations/) (Table 1 and Fig. 1). We also used the novel online classification tool to confirm DNA methylation-based classification of central nervous system tumors (https://www.molecularneuropathology.org/mnp/) [7] (Supplementary Table S1). This tool classified 15 brain tissues as “Control tissues,” which means normal tissues. During the surgery, whole blood samples were collected in EDTA tubes and immediately preserved in RNAlater® (whole blood : RNAlater® = 5 : 13). They were immediately stored overnight at 4°C, then at − 20°C for long-term storage. Saliva samples were collected using the Oragene DISCOVER™ kit (DNA Genotek Inc., Ottawa, CA, OGR-500) and stored at room temperature (RT) until DNA extraction at a later time. Buccal epithelial tissues (buccal) were collected using individually packaged commercial cotton swabs (four swabs/subject) and used for DNA extraction after air-drying at RT for a few days. The saliva and buccal swabs were collected 11.0 and 12.2 days after the operation, on average, respectively. For each sample, the date of acquisition in relation to the date of surgery was recorded.
Table 1
Subject and brain sample characteristics.
|
Clinical demographics
|
Brain tissue
|
MNI coordinate
|
Day(s) after surgery
|
ID
|
Sex
|
Age (y)
|
Height (cm)
|
Weight (kg)
|
BMI
|
Diagnosis
|
Brodmann areas
|
Hemisphere
|
X
|
Y
|
Z
|
Saliva
|
Buccal
|
1
|
male
|
69
|
160
|
37.5
|
14.6
|
primary central nervous system vasculitis
|
dlPFC
|
right
|
38
|
34
|
44
|
7
|
7
|
2
|
female
|
59
|
150
|
45.5
|
20.2
|
falx meningioma
|
frontal pole
|
right
|
42
|
41
|
18
|
3
|
3
|
3
|
male
|
60
|
164.5
|
66.8
|
24.7
|
internal carotid artery-posterior communicating artery aneurysm
|
temporal pole
|
left
|
-50
|
14
|
-18
|
11
|
11
|
4
|
female
|
73
|
157
|
40
|
16.2
|
metastatic brain tumor* (breast cancer)
|
visual cortex
|
left
|
-45
|
-85
|
18
|
7
|
7
|
6
|
male
|
61
|
164
|
69.3
|
25.8
|
anterior communicating artery aneurysm
|
straight gyrus
|
left
|
-8
|
24
|
-19
|
16
|
8
|
7
|
male
|
69
|
169
|
57
|
20.0
|
internal carotid artery stenosis
|
temporal pole
|
right
|
38
|
13
|
-29
|
9
|
9
|
8
|
male
|
67
|
168
|
65
|
23.0
|
frontal lobe tumor
|
dlPFC
|
right
|
29
|
51
|
34
|
15
|
15
|
9
|
female
|
45
|
175.5
|
83
|
26.9
|
internal carotid artery stenosis
|
temporal pole
|
left
|
-38
|
16
|
-26
|
7
|
7
|
10
|
female
|
70
|
155
|
70
|
29.1
|
anterior communicating artery aneurysm
|
orbitofrontal cortex
|
right
|
9
|
16
|
-22
|
6
|
6
|
11
|
female
|
73
|
152
|
44.1
|
19.1
|
internal carotid artery-posterior communicating artery aneurysm
|
orbitofrontal cortex
|
left
|
-18
|
28
|
-22
|
15
|
9
|
12
|
male
|
60
|
172
|
63.9
|
21.6
|
tentorial meningioma
|
middle temporal gyrus
|
right
|
63
|
-32
|
-10
|
27
|
27
|
13
|
male
|
50
|
170
|
61
|
21.1
|
internal carotid artery-posterior communicating artery aneurysm
|
insula
|
right
|
28
|
14
|
-17
|
15
|
15
|
14
|
female
|
60
|
160.5
|
49.7
|
19.3
|
middle cerebral artery aneurysm
|
dorsal entorhinal cortex
|
right
|
34
|
4
|
-17
|
12
|
12
|
15
|
female
|
13
|
160
|
50
|
19.5
|
temporal lobe epilepsy, glioma
|
temporal pole
|
left
|
-60
|
2
|
-26
|
11
|
42
|
16
|
male
|
62
|
185
|
93.4
|
27.3
|
frontal lobe meningioma, epilepsy
|
frontal eye field
|
right
|
14
|
32
|
47
|
13
|
13
|
17
|
male
|
28
|
171
|
60
|
20.5
|
temporal lobe epilepsy
|
middle temporal gyrus
|
right
|
63
|
-9
|
-24
|
14
|
14
|
18
|
male
|
67
|
170
|
53.3
|
18.4
|
internal carotid artery stenosis
|
straight gyrus
|
right
|
7
|
18
|
-16
|
10
|
10
|
19
|
female
|
51
|
162
|
55
|
21.0
|
Subarachnoid hemorrhage, moyamoya disease
|
angular gyrus
|
right
|
36
|
-80
|
45
|
11
|
11
|
20
|
female
|
73
|
152.5
|
45.5
|
19.6
|
internal carotid-posterior communicating aneurysm
|
temporal pole
|
right
|
33
|
8
|
-33
|
0
|
6
|
*Later noticed that the brain tumor was metastatic, but we included this subject according to the result of DNA methylation-based classification shown in supplementary Table 2. |
Subject 5 was removed from downstream analyses due to mixed result seen in MDS plot. |
DNA extraction
Brain DNA was extracted using AllPrep DNA/RNA/miRNA Universal Kit (QIAGEN, Hilden, Germany) after disrupting and homogenizing 10 mg RNAlater® preserved brain tissues by a rotor–stator (speed: 6.5 m/sec, running time: 45 s, FastPrep FP120J-100, Savant Instruments, Inc.) with tissue homogenization beads (Lysing Matrix D, 2 mL, MP biomedicals, LLC, CA, US). Although this preprocessing was originally for RNA extraction, we followed the instructions in the RiboPure™ RNA Purification Kit (Thermo Fisher Scientific, Inc., MA, US) for preprocessing for DNA extraction using RNAlater® preserved blood samples. In brief, the RNAlater® preserved blood samples (720 µL) were centrifuged for 1 min at 16 000 ×g, and the supernatant was removed. Then, 200 µL of PBS was pipetted in and mixed, and centrifugation and supernatant removal were repeated. Finally, 200 µL of PBS was added and pipetted together before being used for DNA extraction. QIAamp DNA Mini kit (QIAGEN, Hilden, Germany) was used to extract DNA from blood and buccal, and the protocols for each tissue type were followed. Meanwhile, the prepIT®•L2P reagent (DNA Genotek Inc., Ottawa, CA) was used to extract DNA from saliva. The DNA yield was determined using the Qubit™ dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific, Inc., MA, US)[8].
DNA methylation array and pre-processing
For each sample, 500 ng of DNA was bisulfite converted with the EZ DNA Methylation™ Kit (Zymo Research, D5002). Meanwhile, the Infinium HumanMethylationEPIC BeadChip Kit (Illumina, WG-317-1002) array was used to assess genome-wide DNA methylation. Samples were grouped by individuals and randomized onto the chips. The arrays were scanned with the Illumina iScan platform.
To allow for fair comparison with the previous study [4], the DNA methylation dataset was pre-processed using the R packages Minfi [16, 17] and RnBeads [18]. Background correction was performed with the Noob method in Minfi. Probes were filtered out using RnBeads if they: (1) overlapped within 5 bp of an SNP (21 361 probes); (2) had a detection P-value > 0.01 or were deemed unreliable measures based on RnBeads’s greedy-cut algorithm (16 367 probes); or (3) were context-specific sites (2 873 probes). Probes excluded with overlapping SNPs were assigned by RnBeads using the version of dbSNP derived from Genome Reference Consortium Human Build 37 patch release 10 (GRCh37.p10). With the application of these filters, 825 637 probes were included in the dataset. Beta mixture quantile dilation (BMIQ) was used to normalize the samples.
IMAGE-CpG dataset (GSE111165) pre-processing
The GSE111165 dataset was similarly pre-processed. Minfi's Noob method was used to correct the background. Using RnBeads, we filtered out probes if they: (1) overlapped within 5 bp of an SNP (21 358 probes); (2) had a detection P-value > 0.01 or were deemed unreliable measures based on RnBeads’s greedy-cut algorithm (10 053 probes); or (3) were context-specific sites (2 894 probes). After filtering, we obtained 831 786 probes for the dataset. BMIQ was used to normalize the samples.
Pre-processing for GSE59685 and GSE95049 datasets
The pre-processing was conducted as closely to the original as possible [10, 11]. The total number of probes and samples for GSE59685 and GSE95049 was 437 649 and 67 for brain tissues (PFC, EC, STG, and CER) and blood, and 444,283 and 15 samples of brain tissues (BA10, BA20, and BA7) and blood, respectively. The R code is available as Supplementary Material.
Estimation of ancestry data
To confirm the racial differences between the datasets, we generated ancestry principal components (PCs) from blood DNA methylation using the method described by Barfield et al. [19].
Cellular composition adjustment
Given that cellular heterogeneity affects methylation, the AMAZE-CpG and IMAGE-CpG datasets were pre-processed with the adjustment (Adj) in parallel, as Edgar et al. [11] did, in addition to the raw dataset (Raw). Cellular heterogeneity was predicted using CETS [20] and EpiDISH [13] for the brain and other peripheral tissues, respectively (Supplementary Table S2). Brain tissue methylation was adjusted by the proportion of neuron. Only five cell type values (B, NK, CD4T, CD8T, and Mono, without Neutro) were used to adjust blood methylation [21], because they lie in the [0,1] range and are constrained to sum to 1 within a sample; including all six values as covariates would induce multicollinearity [22]. Saliva and buccal tissue methylations were adjusted by the proportion of epithelial cells. We processed both Raw and Adj datasets in each analysis to compare their performance.
Statistical analysis
All statistical analyses were performed in R [23]. Two approaches to cross-tissue correlation were used. First, Pearson's correlation was used to calculate overall levels of DNA methylation correlation from the average methylation across subjects for each tissue. For the overall correlation, all 825 637 (AMAZE-CpG) and 831 786 (IMAGE-CpG) CpGs were used in the calculation. Second, a within-subject method was employed. Because of the small sample size and the possible inappropriate influence of outliners on the correlation coefficient, a correlation coefficient (rho) and its significance level were calculated for each individual CpG using a non-parametric Spearman's rank correlation test. Variable CpGs were classified as Hannon et al. [10] previously defined. This method involved excluding DNA methylation values in the upper and lower 10th percentile for each CpG, then classifying as variables those CpGs with a remaining range difference of at least 5%. Because the number of these variable CpGs varies from tissue to tissue, the correlation analyses between tissues were limited to the CpGs found to be variable in all four tissues (AMAZE-CpG: 287 033 CpGs [Raw] and 189 704 CpGs [Adj], and IMAGE-CpG: 280 302 CpGs [Raw] and 194 310 CpGs [Adj]). Furthermore, cross-database correlation analyses were conducted to demonstrate the potential similarities and differences in the correlation coefficients of each dataset. In this case, 815 541 for the entire dataset and 233 904 [Raw] and 136 929 [Adj] for CpGs found to be variable in both datasets were included in the analysis. CpG sites with an absolute difference in rho between AMAZE-CpG and IMAGE-CpG lesser (greater) than 0.2 were defined as less (more) dependent on ethnicity.
Assessment of potential SNP confounding effect
As described in Supplementary Methods, we developed filtering parameters based on our dataset to identify probes that may be affected by SNPs.
mQTL classification
A list of DNA methylation quantitative trait loci (mQTL; http://www.mqtldb.org/) with Gaunt et al.’s original P-value cutoff of P < 1 × 10− 14[24], yielding 27 623 and 27 748 CpGs under genetic influence, which overlapped with the AMAZE-CpG and IMAGE-CpG datasets, respectively.