Sample preparation
Seven pieces of male Y-STR profiles were placed into the China National Y-STR Database and found the individual male with 0-3 mismatch loci (tolerance no more than 3 loci). These similar Y-STR haplotypes were respectively compiled into sample cases 1-7; Men with tolerance equal to 7 were numbered as sample case 8.(the detail of the case in Table 1). After obtaining the informed consent of all members of all above eight cases and the approval of the biomedical ethical committee of Southern Medical University, we collected oral swabs of the subjects with disposable sterile cotton swabs and immediately stored the samples in a -80℃ refrigerator for the purpose of DNA extraction and sequencing uniformly.
DNA extraction
The DNA was extracted using the Eltbio Mag Saliva & Swab DNA Extraction Kit (Enlighten Biotech, Shanghai, China) according to the manufacturer’s protocol.
DNA library preparation
DNA libraries were prepared using the DeepReads DNA Library Prep Kit for Fiseq (DeepReads Genomics Technology, Guangzhou, China) according the manufacturer’s protocol.
Y-chromosome bait design and Y-chromosome capture
Y-chromosome bait was designed by DeepReads Genomics Technology. Capture reactions were performed according to DeepReads WHO-Y Kit protocol (DeepReads Genomics Technology, Guangzhou, China). A series of bait libraries were designed to capture the sequences of a region of ~ 16 M bp on the Y chromosome on the basis of previous ~ 11 M [15].
Fiseq sequencing
The libraries were sequenced on an Fiseq-2000 platform (DeepReads Genomics Technology, Guangzhou, China) and generated 2 × 100–base pair (bp) paired-end reads.
A proper amount of DNA was collected for the construction of the whole genome library, and the whole genome library was used as a template for the construction of liquid phase probe capture library using liquid phase probe capture library construction kits of Deepreads Biotech Co., Ltd. Compared with the imbrined probe design, we adopt the staggered design. For some areas with high GC, high repetition and palindromic structure, we focus on adding some probes to improve the capture efficiency (Schematic design of primer is shown in Fig.2). In liquid phase hybridization, the probe existed in the liquid phase and carried biotin labels. When the probe hybridized with the target region, the probe was absorbed by streptavidin modified magnetic beads. Target region fragments were captured, and the uncaptured fragments were discarded. The probe and the target region were separated by re-denaturation, and all excess probe materials were discarded. Magnetic beads were ultimately used to extract the DNA and thereby obtain the target region library.
Using AMPure XP Beads to screen the fragment length of the target region library, the fragment length was concentrated in the range of 300-400bp, and the target fragment capture was completed [7]. DNA concentration of the library was measured by quality test through Qubit Fluorometer 4.0, with quality control for only concentrations greater than 1.0 ng/µL. The DNA length distribution of the library was detected via Agilent 2100. Qualifying samples possessed the following traits: concentrated at about 400bp, single peak, containing no obvious linker peaks and a large fragment peak. Subsequently, the library was cyclized to prepare single-stranded circular DNAs. These were amplified by 2-3 orders of magnitude according to the principle of rolling loop amplification to prepare DNB nanospheres that can be sequenced on the computer.
Finally, the Fiseq sequencing setting of Beijing Genomics Institute (BGI) United Deepreads Biotech Co., Ltd Platform was used for high-throughput parallel sequencing. For specific operations, please refer to the operating instructions.
Data analysis
We followed the DeepReads Genomics Technology standard procedure to analyze the next-generation sequencing data [16]. The sequences of low quality, and very short reads are discarded to filter out areas of poor quality that affect data quality and subsequent analysis. Reference of genome alignment and correction are required to detect variations. Clean Data should be compared to the reference hg38 genome, however if it fails to completely match the reference genome, such as a mismatch or gap, it can also serve as a basis for subsequent variation detection.
The splitting time of the phylogenetic tree were calculated with Bayesian Evolutionary Analysis Sampling Trees (BEAST (v2.4.3)) software as our previous study [17]. The number of SNP sites differentiated between male individuals was analyzed with reference to the new topological structure of Y chromosome, haplogroup and pedigree tree updated by the ISOGG every year [18]. According to the mutation rate of different Y-SNP sites in the population, the age of distance from pedigree tree nodes was inferred.
Coalescence dating
Recent descent clusters (DCs) are characterised by a high frequency Y-microsatellite haplotype and a set of close mutational neighbors, which means the signals of continued transmission of success over generations. The time to the most recent common ancestor (TMRCA) of each DC was determined by using the average squared distance (ASD) estimator as described previously [19]. A generation time of 25 years was used to produce time estimates in years [20].