Characteristics of the genome datasets
We performed an initial in-depth characterization of the genomes of the 100 YFCs from 10 different breeds and 10 Huaibei (HB) partridge chickens (used for comparisons) sequenced in this study (Figure 1a; Additional file 1). An average of 86,155,900 clean reads per genome are obtained after quality control protocols, which were then aligned to the reference genome, yielding a mean mapping rate at 87.12% (Additional files 2 and 3, Additional file 4: Fig. S1). The total average base coverage across the genome is 96.35% at a sequencing depth target of 1X, 86.81% at 4X, 41.87% at 10X, and down to 0.36% at 30X (Additional file 4: Fig. S2). The average number of nucleotides in each genome is 11,916,810,290 after filtration, with an average GC content at 44.53% (Additional file 5).
For comparative analyses, we merged the 100 YFC and 10 Huaibei partridge chicken genomes with 104, 10, and 1 previously published Chinese chicken, red junglefowl (Gallus gallus; RJF), and green junglefowl (Gallus varius; GVF) genomes, respectively, retaining a total of 3,065,814 common autosomal single nucleotide polymorphisms (SNPs).
Genome variants in the Yellow-feathered chickens
After filtration, 16,817,111 single nucleotide polymorphisms (SNPs) and 1,289,024 InDels (insertion or deletion of bases) (≤ 50 bp) were retained. The structural variations (SVs) and the increase or decrease of the copy number of large (>1 kb) genomic fragments were analyzed. All these genomic variants in the newly generated dataset are summarized in Additional file 4: Fig. S3. Briefly, most of the SNPs are located in intergenic followed by intronic genomic regions (Additional file 4: Fig. S4a). Those located within coding sequences are mainly associated with synonymous or nonsynonymous coding attributes (Additional file 4: Fig. S4b). There are more transitions (11,943,736; 71.02%) than transversions (4,873,375; 28.98%) in the dataset. G->A and C->T substitutions are the common transitions at 28% while A->G and T->C substitutions are around 21% (Additional file 4: Fig. S5). Different transversions show a low but relatively uniform distribution rate in the dataset. The total average ratio of transitions to transversions is 2.53 (Additional file 6). Analysis of the heterogeneity of clean SNPs shows that about 2,033,275 and 2,259,628 SNPs per genome are homogenous and heterozygous hybrids, respectively (Additional file 7).
Among the high quality InDels, there are more deletions than insertions (785,806 (60.96%) versus 503,218 (39.04%)). The genomic locations of these InDels are summarized in Additional file 4: Fig. S6a, where most InDels are located in non-coding (i.e. intergenic and intronic) regions. Additionally, the four most common genomic consequences of the InDels include frameshift or non-frameshift insertions and deletions (Additional file 4: Fig. S6b).
Besides SNPs and InDels, SVs, which represent a large range of chromosomal variations encompassing large genomic regions have been characterized. These include large fragment deletions (DEL), insertions (INS), inversions (INV), translocations, and duplications [13, 14]. Intrachromosomal translocations (65%) and deletions (26%) are predominant in the dataset, while inversions and interchromosomal translocations are present in lower proportions (Additional file 4: Fig. S7a, Additional file 8). Analysis of copy number variations (CNVs; 95,918 in total), divided into deletions and duplications, reveals an overall higher proportion of deletions (56.5%) than duplications (43.5%) (Additional file 4: Fig. S7b, Additional file 9).
Population structure
Principal component Analysis (PCA) was performed for all the 100 YFC genomes, revealing a general separation of YFCs from Henan (Zhengyang, ZY) and Hubei (Jianghan, JH) (Fig. 1b) into a northern cluster. The YFCs from Guangxi (Guangxi Yellow, GX), Guangdong (Huaixiang, HX, Huiyang bearded, HY, and Wuhua Yellow, WH), and Hainan (Wenchang, WC) form a southern cluster, while those from Hunan (Huanglang, HL), Jiangxi (Ningdu Yellow, ND), and Fujian (Hetian, HT) group into a central cluster. This finding is supported by ADMIXTURE analysis (Fig. 1c and d). At the lowest cross validation error value, corresponding to K=2, the northern (blue) and southern (red) clusters show a complete separation, whereas the central cluster exhibits a signal of admixture with the northern and southern clusters. These three clusters were verified when K=3, with HL and WC showing admixed ancestries. When K=4, both HT and HY harbor the same ancestry component, which also contribute to WC.
We implemented comparative population genetic analyses the YFCs against other indigenous chickens from China, RJF, and GVF. In the PCA (Fig. 2a), YFCs tend to cluster together and appear to be in close proximity to HB partridge chickens and a few indigenous chickens from Sichuan and Tibet. These patterns imply a close congruity in the total genomic architecture of the YFCs. Yuanbao bantams form a distant cluster from the YFCs and other chickens, underscoring the genomic effects of differential breeding trajectories [15, 16]. Neighbor joining (NJ) phylogeny (Fig. 2b) and ADMIXTURE (Fig. 2c and d) corroborates the findings of the PCA and further clarifies the northern, central, and southern YFC clustering pattern inferred from figures 1b-d.
Detection of selective sweeps
Genome-wide scans for signals of selection attributable to the YFCs phenotype identified 268 analytical windows within the top 1% of the Locus-specific branch length (LSBL) test, and 370 windows in the π-ratio test (Additional file 10 and 11). These correspond to 366 and 504 positively selected genes (PSGs), respectively. A total of 28 PSGs were concurrently identified in the top 1% by the two selection tests (Fig. 3a). This is a relatively small overlap, possibly owing to the differences in the selection tests. Among the 28 genes are genes that are associated with pigmentation including: RALY heterogeneous nuclear ribonucleoprotein (RALY), leucine rich repeat containing G protein-coupled receptor 4 (LGR4), ryanodine receptor 2 (RYR2), RYR3, solute carrier family 23 member 2 (SLC23A2), and SLC2A14. Functional enrichment assessment showed significant gene ontology (GO) terms including vitamin transport activity (GO:0090482; Fig. 3b), intersecting with SLC23A2 and SLC2A14, which play roles in pigmentation. There are additional genes above the top 1% significance threshold in either of the selection tests. These genes are important for understanding the color trait and other properties of interest like meat quality of the YFCs. They include BCDO2, IL-18, FBXO5, COL1A2, COL4A2, COL6A1, COL6A2 in LSBL; and GDF8, HSPA5, SHISA9, COL4A1, and COL23A1 in the π-ratio test (Fig. 4).
BCDO2 haplotype differentiation
BCDO2 gene is a classical yellow color gene in chicken. We investigated its haplotype structure, also encompassing the proximal flanking genes. BCDO2 showed a homogenous haplotype pattern across the 10 YFC breeds (Fig. 5). Interestingly, Yuanbao breed also bears the same pattern as the YFCs. On the other hand, HB partridge chickens, which initially showed a close genomic proximity to the northern cluster YFCs (Fig. 1), clearly exhibits a synonymous BCDO2 haplotype pattern to the other Chinese indigenous chickens rather than to the YFCs (Fig. 5). Overall, the haplotype differentiation pattern of BCDO2 and its flanking genes (IL18 and PTS) is consistent with the selection of these genes as candidate PSGs for the yellow pigmentation phenotype.