Genotyping and phenotyping of C. carpio samples
Based on previous work [1, 19, 20], 199 samples were randomly collected from a cultivated population of C. carpio. The raw genotyping data with 184,978 SNPs for the 199 samples were obtained from SNP genotyping. A total of 95,400 polymorphic SNPs from 195 samples passed the quality control threshold. As shown in Figure 1A and Table S1, the contents of glutamic (GLU), aspartic (ASP), leucine (LEU) and lysine (LYS) accounted for a relatively large proportion of the total amino acid content, and the distribution of the other 16 amino acids was relatively balanced. The gender information of samples was collected and it showed no significant differences between males and females across all traits. To discover the potential relationship among multiple traits, the contents of 19 amino acids were analyzed using the correlation heatmap shown in Figure 1B. GLU, ASP, THR, LYS and PHE contents were dramatically associated with each other, possibly indicating similar functions within the amino acid content regulating mechanism.
Genome-wide association analysis
In total, 36, 6, and 1 SNPs were identified for glycine (GLY), proline (PRO) and tyrosine (TYR), respectively, with a threshold of P < 5.241 ´ 10-7 (Table S2). The Manhattan plot showed the most promising results for GLY content, while the Q-Q plot indicated the reliability of the data analysis (Figure 2A and B). Similar results for PRO and TYR content are shown in Figure S1. The distributions of SNPs on each chromosome were calculated as shown in Figure 2C, which reflects the relatively even distribution among all of the chromosomes. Genes were annotated through the new common carp genome [21], and 54, 10 and one genes were identified for GLY, PRO and TYR traits, respectively.
Transcriptomic analysis of divergent amino acid content
We examined the amino acid contents of 20 newly collected fish samples and selected six individuals with relatively extreme amino acid content. The EAA, BCAA and FLA contents of the 20 samples are presented in Table S3. Taken from our previously published data, the quality of RNA sequencing data is shown in Tables S4 and S5. As EAA and BCAA content were highly correlated, these two categories were integrated as one group in the following analyses.
Plenty of differentially expressed genes were found for the trait EAA (or BCAA), with 236, 81 and 960 genes in brain, liver and muscle tissues, respectively (Figure 3A-C and Tables S6, S8 and S10). The Venn diagram for EAA (or BCAA) trait showed that two, thirteen and eight genes were shared in pairwise comparisons among three tissues (Figure 3D). The cluster analyses of the differentially expressed genes for EAA (or BCAA) in three tissues are shown in Figures 3E and S2.
For the analysis of the FLA trait, 818, 74 and 3 differentially expressed genes were found in three tissues, respectively (Figure 4; Tables S6, S8 and S10). Very limited numbers of genes were shared among three the tissues, which could be due to the diversified gene functions in different tissues (Figure 4D). The cluster studies of the differentially expressed genes for FLA in three tissues are shown in Figures 4E and S2.
The network analyses of differentially expressed genes for the three classified amino acids in multiple tissues are shown in Figures S3-7 and Tables S7, S9 and S11. The network of GO enrichment pathways are illustrated in Figures 5, S8, S9 and Tables S12 and S13, while the KEGG enrichment analysis is shown in Table S14.
Differential methylation analysis
The epigenetic difference analyses between the high and low amino acid content groups were conducted and the DNA was isolated from muscle tissues of six samples for WGBS. Taken from our previously reported data, the quality of WBGS data was shown in Table S15. A volcano plot of the EAA (or BCAA) shows the distribution of methylation differences in all the DMRs (Figure 6A). DMRs were classified into several genomic regions including exons, introns, promoters and intergenic repeat regions (Figure 6B). Figure 6C shows the length distribution of DMRs and most regions were shorter than 1000 bp. Similar results were observed in the DMRs identification for the FLA trait (Figure 6D-F). The DMRs within the promoter regions (differentially methylated promoters, DMPs) were chosen for further functional enrichment analysis due to the significance of DMPs in the regulation of transcription [22]. The GO enrichment analysis and KEGG enrichment analysis are shown in Tables S17 and S18.
Multi-omics analysis of DGE and DMP results
Through DGE and DMP studies, some related genes and pathways have been found, but the interactions between methylated sites and altered expression are still unclear. Obviously it is considerable to implement a multi-omics analysis through the DGE and DMP aspects. Because very few intersections were observed between the liver DGE and DMP results, we concentrated on the relatedness between DGE and DMP results in brain and muscle tissues (Table S19). After Pearson correlation analysis, significant linear correlations (Figure 7) were found in the EAA (or BCAA) and FLA traits of muscle and brain tissues. These genes were divided into two types, positively related and negatively related. For the EAA (or BCAA) trait in muscle, 57 genes were included in the correlation analysis. In addition, 36 genes of the FLA trait in brain tissues were obtained in the correlation analysis. For the EAA (or BCAA) trait in muscle tissues, 24 positively related genes and 33 negatively related genes were presented in Figure 7A. Totally 12 positively related genes and 24 negatively related genes for FLA trait in brain tissues were found in Figure 7B.