Enrichment of PIPs using 5’-biotinylated DNA probes. To identify the PIPs of each fibroin gene in M4 and L5D5 PSGs (Fig. 1a), we first examined the promoter activities of fibH, fibL, and P25 in cultured BmE cells. Significantly, the luciferase activities driven by the promoters of fibH, fibL, and P25 were approximately 133-, 13-, and 20-fold higher than that of the negative control, demonstrating that the promoter region of each fibroin gene contains regulatory sequences related to its expression (Fig. 1b, 1c). Furthermore, we generated 5’-biotinylated DNA probes based on sequences of the fibH, fibL, and P25 promoters shown in Fig. 1c as well as two control probes (Supplementary Fig. S1), and DNA pull-down assays were subsequently performed. As shown in Fig. 1d, multiple proteins from either the M4 or L5D5 PSG showed obvious binding to each of the biotin-labeled DNA probes specific for the fibH, fibL, and P25 promoters in comparison with the control group results, demonstrating that the pull-down assays were effective and successful.
Summary of PIPs determined by HPLC-MS. HPLC-MS was employed to determine the candidate PIPs of each fibroin gene, which are summarized in Table 1 (detailed see Supplementary Table S1). After removing the proteins captured by both negative control DNA probes, we finally identified 198 and 330 PIPs of fibH, 292 and 305 PIPs of fibL, and 247 and 460 PIPs of P25 from the M4 and L5D5 PSGs, respectively. Notably, the PIPs identified from the L5D5 PSG were much more abundant than those from the M4 PSG, implying that more proteins are recruited to directly regulate the expression of fibroin genes in L5D5, a key period of the efficient synthesis of massive amounts of fibroin protein in the PSG. Moreover, the number of P25 PIPs identified from the L5D5 PSG was much greater than the numbers of fibH and fibL PIPs, which was a surprising finding and deserves further study. Preliminary Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation revealed that the specific PIPs of each group showed some degree of similarity in terms of functional classification (Supplementary Figs. S2 and 3), suggesting that these PIPs may play important roles in the cooperative regulation of fibroin gene expression. Next, we will focus on these common, interesting PIPs to explore their similarities and differences.
Table 1
Summary of candidate PIPs identified by HPLC-MS
Gene promoters
|
Developmental
stages
|
Control group
(GFP probe)
|
Control group
(unbiotinylated probe)
|
Experimental group
(biotinylated probe)
|
Specific PIPs
|
fibH
|
M4
|
34
|
18
|
232
|
198
|
L5D5
|
39
|
23
|
373
|
330
|
fibL
|
M4
|
34
|
23
|
325
|
292
|
L5D5
|
39
|
77
|
387
|
305
|
P25
|
M4
|
34
|
40
|
291
|
247
|
L5D5
|
39
|
28
|
504
|
460
|
Comparison of PIPs between the M4 and L5D5 PSGs. To better understand the characteristics of these PIPs, we first analyzed the common and unique PIPs identified in the M4 and L5D5 PSGs corresponding to each fibroin gene promoter. The results showed that the numbers of common fibH, fibL, and P25 PIPs in M4 and L5D5 were 109, 156, and 158, respectively (Fig. 2a and Supplementary Table S2). More than 90% of these PIPs could be annotated using the BlastKOALA web tool of the KEGG database. Intriguingly, a large proportion of the common PIPs were found to be enriched in three primary pathway categories (i.e., metabolism, particularly metabolic pathways; genetic information processing, especially ribosome and protein processing in the endoplasmic reticulum; and human disease-related pathways, such as those related to prion disease, Huntington’s disease, and Parkinson’s disease) (Fig. 2b).
Among the unique PIPs (Supplementary Table S3), there were some obvious differences between the fibroin gene promoters and between the two developmental stages. For example, in the M4 PSG, PIPs of fibH and fibL were enriched in human disease and genetic information processing pathways, especially those related to the ribosome and RNA transport. PIPs of fibL were also abundant in metabolism pathways, particularly metabolic pathways, while PIPs of P25 were abundant only in pathways involved in transcription, translation, folding, sorting and degradation and replication and repair (Fig. 3a). In the L5D5 PSG, the PIPs of the three fibroin genes were most abundant in disease-related pathways, particularly those related to neurodegenerative disease, followed by metabolism-related categories, such as metabolic pathways and the biosynthesis of secondary metabolites. Some PIPs of fibH and fibL were also enriched in the ribosome and protein processing in the endoplasmic reticulum categories, but no PIPs of P25 were enriched in these two pathways (Fig. 3b). In addition, there were many more PIPs identified in the L5D5 PSG than in the M4 PSG, which suggests that more proteins are recruited to ensure the efficient transcription of fibroin genes in the L5 stage, leading to the large-scale synthesis of fibroin proteins.
Identification of common PIPs shared by the three fibroin genes. The expression of fibroin genes is regulated mainly at the transcriptional level in a concerted manner10,11. To explore the proteins that may be involved in such coregulation, we analyzed the PIPs that interact with the promoters of all three fibroin genes. The results showed that a large number of PIPs (135 in the M4 PSG and 212 in the L5D5 PSG) were shared by fibH, fibL and P25 (Fig. 4a and Supplementary Table S4). Functional prediction based on the BlastKOALA web tool of the KEGG database revealed that 90.4% (122/135) and 90.1% (191/212) of the common PIPs received some annotation. These PIPs were annotated to all six KEGG systems, especially genetic information processing, human diseases and metabolism, and some of these PIPs were found in both the M4 and L5D4 PSGs. In addition to organismal systems, there were many more common PIPs in the other five KEGG systems in the L5D5 PSG than in the M4 PSG (Fig. 4b). Further analysis revealed the detailed pathway annotations of those common PIPs identified in M4 and L5D5 (Fig. 5a). The most highly enriched system was human diseases related to neurodegeneration pathways, including the multiple disease, Alzheimer’s disease, and Parkinson’s disease categories, among others. The second most enriched systems were those of common PIPs involved in the ribosome, protein processing in the endoplasmic reticulum, and metabolic pathways (Fig. 5b).
Functional interaction network of the common PIPs shared by the three fibroin genes. To obtain a greater understanding of the possible functional relationships among the common PIPs identified in the M4 and L5D5 PSGs, we further constructed protein-protein interaction (PPI) networks using the online Search Tool for the Retrieval of Interacting Genes/Proteins (STRING). The initial network of the common PIPs in M4 consists of 132 nodes and 254 edges (Fig. 6a). The vast majority of the nodes were related to 35 PIPs and were mainly involved in ribosome (22 PIPs), RNA transport (5 PIPs), and protein processing in the endoplasmic reticulum (4 PIPs) pathways. The second most significant module (6 PIPs) was involved in ribosome biogenesis in the eukaryotic pathway, and the third most significant module (5 PIPs) was involved in the proteasome pathway. The other modules, which consisted of two or three PIPs, were involved in mismatch repair, aminoacyl-tRNA biosynthesis, DNA replication, nucleotide excision repair, spliceosome, and protein processing in endoplasmic reticulum pathways.
In terms of the common PIPs identified in L5D5, 209 nodes and 444 interactions were found in the initial network (Fig. 6b). The vast majority of the nodes were related to 44 PIPs and were mainly involved in ribosome (25 PIPs) and aminoacyl-tRNA biosynthesis (12 PIPs) pathways, followed by protein export (2 PIPs) and mRNA surveillance (2 PIPs) pathways. Two significant modules consisting of 14 and 9 PIPs were involved in the proteasome and protein processing in the endoplasmic reticulum, respectively. Another significant module consisted of 6 PIPs that were annotated as subunits of the coatomer complex, a cytosolic protein complex that binds to dilysine motifs and is essential for the retrograde Golgi-to-ER transport of dilysine-tagged proteins. The other modules, consisting of two, three, or four PIPs, were mainly involved in pathways such as protein processing in the endoplasmic reticulum, protein export, and ribosome biogenesis in eukaryotes. Taken together, these results indicate that the common PIPs identified in M4 and L5D5 PSGs show many interactions among themselves and are at least partially biologically connected as a group. Further studies could lead to a better understanding of their roles in the direct regulation of fibroin gene expression during larval molting and feeding stages.
Description of TFs among the PIPs of fibroin genes. TFs are indispensable for the regulation of fibroin gene transcription and protein synthesis12–19. However, only a few TFs from the PSG have been isolated and validated using in vivo/vitro methods thus far. Therefore, we further identified the TFs among the PIPs of fibroin genes. As summarized in Table 2, 31 potential TFs were identified, which were distributed on 18 chromosomes. The functions of most of these TFs in regulating fibroin gene expression have not been reported. Notably, 5 TFs that could interact with all fibroin gene promoters were identified in both M4 and L5D5, while the numbers of TFs that could interact with all fibroin gene promoters in either M4 or L5D5 were 11 and 10, respectively. Five TFs were found only in M4, while 9 TFs were found only in L5D5. Surprisingly, 20, 23, and 29 TFs could interact with the promoters of fibH, fibL, and P25, respectively, in either M4 or L5D5. Without considering other factors, these results may reflect the similarities and differences of TFs regulating the expression of the three fibroin genes and are worthy of further study.
Table 2
Potential TFs identified from the PIPs of fibroin genes
Protein ID
|
Chromosome
|
Family/domain
|
Interacting promoters
in M4
|
Interacting promoters
in L5D5
|
P_KWMTBOMO00339
|
Chr1
|
CSD
|
fibH
|
|
P25
|
fibH
|
|
P25
|
P_KWMTBOMO00366
|
Chr1
|
TF_others
|
fibH
|
fibL
|
P25
|
|
fibL
|
P25
|
P_KWMTBOMO00497
|
Chr1
|
THAP
|
fibH
|
fibL
|
P25
|
fibH
|
|
P25
|
P_KWMTBOMO01796
|
Chr4
|
TF_others
|
|
fibL
|
|
fibH
|
fibL
|
P25
|
P_KWMTBOMO03626
|
Chr6
|
NCU-G1
|
fibH
|
fibL
|
P25
|
fibH
|
fibL
|
P25
|
P_KWMTBOMO04742
|
Chr8
|
ETS
|
fibH
|
|
P25
|
|
|
|
P_KWMTBOMO04674
|
Chr8
|
HMG
|
|
|
|
|
fibL
|
P25
|
P_KWMTBOMO04757
|
Chr8
|
MBD
|
|
|
|
|
|
P25
|
P_KWMTBOMO04877
|
Chr9
|
MBD
|
|
|
|
|
|
P25
|
P_KWMTBOMO04966
|
Chr9
|
zf-C2H2
|
|
|
|
|
fibL
|
|
P_KWMTBOMO05228
|
Chr9
|
zf-C2H2
|
|
|
|
|
|
P25
|
P_KWMTBOMO05675
|
Chr10
|
zf-C2H2
|
fibH
|
fibL
|
P25
|
|
|
|
P_KWMTBOMO05699
|
Chr10
|
zf-C2H2
|
|
|
|
|
fibL
|
|
P_KWMTBOMO05810
|
Chr10
|
Nrf1
|
|
|
|
fibH
|
fibL
|
P25
|
P_KWMTBOMO07323
|
Chr12
|
NCU-G1
|
|
fibL
|
P25
|
fibH
|
fibL
|
P25
|
P_KWMTBOMO08022
|
Chr13
|
HMG
|
|
fibL
|
P25
|
|
|
|
P_KWMTBOMO08820
|
Chr15
|
TF_others
|
fibH
|
fibL
|
P25
|
fibH
|
fibL
|
P25
|
P_KWMTBOMO09050
|
Chr15
|
zf-C2H2
|
fibH
|
fibL
|
|
fibH
|
|
P25
|
P_KWMTBOMO09808
|
Chr16
|
NCU-G1
|
fibH
|
fibL
|
P25
|
fibH
|
fibL
|
P25
|
P_KWMTBOMO10147
|
Chr17
|
TF_others
|
|
|
|
|
|
P25
|
P_KWMTBOMO11061
|
Chr18
|
TF_others
|
fibH
|
fibL
|
P25
|
fibH
|
fibL
|
P25
|
P_KWMTBOMO11731
|
Chr19
|
HMG
|
fibH
|
fibL
|
P25
|
|
|
|
P_KWMTBOMO12659
|
Chr21
|
NCU-G1
|
fibH
|
fibL
|
P25
|
fibH
|
|
P25
|
P_KWMTBOMO14887
|
Chr24
|
NCU-G1
|
|
fibL
|
P25
|
fibH
|
fibL
|
P25
|
P_KWMTBOMO15035
|
Chr25
|
THAP
|
|
|
|
|
|
P25
|
P_KWMTBOMO15757
|
Chr26
|
Homeobox
|
|
|
|
fibH
|
|
P25
|
P_KWMTBOMO15758
|
Chr26
|
Homeobox
|
fibH
|
fibL
|
P25
|
fibH
|
fibL
|
P25
|
P_KWMTBOMO15781
|
Chr26
|
THAP
|
|
fibL
|
|
fibH
|
fibL
|
P25
|
P_KWMTBOMO15928
|
Chr27
|
bHLH
|
|
fibL
|
|
|
|
P25
|
P_KWMTBOMO16040
|
Chr27
|
TF_others
|
|
fibL
|
P25
|
|
|
P25
|
P_KWMTBOMO16524
|
Scaf007
|
TF_others
|
fibH
|
fibL
|
P25
|
|
fibL
|
|
Combinatorial TF interactions are critical for gene regulation and are important determinants of different cellular functions. To determine possible interactions among the 31 TFs, we constructed corresponding PPI network models using the STRING database. As shown in Supplementary Fig. S4, two interaction networks were identified, which consisted of 28 nodes and 16 edges. One module was composed of 2 TFs annotated as prohibitin, which is a highly conserved protein that can inhibit the proliferation and apoptosis of tumor cells by regulating gene transcription and maintaining the stability of mitochondrial proteins. The other module was composed of 9 TFs. Interestingly, two ribosomal proteins (P_KWMTBOMO04742 and P_KWMTBOMO05675) and one translation elongation factor (P_KWMTBOMO11731) that form a subnetwork and interact with each other were found only in the M4 PSG, while adenylate kinase 9 (P_KWMTBOMO05810) was found only in the L5D5 PSG. In addition, another subnetwork was composed of 5 TFs (P_KWMTBOMO07323, P_KWMTBOMO03626, P_KWMTBOMO14887, P_KWMTBOMO09808, and P_KWMTBOMO12659) that interact with each other and are all molecular chaperones with similar functions assisting the folding of proteins upon ATP hydrolysis. Taken together, these results suggest that these TFs are crucial for the correct transcription and folding of silk fibroin proteins and deserve further study.