Overview of Transcriptome and Proteome
The four midgut sections of adult N. viridula individuals were dissected and each of these tissues were sequenced together with the corresponding carcass, with four biological replicates yielding a total of 1,426,685,586 reads. These were assembled de novo into 314,260 transcripts (Table 1A), and running TransDecoder on this transcript set predicted a total of 73,752 peptides. This peptide set was used as the theoretical database to identify proteins from gel-free proteomics in each of the four midgut compartments, and resulted in a total of 3,472 unique proteins in our samples (Table 1B). No differences in terms of the enrichment of membrane proteins were observed between the supernatant and pellet fractions of the proteomic analysis (Table S2). Lastly, we tested whether the presence/absence of a protein in the proteomics set was associated with its expression in the transcriptome and found that proteins identified in the proteome showed on average far higher expression values, compared to the non-detected proteins (Figure S1). Full tables showing the expression levels reported in transcripts per kilobase million (TPM) and presence or absence in proteomics are reported in Tables S3 and S4, respectively.
In order to perform a phylogenomic analysis, the N. viridula protein set was filtered to 28,402 unigenes by grouping transcripts at the gene level using the Trinity accession numbers, which yielded superior BUSCO scores (Figure S2). This gene set was compared to publicly available genomes and transcriptomes from stink bugs and other insects (Table S5). More specifically, we used the standalone version of the orthology database OrthoDB v9 [23] to obtain a list of 221 single-copy genes present in all species, which we subsequently used for a phylogenomic analysis. This analysis showed that all stink bugs clustered together and formed a monophyletic clade, as they all belong to the Pentatomidae family of Hemiptera (Figure 1A). The phylogeny was complemented by an orthology analysis, in order to compare gene copy number across various insect lineages (Figure 1B). Interestingly, the unigene set for N. viridula contained a large number (n = 8,510) of unigenes that have no ortholog with other arthropod species. This number is elevated in N. viridula even when compared to the pentatomid stink bug P. stali that was analyzed using the same Trinity-based pipeline. The majority of these genes (n = 5,927) has a BLAST match (e-value <1e–05) in the Uniref50 database, with almost half of them (n = 2,378) being similar to an arthropod protein (Figure S3). Of the 2,583 genes that do not have a BLAST match in Uniref50, 1,757 are transcribed with a TPM value >1, in at least one of the four midgut compartments, indicating that the corresponding genes should be further studied to determine whether they are functional.
It should be noted that a considerable fraction of the N. viridula unigene set are similar to bacterial proteins (n = 2,512). These genes most probably originate from the bacterial symbionts associated with N. viridula. There was a significant difference in the mean transcriptional level of the gut regions, for 871 of them (one-way ANOVA tests using the log-transformed TPMs) with the vast majority being up-regulated in the M4 gut region, which harbors the bacterial symbionts in pentatomid stink bugs [9, 24]. Most of these M4-specific genes originate from γ-proteobacteria, which is in agreement with previous studies [9, 25]. Another set of genes appears as being expressed in the M1 and M2 regions only. Interestingly, their taxonomic profile differs from the previous ones, because their majority originates in the Bacteroidetes/Chlorobi clade. As this study was aimed at analyzing the midgut of N. viridula these 2,512 bacterial-like genes were filtered out of the unigene set and all subsequent analysis was done on the set of 25,890 remaining eukaryote-like unigenes.
Analysis of functions in each gut compartment
In order to obtain an overview of the expression profile along the midgut, transcripts expressed >1 TPM and proteins detected with gel-free proteomics along the N. viridula midgut were compared visually with Venn diagrams (Figure 2 A, B). Despite the obvious morphological differences of these segments, the majority of transcripts (68%; n = 7,898) and a significant amount of proteins (43%, n = 1,302) were present in all compartments. In both analyses the M1 and M4 regions had the highest number of genes or proteins detected in only one compartment. To further explore the broad differences between midgut compartments we also performed a principle component analysis (PCA). The first two dimensions of the PCA explained 45.7 and 34.9 % of the variation respectively (Figure 3). All biological replicates clustered in a sample clustered together, which is indicative of relatively high reliability of the tissue sampling. Also of note is the fact that relative clustering of the M2, M3, and M4 sections especially along the first principle component, suggesting that these samples show similar transcriptome profiles. Unsurprisingly, the carcass sample clustered independently, but so also did the M1 section of the midgut suggesting that it has a distinct transcriptional profile to the other midgut sections. Collectively, these data suggest that while most genes detected in the analysis were commonly shared among all compartments, M4 and especially the M1 appear the most distinct.
A more detailed understanding of each midgut compartment was obtained by identifying groups of transcripts and analyzing them for enrichment in family membership (Pfam) or gene ontology (GO) terms. Fuzzy C-means clustering yielded eight groups of genes which displayed differing expression patterns along the midgut (Figure 4). Four out of the eight clusters reflected transcripts specific to a single compartment. The remaining four clusters showed more complex patterns of expression along the gut. For example, one cluster showed transcripts which gradually increased in expression level from anterior to posterior (M1<M2<M3<M4). The 500 most highly expressed genes were also grouped from each compartment in order to estimate the predominant function of each section. These analysis yielded 12 groups of genes (8 clusters and 4 Top500 groups) which were analyzed in bulk by looking for enriched gene families and GO terms.
The M1-M3 region tended to display similar arrays of enriched protein families and GO terms with regards to both specificity and overall expression level. In the M1-M3 compartments families like cysteine proteases or GO terms related to proteolysis were found significantly enriched in both the top 500 most highly expressed genes and in the compartment specific cluster (Table 2; Table S6). Likewise, families associated with xenobiotic metabolism (P450s, carboxylesterases) or GO terms associated with these reactions (oxidation-reduction process) were frequently found in the anterior sections. In contrast, the M4 displayed GO terms relating to transmembrane transporter proteins and an enrichment in proteins from the sugar porter family (PF00083; Table 2; Table S6). Of all of the other clusters containing genes with more complex expression patterns, only one (M1<M2<M3<M4) showed a significant enrichment in any GO term or family; the zinc finger C2H2 family were overrepresented in this fuzzycluster. From the GO term and Pfam enrichment analysis it can be inferred that the anterior portion of the midgut (M1-M3) has a predominant role in metabolism of xenobiotics and nutrients, while the posterior has a role in the transport of nutrients.
Identification and analysis of detoxification enzymes and nutrient transporters
The enrichment of P450s in the anterior region of the midgut led us to annotate individual members of this gene family using a pipeline centered around homology searches and. Testing our pipeline on several well-annotated proteomes, suggested that our method predicted a number of P450 genes that was close to those previously reported in the literature for other insects (Table S7). After manually combining P450 fragments which displayed overlaps, a total of 111 P450s were identified in our N. viridula unigene protein set (Figure 5; File S1; Table S8). The expression profile of these P450s was then analyzed by family to observe any compartmentalization of functions. Of particular interest was the CYP6 family, which has a known role in insecticide metabolism [18] and showed high expression across all midgut compartments in our dataset with a clear enrichment in the anterior portion of the midgut (M1-M3) compared to both the M4 region and the carcass. Also of note were five CYP4G genes that are commonly implicated in cuticular hydrocarbon biosynthesis [26]. Four out of five of these genes in N. viridula showed high levels of expression only in the carcass sample (Table S8). Averaging the expression of all P450s, there was roughly twice the expression in the anterior portions of the midgut compared to the posterior section.
The enrichment of transporter proteins in the M4 region of the midgut was expanded further by identifying individual members of several families of sugar and amino acid transporters using an in house pipeline (see Materials and Methods). Sugar transporters belonging to the SP, SSS, and SWEET families were identified and analyzed for their expression pattern along the midgut (File S2; Table S9). The 11 SSS transporters that were identified, were expressed at very low levels in all midgut compartments. Only two SWEET transporters were detected, one of which showed high expression and 2–4 fold enrichment in all midgut compartments compared to the carcass. However, by far the largest group of sugar transporters was the SP family with 84 detected transporters. This group was incredibly diverse in its expression pattern; different SPs showed specificity or enrichment in different midgut compartments. However, in accordance with the Pfam enrichment of sugar transporters in the M4 region (Table 3), the highest total expression and the largest number of highly expressed genes (>50 TPM) were found in the M4 region of the midgut (Table 3).
Amino acid transporters belonging to the families NSS, APC, POT, and AAAP families were all represented by at least four members in N. viridula (File S2; Table S9). The ten NSS family members generally showed low expression, and only one NSS showed expression values of >10 TPM. The five POT family members showed a similar low expression apart from DN111091_c2_g2, which showed very high (>200 TPM) expression in the M2 and M3 regions of the midgut. The APC and AAAP families were larger, with 18 and 15 members respectively. Furthermore, transporters in these families tended to concentrate in the M4 region of the midgut. The number of transcripts from both APC and AAAP showing very high (>50 TPM) expression was elevated in the M4 tissue (Table 3). 3/15 AAAPs and 6/18 APCs were highly expressed in the M4 region. Lastly, the expression of these families in the M4 (APC: 48.00 ± 15.4, AAAP: 85.9 ± 26.8) and was higher than the average anterior midgut expression (APC: 16.2 ± 7.0: AAAP: 34.1 ± 17.8; Figure 6).