Physicochemical and GC-MS analysis
The spatial heterogeneity in physicochemical parameters was revealed in the different sampling points (Table 1). The heavy metal concentration such as iron varied significantly (p=0.05) between the sampling points. Iron concentration was higher (1.42 mgkg-1) in the collection tank (TP01) as compared to the discharge of final effluent (TP06) (0.324 mgkg-1). The other heavy (Cr, Ni and Zn) decreased from 1.09 mgkg-1 to 0.195, 0.10 to 0.07 mgkg-1 and 0.056 to 0.041mgkg-1 respectively. Analysis of variance results show that there was significant difference in chromium concentration and the mean value of chromium was significantly different in all sampling point except TP06 and TP05 (p<0.05). As effective as chemical, physical and biological treatment reduces the load. Whereas, the concentration of copper was slightly higher in the final effluent; TP06 (1.53 mgkg-1) than the collection tank; TP01 (0.23 mgkg-1). However, the permissive limit of the heavy metals was in the range as prescribed by the central pollution control board for the treated effluent standards (CPCB,2012). Variation in the TOC level was observed in different stages of the treatment process. The TOC level was significantly higher in TP07 (414.35 ppm) and TP03 (178.8 ppm) and reduced dramatically in TP08 and TP06 after treatment. The total organic carbon showed the difference between the different sampling points and suggested their contribution to the diversity of the microbial community in the CETP.
The GC-MS analysis was carried out for the characterization of the compounds present in the CETP, vatva (Table S1). On comparison of the mass spectra of the constituents with the NIST library, different chemicals compounds were identified in effluent treatment plant. The compounds identified in the CETP consisted of a complex mixture of phthalate, phenolic compounds, aromatic and aliphatic carboxylic acids, amines and alkanes. In the TP01 most abundant compounds were identified as benzensulonamide (28.56%), 1-hexyl-2-nitrocyclohexane (11.17%), oxalic acid (9.71%), cyclopentane (5.85%), phthalic acid (5.47%) and bis(2-ethylhexyl) (2.81%) followed by 2-methylhexacosane (2.01%), 3-ethyl-3-methyl (2.05%) and 1,2 benzedicaroxylic acid (2.00%). Compounds like bis(2-ethylhexyl) phthalate, cyclopentane, (2-hexyloctyl), 2-methylhexacosane, decane, 3-ethyl-3-methyl, octatriacontane, 1,3-dibromo, benzenesulfonamide and 4-amino-n-ethyl were not detected in the TP06 sample. Moreover, in TP01 and TP06 samples showed the elevated peak area of 1,2 benenedicaroxylic acid (3.72%) and 1-hexyl-2-nitrocyclohexane (8.34%) levels over the other TP02, TP03, TP04 & TP05 samples. Among all sampling points, TP02 and TP07 shows the presence of the amines such as dimethoxyamine, 1-butanamine, 1-propanamine and 1,1'-biphenyl]-4,4'-diamine, 2,2'-dichloro. The maximum percentage of the oxalic acid with peak area 28.59% was shown by the TP08 followed by the TP04 (21.09%). Moreover, in sludge samples (TP07&TP08) different compounds such as benzeneacetic acid, 2-undecene, dichloroacteic acid and cyclohexane were detected (Table S1).
Microbial community structure and taxonomic profile
The characteristic features and annotation of metagenome datasets were summarized in Table 2. The rarefaction curve was obtained to determine whether the level of sequencing carried out in a sample is sufficient to represent its true diversity. The rarefaction curve was plotted between species richness and the number of expected OTUs obtained per sample (Fig. S1). Rarefaction curves almost reached an asymptote state indicating that full community coverage was achieved which were reflected by a higher Chao1 index which ranged from 570.80 to 1018 (Table 4). The alpha diversity was estimated to determine the species richness and diversity of the given system. The different diversity indices were estimated which showed that the microbial diversity varies slightly with the treatment stages and has not reduced the microbial abundance within the stages as revealed by Simpson diversity indices (Table 4). Furthermore, Simpson's index of diversity (1-D) ranges from 0.77 to 0.90 and Shannon's diversity index ranges from 3.05 to 3.95 was relatively higher, which reflects the richness in diversity. At CETP, the community sampled from TP06 was less diverse as compared to other samplings points. Whereas, in TP07 the highest level of richness and diversity was observed. Other diversity indices and data from individual samples were elevated. TP02 showed relatively higher Shannon's Diversity Index (3.95) which disperse the increase in the species richness and evenness and hence the diversity in TP02 also increases as compared to other sampling points.
The taxonomic diversity of the microbial community was carried out by using the RDP classifier based on 16S rRNA gene sequences in the shotgun metagenome. The majority of reads were assigned to bacteria at the domain level with an abundance between 97.78%-98.97%. Archaea, and eukaryote contributed less than 1% abundance (Table 3). In the phylum level, classification an overall dominance of Proteobacteria, fusobacteria, bacteroidetes, Firmicutes, and unclassified bacteria was observed in the metagenome datasets (Fig. 2a & b). Among these, the most abundant were Proteobacteria TP07 (42%) followed by TP06 (41%), TP04 (37%), TP03 (32%), TP01(26%) TP08 (25%) and TP02 (12%) samples respectively. Whereas, in TP02 the abundance of Firmicutes (28%) was higher as compared to other datasets that range from (3 % to 11%). At the class level, taxonomical changes were observed within the stages of CETP metagenome datasets. Within the Proteobacteria phylum, Alpha-Proteobacteria were found to be most prevalent in TP08 (sludge sample) while Beta-Proteobacteria were dominant in TP04 (Fig. 2c). This suggested that the Alpha-Proteobacteria class might tolerate well in the CETP treatment process and can be used further for wastewater treatment. At the genus level, the difference between the datasets was also observed, with Pseudomonas (7%), Fusobacterium (6%), Bacteroides (4%), Leptotrichia (4%) Arcobacter (3.1%), Methylophage (3%), Streptococcus (2.2%) and Desulfuromonas (1.1%) were the most prevalent in the TP01 metagenome dataset. Whereas, in TP02 the dominant genera were Fusobacterium (15%) followed by Clostridium (12%), Bacteroides (11%), Streptococcus (8%) and Desulfovibrio (4.4 %). The dominance of Pseudomonas genus was higher in TP07 (25.1%) dataset followed by TP04 (12.1%) and TP03(8.2%) respectively. However, in TP02 the relative abundance of Pseudomonas genus (1.2%) was lower.
Furthermore, the number of shared and unique genera between different metagenomics dataset were represented in the Venn diagram (Fig. 3). The samples were divided into two groups the first group includes CETP inlet (TP01) and sludge samples (TP07 and TP08) and the second group includes inlet going to collection tank (TP01) and the final discharge of the effluent (TP06) (Fig. 3a & b). While, Fig. 3a demonstrated that a total of 69 bacterial genera were common among these metagenomics data sets whereas 126, 336 and 171 bacterial genera are unique in TP01, TP07, and TP08 that resulted in a very diverse community was shared between these metagenomics data sets. Whereas, in the second group 85 genera were found to be common in inlet and outlet of the CETP effluent with 262 bacterial genera were unique to the inlet (TP01) and 136 genera were unique to TP06 i.e. outlet of CETP (Fig. 3b).
At the species level, the CETP Vatva treatment plant was predominantly occupied by Arcobacter nitofigilis, Wolinella succinogenes, Desulfovibris vulgaris, Desulfuromonas thiophila, Clostridium sticklandii, Streptococcus suis, Thurea sp, Enterococcus sp, Klebsiella pneumonia, Riemerella sp., Shewanella sp., Alcaligenes sp. and others. The microbial community was also dominated by different species of Pseudomonas aeruginosa, Pseudomonas filiscindens, Pseudomonas fluorescens and Pseudomonas stutzeri. The Shewanella decolorationis species was only found in the TP07 whereas, other rare bacterial species across the sampling set were Acinetobacter sp., Aeromonas sp., Azospira sp., Nitrosomonas sp., Rhizobium sp., and many more.
Functional profiling of the microorganism
The whole metabolic pathways were analysed by using two metabolic databases (KEGG and COG). The results showed that the relative abundance of functional features belonging to COG categories were cellular processes and signalling, information storage and processing, and metabolism (Fig. S2). Metabolism was the most predominant function, ranged from (47.4 to 43.01%) in the CETP samples collected from different sampling points. At sub-level 2, the top six abundant functions were general function prediction, amino acid transport and metabolism, replication recombination and repair, energy production and conservation, translation, ribosomal structure and biogenesis, and cell wall membrane envelope biogenesis in TP01 sample. Whereas, in TP06 dataset top six abundant functions belong to signal transduction, cell cycle control cell division, general function prediction, transcription, cytoskeleton and carbohydrate transport and metabolism, carbohydrate transport and metabolism (Fig. 4a). PCA was performed to investigate the association among the COG functions of the CETP samples. The functional variations were covered in two components PC1 and PC2 with a total variation of 75.8%. In the PCA plot, the samples were clustered into two major groups TP03, TP04, and TP07 and the second major cluster was formed between TP05, TP06, and TP08 while TP01 and TP02 were separated independently (Fig. 4b). The result also supported in the dendrogram and heatmap (Fig. 4c & Fig. S3).
In KEGG annotation, the most abundant function belongs to metabolism, as predicted in COG analysis. KEGG mapping classified the metagenome dataset in metabolism (55.57 to 59.39 %), genetic information processing (18.61 to 21.19%), environmental information processing (11.58 to 16.12%), cellular processes (4.95 to 5.34%), human diseases (0.86 to 1.83%) and organismal systems (0.36 to 0.49%) (Fig. S4). At KO level 2, carbohydrate metabolism was the most abundant function represented by 19.27 to 22.71% in TP01, TP02, TP03, TP04, TP06, and TP07 samples. Whereas, amino acid metabolism was the most abundant function represented by TP06 (19.93%) and TP05 (19.79%). The other KO categories were energy metabolism (12.9 to 14.83%), metabolism of cofactors and vitamins (10.3 to 11.15%), nucleotide metabolism (9.96 to 11.27%), lipid metabolism (5.41 to 6.58%), metabolism of other amino acids (4.85 to 5.62%), xenobiotics biodegradation and metabolism (2.57 to 4.66%), glycan biosynthesis and metabolism (3.25 to 4.93%), metabolism of terpenoids and polyketides (3.02 to 3.24%) and biosynthesis of other secondary metabolites (1.92 to 2.69%) (Fig 5a). The PCA analysis for the relative abundance of an annotated sequence of functional systems from CETP samples was presented in Fig. 5. In the PCA plot, high variance indicated a statistically significant difference in the abundance of functional profiles among the different sampling points in the CETP samples. The samples were clustered in groups, similarly as analysed in the COG annotations (Fig. 5b). Moreover, results were also supported by the dendrogram and in the heatmap (Fig 5c & Fig. 5S).
KEGG pathway assignment
Pathway prediction was performed based on the KEGG database. Further, each of the enzymes involved in the degradation was filtered and the gene abundance was calculated from each metagenome dataset and was represented in Table S1, S2, S3. It was observed that genes related to the metabolism of the aromatic compound were more abundant in the TP05 metagenome dataset as compared to other sampling datasets.
Benzoate biodegradation pathway
In the metagenomics dataset, a maximum number of genes were annotated from the benzoate degradation pathway. A total of 10,226 genes with 81 KEGG hits were identified and mapped in KEGG mapper. The key enzymes involved in the benzoate degradation pathways via hydroxylation were catechol 1,2-dioxygenase (EC 1.13.11.1), and protocatechuate 3,4-dioxygenase (EC 1.13.11.3), were identified in the CETP metagenomes dataset (Fig. 6; Table S2). These enzymes were involved in the aerobic benzoate degradation using dioxygenase to form catechol and monooxygenase to form protocatechuate. Almost all enzyme involved in the aerobic benzoate degradation was covered by metagenome datasets (Fig. 6). Moreover, the cluster of the ben gene (benK, benE, benABC, and benD) was also found in the dataset. The benABCD and benD were involved in converting benzoate to catechol whereas, benK and benE help in the transportation of benzoate inside the cell and cat genes (catA and catE) is involved in degradation of catechol.
A second mechanism to degrade benzoate is via protocatechuate formation, in which gene CYP53A1 and PobA were involved. The abundance of the CYP53A1 gene was found only in TP05 and TP06, although the PobA gene was present in all samples. After the formation of protocatechuate, PcaGH and LigAB genes were further involved in ortho and meta cleavage. Moreover, the gene abundance to degrade benzoate via box pathway (aerobic hybrid pathway) was also found in CETP metagenome datasets.
Phenylalanine Metabolism Pathway
Gene mining was done to predict the enzymes for the phenylalanine metabolism pathway in CETP metagenome dataset. The different genes involved in the phenylalanine was found in all the metagenome dataset with their abundance (Table S3). In metagenome dataset, the conserved gene cluster i.e. (paa gene cluster) was found in all sampling points which include 14 genes and two catabolic operon paaXY. The Fig.7, represents the different paa gene cluster involved in the degradation of phenylalanine. Initially, the catalase-peroxidase enzyme (Kat G gene) acts on phenylalanine, and converts it to 2-phenyl acetamide and then phenyl acetamide in the presence of AmiE (amidase) enzyme. Further, paaK (phenylacetate-CoA ligase EC:6.2.1.30) gene catalyze the reaction by converting phenyl acetyl to phenyl Acetyl-CoA. The abundance of the paaK gene was maximum in TP07 (189) followed by TP04 (131), TP03(119), TP01(97), TP02(90), TP05(81), TP08(77) and TP06 (43). The presence of other genes which were involved in the phenylalanine degradation pathway include; paaG, paaJ, paaFand paaH genes. The paaABCD gene was also found in metagenome dataset and proposed to be responsible for the introduction of oxygen into the aromatic ring of phenylacetyl-CoA. Further, the paaJ gene catalysed the reaction of 3-oxo-5,6–di-dehydro suberyl-CoA to Acetyl-CoA or further involved in 2,3-didehydro-adipyl-Co to metabolise via succinyl-CoA, and Acetyl-CoA in the presence of CoA, NAD+. Gene paaF catalysed the reversible reaction of 2,3-dehydroadipyl-CoA to 3-hydroxyadipyl-CoA and paaH gene was involved in last steps of phenylalanine metabolism, which catalysed 3-dehydroadipyl-CoA to 3-oxo-adipyl-CoA, subsequent formation of succinyl-CoA.
Degradation pathways of 1,2-dichloroethane
Enzymes encoded by bacteria enable them to degrade the synthetic chlorinated compound. The presence of a such gene in the effluent provides clear evidence of the utilization of chlorinated compounds in a huge amount by the industries. In the first step of the 1,2-dichloroethane (DCA) degradation, the conversion of 1,2-dichloroethane to 2-chloroethanol occurs in the presence of gene DhlA as shown in Fig. 8. The abundance of this gene was present in all samples except TP02. In alternative pathways for the degradation of DCA alkane-1, monooxygenase play the role in conversion via the oxidation process, because this enzyme has wide substrate specificity and performs both epoxidation and hydroxylation reaction in the industrial effluent. The second step includes a further breakdown of 2-chloroethanol to chloroacetaldehyde through enzyme methanol dehydrogenase (mdh1), this gene was also found in all datasets but absent in the TP02 sample (Table S3). Furthermore, chloroacetaldehyde gets converted to chloroacetate, this reaction was catalyzed by aldehyde dehydrogenase (ALDH) enzymes. In the last step of pathway, the conversion of chloro-acetate to glycolate with the presence of haloacetate dehalogenase and 2-haloacid dehalogenase which further enter into glyoxylate and dicarboxylate metabolism (Fig. 8).
The dominance of the various genes involved in the aromatic compound degradation pathway was also identified in the whole genome sequencing of pure culture isolate. In Pseudomonas aeruginosa and Bacillus licheniformis genomic data identifies different genes probably involved in the degradation of benzene, phenol, biphenyl and 1-2 dichloromethane pathways (Table S4, S5, S6). The data also revealed that catechol 2,3 dioxygenase genes were present which are involved in the degradation of catechol in the ortho and meta-cleavage pathways respectively (Fig.6). As reported in the supplementary Table S9 and Fig. 6, the strains P. aeruginosa genome contains maximum number genes implicated in the degradation of benzoate however in B. licheniformis genome contains some of the genes involved in the degradation of the benzoate pathways (Fig. 6,7,8; Table S4). Similarly, both the strains contain the enzyme-coding genes which are involved in the degradation of phenylalanine, and 1,2-dichloroethane.