Isolation and identification of strain SWX-4
Diverse strains of the Gordonia genus have been extensively studied and applied in industrial desulfurization processes. Some strains of Gordonia have the ability to utilize sulfides as electron donors for energy metabolism, even under acidic or neutral conditions (Alves et al. 2007a; Xu et al. 2022).. In this process, they oxidize sulfides into sulfates and release them into the environment, which helps reduce or eliminate sulfide waste emissions, thereby mitigating environmental pollution (Feng et al. 2016; Delegan et al. 2021). Consequently, bacteria belonging to the Gordonia genus are considered promising resources for biodesulfurization.
In our previous research on petroleum-contaminated bioremediation, a sulfur-utilizing strain named SWX-4 was screened, which demonstrated the capability of degrading oil and utilizing dibenzothiophene (DBT) as a sulfur source (Xu et al. 2022). After 2 d of cultivation at 37°C on LB agar plates, colonies of the SWX-4 strain exhibited a pink color with a smooth surface, regular edges, a convex center, opaque appearance, and an approximate colony diameter of 0.8 µm (Fig. 1A). Bacterial cells displayed a rod-shaped morphology, approximately 2–3 µm in length, as observed through scanning electron microscopy (Fig. 1B). Furthermore, through 16S rRNA identification (Fig. 1C), the isolate was identified as Gordonia alkanivorans.
General properties and functional analysis of SWX-4 genome
The genome of strain SWX-4 is composed of a single circular chromosome with a length of 5,303,410 base pairs (bp) and a GC content of 67.44%. Additionally, there are two plasmids with lengths of 70,053 bp and 28,840 bp, respectively. Figure 2 graphically illustrates some important structural features of strain SWX-4, such as GC content, GC skew, and coding sequences.
A total of 4,786 open reading frames (ORFs) were predicted in the genome, and these genes were further categorized and analyzed. It was found that the genome contains 4,683 candidate protein coding genes (CDSs). Among these CDSs, 4,197 (87.69%) were annotated with predicted functions based on databases such as COG, Pfam, and InterPro. Furthermore, the genome harbors 12 rRNA genes (including 5S rRNA, 16S rRNA, and 23S rRNA) and 51 tRNA genes, as shown in Table 1. The whole genome sequence of strain SWX-4 has been deposited in the GenBank database with the accession number CP128197.
Table 1
Genomic characteristics of strain SWX-4 chromosome
Feature
|
Number or size
|
Genome size (bp)
|
5303410bp
|
GCcontent (%)
|
67.44%
|
Total genes
|
4786
|
Total number of CDSs
|
4683
|
rRNA genes
|
12
|
tRNA genes
|
21
|
Other ncRNA
|
28
|
Genes with predicted function
|
2099
|
Genes with unknown function
|
2687
|
Genomic Islands
|
11
|
CDSs assigned to COGs
|
4197
|
GenBank accession no.
|
CP128197.1
|
Plasmid A login number
|
CP128199.1
|
Plasmid B login number
|
CP128198.1
|
The functional analysis of the predicted 4,786 genes in the SWX-4 genome was performed using the GO, COG, and KEGG databases, as depicted in Fig. 3. Based on GO annotation, 3,665 genes were assigned to specific functions, with 1,373 involved in cellular components, 1,361 in molecular functions, and 3,022 in biological processes. According to COG annotation, 4,197 genes were classified into 20 specific categories, accounting for 87.69% of the total number of genes. The most frequently represented functional category was "transcription" (7.89%), followed by "amino acid transport and metabolism" (6.86%), "lipid transport and metabolism" (6.39%), and "energy production and conversion" (5.74%).
Furthermore, a KEGG classification was performed for 2,009 genes of SWX-4 based on their involvement in metabolic pathways. The category with the highest representation was "metabolism," which includes genes related to amino acid metabolism (296 genes), carbohydrate metabolism (235 genes), cofactor and vitamin metabolism (181 genes), energy metabolism (173 genes), lipid metabolism (167 genes), and heterogenic biodegradation and metabolism (147 genes). These results suggest that strain SWX-4 has great potential in environmental bioremediation. In addition to the aforementioned findings, 11 genomic islands (GIs) were identified in the genome of strain SWX-4.
Genes related to degradation of DBT and hydrocarbon metabolism
Some services have reported the presence of dszC in the annotated genomes of the studied strains, which is likely to play a role in the DBT catabolism process (Di Gregorio et al. 2004). However, strain SWX-4, although capable of utilizing DBT as the sole sulfur source, did not have any genes of the dsz operon annotated. The InterPro database (https://www.ebi.ac.uk/interpro) suggests that the acyl-CoA dehydrogenase gene is closely related to the SfnB family genes responsible for sulfur assimilation, while sfnB is related to the dszC gene responsible for DBT desulfurization. Investigation into the genetic structures responsible for biodesulfurization revealed that sulfur metabolism in the SWX-4 genome is controlled by 35 genes (Table 2). All detected genes were examined using SyntTax, and analysis of the sequence containing SfnB revealed a conserved region in SWX-4 (Fig. 4). In more detail, strain SWX-4 is characterized by a regulator of the TetR family, LLM class flavin-dependent oxidoreductase, dimethyl sulfone monooxygenase SfnG, SfnB family sulfur acquisition oxidoreductase, and convergently positioned TerD (also designated as a stress protein). The TetR regulator family is well-known as one of the largest and best-studied groups of prokaryotic single-component signal transduction systems, involved in the regulation of genes responsible for various catabolic and anabolic processes (Cuthbertson and Nodwell 2013). The DszGR protein, a member of the TetR family, was found to be involved in the regulation of the dsz operon in Gordonia sp. strain IITR100, and its regulatory mechanism was investigated in the literature (Ahmad et al. 2014; Adlakha et al. 2016).. Another study reported that the DszGR protein specifically binds to upstream sequences, inducing a bend required for the activity of the dsz promoter (Keshav et al. 2022). The UniProt database (https://www.uniprot.org/) contains information on the relationship between LLM class flavin-dependent oxidoreductase and DszA, as well as alternative names such as alkane sulfonate monooxygenase (SsuD) and FMNH2-dependent monooxygenase. Although the similarity between this enzyme and the translated sequence of the dszA gene product is relatively low, they both perform the same functions, catalyzing reactions that require FMNH2 and O2, with FMN release. SfnG has been reported to convert dimethyl sulfone to methane sulfinate. This process occurs in the second reaction of the 4S pathway, where DBT sulfone is converted to 2-hydroxybiphenyl-2-sulfinate (Wicht 2016). Based on the suggestion that the region encoding TetR, LLM class flavin-dependent oxidoreductase, SfnG, SfnB, and TerD may determine DBT degradation by bacteria lacking the dsz genes in their genomes, the pathways of DBT catabolism by strain SWX-4 were proposed.
Table 2
Genes related to sulfur metabolism in strain SWX-4
Locus tag
|
Gene Name
|
Size(bp)
|
Predicted function
|
gene1481
|
sfnG
|
1112
|
dimethyl sulfone monooxygenase SfnG
|
gene1482
|
sfnB
|
1247
|
SfnB family sulfur acquisition oxidoreductase
|
gene1483
|
terD
|
1218
|
TerD family protein
|
gene1480
|
ssuD
|
1179
|
LLM class flavin-dependent oxidoreductase
|
gene1479
|
tetR
|
681
|
Bacterial regulatory proteins, tetR family
|
gene0787
|
fdx
|
317
|
2Fe-2S iron-sulfur cluster binding domain-containing protein
|
gene4736
|
trxB
|
1007
|
thioredoxin-disulfide reductase
|
gene1703
|
yagT
|
515
|
2Fe-2S iron-sulfur cluster binding domain-containing protein
|
gene2242
|
fdx
|
320
|
2Fe-2S iron-sulfur cluster binding domain-containing protein
|
gene2359
|
fdx
|
320
|
2Fe-2S iron-sulfur cluster binding domain-containing protein
|
gene3695
|
prmB
|
1040
|
2Fe-2S iron-sulfur cluster binding domain-containing protein
|
gene4607
|
fdx
|
320
|
2Fe-2S iron-sulfur cluster binding domain-containing protein
|
gene1711
|
sseA
|
896
|
sulfurtransferase
|
gene2744
|
sseA
|
842
|
sulfurtransferase
|
gene3963
|
sseA
|
839
|
sulfurtransferase
|
gene4293
|
sseA
|
881
|
sulfurtransferase
|
gene0819
|
cysD
|
971
|
sulfate adenylyltransferase subunit CysD
|
gene0820
|
cysNC
|
1862
|
adenylyl-sulfate kinase
|
gene1349
|
cysJ
|
4148
|
bifunctional nitrate reductase/sulfite reductase flavoprotein subunit alpha
|
gene1783
|
cysJ
|
1223
|
sulfite reductase subunit alpha
|
gene3259
|
sir
|
1778
|
nitrite/sulfite reductase
|
gene1630
|
metY
|
1322
|
bifunctional o-acetylhomoserine/o-acetylserine sulfhydrylase
|
gene1656
|
sdhB
|
788
|
succinate dehydrogenase iron-sulfur subunit
|
gene1928
|
sdhB
|
746
|
succinate dehydrogenase/fumarate reductase iron-sulfur subunit
|
gene4051
|
metZ
|
1250
|
O-succinylhomoserine sulfhydrylase
|
gene2607
|
sufS
|
1286
|
cysteine desulfurase
|
gene3260
|
cysH
|
752
|
phosphoadenylyl-sulfate reductase
|
gene3261
|
cysD
|
941
|
sulfate adenylyltransferase subunit 2
|
gene3267
|
cysW
|
938
|
sulfate ABC transporter permease subunit CysW
|
gene3268
|
cysU
|
881
|
sulfate ABC transporter permease subunit CysT
|
gene3269
|
sbp
|
1115
|
sulfate ABC transporter substrate-binding protein
|
gene3760
|
fdhD
|
860
|
formate dehydrogenase accessory sulfurtransferase FdhD
|
gene4188
|
sseA
|
590
|
putative 3-mercaptopyruvate sulfurtransferase
|
gene4353
|
thiS
|
200
|
sulfur carrier protein ThiS
|
gene3119
|
qcrA
|
1157
|
ubiquinol-cytochrome c reductase iron-sulfur subunit
|
For the SWX-4 strain, the genes involved in the metabolism of hydrocarbon compounds were analyzed. This includes the catechol metabolism genes catA (catechol 1,2-dioxygenase), as well as genes responsible for the degradation of other common polyaromatic hydrocarbons (Table 3). The annotation of gene function suggests that strain SWX-4 has adapted to using organosulfur compounds as sulfur sources, and it has the ability to degrade oil hydrocarbon compounds. This makes it a potential candidate for environmental biotechnologies aimed at bioremediation of oil-contaminated environments and for microbial enhancement of oil recovery (Frantsuzova et al. 2022).
Table 3
Genes involved in hydrocarbon
Locus tag
|
Gene Name
|
Size(bp)
|
Predicted function
|
gene1444
|
catA1
|
879
|
catechol 1,2-dioxygenase
|
gene4192
|
catA2
|
867
|
catechol 1,2-dioxygenase
|
gene2204
|
pcaG
|
570
|
protocatechuate 3,4-dioxygenase, alpha subunit
|
gene2205
|
pcaH
|
768
|
protocatechuate 3,4-dioxygenase, beta subunit
|
gene2222
|
dbfA2
|
504
|
dibenzofuran dioxygenase subunit beta
|
gene4195
|
benA-xylX
|
1395
|
benzoate/toluate 1,2-dioxygenase subunit alpha
|
gene4196
|
benB-xylY
|
540
|
benzoate/toluate 1,2-dioxygenase subunit beta
|
gene4197
|
benC-xylZ
|
2814
|
benzoate/toluate 1,2-dioxygenase reductase component
|
gene0419
|
alkB
|
834
|
catechol 1,2-dioxygenase
|
gene0786
|
p450
|
1251
|
Cytochrome p450
|
gene3694
|
prmA
|
1638
|
propane 2-monooxygenase large subunit
|
gene3695
|
prmB
|
1041
|
propane monooxygenase reductase component
|
gene3696
|
prmC
|
1119
|
propane 2-monooxygenase small subunit
|
gene3697
|
prmD
|
348
|
propane monooxygenase coupling protein
|
gene1169
|
adh
|
1113
|
alcohol dehydrogenase
|
gene1805
|
yiaY
|
1176
|
alcohol dehydrogenase
|
gene1470
|
eno
|
1284
|
enolase
|
gene0166
|
aldB
|
1524
|
aldehyde dehydrogenase
|
gene0796
|
mhpF
|
906
|
acetaldehyde dehydrogenase
|
gene2715
|
adhE
|
2628
|
acetaldehyde dehydrogenase / alcohol dehydrogenase
|
gene4660
|
hmgA
|
1110
|
homogentisate 1,2-dioxygenase
|
gene4451
|
fadJ
|
2190
|
3-hydroxyacyl-CoA dehydrogenase / enoyl-CoA hydratase / 3-hydroxybutyryl-CoA epimerase
|
gene3621
|
prpE
|
1881
|
propionyl-CoA synthetase
|
gene3669
|
accB
|
216
|
acetyl-CoA carboxylase biotin carboxyl carrier protein
|
gene0166
|
aldB
|
1524
|
aldehyde dehydrogenase
|
Pangenetic analysis and core gene
Pan-genome analysis provides insights into the core and accessory genes of bacteria, highlighting differences in genomic signatures and reflecting the genomic diversity and adaptive capabilities of strains in response to environmental changes and the acquisition of new traits through genetic material transfer under environmental selection (Li et al. 2021). In the case of 43 strains of Gordonia sp., the pan-genome consisted of 774 core genes, which are crucial for the fundamental bacterial lifestyle as presented in supplementary Table S1, Fig. S2, and Fig. S3. Additionally, there were a total of 135,870 accessory genes and 15,726 unique genes observed. These findings indicate that accessory genes play a vital role in bacterial survival and serve as the basis for their genomic diversity and adaptation to various environments (Wan et al. 2023). Consequently, studying auxin genes in Gordonia sp. may provide genetic explanations for how these organisms have undergone alterations to adapt to different environments.
Evolutionary analysis and functional notes of Gordonia sp.
A phylogenetic tree was constructed based on single-copy core genes to classify the affinities of 43 Gordonia sp. isolates. The bar graph attached to the right of the tree (Fig. 5) visualizes the content of core genes, accessory genes, and unique genes of the 43 specific strains of Gordonia sp. Each strain of Gordonia sp. possesses a certain number of unique genes. This is because these bacteria can acquire new genes from the environment through horizontal evolutionary transfer, thereby increasing bacterial genome diversity and enhancing the adaptability of strains to environmental changes.
To further investigate the functional properties of these strains, COG category and KEGG analysis were conducted on the core and non-essential genomes of the strains (supplementary Fig. S4). The COG category of core genes primarily relates to translation, ribosome structure and biogenesis, transcription, amino acid transport, and metabolism, which are essential for cell growth. These capabilities provide a survival advantage in an ever-changing environment (Ying et al. 2019).
Among the core genomic pathways in KEGG, the 43 isolates predominantly focus on life activities related to genetic information processing, while the proportion of non-essential genes is significantly concentrated on environmental adaptation. Furthermore, when comparing the protein sequences of 43 strains of Gordonia sp., it was discovered that 11 of them possessed desulfurization-related operons, which were indicated in watery blue on the phylogenetic tree (Fig. 5). The genomic sequences of these 11 type strains were then obtained from NCBI and subjected to all-against-all ANI analysis with the SWX-4 genome (Fig. 6). The 11 desulfurizing strains were found to be distributed across different branches of the tree, without any clear correlation between their genome and desulfurization metabolism. This suggests that there is no evolutionary relationship between genome composition and desulfurization capability in Gordonia sp. Moreover, the proportion of genes in the 11 desulfurizing strains did not significantly differ from that of the other 32 strains, indicating that the presence of non-essential genes in Gordonia sp. is generally low and unrelated to desulfurization.