Genome preparation, assembly and annotation
Initial sequencing and assembly, and final nanopore sequencing
The genome of R. rhodochrous ATCC BAA-870 was originally sequenced in 2009 by Solexa Illumina (sequence reads with average length 36 bp), resulting in a coverage of 74%, with an apparent raw coverage depth of 36x. An initial assembly of this 36-cycle, single-ended Illumina library, together with a mate-pair library, yielded a 6 Mbp genome of 257 scaffolds. A more recently performed paired-end Illumina library combined with the mate-pair library reduced this to only 6 scaffolds (5.88 Mbp). Even after several rounds of linking the mate-pair reads, we were still left with 3 separate contiguous sequences (contigs). The constraint was caused by the existence of repeats in the genome of which one was a 5.2 kb contig that, based on sequence coverage, must exist in four copies, containing 16S-like genes. Applying third generation sequencing (Oxford Nanopore Technology) enabled the full assembly of the genome, while the second generation (Illumina) reads provided the necessary proof-reading. This resulted in a total genome size of 5.9 Mbp, consisting of a 5.37 Mbp circular chromosome and a 0.53 Mbp linear plasmid. The presence of the plasmid was confirmed by performing Pulse Field Gel Electrophoresis using non-digested DNA [41].
Annotation
The assembled genome sequence of R. rhodochrous ATCC BAA-870 was submitted to the Bacterial Annotation System web server, BASys, for automated, in-depth annotation [42]. The BASys annotation was performed using raw sequence data for both the chromosome and plasmid of R. rhodochrous ATCC BAA-870 with a total genome length of 5.9 Mbp, in which 7548 genes were identified and annotated (Figure 1). The plasmid and chromosome encode a predicted 677 and 6871 genes, respectively. The same sequence run through RAST (Rapid Annotation using Subsystem Technology) predicted 5535 protein coding sequences (Figure 2), showing the importance of the bioinformatics tool used, which makes comparison to other genomes more difficult. Confirmation of annotation was performed manually for selected sequences. In BASys annotation, COGs (Clusters of Orthologous Groups) were automatically delineated by comparing protein sequences encoded in complete genomes representing major phylogenetic lineages [43]. As each COG consists of individual proteins or groups of paralogs from at least 3 lineages, it corresponds to an ancient conserved domain [44, 45]. A total of 3387 genes annotated in BASys were assigned a COG function (44.9% of annotated genes), while 55 to 59% of annotated genes on the chromosome and plasmid respectively have unknown function. Based on counts of total genes annotated in RAST (5535), only 26% are classified as belonging to subsystems with known functional roles, while 74% of genes do not belong to known funtional roles. Overall 38% of annotated genes were annotated as hypothetical irrespective of whether they were included in subsystems or not. The complete genome sequence of R. rhodochrous ATCC BAA 870 is deposited at NCBI GenBank, with Bioproject accession number PRJNA487734, and Biosample accession number SAMN09909133.
Taxonomy and lineage of R. rhodochrous ATCC BAA-870
The R. rhodochrous ATCC BAA-870 genome encodes four 16S rRNA genes, consistent with the average 16S gene count statistics of Rhodococcus genomes. Other Rhodococcus genomes also contain multiple 16S rRNA gene copies. From a search of The Ribosomal RNA Database, of the 28 Rhodococcus genome records deposited in the NCBI database, 16S rRNA gene counts range from 3-5 copies, with an average of 4 [46]. Of the four 16S rRNA genes found in R. rhodochrous ATCC BAA-870, two pairs are identical (i.e. there are two copies of two different 16S rRNA genes). One of each identical 16S rRNA gene was used in nucleotide-nucleotide BLAST for highly similar sequences [47]. BLAST results were used for comparison of R. rhodochrous ATCC BAA-870 to other similar species using 16S rRNA multiple sequence alignment and phylogeny in ClustalO and ClustalW respectively [48-50] (Figure 3). Nucleotide BLAST results of the two different R. rhodochrous ATCC BAA-870 16S rRNA genes show closest sequence identities to Rhodococcus sp. 2G and R. pyridinovorans SB3094, with either 100% or 99.74% identities to both strains depending on the 16S rRNA copy.
We used the in silico DNA-DNA hybridisation tool, the Genome-to-Genome Distance Calculator version 2.1 [51-53], to assess the genome similarity of R. rhodochrous ATCC BAA-870 to its closest matched strains based on 16S rRNA alignment (R. pyridinovorans SB3094 and Rhodococcus sp. 2G). The results of genome based species and subspecies delineation, and difference in GC content, is summarised (Supp. Info Table S3), with R. jostii RHA1 additionally shown for comparison. GC differences of below 1% would indicate the same species, and therefore R. rhodochrous ATCC BAA-870 cannot be distinguished from the other strains based on GC content. Digital DNA-DNA hybridisation (dDDH) values of more than 70% and 79% are the threshold for delineating type strains and subspecies. While 16S rRNA sequence alignment and GC content suggest that R. rhodochrous ATCC BAA-870 and R. pyridinovorans SB3094 and Rhodococcus sp. 2G are closely related strains, the GGDC supports their delineation at the subspecies level.
Protein coding sequences
The genomic content of R. rhodochrous ATCC BAA-870 was outlined and compared to other rhodococcal genomes (Supp. Info. Table S1). BAA-870 contains 7548 predicted protein-coding sequences (CDS) according to BASys annotation. 56.9% of this encodes previously identified proteins of unknown function and includes 305 conserved hypothetical proteins. A large proportion of genes are labelled ‘hypothetical’ based on sequence similarity and/or the presence of known signature sequences of protein families (Figure 4). Out of 7548 BASys annotated genes, 1481 are annotated enzymes that could be assigned an EC number (20%). BASys annotation could provide a possible overprediction of gene numbers, due to sensitive GLIMMER ab initio gene prediction methods that may give false positives for higher GC content sequences [54]. The RAST subsystem annotations are assigned from the manually curated SEED database, in which hypothetical proteins are annotated based only on related genomes. RAST annotations are grouped into two sets (genes that are either in a subsystem, or not in a subsystem) based on predicted roles of protein families with common functions. Genes belonging to recognised subsystems can be considered reliable and conservative gene predictions. Annotation of genes that do not belong to curated protein functional families however (i.e. those not in the subsystem), may be underpredicted by RAST, since annotations belonging to subsystems are based only on related neighbours.
Sequences of other Rhodococcus genomes were obtained from the Genome database at NCBI [55] and show a large variation in genome size between 4 and 10 Mbp (Supp. Info. Table S1), with an average of 6.1 ± 1.6 Mbp. The apparent total genome size of R. rhodochrous ATCC BAA-870, 5.9 Mbp (consisting of a 5.37 Mbp genome and a 0.53 Mbp plasmid), is close to the average. From the well-described rhodococci (Table 1), the genome of R. jostii RHA1 is the largest rhodococcal genome sequenced to date (9.7 Mbp), but only 7.8 Mbp is chromosomal, while the pathogenic R. hoagii genomes are the smallest at ~5 Mbp. All rhodococcal genomes have a high GC content, ranging from 62 – 71%. The average GC content of the R. rhodochrous ATCC BAA-870 chromosome and plasmid is 68.2% and 63.8%, respectively. R. jostii RHA1 has the lowest percentage coding DNA (87%), which is predictable given its large overall genome size, while R. rhodochrous ATCC BAA-870 has a 90.6% coding ratio, and on average large genes, consisting of ~782 bps per gene. Interestingly, the distribution of protein lengths on the chromosome is bell-shaped with a peak at 350 bps per gene, while the genes on the plasmid show two size peaks, one at 100 bps and one at 350 bps. Together with the lower GC content, this shows that the plasmid content was probably acquired over different occasions [56].
Transcriptional control
Transcriptional regulatory elements in R. rhodochrous ATCC BAA-870 include 18 sigma factors, at least 8 regulators of sigma factor, and 118 other genes involved in signal transduction mechanisms (COG T), 261 genes encoding transcriptional regulators and 47 genes encoding two-component signal transduction systems. There are 129 proteins in R. rhodochrous ATCC BAA-870 associated with translation, ribosomal structure and biogenesis (protein biosynthesis). The genome encodes all ribosomal proteins, with the exception of S21, as occurs in other actinomycetes. RAST annotation predicts 66 RNAs. The 56 tRNAs correspond to all 20 natural amino acids and include two tRNAfMet. Additional analysis of the genome sequence using the tRNA finding tool tRNAScan-SE v. 2.0 [57, 58] confirms the presence of 56 tRNA genes in the R. rhodochrous ATCC BAA-870 genome, made up of 52 tRNA genes encoding natural amino acids, 2 pseudogenes, one tRNA with mismatched isotype and one +9 Selenocysteine (TCA) tRNA.
Protein location in the cell
It is often critical to know where proteins are located in the cell in order to understand their function [59], and prediction of protein localization is important for both drug targeting and protein annotation. In this study, prediction was done using the BASys SignalP signal prediction service [42]. The majority of annotated proteins are soluble and located in the cytoplasm (83%), while proteins located at the cellular membrane make up 16% of the total. Cell membrane proteins include proteins that form part of lipid anchors, peripheral and integral cell membrane components, as well as proteins with single or multiple pass functions. Of the membrane proteins in R. rhodochrous ATCC BAA-870, 47% constitute single-pass, inner or peripheral membrane proteins, while 41% are multi-pass membrane proteins. Most of the remaining proteins will be transported over the membrane. The periplasm contains proteins distinct from those in the cytoplasm which have various functions in cellular processes, including transport, degradation, and motility. Periplasmic proteins would mostly include hydrolytic enzymes such as proteases and nucleases, proteins involved in binding of ions, vitamins and sugar molecules, and those involved in chemotaxic responses. Detoxifying proteins, such as penicillin binding proteins, are also presumed to be located mostly in the periplasm.
Transport and Metabolism
A total of 1504 genes are implicated in transport. Numerous components of the ubiquitous transporter families, the ATP-Binding Cassette (ABC) superfamily and the Major Facilitator Superfamily (MFS), are present in Rhodococcus strain BAA-870. MFS transporters are single-polypeptide secondary carriers capable only of transporting small solutes in response to chemiosmotic ion gradients [60, 61]. R. rhodochrous ATCC BAA-870 has 81 members of the MF Superfamily, mostly from the phthalate permease and sugar transporter families. There are dozens of families within the ABC superfamily, and each family generally correlates with substrate specificity. Transporters of R. rhodochrous ATCC BAA-870 include at least 122 members of the ABC superfamily, which includes both uptake and efflux transport systems. Out of 4163 genes assigned a COG function, 1487 (36%) are associated with transport and metabolism. These include 206 carbohydrate, 271 amino acid, 121 coenzyme, 236 inorganic ion, 411 lipid and 67 nucleotide transport and metabolism gene functions, and 174 secondary metabolite biosynthesis, transport and catabolism genes. These multiple transport systems highlight the metabolic versatility of this Rhodococcus species, which facilitates the use of whole cells in biotechnological applications.
The complete biosynthetic pathways for all nucleotides, nucleosides and natural amino acids are also contained in the genome of R. rhodochrous ATCC BAA-870. The central metabolism of strain BAA-870 includes glycolysis, gluconeogenesis, the pentose phosphate pathway, and the tricarboxylic acid (TCA) cycle, a typical metabolic pathway for an aerobic organism. McLeod et al. reported that R. jostii RHA1 contains genes for the Entner-Doudoroff pathway (which requires 6-phosphogluconate dehydratase and 2-keto-3-deoxyphosphogluconate aldolase to create pyruvate from glucose) [10]. The Entner-Doudoroff pathway is, however, rare in Gram positive organisms which preferably use glycolysis for a richer ATP yield. There is no evidence of this pathway existing in R. rhodochrous ATCC BAA-870, indicating that the RHA1 strain must have acquired it rather recently. Enzymes found in other rhodococci such as lipases and esterases [62, 63] are also present in strain BAA-870.
Aromatic Catabolism and oxidoreductases
As deduced from the better characterized pseudomonads [64], a large number of ‘peripheral aromatic’ pathways funnel a broad range of natural and xenobiotic compounds into a restricted number of ‘central aromatic’ pathways. Analysis of the R. rhodochrous ATCC BAA-870 genome suggests that at least four major pathways exist for the catabolism of central aromatic intermediates, comparable to the well-defined aromatic metabolism of Pseudomonas putida KT2440 strain [65].
Catabolism typically involves oxidative enzymes. The presence of multiple homologs of catabolic genes in Rhodococcus species suggests that they may provide a comprehensive biocatalytic profile [1]. In R. rhodochrous ATCC BAA-870 the dominant portion of annotated enzymes are involved in oxidation and reduction. There are about 500 oxidoreductase related genes including oxidases, hydrogenases, reductases, oxygenases, dioxygenases, cytochrome P450s, catalases and peroxiredoxins. These numbers are quite high compared to other bacteria of the same size, but in line with most other (sequenced) rhodococci [66]. In R. rhodochrous ATCC BAA-870 there are 71 monooxygenase genes, 11 of which are on the plasmid. Rhodococcus genomes usually encode large numbers of oxygenases [1]. Some of these are flavonoid proteins with diverse useful activities [67], which include monooxygenases capable of catalysing Baeyer–Villiger oxidations wherein a ketone is converted to an ester [68, 69].
In R. rhodochrous ATCC BAA-870 there are 14 cytochrome P450 genes and their prevalence reflects a fundamental aspect of rhodococcal physiology. Similarly, the number of cytochrome P450 genes in R. jostii RHA1 is 25 (proportionate to the larger genome) and is typical of actinomycetes. It is unclear which oxygenases in R. rhodochrous ATCC BAA-870 are catabolic and which are involved in secondary metabolism, but their abundance is consistent with a potential ability to degrade an exceptional range of aromatic compounds (oxygenases catalyse the hydroxylation and cleavage of these compounds). Rhodococci are well known to have the capacity to catabolise hydrophobic compounds, including hydrocarbons and polychlorinated biphenyls (PCBs), mediated by a cytochrome P450 system [70-73]. Cytochrome P450 oxygenase is often found fused with a reductase, as in Rhodococcus sp. NCIMB 9784 [74]. Genes associated with biphenyl and PCB degradation are found in multiple sites on the R. jostii RHA1 genome, both on the chromosome as well as on linear plasmids [1]. R. jostii RHA1 was also found to show lignin-degrading activity, possibly based on the same oxidative capacity as that used to degrade biphenyl compounds [75].
The oxygenases found in rhodococci include multiple alkane monooxygenases (genes alkB1–alkB4) [76], steroid monooxygenase [77], styrene monooxygenase [78], peroxidase [79] and alkane hydroxylase homologs [80]. R. rhodochrous ATCC BAA-870 has 87 oxygenase genes while the PCB degrading R. jostii RHA1 has 203 oxygenases, including 19 cyclohexanone monooxygenases (EC 1.14.13.22), implying that of the two, BAA-870 is less adept at oxidative catabolism. Rhodococcal cyclohexanone monooxygenases can be used in the synthesis of industrially interesting compounds from cyclohexanol and cyclohexanone. These include adipic acid, caprolactone (for polyol polymers) and 6-hydroxyhexanoic acid (for coating applications) [81]. Chiral lactones can also be used as intermediates in the production of prostaglandins [82]. The same oxidative pathway can be used to biotransform cyclododecanone to lauryl lactone or 12-hydroxydodecanoic acid [83, 84]. Cyclododecanone monooxygenase of Rhodococcus SC1 was used in the kinetic resolution of 2-substituted cycloketones for the synthesis of aroma lactones in good yields and high enantiomeric excess [85]. Similar to R. jostii RHA1, R. rhodochrous ATCC BAA-870 also encodes monooxygenases including three cyclopentanone monooxygenases (EC 1.14.13.16) and a phenol monooxygenase (EC 1.14.13.7) on the plasmid, a methane monooxygenase (EC 1.14.13.25), two alkane 1-monooxygenases (EC 1.14.15.3) and five phenylacetone monooxygenases (EC 1.14.13.92), one of which is on the plasmid. These enzymes could be interesting for synthetic purposes in industrial biotechnological applications.
Nitrile biocatalysis
Rhodococci are well known for their application in the commercial manufacture of amides and acids through hydrolysis of the corresponding nitriles. R. rhodochrous J1 can convert acrylonitrile to the commodity chemical acrylamide [86], and both Mitsubishi Rayon Co., Ltd (Japan) and Senmin (South Africa) are applying this biocatalytic reaction at the multi-kiloton scale. Lonza Guangzhou Fine Chemicals use the same biocatalyst for large-scale commercial synthesis of nicotinamide from 3-cyanopyridine [87]. Both processes rely on rhodococcal nitrile hydratase activity [81]. The locations and numbers of nitrile converting enzymes in the available genomes of Rhodococcus were identified (Table 2). As expected from previous studies, strain BAA-870 contains several nitrile converting enzymes [32]. A low molecular weight cobalt-containing nitrile hydratase and a nitrilase are present, along with two amidases. The low molecular weight nitrile hydratase gene and amidase form a cluster, along with their associated regulatory elements (Table 2), including cobalt transport genes necessary for uptake of cobalt for inclusion in the nitrile hydratase active site. This is all in line with previous activity assays using this Rhodococcus strain [33, 34]. However, in most R. rhodochrous strains these enzymes are on the chromosome, while in R. rhodochrous ATCC BAA-870, they are found on a plasmid. In R. rhodochrous ATCC BAA-870 the nitrile hydratase is expressed constitutively, explaining why this strain is an exceptional nitrile biocatalyst [36]. Environmental pressure through chemical challenge by nitriles may have caused this deregulation of the nitrile biocatalyst by transferring it to a plasmid.
Many rhodococci can hydrolyse a wide range of nitriles [88-91]. The R. jostii RHA1 (Table 2) 16S RNA sequence indicates that it is closely related to R. opacus [10] according to the taxonomy of Gürtler et al (2004) [92]. R. jostii RHA1 expresses a nitrile hydratase (an acetonitrile hydratase) and utilises nitriles such as acetonitrile, acrylonitrile, propionitrile and butyronitrile [93], while R. opacus expresses nitrile hydrolysis activity [88]. R. erythropolis PR4 (Table 2) expresses an Fe-type nitrile hydratase [94], and R. erythropolis strains are well known for expressing this enzyme [88, 95, 96] as a part of a nitrile metabolism gene cluster [92]. This enzyme has been repeatedly determined in this species from isolated diverse locations [97], expressing broad substrate profiles, including acetonitrile, propionitrile, acrylonitrile, butyronitrile, succinonitrile, valeronitrile, isovaleronitrile and benzonitrile [88].
The nitrile hydratase enzymes of R. rhodochrous have to date been shown to be of the Co-type [6, 96, 98], which are usually more stable than the Fe-type nitrile hydratases. They have activity against a broad range of nitriles, including phenylacetonitrile, 2-phenylpropionitrile, 2-phenylglycinonitrile, mandelonitrile, 2-phenylbutyronitrile, 3-phenylpropionitrile, N-phenylglycinonitrile, p-toluonitrile and 3-hydroxy-3-phenylpropionitrile [32]. R. ruber CGMCC3090 and other strains express nitrile hydratases [88, 99] while the nitrile hydrolysis activity of R. equi [88], is also attributed to a nitrile hydratase [100].
The alternative nitrile hydrolysis enzyme, nitrilase, is also common in rhodococci, including R. erythropolis [101], R. rhodochrous [102-105], R. opacus B4 [106] and R. ruber [107, 108] (Table 2). The nitrilase from R. ruber can hydrolyse acetonitrile, acrylonitrile, succinonitrile, fumaronitrile, adiponitrile, 2-cyanopyridine, 3-cyanopyridine, indole-3-acetonitrile and mandelonitrile [108]. The nitrilases from multiple R. erythropolis strains were active towards phenylacetonitrile [109]. R. rhodochrous nitrilase substrates include (among many others) benzonitrile for R. rhodochrous J1 [110] and crotononitrile and acrylonitrile for R. rhodochrous K22 [111]. R. rhodochrous ATCC BAA-870 expresses an enantioselective aliphatic nitrilase encoded on the plasmid, which is induced by dimethylformamide [36]. Another nitrilase/cyanide hydratase family protein is also annotated on the plasmid (this study) but has not been characterised. The diverse, yet sometimes very specific and enantioselective substrate specificities of all these rhodococci gives rise to an almost plug-and-play system for many different synthetic applications. Combined with their high solvent tolerance, rhodococci are very well suited as biocatalysts to produce amides for both bulk chemicals and pharmaceutical ingredients.
Secondary metabolism and metabolite biosynthesis clusters
The ongoing search for new siderophores, antibiotics and antifungals has led to a recent explosion of interest in mining bacterial genomes [112], and the secondary metabolism of diverse soil-dwelling microbes remains relatively underexplored despite their huge biosynthetic potential [113]. Evidence of an extensive secondary metabolism in R. rhodochrous ATCC BAA-870 is supported by the presence of at least 227 genes linked to secondary metabolite biosynthesis, transport and catabolism. The genome chromosome contains 15 biosynthetic gene clusters associated with secondary metabolites or antibiotics, identified by antiSMASH (antibiotics and Secondary Metabolite Analysis Shell pipeline, version 5.0.0) [114, 115]. Biosynthetic gene clusters identified in R. rhodochrous BAA-870 include ectoine (1,4,5,6-tetrahydro-2-methyl-4-pyrimidinecarboxylic acid), butyrolactone, betalactone, and type I polyketide synthase (PKS) clusters, as well as three terpene and seven nonribosomal peptide synthetase (NRPS) clusters. An additional six putative biosynthetic clusters were identified on the R. rhodochrous ATCC BAA-870 plasmid, four of an unknown type, and the other two with low similarity to enterobactin and lipopolysaccharide biosynthetic clusters. The presence of an ectoine biosynthesis cluster suggests that R. rhodochrous ATCC BAA-870 has effective osmoregulation and enzyme protection capabilities. R. rhodochrous ATCC BAA-870, together with other Rhodococcus strains, is able to support diverse environments and can tolerate harsh chemical reactions when used as whole cell biocatalysts, and it is likely that ectoine biosynthesis plays a role in this. Regulation of cytoplasmic solute concentration through modulation of compounds such as inorganic ions, sugars, amino acids and polyols provides a versatile and effective osmo-adaptation strategy for bacteria in general. Ectoine and hydroxyectoine are common alternate osmoregulation solutes found especially in halophilic and halotolerant microorganisms [116, 117], and hydroxyectoine has been shown to confer heat stress protection in vivo [118]. Ectoines provide a variety of useful biotechnological and biomedical applications [119], and strains engineered for improved ectoine synthesis have been used for the industrial production of hydroxyectoine as a solute and enzyme stabiliser [120, 121].
AntiSMASH analysis reveals 3 terpene biosynthetic clusters in the genome of R. rhodochrous ATCC BAA-870. Terpenes and isoprenoids are implicated in diverse structural and functional roles in nature, providing a rich pool of natural compounds with applications in synthetic chemistry, pharmaceutical, flavour, and even biofuel industries. The structures, functions and chemistries employed by the enzymes involved in terpene biosynthesis are well known, especially for plants and fungi [122, 123]. However, it is only recently that bacterial terpenoids have been considered as a possible source of new natural product wealth [124, 125], largely facilitated by the explosion of available bacterial genome sequences. Interestingly, bacterial terpene synthases have low sequence similarities, and show no significant overall amino acid identities compared to their plant and fungal counterparts. Yamada et al. used a genome mining strategy to identify 262 bacterial synthases, and subsequent isolation and expression of genes in a Streptomyces host confirmed the activities of these predicted genes and led to the identification of 13 previously unknown terpene structures [124].
Soil-dwelling Rhodococci present rich possible sources of terpene and isoprenoid discovery. Some of the examples of annotated R. rhodochrous ATCC BAA-870 genes related to terpene and isoprenoid biosynthesis include phytoene saturase and several phytoene synthases, dehydrogenases and related proteins, as well as numerous diphosphate synthases, isomerases and epimerases. The genome also contains, for example, lycopene cyclase, a novel non-redox flavoprotein [126], and farnesyl diphosphate synthase, farnesyl transferase, geranylgeranyl pyrophosphate synthetases and digeranylgeranylglycerophospholipid reductase. Farnesyl diphosphate synthase and geranylgeranyl pyrophosphate synthases are potential anticancer and anti-infective drug targets [122]. In addition, the R. rhodochrous ATCC BAA-870 plasmid encodes a lactone ring-opening enzyme, monoterpene epsilon-lactone hydrolase.
The abundance of PKS and NRPS clusters suggest that R. rhodochrous ATCC BAA-870 may host a significant potential source of molecules with immunosuppressing, antifungal, antibiotic and siderophore activities [127]. The R. rhodochrous ATCC BAA-870 genome has two PKS genes, one regulator of PKS expression, one exporter of polyketide antibiotics, as well as three for polyketide cyclase/dehydrase involved in polyketide biosynthesis. In addition, there are two actinorhodin polyketide dimerases. A total of five NRPS genes for secondary metabolite synthesis can be found on the chromosome, while in comparison R. jostii RHA1 contains 24 NRPS and seven PKS genes [10]. R. jostii RHA1 was also found to possess a pathway for the synthesis of a siderophore [128]. R. rhodochrous ATCC BAA-870 contains 4 probable siderophore-binding lipoproteins, 3 probable siderophore transport system permeases, and two probable siderophore transport system ATP-binding proteins. Other secondary metabolite genes found in R. rhodochrous ATCC BAA-870 include a dihydroxybenzoic acid-activating enzyme (2,3-dihydroxybenzoate-AMP ligase bacillibactin siderophore), phthiocerol/phenolphthiocerol synthesis polyketide synthase type I, two copies of linear gramicidin synthase subunits C and D genes, and tyrocidine synthase 2 and 3.
CRISPR
One putative clustered regularly interspaced short palindromic repeat (CRISPR) is contained in the R. rhodochrous ATCC BAA-870 genome, according to analysis by CRISPRCasFinder [129]. Associated CRISPR genes are not automatically detected by the CRISPRCasFinder tool, but manual searches of the annotated genome for Cas proteins reveal possible Cas9 candidate genes within the R. rhodochrous ATCC BAA-870 genome, including a ruvC gene, and HNH endonuclease and nuclease genes. CRISPRs are unusual, or under-reported, finds in Rhodococcus genomes. Based on literature searches to date, only two other sequenced Rhodococcus strains were reported to contain potential CRISPRs. R. opacus strain M213, isolated from fuel-oil contaminated soil, has one confirmed and 14 potential CRISPRs [130], identified using the CRISPRFinder tool [131]. Pathak et al. also surveyed several other Rhodococcus sequences and found no other CRISPRs. Zhao and co-workers state that Rhodococcus strain sp. DSSKP-R-001, interesting for its beta-estradiol-degrading potential, contains 8 CRISPRs [132]. However, the authors do not state how these were identified. Pathak et al. highlight the possibility that the CRISPR in R. opacus strain M213 may have been recruited from R. opacus R7 (isolated from polycyclic aromatic hydrocarbon contaminated soil [133]), based on matching BLASTs of the flanking regions.
The R. rhodochrous ATCC BAA-870 CRISPR upstream and downstream regions (based on a 270- and 718 nucleotide length BLAST, respectively) showed significant, but not matching, alignment with several other Rhodococcus strains. The region upstream of the BAA-870 CRISPR showed a maximum 95% identity with that from R. rhodochrous strains EP4 and NCTC10210, while the downstream region showed 97% identities to R. pyridinovorans strains GF3 and SB3094, R. biphenylivorans strain TG9, and Rhodococcus sp. P52 and 2G. Analysis by PHAST phage search tool [134] identified the presence of 6 potential, but incomplete, prophage regions on the chromosome, and one prophage region on the plasmid, suggesting that the CRISPR acquisition in R. rhodochrous ATCC BAA-870 could also have arisen from bacteriophage infection during its evolutionary history.
Horizontal gene transfer
Organisms acquire diverse metabolic capacity through gene duplications and acquisitions, typically mediated by transposases. Analysis using IslandViewer (for computational identification of genomic islands) [135] identifies 10 possible large genomic island regions in R. rhodochrous ATCC BAA-870 which may have been obtained through horizontal mobility. Half of these genomic islands are located on the plasmid and make up 90% of the plasmid coding sequence. The low molecular weight cobalt-containing nitrile hydratase operon is located on an 82.5 kbp genomic island that includes 57 predicted genes in total. Other genes of interest located on this same genomic island include crotonase and enoyl-CoA hydratase, 10 dehydrogenases including four acyl-CoA dehydrogenases and two aldehyde dehydrogenases, four hydrolases including 5-valerolactone hydrolase and amidohydrolase, beta-mannosidase, haloacid dehalogenase and five oxidoreductases. The R. rhodochrous ATCC BAA-870 genome contains 31 transposase genes found in the genomic regions identified by IslandViewer, one of which is from the IS30 family, a ubiquitous mobile insertion element in prokaryotic genomes [136]. Other transposase genes belonging to at least 10 different families of insertion sequences were identified in R. rhodochrous ATCC BAA-870, including ISL3, IS5, IS701, two IS1634, three IS110, three IS3, three IS256, five IS21, and six IS630 family transposases. The majority of these transposons (27 of the 31 identified by IslandViewer) are located on the plasmid. The large percentage of possible mobile genomic region making up the plasmid, together with the high number of transposon genes and the fact that the plasmid contains the machinery for nitrile degradation, strongly support our theory that R. rhodochrous ATCC BAA-870 has adapted its genome recently in response to the selective pressure of routine culturing in nitrile media in the laboratory. Even though isolated from contaminated soil, the much larger chromosome of R. jostii RHA1 has undergone relatively little recent genetic flux as supported by the presence of only two intact insertion sequences, relatively few transposase genes, and only one identified pseudogene [10]. The smaller R. rhodochrous ATCC BAA-870 genome, still has the genetic space and tools to adapt relatively easily in response to environmental selection.