P. aeruginosa Sequence Retrieval
The study sought to retrieve complete genome sequences of P. aeruginosa strains isolated from various ecological niches. These sequences were analyzed by different comparative genomics tools in a bid to characterize the biofilm formation genes with respect to different strains of the pathogen. A total of 185 complete genome sequences of P. aeruginosa strains from the NCBI and IPCD databases were retrieved for analysis as indicated in Table 1. Microorganisms were classified into 14 categories based on the ecological niches the different strains were isolated from. The study used metadata on each strain to complete this classification. Table 1 shows niche-specific categories of P. aeruginosa isolates which were used for downstream analyses.
Table 1 |
Statistics of Pseudomonas aeruginosa sequences and their respective ecological niches |
Ecological niche | Analyzed sequences |
Abscess | 2 |
Blood | 16 |
Bronchial | 6 |
Cell culture | 4 |
Clinical | 10 |
Dental | 1 |
Environment | 8 |
Eye | 2 |
Lung | 1 |
Sputum | 26 |
Trachea aspirates | 5 |
Urine | 7 |
Wound | 9 |
Unclassified | 98 |
Total | 185 |
Figure 1 indicates the niche associations of the strains selected from NCBI. The selection doesn’t represent any real prevalence of P. aeruginosa in nature. It is biased by how strains are selected for sequencing.
Identification of Biofilm Formation Genes
Using genome annotation data and sequence alignments, 13 biofilm formation genes common in most of the strains of the P. aeruginosa were identified and grouped into corresponding COGs represented by individual FASTA files as indicated in table 2. Each file contained 44 protein sequences of the respective genes selected from every P. aerugionosa reference genome. Table 3 indicates the names, GenBank accession number, size and number of biofilm associated genes represented by individual COGs.
Table 2
|
|
|
Biofilm formation genes retrieved using custom python scripts
|
ID
|
Gene name
|
Accession Number
|
882251
|
pslJ
|
NC_002516.2
|
879704
|
pslG
|
NC_002516.2
|
878490
|
pslE
|
NC_002516.2
|
882052
|
fliC
|
NC_002516.2
|
882125
|
algU
|
NC_002516.2
|
879406
|
algC
|
NC_002516.2
|
881782
|
rsaL
|
NC_002516.2
|
879474
|
gshA
|
NC_002516.2
|
879004
|
algD
|
NC_002516.2
|
882372
|
ppyR
|
NC_002516.2
|
879143
|
arnB
|
NC_002516.2
|
8819333
|
htpG
|
NC_002516.2
|
878223
|
gshB
|
NC_002516.2
|
Table 3
The names, GenBank accession numbers, size, number of annotated genes and ecological niches of the 44 Pseudomonas aeruginosa genomes
Scientific Names
|
GenBank Accession Number
|
Size (kbps)
|
Number of annotated genes
|
GC (%)
|
Ecological niche
|
P. a PAO1
|
NC_002516
|
6,264.404
|
5700
|
66.56
|
Unclassified
|
P. a strain 24Pae112
|
NZ_CP029605
|
7097.241
|
6596
|
65.99
|
|
P. a strain 268
|
NZ_CP032761
|
7030.474
|
6604
|
65.91
|
|
P. a strain B17932
|
NZ_CP034436
|
6744.658
|
5943
|
65.94
|
|
P. a strain BA15561
|
NZ_CP033432
|
6793.961
|
5813
|
65.84
|
|
P.a strain NCTC 12903
|
NZ_LR134309
|
6839.985
|
6431
|
66.09
|
Blood
|
P. a strain PA1207
|
NZ_CP022001
|
7411.863
|
6813
|
65.70
|
|
P. a strain PA1242
|
NZ_CP022002
|
7050.510
|
6303
|
65.80
|
|
P. a strain PABL012
|
NZ_CP031659
|
6546.467
|
6089
|
66.29
|
|
P. a strain PABL017
|
NZ_CP031660
|
6503.460
|
6019
|
66.31
|
|
P. a strain Pa58
|
NZ_CP021775
|
7241.575
|
6673
|
65.80
|
|
P. a strain Pa84
|
NZ_CP021999
|
6566.724
|
6058
|
66.23
|
|
P. a strain Pa124
|
NZ_CP021774
|
7008.516
|
6479
|
65.84
|
|
P. a strain Pa127
|
NZ_CP022000
|
7148.302
|
6565
|
65.74
|
Bronchial
|
P. a strain GIMC5015:PAKB6
|
NZ_CP034429
|
6258.491
|
5772
|
66.53
|
|
P. a strain H26023
|
NZ_CP033685
|
6729.216
|
6260
|
66.21
|
|
P. a strain NCTC11445
|
NZ_LR134308
|
6766.292
|
6378
|
66.06
|
|
P. a paerg002
|
NZ_LR130527
|
6451.470
|
5935
|
66.40
|
|
P. a paerg003
|
NZ_LR130530
|
6433.962
|
5945
|
66.40
|
|
P. a paerg004
|
NZ_LR130531
|
6452.809
|
5936
|
66.40
|
|
P. a paerg005
|
NZ_LR130534
|
6931.425
|
6427
|
66.00
|
Clinical
|
P. a paerg009
|
NZ_LR130533
|
6941.287
|
6352
|
65.98
|
|
P. a paerg010
|
NZ_LR130536
|
6433.960
|
5950
|
66.40
|
|
P. a paerg011
|
NZ_LR130535
|
6434.133
|
5946
|
66.40
|
|
P. a paerg012
|
NZ_LR130537
|
6434.020
|
5948
|
66.40
|
|
P. a strain L10
|
NZ_CP019338
|
6661.962
|
6119
|
66.13
|
Environment
|
P. a strain PA34
|
NZ_CP032552
|
6810.079
|
6314
|
66.07
|
Eye
|
P. a C-NN2 isolate
|
NZ_LT883143
|
6902.967
|
6412
|
66.12
|
Lung
|
P. a strain H25883
|
NZ_CP033686
|
6706.800
|
6236
|
66.15
|
|
P. a strain H26027
|
NZ_CP033684
|
7079.598
|
6650
|
66.07
|
|
P. a strain MRSN12280
|
NZ_CP028162
|
7070.928
|
6597
|
66.02
|
Wound
|
P. a PAO1161
|
NZ_CP032126
|
6383.803
|
5918
|
66.42
|
|
P. a strain
NCTC13715
|
NZ_LR134330
|
6765.311
|
6288
|
66.12
|
Urine
|
P. a strain
FDAARGOS_505
|
NZ_CP033832
|
7029.824
|
6520
|
65.87
|
Trachea
|
P. a strain
AES1M
|
NZ_CP037925
|
6373.139
|
5848
|
66.48
|
|
P. a strain AES1R
|
NZ_CP037926
|
6373.893
|
5833
|
66.48
|
|
P. a strain CCUG 70744
|
NZ_CP023255
|
6859.232
|
6422
|
66.04
|
|
P. strain LW
|
NZ_CP022478
|
6824.837
|
6271
|
65.97
|
|
P. a strain PASGNDM345
|
NZ_CP020703
|
6893.164
|
6432
|
66.07
|
Sputum
|
P. a strain PASGNDM699
|
NZ_CP020704
|
6985.102
|
6545
|
66.00
|
|
P. a strain SP2230
|
NZ_CP034434
|
6976.603
|
6067
|
65.74
|
|
P. a strain SP4527
|
NZ_CP034409
|
7005.215
|
6123
|
65.79
|
|
P. a strain SP4528
|
NZ_CP033439
|
6877.287
|
6082
|
65.85
|
|
P. a strain Y31
|
NZ_CP030910
|
6831.076
|
6322
|
66.15
|
|
The retrieved sequences were further classified into four categories based on the role they play in the biofilm formation process. An additional category was created to cater for genes whose functions are not clearly understood yet. Specific roles were identified using related publications and metadata provided for each gene. Based on this, the genes were assigned to different classes as shown in Table 4.
Table 4
|
Classes of biofilm formation genes retrieved using the custom python scripts
|
Classes
|
No of Sequences
|
Genes
|
Adhesins
|
1
|
ppyR (psl)
|
Repressors
|
1
|
GshB
|
Regulatory
|
5
|
algC, algD, algU, arnB, rsaL
|
Motility
|
3
|
fliC, gshA, htpG
|
Unclassified
|
3
|
pslJ, pslE, pslG,
|
Total
|
13
|
|
Comparative Genomic Analyses
From the comparison of the 13 COG-based ML phylogenetic trees, the study created a tree of relationships between different biofilm related genes using the treedist distance matrix (Figure 2). The phylogenetic analyses revealed four clusters. From the 13 biofilm formation genes analyzed, 10 genes fell into a single cluster. The algD and algU genes diverged the most from the other biofilm formation genes, while fliC was not completely divergent from the other genes which seem to have co-evolved together. While the study assumed that all the biofilm formation genes co-evolved together given that they belong to a group of functionally related genes that generally was confirmed by the obvious co-evolution of these genes – the divergence of the three genes may result from a horizontal gene transfer. Alternatively, it may be assumed that these genes evolved faster than other genes as they are more important in terms of a proper response of the biofilm formation to specific environmental stimuli in different habitats. It makes these genes potential targets for antibiofilm therapies. Both the algD and algU genes were classified as regulatory genes responsible for the regulatory stage of the biofilm formation process.
Genome Mapping
To further investigate the role of horizontal gene transfer in the divergence observed in algU, algD and fliC, the study conducted a BRIG analysis of different strains of P. aeruginosa. This analysis helped to determine the variable and conserved regions within the genome as well as the distribution of the biofilm formation genes. It also pointed further light to the meaning of these mutations in different ecological niches occupied by the ubiquitous pathogen. From the BRIG analysis the study created 14 gene maps, each map representing strains of the pathogen occupying different ecological niches. (Figure 3A to figure 3N). Figure 3A indicated the distribution of the biofilm formation genes among the sequences of the strains isolated from the 13 ecological niches that were identified by this study. Biofilm formation genes retrieved by the custom python scripts were highlighted in green. This analysis revealed that the pslE, pslG and pslJ genes were all located around the same locus (2450kbps) in a variable region of the genome of strains of P. aeruginosa. This gives credence to the coevolution between these set of genes. The three genes are part of the eleven psl genes required for Psl production and surface attachment which is necessary for biofilm formation in nonmucoid strains of Pseudomonas aeruginosa which don’t depend on alignate as the main biofilm polysaccharide (17). These genes are co-transcribed and our evolutionary analysis indicated that they co-evolve with each other. The other ten genes were located in different loci, all of which were conserved regions of the genome. Figure 3B to figure 3N indicated the distribution of the genes among sequences of strains isolated from each individual ecological niche. Besides the distribution of biofilm formation genes, the BRIG analysis was also used to identify the conserved and variable regions of the genome of the pathogen. This was done to further elucidate the differences in the distribution of genes in strains from different ecological niches.