We present draft Prochlorococcus genomes from enrichment cultures P1344, P1361, and P1363 (Dataset 1) [21–23]. They come from a single sample collected in the North Pacific from 150 m at Station ALOHA (22.75°N, 158°W), June 2013 [24-25]. Isolation protocols [16] were tuned to enrich for LL Prochlorococcus. Data file 1 [26] provides detailed methods; Table 1 lists datasets. Previously described strains from the same project [17–19] are described in Data file 2 [26]. After 1.5 years of subculturing, enrichments P1344, P1361, and P1363 each stabilized to a single internal transcribed spacer rRNA (ITS) sequence [27], an indicator of unialgal Prochlorococcus cultures [16, 28]. Because the time from sea to genome was shorter than for previously sequenced enrichment cultures (e.g., 5-20 years [28]), we followed a naming convention for Prochlorococcus enrichments [29]. The three ITS sequences, in the LLIV clade, were identical to each other, strains MIT1312 and MIT1327 (additional co-isolates from the same sample [17]), and MIT1227 (from Station ALOHA one year earlier [30]).
Genomic libraries were prepared as in [31] from bulk enrichment DNA and sequenced with Illumina MiSeq V3 at the MIT BioMicroCenter [32] with 300 base paired reads. Raw reads are available in the NCBI Sequencing Read Archive (Dataset 2) [33–35]. Quality-trimmed reads were assembled de novo with SPAdes v.3.1.1 [36]. Enrichment contigs were screened with blastn [37] against the NCBI nt database to separate out Prochlorococcus sequences (Data file 3) [26]. Contigs with at least 500 bases, top BLAST hits to Prochlorococcus (Data file 4) [26], and at least 2x kmer coverage were selected to produce the genomes. For P1344, P1361, and P1363, respectively, genomes consist of 106, 45, and 66 contigs, with average read coverage depths 82x, 57x, and 67x [38] and genome sizes 2.47 Mb, 2.51 Mb, and 2.56 Mb, similar to other LLIV genomes (Data file 2) [26].
For initial assessments, we annotated the genomes with Prokka [39] (Dataset 3, Data file 5) [26]. Genbank annotations come from the NCBI automated annotation pipeline (Dataset 1) [22–24]. Enrichment assembly (Dataset 4 [26]) BLAST and metagenomic binning results (Data files 1, 3, 6, 7) [26] show the presence of copiotrophic marine bacteria at lower coverage than Prochlorococcus, with partially recovered genomes [40-43]. These include Alteromonas and Marinobacter, groups previously studied in co-culture with Prochlorococcus that can enhance its growth or survival in culture [18, 19, 44-46]. Comparisons among P1344, P1361, P1363, and the three other genomes with the same ITS (Data files 1, 8, 9) [26] support the idea that they represent similar but distinct strains, with average nucleotide identity from 99.9% ± 0.7% to 100.0% ± 0.1% s.d. [47], 103 – 3,854 SNPs in pairwise alignments [48, 49], and 32 – 132 distinct genes in pairwise ortholog group comparisons [50]. While mostly without predicted functions, these variable genes include a pilus-related protein and a member of the cytochrome c family, located near contig ends (Data files 1, 5) [26]. This scale of variation will support the study of recent Prochlorococcus evolution.
Table 1: Overview of data files/data sets.
Label
|
Name of data file/data set
|
File types
(file extension)
|
Data repository and identifier (DOI or accession number)
|
Dataset 1
|
Draft genomes for Prochlorococcus P1344, P1361, and P1363
|
Fasta sequence files (.fsa) and Genbank flatfile annotations (.gbff)
|
NCBI Genbank: JABBYR000000000.1 (P1344) [21], JABBYP000000000.1 (P1361) [22], JABBYQ000000000.1 (P1363) [23]
|
Dataset 2
|
Raw sequencing reads for P1344, P1361, and P1363 enrichments
|
Fastq sequence files (.fastq)
|
NCBI Sequence Read Archive:
SRR11497176 (P1344) [33], SRR11497178 (P1361) [34], SRR11497177 (P1363) [35]
|
Dataset 3
|
Prokka genome annotations for Prochlorococcus P1344, P1361, P1363, MIT1227, MIT1312, and MIT1327
|
Genbank flatfile annotations (.gbf)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|
Dataset 4
|
Enrichment assemblies for P1344, P1361, and P1363
|
Fasta sequence files (.fasta)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|
Data file 1
|
Detailed methods
|
Document (.pdf)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|
Data file 2
|
Genome summary information
|
Microsoft Excel file (.xlsx)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|
Data file 3
|
Enrichment contig characteristics
|
Microsoft Excel file (.xlsx)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|
Data file 4
|
Genome contig characteristics
|
Microsoft Excel file (.xlsx)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|
Data file 5
|
Annotation information
|
Microsoft Excel file (.xlsx)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|
Data file 6
|
Overview of other organisms in enrichment BLAST results
|
Microsoft Excel file (.xlsx)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|
Data file 7
|
Genome and enrichment bin completeness and taxonomy
|
Microsoft Excel file (.xlsx)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|
Data file 8
|
Protein ortholog clusters for identical ITS group
|
Microsoft Excel file (.xlsx)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|
Data file 9
|
Genome comparisons for identical ITS group
|
Microsoft Excel file (.xlsx)
|
figshare: https://doi.org/10.6084/m9.figshare.12675410 [26]
|