Sample collection
The sampling site (Jangmok Bay Time-series Monitoring Site, JBTMS: 34º 59´ 37´´ N and 128º 40´ 27´´ E) is a semi-closed bay on the southern coast of South Korea (Supplementary Fig. 7). The JBTMS is a eutrophic system that is subjected to strong mixing between the surface and bottom layers. Its maximum tidal range is approximately 2.2 m, and the mean water depth at the sampling station is approximately 8.5 m. A total of 90 sub-samples during June 2016 and June 2017 were obtained from surface water (sampling depth: 1 m under sea surface). In particular, when A. sanguinea bloomed (November 11th to December 13th 2016), sampling was performed daily. We also explored the differences between A. sanguinea bloom and no-bloom condition daily between November 14th to December 13th 2017. We drew 20-L samples of seawater from the surface layer and stored them in a cooler until arrival (5 min) at the laboratory of the South Sea Institute of Korea Institute of Ocean Science & Technology (KIOST, Geoje, South Korea), where the seawater was prepared immediately.
Monitoring of environmental factors
Temperature, salinity, pH, and dissolved oxygen were evaluated using a portable YSI environmental multi-parameter (YSI 6920 Inc., Yellow Springs, OH, USA). A 100-mL aliquot of each sub-sample was filtered through a 47-mm glass fibre filter (GF/F, Whatman, Clifton, NJ, USA), and the filtered seawater was stored in an acid-cleaned polyethylene bottle in a deep freezer (at -80 ºC). Subsequently, concentrations of dissolved inorganic nutrients, such as inorganic nitrogen (DIN; NO2− + NO3− + NH4+), inorganic phosphorus (DIP), and silica (DSi) were determined in each sample using an automatic nutrient analyser (Quaatro39; Seal analytical instrument, UK). To analyse the dissolved organic carbon (DOC) concentration, a 10-mL aliquot of each water sample was filtered through the GF/F filter (pre-combusted at 450 ℃ overnight) under gravity pressure, and the DOC concentration was determined using a high-temperature catalytic combustion instrument (TOC-VCPH; Shimadzu, Kyoto, Japan). To determine chlorophyll a concentration, 500 mL of each sample was filtered through the GF/F filter under low vacuum pressure. Each filter was soaked in 15 mL of cold 90 % acetone-distilled water (v/v) and sonicated to break the cell walls. Then, chlorophyll a was extracted for 24 h at 4 ºC in the dark, and its concentration was measured using a fluorometer (10-AU; Turner Designs, Inc., San Jose, CA, USA).
Microscopic observation
To count total heterotrophic bacteria, a 10-mL aliquot was collected from each sub-sample in a 15-mL sterilised polyethylene bottle and fixed immediately with glutaraldehyde (at a final concentration of 2 %). The sample was stored in the dark at 4 ºC prior to analysis. The fixed bacterial cells were filtered through a black isopore membrane filter (GTBP 02500; Millipore, Bedford, MA, USA) and stained with 1 μg mL−1 of 4′,6-diamidino-2-phenylinodole solution[48]. At least 600 stained bacterial cells per sample were counted using an epifluorescence microscope (Axioplan, Zeiss, Oberkochen, Germany) at a magnification of 1,000×. To count and identify phytoplanktonic communities, at least 500 phytoplankton cells per sub-sample were identified and counted using a phytoplankton (or Sedgwick–Rafter) counting chamber under a light microscope (Axioplan) at a magnification of 400−1,000×.
Preparation for DNA extraction of microbial communities
The microbial communities are multi-phylotype communities, ranging from the numerically dominant viruses to the phylogenetically diverse eukaryotic plankton. For mNGS, Flaviani et al.[22] concluded that 250 mL of seawater is sufficient to analyse microbial diversity (from double-stranded DNA virome to phytoplankton). Therefore, we analysed NCLDVs, bacteria, and eukaryotic planktonic organisms, including the endoparasitic dinoflagellate Amoebophrya spp., from 1 L surface seawater. Moreover, for analyses of various microbial communities, we harvested the microbes in three steps according to their size fraction; 1), a 10 μm polycarbonate filter (TCTP04700, Millipore) was used, which focused on eukaryotic plankton and dinospores of Amoebophrya sp. in A. sanguinea cells at cell size of > 10 μm. To remove organisms < 10 μm in size and particles attached to A. sanguinea cell surfaces, cells collected on the 10 μm filter were washed three times with approximately 50 mL distilled water at approximately 50-60 oC. 2), a 3 μm polycarbonate filter (TSTP04700) was used, which focused on free-living Amoebophrya spp. and nano-sized phytoplankton at cell size of 10-3 μm, and 3), a 0.22 μm polycarbonate filter (GTTP04700) was used, which focused on bacteria and NCLDVs at cell size of 3-0.22 μm. The filters were stored at -80 ºC until DNA extraction.
mNGS analyses of bacteria and eukaryotic plankton
The filters at each size fraction were cut into several pieces for genomic DNA (gDNA) extraction. To analyse the bacterial community in the water samples, gDNA was extracted using the DNeasy Powersoil Kit (Qiagen, Valencia, CA, USA) from the 3−0.22 μm size fraction; the DNA was diluted to a final concentration of 20 ng μL−1. The quantity and quality of the total gDNA was determined using Nano-drop (Nano-MD-NS, SCINCO Ltd., South Korea). The V3-V4 hyper variable regions of bacterial 16S rDNA genes were amplified using the universal Illumina tagged forward (341F) and reverse (800R) primers (Supplementary Table 10). To analyse eukaryotic plankton, gDNA was extracted from the > 10 μm and 10−3 μm size fractions using a DNeasy PowerSoil kit; DNA was diluted to each final concentration of 30 ng μL−1 and 20 ng μL−1, respectively. The V4-V5 region of the 18S rDNA gene was targeted using the Illumina tagged forward (TAReuk454FWD1) and reverse (TAReukREV3) primers (Supplementary Table 11). Although we did not perform replicate experiments, we attempted to overcome the experimental bias and obtain more accurate results as follows; 1), intensive daily continuous monitoring in JBTMS, 2), in-door microcosm experiments to verify changes in endoparasitic dinoflagellates and NCLDVs in the field, and 3) three PCRs performed in distinct tubes, and then mixing of the PCR reaction products to obtain more accurate mNGS results[49]. The amplified products from the first PCR were individually purified using a QIAquick PCR purification kit (Qiagen, Hilden, Germany). The second PCR lasted 10 cycles using tags of Nextera XT 96 index kit v2 (Illumina). DNA concentration was measured in a Bio-analyzer 2100 (Agilent Technologies, Palo Alto, CA, USA). Equal concentrations of the PCR products for each sample were pooled, and the merged samples were analysed using a Mi-Seq platform (Illumina, San Diego, CA, USA).
After each sequencing procedure was completed, the data were pre-processed using Mi-Seq Control Software (MCS) v2.4.1. Raw sequences were first analysed using FastQC[50] to check basic statistics, such as GC %. Furthermore, quality score distribution per base, and poor-quality sequences were flagged. Additionally, ambiguous and chimeric reads were removed, and the noised sequences (denoising), which involved OTUs with 1, 2, and 3 reads, were removed at a cut-off of 97%. The processed pair-end reads were then merged using the fast length adjustment of short reads (FLASH) software tool[51]. After each sequencing procedure, a quality check was performed to remove short sequence reads (<150 bp), low-quality sequences (score < 25 in analysis of 16s rDNA; score < 33 in analysis of 18s rDNA), singletons, and non-target sequences. Using Basic Local Alignment Search Tool (BLAST)[52], all the sequence reads were compared with those from the National Center for Biotechnology Information (NCBI) database. Sequence reads with an E-value smaller than 0.01 were considered for further analysis. A pairwise global alignment was performed on the selected candidate hits to identify the most similar sequences. The taxonomy of the sequence with the highest similarity was assigned to the sequence read (species or genus levels with >97 % or >94% similarity, respectively). To analyse operational taxonomic units (OTUs), the CD-HIT-OTU software[53] was used for clustering and metagenomic functional information. To calculate alpha-diversity, including Shannon-Weaver diversity, Chao richness, and Simpson evenness, we used the closed-reference protocol published by Mothur[54] and QIIME[55] based on the OTU table.
mNGS analysis of Nucleo Cytoplasmic Large DNA Viruses
To analyse NCLDVs, gDNA was used of extracted it for analysis of bacterial metagenomics, and sequencing library of NCLDVs was generated using NEBNext Ultra DNA Library Prep Kit (Illumina, San Diego, USA) following the manufacturer’s instructions. The library was prepared by random fragmentation of the DNA sample, followed by 5´ and 3´ adapter ligation. “Tagmentation”, which combines the fragmentation and ligation reactions into a single step and greatly increases the efficiency of library preparation, was used. Adapter-ligated fragments were then amplified using 12 PCR cycles and purified by gel electrophoresis and a gel extraction kit (Qiagen, Hilden, Germany). Libraries were analysed for size distribution by using a Bio-analyser 2100 model (Agilent Technologies, Palo Alto, CA, USA), which indicated that the final library contained inserts between 35 and 1,000 base pairs[56]. Clustering of the index-coded samples was performed on a cBot Cluster Generation System according to the manufacturer’s instructions. After cluster generation, the library preparations were sequenced on an Illumina Hi-Seq 2500 platform (Illumina, San Diego, CA, USA).
FASTAQ files were imported into the CLC Genomics Workbench v. 11.0.1 (Qiagen, Hilden, Germany). Reads below 0.05 quality score cut-off and adapter trimming were removed from subsequent analyses. The remaining reads were trimmed of any ambiguous and low quality 5′ bases, and only reads at the full length were retained for assembly. Quality controlled reads were then de novo assembled using the assemble function (minimum >300 bp) of the CLC Genomics Workbench for ambiguous nucleotide trimming and bad reads filtered at the limit of Q20. To identify viruses, the assembled contigs were subjected to a BLAST search against viral reference genome sequences, using the NCBI virus genome database (http://www.ncbi.nlm.nih.gov/genome/viruses/) for taxonomic assignment using BLASTX (E-value < 10−5).
Indoor microcosm experiment
To investigate the changes in environmental factors, and the potential biological control mechanisms of A. sanguinea bloom, we conducted 100-L in-door microcosm experiments (Supplementary Fig. 8)[23]. The experiment included three enclosures (triplicate experiments), each containing 80 L of seawater when natural A. sanguinea bloomed at a mean density of 920 cells mL-1. Each microcosm was a 100 L enclosure within a cylinder of transparent acrylic material. To observe the survival and growth of A. sanguinea cells in terms of changes in the water temperature, the room temperature was maintained at an average of 18 ºC for the first three days and sustained at 15 ºC for the last period of the experiment. Light intensity was 50 E m-2 s-1 under a 16 h : 8 h (light : dark) cycle. Water in each microcosm was mixed by three impellers (each 15 cm long and 6 cm wide) at a speed of 10 rpm on a 15 min : 15 min automatically controlled run : stop cycle. The experiment lasted 22 days. During this time, we collected 17 sub-samples (daily collection for 12 days and bi-daily collection the last 10 days) for analysis of environmental factors and microbial communities.
Statistical interpretation of the data
Pearson’s correlation analysis was performed to examine the relationships between the measured parameters using SPSS v.12 (SAS Institute Inc., Cary, NC, USA). Cluster analysis was performed using group average clustering by the Bray-Curtis similarity method on the most abundant OTUs of bacteria and NCLDVs (over 1 % in at least one sample). Using the ranked similarity matrix, an ordination plot was produced by non-metric multidimensional scaling (nMDS) using PRIMER 6 (version 6.1.13). Hierarchical agglomerative clustering using the group average method was carried out on the most abundant OTUs based on groups selected from nMDS analysis. To test the null hypothesis (no significant difference between the groups discriminated according to the agglomerative clustering analysis), similarities were analysed with ANOSIM in PRIMER[57].
Extended local similarity Aanalysis (eLSA)[58] was used for data of 2016 and 2017 with 33 and 29 days, respectively, (a time interval of two days) to analyse covariation between the most abundant OTUs, each displaying a relative abundance >1 % in at least one sample, resulting in 63 and 76 OTUs of microbial communities and 9 environmental parameters each in 2016 and 2017, respectively. P-value was determined using statistical approximation followed by permutation testing to reduce computing time, and ensuring accuracy and Q-value (false discovery rate) was calculated to estimate the likelihood of false positives[59]. eLSA network of delay-shifted Spearman correlation coefficients (SCC) between variables was visualized using Cytoscape v3.7.2[60] with P<0.01 and Q<0.05. Because the sampling was evenly spaced for two-day intervals, maximum time-lags were considered to be 10 days. The networks were selected by A. sanguinea identification or edge type (or example, correlations between specific OTUs). Random undirected networks of equal size by number of nodes and edges were calculated by the Erdös-Rényi model using the Random Network plugin in Cytoscape. Network statistics were calculated with the network analyzer as undirected networks using the defaults[61].