Plant material and growth conditions. The α-null type NIL used in this study was developed by four generations of backcrossing a line harboring cgy-2 (confirmed α-null) from RiB with DN47, followed by five generations of selfing to generate a BC4F5 NIL population (Fig. S4). We previously used this population to investigate α-null-related transcription-level changes8. Standard farming practices were used to grow the BC4F5 NIL plants in a randomized block design at the Northeast Agricultural University Experimental Station, China. Pod samples were collected during the seed development stage at 20 DAF (Fig. S4C) during the summer of 2018. SDS-PAGE and western blot analyses confirmed that the α-null phenotype was stably inherited in NIL (Fig. S4D). The BC4F5 seeds harvested in 2018 were used for the ChIRP analyses. ‘DongNong 50’ (DN50), a soybean cultivar that shows high transformation efficiency, was used for CRISPR/Cas9 analysis.
Phenotype screening for the α-subunit-null mutation in the NIL using SDS-PAGE analysis. The absence of the α-unit of b-conglycinin was confirmed in the collected NIL seed samples by analyzing the subunit composition of seed proteins by SDS-PAGE (Supplementary Fig. 1D). SDS sample buffer was used to extract seed proteins from a small amount of cotyledon tissue (5% [v/v] 2-mercaptoethanol, 2% [w/v] SDS, 5 M urea, 62.5 mM Tris amino methane, and 10% [w/w] glycerol). Samples were centrifuged at 15,000 ×g, after which 10 μL supernatant was used in 12.5% [w/w] separating and 4.5% [w/w] stacking polyacrylamide gels that were stained using Coomassie Brilliant Blue R 250.
RNA quality testing. Developing seeds harvested from DN47 and NIL plants at 20 days after flowering during the summer of 2018 were used for RNA-Seq. Total RNA was extracted using an enhanced cetyltrimethylammonium bromide (CTAB) method. We checked RNA quality using a K5500® spectrophotometer (Kaiao, Beijing, China). The RNA integrity was assessed and RNA concentrations were calculated using an RNA Nano 6000 Assay Kit for a Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA).
Library preparation for lincRNA sequencing. A sample (3 μg) of extracted RNA was used as the initial material. Epicentre Ribo-Zero™ Gold Kits (Human/Mouse/Rat/Other) (Epicentre, Madison, WI, USA) were used to remove ribosomal RNA. Sequencing libraries (with different index labels) were subsequently created using a NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina (NEB, Ipswich, MA, USA).
Library checking. A Qubit® RNA Assay Kit was used to measure the RNA concentrations of the prepared libraries, after which samples were diluted to 1 ng/μL. Using an Agilent Bioanalyzer 2100 system (Agilent Technologies), the insert sizes were evaluated, and appropriate inserts were quantified using a TaqMan fluorescence probe and a StepOne Plus Real-Time PCR System (Applied Biosystems) (valid library concentration > 10 nM).
Library clustering and sequencing. A cBot cluster-generation system with a TruSeq PE Cluster Kit (version 4) cBot-HS (Illumina, San Diego, CA, USA) was used to complete the clustering of the index-coded samples. The libraries were sequenced on an Illumina platform after clustering to generate 150-bp paired-end reads.
Data quality control. Perl scripts were used to process the raw data to guarantee the suitable quality of the data for subsequent analyses. The reference genome and the annotation files were downloaded from the ENSEMBL database (http://www.ensembl.org/index.html). The genome index was used to build Bowtie2 (version 2.2.3). Using TopHat (version 2.0.12), clean sequence data were mapped to the reference genome. The latter program was also used to recognize exon–exon junctions by separating the mapped reads and remapping them to the reference genome. TopHat uses Bowtie2 for mapping, which improves the accuracy and speed of the analysis.
Quantification of gene expression levels. Read counts for each gene in every sample were determined using HTSeq (version 0.6.0), after which the number of reads per kilobase per million mapped reads (RPKM) was computed with the following equation to approximate the gene expression levels in each sample:
where R represents the number of reads for a particular gene in a specific sample, N denotes the total number of mapped reads in a specific sample, and L is the length of a particular gene.
Analysis of differentially expressed genes. The DESeq (version 1.16) program was used to analyze DEGs in DN47 and NIL in accordance with a negative binomial distribution model. A P-value was allocated to each gene and the Benjamini–Hochberg method used to control the false discovery rate. Genes with |log2 ratio| ≥ 1 and q ≤ 0.05 were recognized as DEGs.
Quantitative real-time PCR validation. Total RNA was transcribed reversely into cDNA utilizing SuperScript III Reverse Transcriptase (Invitrogen, Grand Island, NY, USA) following the manufacturer’s instructions A 2× PCR Master Mix and Applied Biosystems ViiA 7 Real-Time PCR System were used for qRT-PCR analysis with incubation for 10 min at 95°C, followed by 40 cycles of 60°C for 1 min and 95°C for 10 s. The 2−ΔΔCt method was used to calculate the relative mRNA and lincRNA expression levels, which were normalized to GAPDH as an endogenous reference transcript. The data shown represent the means of three repetitions.
5′ and 3′ RACE of MSTRG128686. The 5′ RACE PCR amplification was performed based on the Invitrogen 5′ RACE system manual. For cDNA first-strand synthesis, the mixture contained 5 μL total RNA and incubation was performed for 1 μL random primer at 70°C for 5 min followed by placement in an ice bath for 2 min. Then, 2.0 μL of 5× first-strand buffer, 0.5 μL of 10 mM dNTPs, 0.25 μL RNase inhibitor, and 0.25 μL reverse transcriptase were added. The mixture was made to 10.0 μL total volume and incubated at 42°C for 60 min followed by 72°C for 10 min. For 5′ RACE with a nested PCR reaction system (end C method), reverse transcription used specific primers RC583-RT1/RC583-RT2 to amplify the cDNA, and after the RNase H and TdT treatment we performed nested PCR (see the following section). For 5′ and 3′ RACE of rare cDNAs, the temperature parameters for PCR were: 3 min at 95°C followed by 33 cycles at 94°C for 30 s and 68°C for 30 s; after a 7-min ultimate extension at 72°C, the PCR was repeated.
The 3′ RACE amplification was also conducted using nested PCR, using the 3′ adaptor as the reverse primer, cDNA as the template, and the same conditions and cycle parameters as for 5′ RACE, except that the annealing temperature was 58°C for 30 s. The PCR products were separated on 1.0% (w/w) agarose/ethidium bromide gels in 1× TBE buffer containing 90 mM Tris-borate and 2 mM EDTA (pH 8.0 at 22°C). We used a 1 kb DNA ladder as a DNA size marker.
RT-PCR. Total RNA was extracted from DN47 and NIL seeds at 20 days after flowering using TRIzol reagent (Invitrogen) followed by treatment with RNase-free DNase I (Invitrogen) to eliminate genomic DNA. Treated RNA was utilized for RT-PCR. The RT-PCR amplification of the convergent transcription readthrough of CG-α-1/Linc-GmSTT1 transcripts was conducted using the primer pairs listed in Supplementary Table 2. The PCR-created products were cloned directly into pCRII using a TOPO TA cloning kit (Invitrogen) and subsequently sequenced.
CRISPR/Cas9-mediated Linc-GmSTT1 knockout. CRISPR/Cas9 gene-knockout constructs were developed using the pCBSG015(Basta) vector. We designed two sgRNAs targeting Linc-GmSTT1 at two locations: 5′-CTTACAAATGACAAGTGTCTTGG-3′ and 5′-GTTGGCCACAAAATTGTCTGTGG-3′. The two sgRNAs were added using pCBSG015(Basta) containing Cas9. The constructs were individually transformed into the DN50 (α-normal) background using soybean embryo cotyledonary node transformation.
The Cas9/sgRNA expression vectors in pCBSG015(Basta) were introduced into Agrobacterium tumefaciens strain EHA105 by electroporation. Embryo cotyledonary nodes from DN50 seeds germinated for 5 days were placed in a petri dish containing 50 mL Agrobacterium suspension. About 150 explants were treated for 2 h, and were then left at room temperature for 30–60 min for infection. After infection, the Agrobacterium liquid was discarded, the explants were transferred to the co-cultivation medium and incubated in the dark at 23°C for 3 days. After co-cultivation, the embryos were transmitted to the shoot-induction medium, cultured at 25°C for 7 days, then placed on selection medium containing glufosinate. After culture for 3 weeks, the glufosinate-resistant shoots were transferred to shoot-elongation medium containing glufosinate and cultured in the light for 6–9 weeks. The regenerated elongated seedlings were transferred to rooting medium at 25°C and cultured under light (5000 lux) until rooting.
For each transformed plant, to validate the CRISPR/Cas9-mediated gene disruption, genomic DNA was extracted from the leaves using the CTAB method. The target Linc-GmSTT1 gene fragment was amplified by PCR using the primer pair 5′-CTTCAACTGTCTGCTTAGCTAATTT-3′ and 5′-CCTTTGCCTTCCATAAGGAATTGT-3′. Ultimately, the PCR products were sequenced to verify the successful editing of the gene. Only transformed plants in which the target gene was edited successfully were used in the subsequent tests.
Crosslinking and chromatin preparation. One gram of frozen tissue was sliced and resuspended in 1 volume PBS, crosslinked in 1% (v/v) formaldehyde for 10 min, then quenched for 5 min with 0.125 M glycine, and collected by centrifugation at 2000 ×g for 5 min. Nuclei were lysed (100 mg/mL in nuclear lysis buffer: 50 mM Tris [pH 7.0], 1% [w/v] SDS, 10 mM EDTA, with DTT and PMSF added just before use) on ice for 10 min, and sonicated utilizing a Bioruptor until most chromatin was solubilized and the DNA was within the size range of 100–500 bp. Chromatin preparations were snap-frozen in liquid nitrogen and stored at −80°C until use.
Hybridization and washing. Chromatin was diluted in two volumes of hybridization buffer (1% [w/v] SDS, 750 mM NaCl, 1 mM EDTA, 15% [v/v] formamide, 50 mM Tris [pH 7.0], with DTT and PMSF added just before use). Probes (100 pmol) were added to 3 mL diluted chromatin and combined by end-to-end shaking at 37°C for 4 h. Streptavidin–magnetic C1 beads were rinsed three times in nuclear lysis buffer, then 100 µL of washed beads was added per 100 pmol probes, and the blend was mixed at 37°C for 1 h. Beads:biotin-probes:RNA:chromatin adducts were captured using magnets (Invitrogen) and rinsed five times with 1 mL wash buffer (0.5% [w/v] SDS, 2× SSC, with DTT and PMSF added just before use). At the last wash, the beads were resuspended. Aliquots of 300 μL were removed for isolation of protein, RNA, and DNA. All tubes were placed on a DynaMag-2 magnetic strip and the wash buffer was removed. After brief centrifugation, tubes were placed on a magnet strip and the last remnants of wash buffer were removed using a fine 10 μL pipette tip.
ChIRP protein elution and MS analysis. Beads were resuspended in 3× original volume of DNase buffer (0.1% NP-40 and 100 mM NaCl). Protein was eluted with 0.1 U/µL RNase H (Epicenter), 100 U/mL DNase I (Invitrogen), and a cocktail of 100 µg/mL RNase A (Sigma-Aldrich) at 37°C for 30 min. Protein eluent was supplemented with 0.2 volume of 5× SDS loading buffer, boiled for 5 min, separated on a NuPAGE 4%–12% (w/w) Bis-Tris gel, followed by silver staining to identify differential bands. The whole gel lane was excised, trypsinized, reduced, alkylated, and further trypsinized at 37ºC overnight. The resulting peptides were extracted, concentrated, and HPLC-purified. The peptides separated by liquid-phase chromatography were ionized through a nanoESI source and then passed through a tandem mass spectrometer LTQ Orbitrap Velos (Thermo Fisher Scientific, San Jose, CA, USA) with data-dependent acquisition- (DDA-) mode detection. Protein identification aligned the experimental MS/MS data with the theoretical MS/MS data from a database. Raw MS data were converted into a peak list and then used to search for matches in the database with strict filtering and quality control to produce possible protein identifications. The final protein identification list was used for functional annotation analysis using the GO and KEGG databases.
ChIRP DNA elution and high-throughput sequencing. Beads were resuspended in 3× original volume of DNA elution buffer (1% [w/v] SDS, 50 mM NaHCO3, and 200 mM NaCl), including DNA INPUT, and DNA was eluted with 100 µg/mL RNase A (Sigma-Aldrich) and 0.1 unit/µL RNase H (Epicenter). Elution was performed two times [for 1 h] at 37°C with end-to-end shaking, and both eluates were combined. Chromatin was reverse-crosslinked with formaldehyde at 65°C overnight then treated with 0.2 U/µL of proteinase K at 55°C for 60 min. DNA was then extracted with an equivalent volume of phenol:chloroform:isoamyl alcohol (Invitrogen) and precipitated with ethanol at −80°C overnight. Using a DNA library preparation protocol, eluted DNA was amplified into sequencing libraries based on the manufacturer’s instructions (KAPA). To create 151 nt paired-end reads, the recovered libraries were sequenced on an Illumina NextSeq 500 platform (ABLife Inc., Wuhan, China). The raw reads were ranged by Bowtie2 (version 2.2.9) with the Glycine max reference genome. The exclusively mapped reads were exposed to the peak-calling algorithm MACS (version 1.4.2) with default factors.
ChIRP RNA elution and high-throughput sequencing. Beads were resuspended in 95 μL RNA PK buffer (10 mM Tris-Cl [pH 7.0], 100 mM NaCl, 0.5% [w/v] SDS, and 1 mM EDTA), then 5 μL of proteinase K was added and the mixture was incubated at 50°C for 45 min with end-to-end shaking. For RNA INPUT samples (10 μL), 85 μL RNA PK buffer was added. All tubes were centrifuged briefly and heated at 95°C for 10 min, and then RNA was extracted with TRIzol:chloroform. Eluted RNA was amplified into sequencing libraries via a RNA library preparation protocol based on the manufacturer’s instructions (KAPA). To create 151 nt paired-end reads (ABLife Inc., Wuhan, China), the recovered libraries were sequenced on an Illumina NextSeq 500 platform. The raw reads were aligned by Bowtie2 (version 2.2.9) with the Glycine max reference genome. The exclusively mapped reads were exposed to the peak-calling algorithm MACS (version 1.4.2) with default factors.