Ethics statement. The study was approved by the Peking University Institutional Animal Care and Use Committee (IACUC). All the animal experiments were conducted following their guidelines.
Cell types and culture conditions. We performed LiMCA on four human cell lines and the developing mouse main olfactory epithelium.
K562 (ATCC), chronic myeloid leukemia cells, were cultured in Iscove’s Modified Dulbecco’s medium (Gibco, Cat. #12440053) supplemented with 10% FBS (Gibco, Thermo Fisher Scientific, Cat. #10099141) and 1% Penicillin/Streptomycin (Pen/Strep) (Gibco, Thermo Fisher Scientific, Cat. #15140148).
GM12878 (Coriell Institute), B lymphoblastoid cells, were grown in Roswell Park Memorial Institute 1640 Medium (Gibco, Thermo Fisher Scientific, Cat. #11875093) media supplemented with 15% FBS and 1% Pen/Strep. GM12878 cells were grown from a single cell clone.
BJ (ATCC), fibroblasts, were grown in ATCC-formulated Eagle's Minimum Essential Medium (ATCC, Cat. 30-2003) with 10% FBS and 1% Pen/Strep.
eHAP (Cellosaurus), engineered haploid chronic myeloid leukemia cell line40, were grown in Iscove’s Modified Dulbecco’s Medium (Gibco, Cat. #12440053) supplemented with 10% FBS and 1% Pen/Strep. Passage every 2 to 3 days with 1:10 to 1:15 dilution.
When used, adherent cells (e.g., K562 and eHAP) were washed twice in 1 x PBS, 0.25% Trypsin-EDTA was added (Thermo Fisher Scientific, Cat. #25200072) and incubated at 37°C for 5 min, then diluted with complete culture medium to stop trypsinization. Cells were collected by centrifuge at 350 x g for 5 min, and resuspended in 1 x PBS. All cell lines were maintained at 37°C with 5% CO2 at recommended density.
Dissociation of single cells from the mouse olfactory epithelium. The main olfactory epithelium was dissected and minced to small pieces, then was dissociated into single cell suspension with the Papain Dissociation System (Worthington Biochemical, Cat. #LK003150) at 37°C for 15 min, during incubation, titration every 5 min with wide-bore pipette tip according to previously described20. Then filtered with 30 µm strainer (MACS), then wash twice with ice-cold 1 x PBS.
Single-cell ATAC-seq—METATAC. Single-cell ATAC-seq datasets were generated with our high-sensitivity METATAC method. METATAC was performed as described in our previous work32. We performed METATAC on mouse main olfactory epithelium at four time points during the first postnatal month, day 3, day 7, day 14 and day 28. Briefly, dissociated single cells were stained with 7-AAD (eBioscience, Cat. #00-6993-50), then used fluorescence-activated cell-sorting (FACS) to sort viable cells. Cells were counted and taken 50,000 cells as input, nuclei were extracted with 50 µL ATAC lysis buffer (10 mM Tris-HCl pH 7.5, 10 mM NaCl, 3 mM MgCl2, 0.01% Digitonin, 0.1% Tween-20, 0.1% IGEPAL CA630) by incubating on ice for 5 min, and then bulk transposed with META transposome (12.5 µL 2 x TD buffer from Illumina Nextera kit, 10 µL 1 x PBS (pH 7.4), 0.25 µL 1% Digitonin, 0.25 µL 10% Tween, 2 µL 1.25 µM META transposome), then the transposed nuclei were sorted to 96-well plates. Sorted nuclei could be stored at -80°C or proceeded to amplification.
Droplet scRNA-seq. Single-cell RNA-seq library were prepared according to the 10x Genomics manufacture guidance using the Single-cell Gene Expression 5’ RNA-seq kit v1.1. Briefly, dissociated single cell suspension were stained with 7-AAD to sort viable cells, and 40,000 cells were taken as input for each reaction. Cells were collected from a male and a female mice at each stage. For P4-P7 sample, cells from mouse at postnatal day 4 and postnatal day 7 were pooled together to load on the same channel.
LiMCA protocol. Our method was based on nucleus-cytoplasm physical separation, like Trio-seq21. Nucleus was submitted to chromosome conformation capture processing, and cytoplasm mRNA was amplified according to the Smart-seq2 procedure24.
Single cell nucleus-cytoplasm separation. Briefly, viable cells were picked into a single tube containing 7 µL soft cell lysis buffer (25 mM Tris pH8.3, 30 mM NaCl, 0.45% IGEPAL-CA630, 1 U/µL SUPERaseIn) supplemented with 0.25 µL magnetic beads, then incubate on ice for 30 min, after incubation, vortex for 1 min. Centrifuge at 500 x g for 5 min at 4°C, then take 5 µL supernatant to a new tube. The supernatant was proceeded to Smart-seq2 reverse transcription and amplification24. Nuclei was proceeded to chromosome conformation capture.
Cytoplasm smart-seq2 procedure. Briefly, add 1.25 µL oligo-dT-dNTP mix (1 µM oligo-dT30VN, 2 mM dNTP mix), incubate at 72°C for 5 min, incubate at 4°C for 5 min. Then add 7 µL reverse transcription mix (1 x SSII first-strand buffer, 1 U/µL RNase inhibitor, 10 U/µL SSII RTase, 1 mM GTP, 5 mM DTT, 1 M Betaine, 6 mM MgCl2, 1 µM Template switch oligo), incubate as [42°C, 90 min, 10 cycles of [50°C, 2 min, 42°C, 2 min], 72°C, 5 min]. After RT, add 14.75 µL amplification mix (14 µL KAPA HiFi Hotstart mix, 0.28 µL 10 µM ISPCR primer, 0.47 µL nuclease-free water) to each tube, incubate as [98°C, 3 min 21 cycles of [98°C, 20 s, 65°C, 30 s, 72°C, 4 min], 72°C, 5 min]. Purify with 0.7 x AMPure XP beads.
Single-nucleus chromosome conformation capture. Add 8 µL 2.5% paraformaldehyde (EMS, 15714-S) to remaining 2 µL pellet (containing the nucleus), vortex to resuspend, incubate at room temperature for 10 min to fix nuclei, then add 10 µL 0.25 M glycine to quench. Centrifuge at 500 x g for 5 min at 4°C, discard 17 µL supernatant. Add 17 µL Hi-C lysis (10 mM Tris pH8.0, 10 mM NaCl, 0.2% IGEPAL-CA630, supplemented with protease inhibitor) without incubation. Centrifuge at 500 x g for 5 min at 4°C, discard 17 µL supernatant, leave 3 µL. Add 2 µL 0.75% SDS to each tube (final 0.3%), vortex to resuspend, incubate at 62°C for 10 min, then add 5 µL 4% Triton X-100, incubate at 37°C for 15 min to quench. Add 10 µL digestion mix (2 x NEBuffer2, 4 U/µL NIAIII), incubate at 37°C for 2 hr with rotation. Centrifuge at 500 x g for 5 min at 4°C, discard 17 µL supernatant. Add 17 µL 1 x T4 buffer. Centrifuge at 500 x g for 5 min at 4°C, discard 17 µL supernatant. Then add 17 µL ligation mix (1 x T4 buffer containing 10 U/µL T4 ligase), incubate at room temperature for 2.5 hr with rotation. Centrifuge at 500 x g for 5 min at 4°C, discard 18 µL supernatant. Add 2 µL cell lysis buffer (20 mM Tris pH8.0, 40 mM NaCl, 30 mM DTT, 2 mM EDTA, 0.2% Triton X-100, 3 mg/mL QP). Incubate as [50°C, 1 hr, 65°C, 1 hr, 70°C, 15 min]. After lysis, cells were stored at -80°C.
Single-cell whole-genome amplification. Single cells were amplified with our transposon-based state-of-the-art whole-genome amplification method—META, as previously described18, 26. In this study, we use 20 META tags26.
Library construction. The cDNA amplicons were quantified, take 1 ng as input for Nextera Tn5 tagmentation and library preparation, cells were pooled for purification and first purified with 0.6 x SPRI beads, then purify with 0.2 x SPRI beads. The sequenced gDNA and RNA data from the same cells are integrated based on experimental label.
Sequencing. METATAC libraries were sequenced with paired-end 2 x 150 bp on Illumina Novaseq, sequenced at 9 Gb per 96-well plate. LiMCA library were sequenced with paired-end 2 x 150 bp on Illumina Hiseq x10 or Novaseq, sequenced at 3–6 Gb for gDNA and 0.2–0.6 Gb for cDNA per cell.
Published data
Phased SNP files were downloaded from the Sanger Institute Mouse Genomes Project. Bulk Hi-C or Micro-C was taken from the 4DN Data Portal (4DNFIXP4QG5B for GM12878, 4DNFIB59T7NN for HFFc6, 4DNFINSKEZND for HAP1, 4DNFI18UHVRO for K562, 4DNFI1TBYKV3 for GBCs, 4DNFICUQ1N7S for INPs, 4DNFIUH9FJR6 for mOSNs, 4DNFI5AFARSZ for mOSNs (Olfr1507), 4DNFIB5G24G6 for mOSNS (Olfr17)).
METATAC data preprocessing
METATAC data were processed as described previously32. In brief, cell barcodes and META sequences were identified for each pair of reads using a custom script. Reads from each cell were split according to their barcodes. Adapters were then trimmed using cutadapt (v4.0) with parameters “-e 0.22 -a CTGTCTCTTATACACATCT” followed by parameters “-e 0.22 -g AGATGTGTATAAGAGACAG”. Cleaned reads were then mapped to mm10 (GRCm38) reference genome using bowtie2 (v2.3.4.3) with parameters “-X 2000 --local --mm --no-discordant --no-mixed”. Duplicated reads were removed using custom script according to both their mapped location on the genome and META tags. Mapped paired reads were transformed into fragments, and a bias of “+4” or “-5” was added to each end of each fragment to center the Tn5 insertion sites.
Fragments from all cells were then integrated together. Fragments that may be from contamination were identified and removed by a custom script as described previously32. The decontaminated fragments were then subjected to R (v4.1.0) package ArchR (v1.0.2) for quality control (QC). TSS enrichment scores and doublet scores of each cell were calculated using the default parameters of ArchR. Cells meeting any of the following conditions were considered to be of low quality and excluded from downstream analyses: number of aligned reads < 5×104 or > 1×106; ratio of aligned reads < 0.85; number of fragments < 3.16×103 or > 3.16×105; ratio of contaminated fragments > 0.6; mitochondrial reads > 5%; TSS enrichment score < 5; promoter fragments < 0.1; doublet score > 10.
METATAC cell embedding and clustering
Processed METATAC fragments after QC were analyzed using ArchR. First, gene activities were calculated using addGeneScoreMatrix function with GENCODE vM25 annotation of mm10 genome. Iterative LSI was performed with clustering parameters “resolution = 0.2, sampleCells = 10000, n.start = 10”. UMAP embeddings was calculated with parameters “nNeighbors = 30, minDist = 0.5”. Then, cells were clustered (addClusters) with parameters “maxCluster = 35, resolution = 0.8”. Cell type of each cluster was annotated manually with the help of Enrichr database [https://doi.org/10.1093/nar/gkw377] according to their marker genes calculated by getMarkerFeatures function with default parameters. Cell type-specific ATAC peaks were identified using addReproduciblePeakSet function of ArchR with macs2 (v2.2.7.1). We identified the marker peaks for cell types of interest, including HBCs, GBCs, Early/Late INPs, and immature/mature OSNs, using getMarkerFeatures function on “PeakMatrix”, and the enriched TF-binding motifs in the corresponding marker peaks were identified using peakAnnoEnrichment function. ArchR addTrajectory function was utilized to build a development trajectory and assign pseudotime values for cells from GBC to mOSN.
Identification and validation of potential regulatory peaks
We utilized Lhx2 and Ebf ChIP-seq data in mOSNs and previously defined Greek Islands from Monahan et al.17. We first interrogated in our peak set if there were peaks in OR clusters following the criteria of Greek Islands (overlap witdh both Lhx2 and Ebf ChIP-seq peaks) but not identified as Greek Islands previously. DNA sequences of the resulting 27 peaks and 63 previously identified Greek Islands were extracted from the mm10 genome and subjected to XSTREME online server (https://meme-suite.org/meme/tools/xstreme) for de novo motif discovery with defaultd parameters. The resulting motif with the most significant E-value corresponded to the desired composite motif. We used fimo (v5.3.3) with a q-value threshold of 10− 3 to further identify the location of the motif within GIs and candidate peaks, and determined the q-values of all matched sites. GIs and identified regulatory peaks are visualized in a circos plot with ChIP-seq and scATAC-seq (grouped by cell types) tracks of OR clusters using R package circlize (v0.4.12) and ggplot2 (v3.3.3).
ATAC footprinting is performed by getFootprints function of ArchR with default parameters on the GIs and candidate peaks centered at the composite motif. Figures are generated by plotFootprints function of ArchR with the Tn5 insertion bias normalized by subtraction.
Single-cell RNA-seq data processing
10X scRNA-seq reads were processed and mapped to mm10 genome using Cell Ranger (v 5.0.1). We used R package Seurat (v4.0.4) for QC and downstream analysis. Barcodes with UMI counts over 1,000 and less than 25,000, number of detected genes over 200, and mitochondrial counts less than 10% are considered as high-quality cells. Cells from three batches (P4/P7, P14, and P28) that passed QC were merged, log-normalized, and integrated using Seurat IntegrateData function with anchors identified by FindIntegrationAnchors function, to correct batch effects. The integrated dataset were scaled and embedded using PCA followed by UMAP, using ScaleData, RunPCA(npcs = 30), RunUMAP functions of Seurat, respectively. K-nearest neighbors of cells were identified using Seurat FindNeighbors function, and Louvain clustering was performed with resolution = 1.0 using FindClusters function. Then, cell types were annotated manually according to their marker genes identified by FindAllMarkers with parameters “min.pct = 0.25, logfc.threshold = 0.25, only.pos = TRUE”. Then, we removed all cells except for GBCs, INPs, iOSNs, and mOSNs. The remaining subset was scaled by SCTransform(vst.flavor=“v2”), followed by RunPCA(npcs = 30), RunUMAP(dims = 1:15), FindNeighbors, and FindClusters(resolution = 0.5). Six subclusters out of the identified 17 subclusters, which mainly consist of mOSNs, cannot be aligned well on the trajectory from GBC to mOSN and therefore be removed. We used slingshot (v2.2.1) to construct a trajectory on the UMAP of the subset after removing the outlier subclusters, and assigned pseudotime values for each cell from GBC to mOSN.
To get the correlation between scATAC-seq and scRNA-seq data, we used the variable genes identified by Seurat, and calculated the gene score matrix and the log-normalized UMI count matrix from ATAC and RNA dataset, respectively. We calculated the mean scores or log-normalized counts for each cell type. The Pearson correlation coefficients between each pair of ATAC and RNA clusters over the variable genes were calculated using numpy (v1.20.3), and visualized by pheatmap (v1.0.12).
RNAC data preprocessing
The RNA data and Hi-C data were preprocessed separately. For RNA data, we followed the Smart-seq2 processing workflow documented in the Human Cell Atlas (HCA) Data Portal (https://data.humancellatlas.org/pipelines/smart-seq2-workflow). Briefly, sequencing reads were mapped to transcriptomic references of hg38 (GRCh38) and mm10 (GRCm38) genome assembly for human and mouse data, respectively, using the hisat2 package. We then used RSEM to quantify RNA reads and generate gene count and FPKM matrix. Hi-C data were processed as previously described []. In brief, contact pairs and maps were generated from raw sequencing reads using the hickit pipeline. Since the human eHAP cell line contains the Philadelphia Chromosome (t(9;22)(q34;q11)) and reconstructions of 3D genome structures are sensitive to chromosomal structural variations and, for eHAP Hi-C data, we extracted the exact breakpoints, generated a custome hg38 genome reference accordingly, and mapped Hi-C reads to it.
RNAC RNA data analysis
The Seurat package was used for QC and downstream analysis of RNAC RNA count matrices. We filtered cells with fewer than 200 genes detected and genes expressed in fewer than 3 cells. The filtered count matrices were then normalized using the NormalizeData (for human data) or SCTransform (for mouse data) function. We then performed PCA and UMAP embeddings and Louvain clustering with the following parameters: RunPCA(dims = 1:20), RunUMAP(dims = 1:15), FindNeighbors(dims = 1:10) and FindClusters(resolution = 0.4) for human data; RunPCA(dims = 1:15), RunUMAP(dims = 1:10), FindNeighbors(dims = 1:10) and FindClusters(resolution = 0.4) for mouse data. For each human cell, we calculated cell cycle phase scores based on known cell cycle markers and annotated cell cycle phase using the CellCycleScoring function.
scA/B values
We calculated scA/B values from single-cell contact maps at 1-Mb resolution with dip-c color2. To ensure consistency across cells, all sex chromosomal bins were excluded. We then rank-normalized scA/B values to 0–1 in each cell with the scipy rankdata function and performed PCA and UMAP embeddings using the python sklearn and umap packages with following parameters: PCA(n_components = 20) and UMAP(n_neighbors = 10).
Mouse olfactory cell type annotation
For mouse olfactory data, we performed cell type annotations in two steps. In the first step, based on RNA counts of known marker genes of mouse olfactory epithelium [], we manually annotated four major cell types, including olfactory progenitors, immature olfactory sensory neurons (iOSNs), mature olfactory sensory neurons (mOSNs) and unknown cells. In the second step, we visualized above cell type assignment on the UMAP plot of scA/B values derived from Hi-C data. Using scA/B value information, the progenitor cluster annotated before was futher segregated into two discrete clusters, progenitor-1 and progenitor-2. These two progenitor clusters did not overlap each other on the UMAP plot of RNA data, further confirming our assignment. After annotating olfactory cell types, we used Monocle3 to construct the developmental trajectory and calculated pseudotime values for each cell.
3D genome structure analysis
Single-cell 3D genome structures were reconstructed using hickit and dip-c packages as previously described []. For K562 or BJ cells, reconstructions were impractical due to gross chromosomal aberrations or lack of phased SNP information. Thus, we generated 3D genome structures for GM12878, eHAP, and mouse olfactory epithelium cells. Note that the reconstruction of individual eHAP cells did not involve homolog imputations of contact pairs, since the human eHAP line is fully-haploid. Three replicate structures were generated for each cell with three different random seeds. Only Cells with root-mean-square RMSD (root-mean-square deviation) less than 1.5 were kept for further analysis.
Spatial analysis of active genes
The radial positions across differenct resolutions were calculated using “dip-c color -C”. To characterize gene clustering, we extracted all particles with active genes and calculated the number of genes within 3 particle radii from each gene using “dip-c color -r 3”.
Single-cell chromatin looping analysis
We first identified cell-type specific chromatin loops at 10-Kb resolution from published bulk Hi-C or Micro-C maps with the diff_mustache.py script from the mustache package. We then iteratively compared one cell type to others and retained calls unique in that cell type as its cell type-specific chromatin loops. The coolpuppy package was used to generated pileups of merged single-cell maps for each set of loops. We calculated the total contact count for each set of chromatin loops in individual cells using dip-c ard and combined the counts into a matrix. We noted that the number of chromatin loops was inconsistent across groups due to varying sequencing depths. To eliminate this impact, for each group of loops, we normalized the contact counts by the number of contacts used for loop calling.
Virtual 4C analysis
We grouped cells into sets with high (above the median) or low (below the median) expression of specific genes such as NFKB1 and performed virtual 4C analysis. For each set of cells, we combine contact maps to generate a pseudobulk contact matrix and balanced it with the KR algorithm using the juicer package. Contacts with the gene locus were extracted and normalized by the total number of contacts with the locus.
Determination of active olfactory receptors in single neurons
To exclude mismapped ORs, ORs truly expressed in individual cells were determined by visual inspection. Similar to [], we discarded all OR genes with fewer than 10 FPKM and visually examined the RNA read coverage of the remaining OR genes using the Intergrative Genomics Viewer (IGV). Those OR genes that were only partially covered were considered artefacts and were excluded from subsequent analyses. In addition, to determine the expressed allele for each OR gene, we performed allele-specific gene expression analysis using the phASER package.
OR and GI interactions
For bulk data, the normalized interchormosomal contact pileups between ORs and between Greek islands were generated using the coolpuppy package with “--trans --flanking 10000000”. We also calculated single-cell contact pileups between ORs and between Greek islands using “dip-c ard -d10000000 -h100000”. We defined the contact strength as the ratio between the mean contact value within 200-Kb of OR pairs (or 100-Kb of Greek island pairs) and the mean value in surrounding regions.
OR and Greek-island aggregates were identified in single cells as previously described []. Briefly, pairwise distances between ORs, or between ORs and Greek islands were calculated using “dip-c pd”. The dip-c network_around.py package was then used to record Greek islands or ORs within 2.5 or 5 particle radii from each OR. In each cell, we extracted the number of Greek islands from ORs that were actively expressed.