Genome-wide detection and analysis of eccDNAs in the matched ESCC tissue
After DNA isolation, removal of linear DNA and mitochondrial circular DNA, rolling amplification and high throughput sequencing, there were more than 100 millions of clean reads in each sample after removing low quality reads. By mapping these clean reads to human genome (UCSC hg19), 184557 eccDNAs annotated at 23 pair of chromosomes were identified in these specimens. Most of these eccDNAs were detected in more than one specimen (Supplementary Table S2). These results indicated the existence of eccDNAs was a common event in ESCC tissues.
The genomic distribution of eccDNAs revealed that they were common in each of 23 pairs of chromosomes. No eccDNAs from mitochondrion was detected because they have been removed before sequencing. The eccDNA frequency per Mb was even in each chromosome, except chromosome Y with much lower frequency of eccDNAs (Fig. 1a/b). No correlation between the ratio of coding genes/Mb and eccDNA/Mb in each chromosome was found (p = 0.27) (Fig. 1c).
We mapped all the eccDNAs to different classes of genomic regions. Normalized genomic coverage was defined as the percentage of eccDNA mapped to that class of genomic regions divided by the percentage of the genome covered by that class of genomic region [10]. We found the eccDNAs were originated mainly from 5’-untranslated regions (5’-UTR), 3’-untranslated region (3’-UTR), 2 kb upstream or downstream of genes, 2 kb upstream to 2 kb downstream of CpG island regions etc.. Meanwhile, they were rarely distributed in exons, introns, LINE or Alu repeat region (Fig. 1d).
As to the eccDNAs relative to the deprived genes, 13177 genes in the genome gave rise to all the eccDNAs. There were the most 192 eccDNAs annotated at the gene of LSAMP. More than 100 eccDNAs were annotated at 16 respective genes (Supplementary Table S3). Only one eccDNA was originated from 3208 genes (Fig. 1e).
The overall length distribution of eccDNAs was from 33 bp to 968842 bp, with the peak at ~ 360 bp. Meanwhile, there were 2 additional peaks at ~ 555 bp and ~ 736 bp (Fig. 4A). 95.0% (175363/184557 of eccDNA were shorter than 3000 bp and 86.1% (158850/184557) were shorter than 2000 bp (Fig f/g).
The comparation of the distribution pattern of the eccDNAs between ESCC and matched normal epithelium
The distribution features varied between ESCC samples and matched normal esophageal epithelium. As Venn Diagram showed about the total of 184557 eccDNAs, 65809 eccDNAs were only detected in ESCC samples, 4520 were only detected in normal esophageal epithelium, while 114228 eccDNAs were detected in both samples (Fig. 2a). However, the chromosome distribution and annotation of genomic elements of eccDNAs was similar in ESCC and matched normal epithelium (Fig. 2b/c).
Because the length distribution of eccDNAs could distinguish maternal from fetal plasma and lung cancer from normal lung tissues in previous reports [9, 10], we analyzed the length distribution of eccDNAs in ESCC and matched normal esophageal epithelium. We found the length distribution in either ESCC or normal esophageal epithelium had similar features, such as the location of peak and the span of the length of eccDNAs (Fig. 2d).
Identification of the eccDNAs at differential level between ESCC and matched normal epithelium
Because most of eccDNAs were shown to be detected in more than one specimen, we further compared the level of eccDNAs in ESCC to that in normal esophageal epithelium to investigate the function of eccDNAs in ESCC. According to the screening criteria (p value < 0.05 and |LogFC| >1), a total of 16031 eccDNA was defined as candidate functional eccDNAs, including 10126 up-regulated eccDNAs and 5905 down-regulated eccDNAs. Most of these candidate eccDNAs were detected in either ESCC samples or normal esophageal epithelium, while only a small fraction of candidate eccDNAs were detected in both of them (Fig. 3a/b/c). These candidate eccDNAs could distinguish ESCC from normal esophageal epithelium and may participate in the origin and progression of ESCC.
The length distribution of these candidate eccDNAs were from 44 bp to 395264 bp, with peak at ~ 357 bp and two additional peak at ~ 549 bp and ~ 733 bp, respectively (Fig. 3d/e). By mapping these candidate eccDNAs to genomic elements, we found they were mainly from 5’-UTR and 3’-UTR, and rarely from exons, introns or repeat regions such as LINE and Alu (Fig. 3f). Specially, among 10126 up-regulated eccDNAs, 49.4% (5007/10126) eccDNAs were annotated in the region of genes, while 50.6% (5119/10126) eccDNAs were annotated in the intergenic region. These eccDNAs were originated from 3219 genes. The most eccDNAs (13) were deprived from AUTS2 gene. The first 10 genes giving rise to the eccDNAs were shown in Supplementary Table S4. Among 5905 down regulated eccDNAs, 48.4% (2859) and 51.6% (3046) were annotated in the region of genes and intergenic region, respectively. They were deprived from 2235 genes. The most eccDNAs (7) were from LSAMP, CSMD1 and BICD1. The first 10 genes giving rise to the eccDNAs were shown in Supplementary Table S4.
GO and KEGG pathway analysis based on the genes associated with the eccDNAs at differential level
To understand the functions of genes associated with the eccDNAs at differential level, GO analysis was performed. GO analysis of genes associated with respective down-regulated or up-regulated eccDNAs included identification of cellular components (Fig. 4a/d), molecular function (Fig. 4b/e), and biological processes (Fig. 4c/f). Although the dominant biological processes were related to neurons, GTPase related activity and cytoskeleton were the main component in the molecular function and cellular components. In addition, the functions of the genes associated with eccDNAs at differential level were characterized by KEGG pathway analysis (Fig. 4g/h). The differentially expressed eccDNA-associated mRNAs were dominantly associated with pathways in cancer, mitogen-activated protein kinase (MAPK) pathway, focal adhesion, Rap1 pathway et al.