Identification of MCC MCV+ tumors using WES
We interrogated sequencing data for a total of 15 MCC primary tumors and their normal paired samples, previously characterized in the laboratory for MCV insertions (seven MCV+, and eight MCV-) [6]. Our method was able to detect all seven MCC positive for MCV (ranged from 2 to 44 mapped reads). Supplementary Figure 1 shows the MCV+ MCC aligned reads in the IGV tool. Interestingly, there was a tendency towards detecting more aligned reads in the first 2,500 bp of the MCV. As expected, none of the normal paired samples were found to present MCV insertions.
Sensitivity and specificity were 100% and 62.5%, respectively. However, due to the extremely low number of reads in false positive samples, better values of specificity would be obtained if a sample was catalogued as positive only when at least two MCV reads were detected. Strikingly, three out of eight previously classified as MCV- MCC were found to have at least one read mapping into MCV genome whereas the remaining five had zero mapped reads (Table 1). These three false positive samples were evaluated in detail. Two out of the three samples were probably artefacts since only one 20 bp length read with two mismatches was mapped in MCV genome. However, the third sample had two 75-bp-length reads mapped in MCV genome thus we cannot exclude the possibility that this is a true positive sample. It was also interesting to note that there was a correlation between the sequencing coverage and the number of mapped reads. This may indicate that when depth of sequencing increases also increases the probability of finding an inserted viral genome (Table 1).
Table 1. Alignment results.
SAMPLE
|
TISSUE
|
MCV
|
Nº MAPPED READS
|
COVERAGE
|
48120427
|
FF
|
Positive
|
44
|
3.519
|
48120428
|
FF
|
Positive
|
34
|
2.704
|
48121209
|
FF
|
Positive
|
16
|
2.543
|
48120987
|
FF
|
Positive
|
15
|
2.286
|
48140221
|
FF
|
Positive
|
7
|
2.090
|
48090369
|
FF
|
Positive
|
4
|
2.467
|
48130247
|
FFPE
|
Positive
|
2
|
1.794
|
48130206
|
FFPE
|
Negative
|
2
|
1.304
|
48120431
|
FF
|
Negative
|
1
|
3.213
|
48141029
|
FFPE
|
Negative
|
1
|
1.784
|
48141028
|
FFPE
|
Negative
|
0
|
1.869
|
48130208
|
FFPE
|
Negative
|
0
|
1.364
|
48130207
|
FFPE
|
Negative
|
0
|
1.691
|
48121576
|
FF
|
Negative
|
0
|
2.520
|
48120426
|
FF
|
Negative
|
0
|
3.192
|
It is worth to mention that a bioinformatic approach to detect MCV insertions was previously described by Knepper et al [19]. However, they performed de-novo assembly and then aligned into a variety of virus genomes rather than directly aligned off-targets reads into MCV genome.
Since interrogated samples were a mixture of fresh frozen (FF) and formalin-fixed paraffin-embedded (FFPE) preserved tumors, we wondered if our method performed well in both types of samples. Figure 2 suggests that our pipeline is more robust when NGS was performed in FF samples. This is not surprising since sequencing performs better in FF tissues. However, only one out of seven MCV+ MCC were FFPE. Moreover, the MCV+ FFPE sample was the one with the lower sequencing depth so a coverage bias rather than a tissue preservation effect cannot be excluded. Thus, more samples need to be assessed in order to settle this topic.
MCV site of insertion
Finally, paired-end WES data was used to try to infer the virus insertion site into the tumor genome. After applying this strategy in all positive samples, only one of them (48121209) showed soft-clipped reads whose mates were mapped on chr19:48,445,990. Interestingly, this region was into GRWD1 gene intronic region (Supplementary Figure 2A). This is a P53 regulator whose loss of function has been previously associated with tumorigenesis [20]. The fact that this is a highly covered region might indicate that our method failed to detect insertion positions in poor covered regions. Thus, whole-genome sequencing should be a better technique to identify not only MCV+ tumors but also the virus insertion site. Supplementary Figure 2B showed how soft-clipped reads also mapped against the MCV genome.
MCV insertion in non-small cell lung patients
Apart from MCC, MCV insertions have been found in non-small cell lung cancer in non-smokers [21]. However, this is a controversial result since other study has reported no MCV insertions in this type of cancer [22]. Therefore, the 2-alignment-steps methodology was applied on non-smokers lung adenocarcinoma data sets; 25 RNA-seq samples [12] and 30 WES samples [13]. As a result, no MCV+ tumor was found. To note, we performed RNA-seq alignments with STAR and Bowtie2 with no differences. Our result appears to support a study reporting that MCV DNA fragments could be detected in the lower respiratory tract when a high-sensitive PCR assay was used thus generating false positives results [23].