Table 1 describes the clinicopathological characteristics of the patients included in the discovery cohort. It can be observed that there are no differences in clinical variables between MRD + and MRD- patients, except for risk. However, this was to be expected because MRD + patients are considered intermediate to high risk, whereas MRD- patients may be low to intermediate risk. Interestingly, more than sixty percent of patients had normal karyotype and the most frequent risk group was intermediate. In total, there were 2 relapses and 6 deaths. Deaths in the MRD- group were due to relapse and progression and another due to febrile neutropenia. In the MRD + group, there was also one death due to relapse and progression and 3 deaths during the induction phase (very aggressive disease).
Table 1
CLINICAL CHARACTERISTICS OF THE DISCOVERY COHORT
CLINICAL CHARACTERISTICS | MRD+ (n = 10) | MRD- (n = 9) | |
| n (%) | MEAN (RANGE) | n (%) | MEAN (RANGE) | p - value |
AGE (years) | | 10.3 (3–17) | | 11.11 (3–15) | 0.89 |
SEX | | | | | |
Female | 4 (40) | | 3 (33.33) | | 0.76 |
Male | 6 (60) | | 6 (66.67) | |
WBC (cel/µ) | | 58.36 (2.97–292) | | 53.56 (6.22–214.9) | 0.89 |
RISK | | | | | |
Low | 0 (0) | | 1 (11.1) | | 0.03 |
Intermediate | 5 (50) | | 8 (88.9) | |
High | 5 (50) | | 0 (0) | |
EXTRAMEDULLAR INFILTRATION | | | | | |
Yes | 1 (10) | | 2 (22.2) | | 0.46 |
No | 9 (90) | | 7 (77.8) | |
CORTICOID RESPONSE | | | | | |
Yes | 9 (90) | | 9 (100) | | 0.32 |
No | 1 (10) | | 0 (0) | |
CARIOTYPE/MOLECULAR ALTERATIONS | | | | | |
Normal | 7 (70) | | 6 (66.6) | | 0.49 |
Hyperdiploid | 1 (10) | | 0 (0) | |
t(1;19) | 1 (10) | | 2 (22.2) | |
t(9;22) | 1 (10) | | 0 (0) | |
t(3;14) | 0 (0) | | 1 (11.2) | |
RELAPSE | | | | | |
Yes | 1 (10) | | 1 (11.1) | | 0.93 |
No | 9 (90) | | 8 (88.9) | |
DEATH | | | | | |
Yes | 4 (40) | | 2 (22.2) | | 0.40 |
No | 6 (60) | | 7 (77.8) | |
MRD + indicates minimal residual disease positive. MRD- indicates minimal residual disease negative. |
To identify genes that could differentiate MRD + patients from MRD- patients, we separated with immunomagnetic column the leukemic blasts obtained at the time of diagnosis and obtained the gene expression and DNA methylation profiles of each sample. Subsequently, we collected clinical information from all patients to determine if they were MRD + or MRD-. Then, we compared the gene expression profiles and DNA methylation between MRD- and MRD + patients at day 15 of treatment, MRD- and MRD + patients at the end of induction and between MRD-/- vs MRD+/+ patients. Supervised hierarchical cluster analysis showed 159 DEGs at day 15 of induction treatment of which 127 genes were upregulated and 32 downregulated (Fig. 2a), 200 DEGs at the end of induction treatment of which 165 genes were upregulated and 35 downregulated (Fig. 2b) and 153 DEGs after comparison between MRD-/- and MRD+/+ patients of which 117 were upregulated and 36 downregulated (Fig. 2c). Finally, a Venn diagram was used to identify common DEGs between the three analyses, finding 50 of them (Fig. 2d). In MRD + patients, 46 of 50 common genes were up-regulated, and 4 of 50 were down-regulated (Additional file 1). Uniprot analysis showed that some genes were associated with different functions, including DNA binding (HOXB2, RFX8), transmembrane (SLC18A2, SLC45A3, BOC, ITGA3), lipidation (TNFRSF10C, PTGIR), among others. In addition, enrichment analysis showed that these genes are involved in processes like inflammatory response, transmembrane transport of sugars, cellular response to growth factor stimulus, etc.
To identify GDMCpGs, we used the same groups as in differential expression analysis. A total of 1834 GDMCpGs were obtained when comparing diagnosis BM of patients with MRD + and MRD- at day 15 (Fig. 3a); 2382 at day 33 (Fig. 3b); and 2077 between MRD+/+ and MRD-/- patients (Fig. 3c). We identified 687 common GDMCpGs across all comparison groups (Fig. 3d).
Then, to identify genes with both differential expression and methylation, we matched the previously obtained DEGs and GDMCpGs. We found 38 genes at day 15 of induction treatment, 52 genes at the end of induction, and 40 genes between patients with MRD+/+ and MRD-/-. Finally, a Venn diagram analysis identified 10 common genes (HOXB2, TMEM51, F2RL1, PDZRN3, BOC, TANC1, ASCL2, FBN2, MIR4435-2HG, and SCL18A2) (Fig. 3e). We found a significative inverse correlation between gene overexpression and hypomethylation in only four of them, which were present in all three comparison groups (BOC, MIR4435-2HG, SCL18A2, and ASCL2) (Table 2).
| Table 2. COMMON GENES BETWEEN THREE ANALYSIS GROUPS AND THEIR CORRESPONDS CpGDM |
GENE/CpGs | | | | MRD DAY 15 | MRD DAY 33 | TRUE RESPONSE |
| FC | METHYLATION STATUS | RELATION TO CpGs ISLAND | p-value | Rho | p-value | Rho | p-value | Rho |
HOXB2 | 5,1 | | | | | | | | |
cg22777724 | | Hypermethylation | S_Shore | 0.45 | 0.2 | 0.08 | 0.47 | 0.08 | 0.49 |
cg09313705 | | Hypermethylation | S_Shore | 0.32 | 0.26 | 0.18 | 0.37 | 0.45 | 0.22 |
cg22807449 | | Hypermethylation | S_Shore | | | 0.18 | 0.53 | | |
TMEM51 | 5,4 | | | | | | | | |
cg24835051 | | Hypomethylation | Island | | | 0.004 | -0.73 | | |
cg15319451 | | Hypomethylation | Island | | | 0.009 | -0.67 | | |
cg13210325 | | Hypermethylation | | 0.09 | 0.43 | | | 0.54 | 0.18 |
cg07180646 | | Hypomethylation | S_Shore | 0.08 | -0.45 | | | | |
cg11618537 | | Hypomethylation | | | | | | 0.2 | -0.37 |
F2RL1 | 3,5 | | | | | | | | |
cg12518146 | | Hypermethylation | Island | 0.001 | 0.72 | 0.003 | 0.74 | 0.005 | 0.74 |
PDZRN3 | 5,4 | | | | | | | | |
cg04340156 | | Hypomethylation | | | | 0.27 | -0.31 | | |
cg11461794 | | Hypomethylation | | | | 0.86 | 0.05 | 0.87 | 0.049 |
cg24353443 | | Hypomethylation | Island | 0.66 | -0.11 | | | 0.62 | -0.14 |
cg05502701 | | Hypomethylation | Island | 0.6 | -0.14 | | | 0.45 | -0.22 |
BOC | 4,6 | | | | | | | | |
cg11205552 | | Hypomethylation | | | | 0.68 | -0.11 | | |
cg22413784 | | Hypomethylation | | 0.05 | -0.48 | 0.02 | -0.58 | 0.01 | -0.68 |
TANC1 | 3,5 | | | | | | | | |
cg21390755 | | Hypomethylation | | 0.11 | -0.4 | 0.03 | -0.57 | 0.12 | -0.45 |
ASCL2 | 5,4 | | | | | | | | |
cg22346124 | | Hypomethylation | Island | | | 0.13 | -0.42 | | |
cg10290276 | | Hypomethylation | Island | | | 0.03 | -0.55 | | |
cg13930892 | | Hypomethylation | Island | 0.004 | -0.66 | 0.01 | -0.65 | 0.02 | -0.6 |
cg20912770 | | Hypomethylation | Island | | | 0.05 | -0.52 | | |
cg10526374 | | Hypomethylation | Island | | | 0.05 | -0.51 | | |
cg06263495 | | Hypomethylation | Island | 0.01 | -0.6 | 0.02 | -0.59 | | |
cg26051413 | | Hypomethylation | Island | | | | | 0.52 | -0.19 |
cg11644479 | | Hypomethylation | Island | | | | | 0.02 | -0.62 |
cg13762320 | | Hypomethylation | Island | | | | | 0.03 | -0.58 |
cg19284039 | | Hypomethylation | Island | | | | | 0.09 | -0.48 |
FBN2 | -6 | | | | | | | | |
cg23696863 | | Hypomethylation | | 0.05 | 0.49 | 0.09 | 0.46 | 0.12 | 0.45 |
MIR4435-2HG | 4,4 | | | | | | | | |
cg24783876 | | Hypomethylation | | 0.02 | -0.55 | 0.05 | -0.51 | 0.06 | -0.53 |
SLC18A2 | 8,7 | | | | | | | | |
cg03570973 | | Hypomethylation | N_Shore | 0.001 | -0.72 | 0.001 | -0.77 | 0.0009 | -0.8 |
cg19721867 | | Hypomethylation | Island | | | 0.003 | -0.72 | | |
cg10245915 | | Hypomethylation | Island | | | 0.01 | -0.65 | | |
cg01043119 | | Hypomethylation | Island | | | 0.0005 | -0.8 | | |
cg19617377 | | Hypomethylation | Island | | | 0.003 | -0.72 | | |
cg00512279 | | Hypomethylation | Island | | | 0.001 | -0.76 | 0.03 | -0.58 |
cg08521987 | | Hypomethylation | Island | | | 0.05 | -0.51 | 0.007 | -0.7 |
cg27186877 | | Hypomethylation | | | | | | 0.03 | -0.59 |
Whitespaces indicate that not differentially methylated CpGs were identified in this comparison group. MRD, minimal residual disease. FC, fold change. Bold p-values indicate that CpGs are present in all three comparison groups and are statistically significant. |
We selected six additional DEGs from the analyses between the MRD+/+ vs. MRD-/- comparison because it would allow the identification of specific genes of MRD+/+ patients, who have the worst prognosis; these genes were overexpressed and had GDMCpGs, and there was an inverse correlation between them. The resulting genes were DAPK1, NPDC1, CNKSR3, ITGA6, SCL45A3, and CTHRC1 (Additional file 2). Therefore, the 4 common genes between groups (BOC, MIR4435-2HG, SCL18A2, ASCL2) and the 6 genes derived from the MRD-/- vs MRD+/+ analysis (DAPK1, NPDC1, CNKSR3, ITGA6, SCL45A3, and CTHRC1) were selected for verification by RT-qPCR. All genes showed correlation between techniques except BOC and SLC18A2 (Fig. 4).
Due to the low incidence of the disease(24) and the small number of MRD + patients, samples from the MDR + patients in the discovery cohort, together with those from the validation cohort, were used for RT-qPCR analyses. Consistent with the RNA-seq results, MIR4435-2HG, CNKSR3, ITGA6, NPDC1, and DAPK1 were overexpressed in MRD + patients in all comparison groups and had differential FC values between conditions (Fig. 5). We used logistic regression to test if genes could predict response to induction chemotherapy. MIR4435-2HG was found to be the best predictor of whether a patient will be a MRD-/-, MRD+/+ (Figs. 6a-b), or MRD + at day 15 of induction (Figs. 6c-d). Regarding DAPK1, ASCL2, SCL45A3, NPDC1 and ITGA6, it was observed that they could also predict response at day 15 or whether the patient will be a true responder or refractory (Additional file 3 and 4); however, both MIR4435-2HG and the other genes do not predict whether the patient will respond at the end of induction (Additional file 5). To test whether MIR4435-2HG could also predict the risk of death, we performed logistic regression using our RNA-seq counts. We observed that MIR4435-2HG can predict death with good sensitivity and specificity (Fig. 6e-f). When analyzing the relationship between normalized RNA-seq counts of MIR4435-2HG and the probability of non-response to treatment at day 15 or being refractory, we found that patients with counts > 5.1 had a 66% probability of being refractory to treatment. This probability increased proportionally to gene expression. In addition, for patients with counts > 4.97, the probability of therapeutic failure at day 15 was > 60% and increased progressively as MIR4435-2HG overexpression increased. Similarly, the probability of death increased when counts were > 7.0 (Table 3).
Table 3
ANALYSIS OF PROBABILITY OF DEATH, REFRACTORINESS, FAILURE AT DAY 15 OF CHEMOTHERAPY INDUCTION ACCORDING TO MIR4435-2HG COUNTS
RNAseq COUNTS MIR4435-2HG | DEATH (%) | PROBABILITY OF REFRACTORY (%) | RNAseq COUNTS MIR4435-2HG | DEATH (%) | FAILURE RESPONSE DAY 15 (%) |
4,42 | 32 | 1 | 4,36 | 15 | 25 |
4,45 | 33 | 2 | 4,52 | 17 | 33 |
4,55 | 33 | 3 | 4,54 | 17 | 34 |
4,56 | 33 | 3 | 4,64 | 18 | 40 |
4,74 | 35 | 10 | 4,64 | 18 | 40 |
4,89 | 36 | 24 | 4,78 | 20 | 48 |
4,91 | 37 | 27 | 4,81 | 20 | 50 |
5,15 | 39 | 66 | 4,81 | 20 | 50 |
5,15 | 39 | 66 | 4,85 | 20 | 53 |
5,55 | 43 | 97 | 4,96 | 22 | 59 |
6,05 | 49 | 100 | 4,97 | 22 | 60 |
6,31 | 51 | 100 | 5 | 22 | 62 |
7,2 | 61 | 100 | 5,14 | 24 | 69 |
9,44 | 80 | 100 | 5,21 | 25 | 73 |
| | | 5,24 | 26 | 74 |
| | | 5,35 | 28 | 79 |
| | | 5,46 | 29 | 83 |
| | | 5,59 | 32 | 87 |
| | | 6,03 | 39 | 95 |
| | | 6,06 | 40 | 96 |
| | | 6,1 | 41 | 96 |
| | | 6,23 | 43 | 97 |
| | | 6,31 | 45 | 98 |
| | | 7,18 | 62 | 100 |
| | | 7,56 | 68 | 100 |
| | | 8,09 | 77 | 100 |
| | | 9,4 | 90 | 100 |
To test whether the identified genes could also be useful as prognostic biomarkers of relapse or death, first, we compared FC values between patients with MRD-/- and patients who relapsed using RT-qPCR analysis. Unfortunately, we did not have the diagnostic samples of the patients who relapsed, so we used the MRD-/- samples as the normal expression value because they have the best response to treatment. We found that DAPK1 and CNKSR3 had higher FC in relapse samples than in MRD-/- samples, while MIR4435-2HG showed a tendency (Fig. 7a). No difference was observed with ITGA6, NPDC1, ASCL2, CTHRC1 and SLC45A3 (Additional file 6).
Because in the previous analysis MIR4435-2HG only showed a tendency, we decided to analyze its expression in a different cohort using normalized RNA-seq counts obtained from the BM of 58 children with recurrent ALL from TARGET Pan-Cancer database. The 5-year survival analysis showed that MIR4435-2HG overexpression is a poor prognostic variable for patient survival (Fig. 7a). Interestingly, average counts in this cohort (7.82) correlate with our cutoff points, which would give a 100% probability of being refractory and an 80% probability of dying. Nevertheless, in this database, DAPK1 and CNKSR3 were not correlated with survival.
Four Cox regression models were performed to determine which variable might affect patient survival. The first model employed the clinical variables used currently to define the risk of death; in this model, no variable conferred risk of death. The second model used the same clinical variables and gene overexpression; in this model, no variable conferred risk either. However, in the third model, when using only MRD and the selected genes, MIR4435-2HG overexpression was the only variable that increased the risk of death 29 times. Similarly, in the fourth model, when evaluating the entire gene profile, MIR4435-2HG overexpression affected the risk of death (Table 3).
Table 3
MULTIPLE COX REGRESSION USING CURRENTLY CLINICAL VARIABLES AND GENE PROFILE
| MODEL 1. CURRENTLY USED CLINICAL VARIABLES | MODEL 2. CURRENTLY USED CLINICAL VARIABLES AND GENE PROFILE | MODEL 3. GENE PROFILE AND MRD | MODEL 4. GENE PROFILE |
PARAMETER | | | p-value | OR (95%CI) | p-value | OR (95%CI) | p-value | OR (95%CI) |
Age | 0.76 | NC | 0.46 | NC | | | | |
WBC | 0.62 | NC | > 0.99 | NC | | | | |
Extramedullar infiltration | > 0.99 | NC | > 0.99 | NC | | | | |
Corticoid response | 0.15 | NC | > 0.99 | NC | | | | |
MRD + day 15 | 0.36 | NC | 0.51 | 0.32 (0.002–11.91) | 0.34 | 0.25 (0.006–3.80) | | |
MRD + final induction | 0.22 | NC | 0.36 | 3.71 (0.1786–121.0) | 0.23 | 3.49 (0.357–37.17) | | |
MIR4435-2HG | | | 0.22 | NC | 0.01 | 29.42 (2.52-1 067) | 0.005 | 16.47 (2.49–157.2) |
DAPK1 | | | > 0.99 | NC | 0.41 | 3.12 (0.16–96.95) | 0.32 | 2.87 (0.26–24.31) |
ITGA6 | | | 0.18 | NC | 0.11 | 0.03 (0.00002–1.59) | 0.14 | 0.09 (0.002–2.27) |
NPDC1 | | | > 0.99 | NC | 0.10 | 13.22 (0.68–631) | 0.09 | 9.40 (0.90–221.5) |
CNKSR3 | | | 0.92 | NC | 0.87 | 1.17 (0.11–9.55) | 0.97 | 0.97 (0.11–7.07) |
ASCL2 | | | > 0.99 | NC | 0.73 | 1.58 (0.08–29.20) | 0.92 | 0.88 (0.05–11.05) |
CTHRC1 | | | > 0.99 | NC | 0.52 | 0.43 (0.02–5.49) | 0.24 | 0.26 (0.01–2.56) |
SCL45A3 | | | > 0.99 | NC | 0.51 | 2.55 (0.08–45.41) | 0.54 | 2.11 (0.11–24.78) |
MRD + indicates minimal residual disease positive. NC, No calculable. WBC, white blood counts. Bold p-values indicate statistical significance (p < 0.05). |
Strikingly, survival analysis showed that patients with MIR4435-2HG overexpression have worse survival; however, it is important to validate this result in a larger cohort of patients (Fig. 7b). However, because MRD is the most used variable to define risk of death, for survival analysis, we considered MRD at the end of induction, effectively showing that MRD- patients have better survival than MRD+ (Fig. 7c). Intriguingly, when we compared survival combining MRD + with MIR4435-2HG overexpression vs. MRD- with down-expression of MIR4435-2HG, we observed that these two variables separate the survival curves better, where patients with MRD + and MIR4435-2HG overexpression presented worse survival (Fig. 7d).
Finally, since more than half of our patients had normal karyotype, we wanted to test whether selected genes could improve risk classification in patients with normal karyotype. Surprisingly, we found that simultaneous overexpression of MIR4435-2HG, DAPK1, and ASLC2 was associated with poorer survival in these patients compared to those who did not overexpress them (Fig. 7e).