Figure 1 shows an example of the miRNAs detection in the urine of a healthy individual. Not all of the miRNAs we detected in urine are present in all subjects for each of the time points. We aimed at discovering personalized biomarker panels, which would allow anyone to make informed decisions and alert about the need for lifestyle changes or to seek medical advice. In this context, we sought to identify meaningful patterns and predictive trends in miRNA levels present in patient urine and blood specimens. First, we performed a literature search and identified a panel of 25 high fidelity lung cancer biomarkers, based on miRNAs each previously linked to lung cancer development, progression and drug resistance in multiple (between three and eight) papers on the analyses of blood of lung cancer patients and primary lung tumors. Lung cancer is the leading cause of cancer-related deaths and claims more lives each year than all other major cancers combined. Lung cancers are generally diagnosed at an advanced stage because patients lack symptoms in the early stages of the disease. We measured an increase in the levels of 16 of these 25 miRNAs for one of the healthy individuals in our baseline cohort (Fig. 2). The most commonly published biomarkers of lung cancer, which we found to have an increase longitudinally are miRNA-21-3p, miRNA-140-3p and miRNA-93-3p. We did not consider those miRNAs for which there have been reports in the literature of a downregulation in disease, because their levels might appear decreased in urine for other physiological reasons than cancers. After establishing that it was feasible to identify published cancer miRNAs in urine, we sought to investigate whether healthy donors retain similar patterns in their miRNA profiles longitudinally via principal component analysis (Fig. 3). To this end, we processed samples of 15 healthy individuals collected every two months for three time points, i.e., 45 samples in total. This analysis showed that even if there is a certain level of variability, (i) each three longitudinal samples from the same healthy individual cluster together - at least for two of the three time points - and (ii) the longitudinal sample triplets for each of the different healthy individuals form separate clusters (see on Fig. 3 sample triplets per individual color-coded in the same color – pink for donor #1, red for donor #2, cyan for donor #3, blue for donor #4, brown for donor #5, yellow for donor #6, orange for donor #7, etc.). This demonstrates that, in normal physiology, urinary miRNA panels can identify the same person longitudinally.
One of our aims has been to discover personalized disease-specific miRNAs biomarker panels based on significant changes in organ or tissue regulation in disease. As a first step in this regard, we aimed at discovering population-based panels of biomarkers. The small sizes of our preliminary datasets preclude utilization of standard methods, e.g., differential expression analysis (Robinson et al., 2010), so we opted to perform information theory-based computation to identify biomarkers. As the size of our datasets increases, so will our ability to make meaningful biomarker identifications. For the selection of disease-specific panels, several approaches are available for feature selection, for instance, by maximization of mutual information (Jiao et al., 2015) or applying the “maximum of the minimum criterion” (Bennasar M., 2015). The preliminary methodology we utilized for computation of relative entropy via Kullback-Leibler (KL) divergence is based on an adaptive minimax rate-optimal estimator (Han et al., 2016) of the changes in disease from healthy state(s) to cancerous lesion(s) and malignant tumor(s). Consider the KL divergence as:
$$\:D\left(P\right|\left|Q\right)\triangleq\:\left\{\begin{array}{c}{\sum\:}_{i=1}^{S}{p}_{i}ln\frac{{p}_{i}}{{q}_{i}}\:if\:P\ll\:Q,\\\:+\infty\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:otherwise,\end{array}\right.$$
1
where two patient cohorts are considered, \(\:P=\{{p}_{1},\:\dots\:,\:{p}_{\text{s}}\}\) and \(\:Q=\{{q}_{1},\:\dots\:,\:{q}_{\text{s}}\}\), over a common set of miRNAs of length S (S = 947 miRNAs for this dataset). Testing this approach for significant feature selections on a small cohort of 13 stage IV lung cancer patients’ urine samples, we selected miRNAs by computing Eq. (1). We set our threshold at 1.2 bits divergence empirically and this allowed us to identify 20 biomarkers discriminative of lung cancer (Fig. 4). All 20 biomarkers have previously been published in the literature on lung cancer based on analyses of primary lung tumors and blood from lung cancer patients (Bao et al., 2018; Wan and Zheng, 2021; Wang et al., 2014). This selection demonstrates the suitability and ability of this method for the identification and selection of disease-specific biomarkers. Even if this computation is based on small sample cohorts and it is not patient-specific, it indicates the possibility to detect lung cancer in urine samples. Our objective will be to provide every individual with the option to make data-driven decisions in the context of the prevention of the progression of degenerative diseases.