An Comprehensive Index (CIB) with Combination of Consistency in both Case Control and Cohort Study to Determine the Efficacy of a Biomarker

doi:10.21203/rs.3.rs-111055/v1

Download PDF

Research

An Comprehensive Index (CIB) with Combination of Consistency in both Case Control and Cohort Study to Determine the Efficacy of a Biomarker

https://doi.org/10.21203/rs.3.rs-111055/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background: This study aimed to describe biomarkers using the comprehensive index of biomarker (CIB) based on consistency rate in both case control (Youden index, Yen) and cohort study (Crc) to determine the efficacy of a biomarker.

Methods: The CIB is the geometric mean of Yen and Crc. The simulated data were generated to observe CIB features of sensitivity, specificity, and receiver operating characteristic (ROC) analysis for biomarkers

Results: CIB was found to be related to the CRC values ROC analysis. The higher Spe could indicate better diagnostic power and the higher Sen could indicate better joint action for biomarkers with the same Yen. Although Yen is the common index used to evaluate the effectiveness of a biomarker, unfortunately, the Yen value was significantly larger than CIB value under the moderate Spe, showing overestimation.

Conclusion: The CIB with combination of consistency in both case control and cohort study could be more reasonable. The CIB could provide a better understanding of the power of a biomarker and would be better at evaluating biomarkers from new systems or concepts.

Biomedical Engineering

biomarker

cohort study

diagnosis

Youden index

One of the main purposes of seeking biomarkers is diagnosis of a disease. However, most studies to identify biomarkers use case-control studies rather than cohort studies [1-3]. In case-control studies, the potential relationship between a biomarker and the disease is examined by comparing frequencies of this biomarker in the diseased and non-diseased subjects. The efficacy of a biomarker is normally described in terms of changes in consistency (Youden index, Yen) [2-4].

In a cohort study, a suspected biomarker should be considered an exposure factor, and the exposed and unexposed subjects should be observed until they develop the disease. This type of research design is chronologically consistent in that we diagnose the disease from the biomarker; therefore, a cohort study also has a stronger ability to test a biomarker [5-7].

In cohort studies, the difference in a disease’s incidence between an exposed and non-exposed group, which also is consistency rate in cohort study (Crc), indicates the role of the observed factor in the disease’s pathogenesis [8-10]. A definite relationship between the results of a case-control study and a cohort study is as follows [11],

where Pe and Pn represent the disease’s incidence in the exposed and non-exposed (biomarker) groups, respectively; Pd and Pc represent the frequencies of the observation factor (biomarker) in disease group and in the control group, respectively, in the case-control study; “m” represents the incidence in the total population (generally, the disease is an event with a small probability; therefore, m was assigned a value of 1% in the present study) and Crc represents consistency rate in cohort study.

We find that results of a case-control study and a cohort study is not always parallel. For example, if the occurrence probability of a biomarker is 0.80 in the disease group and 0.05 in the control group, while its Yen is 0.80 (0.85–0.05) and Crc is 0.145. When the cardinal number is relatively large (0.90 vs 0.10, Yen=0.8), the Crc is 0.082. The problem is that there was a significant difference between the Yen and Crc in a small probability event (say m=0.01). The occurrence of disease is a small probability event; therefore, the Yen was significantly larger in the case-control design, showing overestimation. This represents a serious problem for determining the efficacy of a biomarker. In the present study, we propose an comprehensive index of biomarker (CIB) with combination of consistency in both case control and cohort study to determine the efficacy of a biomarker.

Calculation of CIB

The basic principle of the analysis model is to comprehensively consider consistency in case control study and cohort study to determine the efficacy of a biomarker. The efficacy of a biomarker is normally described in terms of Yen, which is the sum of the positive rates of a biomarker in the disease group and the negative rates of this biomarker in the control group minus 1 as follows:

Yen = Pd-(Pc-1)-1 = Pd-Pc

where Pd and Pc represent the observed frequencies of a biomarker in the disease group and the control group, respectively, from the case-control study.

The consistency in cohort study (Crc) is the sum of the incidence in the exposure group (positive group of a biomarker) and the healthy rate in the non-exposure group (negative group of a biomarker) minus 1 as follows:

Crc = Pe-(Pn-1)-1 = Pe-Pn

where Pe and Pn represent the incidence in the exposed group and non-exposed group, respectively, from the cohort study.

We define the geometric mean of Yen and Crc as comprehensive index (CIB) as follows:

The geometric means are given because this mean tend to smaller numeric values. The range of CIB was (0～1), a larger CIB implied a stronger power of a biomarker.

Evaluation of ROC analysis

The receiver operating characteristic (ROC) analysis is the common method used to evaluate the effectiveness of diagnosis made using a biomarker [1,2,12]. In present study, ROC analysis was evaluated based on CIB whether the ROC analysis was still available or not.

A model comprising four sets of simulation data was established. Four sets of normally distributed random numbers (100 ± 20, n = 5000; 115 ± 20, n = 5000; 125 ± 20, n = 5000; 140 ± 20, n = 5000) were generated using the SPSS statistical software (IBM Corp., Armonk, NY, USA). Model A consisted of the datasets of 100 ± 20 and 115 ± 20; Model B consisted of the datasets of 100 ± 20 and 125 ± 20, and Model C consisted of the datasets of 100 ± 20 and 140 ± 20. The receiver operating characteristic (ROC) analysis was performed as shown in Figure 1.

Evaluation of sensitivity and specificity

Most studies that attempt to identify biomarkers use a case-control design rather than a cohort design. In case-control studies, the potential relationship between a biomarker and the disease is examined by comparing the frequencies of this biomarker in the diseased and non-diseased (control) groups. With the case-control approach, biomarkers are assessed in already diseased individuals, and the power of a biomarker is typically expressed as the positive rates of a biomarker in the disease group (referred to as sensitivity, Sen) and the negative rates of the biomarker in the control group (referred to as specificity, Spe) [4]. However, even for biomarkers with the same Youden index, the diagnostic power may be different. Further, it is unclear whether the Sen or Spe is more relevant with CIB for biomarkers with the same Youden index. If the cardinal number (value in the control group) is relatively small (and Spe is higher), CIB could change in spite of these biomarkers with the same Yen. Evaluation of Sen and Spe in the case-control study based on CIB values was performed using the values shown in Table 1.

Table 1

Evaluation of sensitivity (Sen) and specificity (Spe) in a case-control study based on comprehensive index of biomarker (CIB)
Higher Sen with lower Spe				Higher Spe with lower Sen
Sen	1-Spe	Yen	CIB	Sen	1-Spe	Yen	CIB
0.999	0.500	0.499	0.020	0.500	0.001	0.499	0.643
0.999	0.400	0.599	0.025	0.600	0.001	0.599	0.715
0.999	0.300	0.699	0.033	0.700	0.001	0.699	0.781
0.999	0.200	0.799	0.048	0.800	0.001	0.799	0.842
0.999	0.100	0.899	0.092	0.900	0.001	0.899	0.899
The incidence in the total population is considered as 1% for calculating CIB

Combination of two biomarkers based on CIB

Under ideal conditions, the power of a combination of two biomarkers would be better than the power of a single biomarker. Further, it is unclear whether biomarkers with the same CIB were combined, the combined power (CIB) would be similar or not. According to the above assumptions, we have chosen the simulated data analytical method to solves this problem.

We assume genetic markers with those expected under the hypothesis of panmixia (Hardy-Weinberg equation), and establish the simulated data (1 and 0 standing for positive and negative) on the SPSS platform according to random numbers; the frequencies of each group are generated by design, each group including 5000 cases (n=5000) and two items (genetic markers); the allele frequency of each item is same and the positive distribution is independent in one group.

Two simulated data groups are selected as disease group and control group depending on design, then CIB are calculated (m=1%). The joint action of multiple indices is evaluated with binary logistic regression[4] and a new CIB are calculated again.

RelationshipbetweenYen and CIB

Yen is the common index used to evaluate the effectiveness of a biomarker or diagnosis made using a biomarker. Further, it is necessary to know that relationship between Yen and CIB. Different Yen with the moderate cardinal number was generated using simulated data as shown in Table 2. The scatter diagram was plotted using the Yen as X-axis and CIB as Y-axis.

Table 2

Observation of relationship between Youden index (Yen) and comprehensive index of biomarker (CIB)
Sen	1-Spe	Yen	CIB	Sen	1-Spe	Yen	CIB
0.55	0.45	0.10	0.020	0.80	0.20	0.60	0.148
0.60	0.40	0.20	0.041	0.85	0.15	0.70	0.191
0.65	0.35	0.30	0.062	0.90	0.10	0.80	0.256
0.70	0.30	0.40	0.087	0.95	0.05	0.90	0.380
0.75	0.25	0.50	0.114	1.00	0.00	1.00	1.000
Sen: sensitivity; Spe: specificity; the incidence in the total population is considered as 1% for calculating CIB.

For ROC analysis simulation, the simulated data sample size was 5000, and the results for the case-control study are shown in Table 3. The results showed that CIB resulted in an increase in Area Under Curve (AUC) in the ROC analysis. Thus, the ROC analysis could be used as a reference of CIB.

Table 3

Relationship between the ROC analysis and comprehensive index of biomarker (CIB)
Model	Cut-off	Sen	1-Spe	Yen	Crc	CIB
	145.0	0.07	0.01	0.06	0.06	0.058
	132.5	0.19	0.05	0.14	0.03	0.063
Model A	125.5	0.30	0.10	0.20	0.02	0.066
AUC = 0.701	116.7	0.46	0.20	0.26	0.02	0.064
	110.6	0.58	0.30	0.28	0.01	0.061
	105.5	0.68	0.40	0.28	0.01	0.057
	100.2	0.77	0.50	0.27	0.01	0.054
	144.2	0.17	0.01	0.16	0.14	0.149
	131.5	0.37	0.05	0.32	0.06	0.142
Model B	124.9	0.49	0.10	0.39	0.04	0.127
AUC = 0.814	116.0	0.67	0.20	0.47	0.03	0.116
	110.2	0.77	0.30	0.47	0.02	0.102
	104.7	0.84	0.40	0.44	0.02	0.089
	99.7	0.89	0.50	0.39	0.02	0.078
	148.0	0.35	0.01	0.34	0.25	0.294
	133.0	0.65	0.05	0.60	0.11	0.260
Model C	126.1	0.76	0.10	0.66	0.07	0.213
AUC = 0.922	116.9	0.89	0.20	0.69	0.04	0.169
	110.8	0.93	0.30	0.63	0.03	0.136
	105.3	0.96	0.40	0.56	0.02	0.113
	99.9	0.98	0.50	0.48	0.02	0.096
AUC: Area Under Curve; Sen: sensitivity; Spe: specificity; Yen: Youden index; Crc: consistency rate in cohort study; the incidence in the total population is considered as 1% for calculating CIB

The Sen and Spe of biomarkers in a case-control study were evaluated based on CIB values as shown in Table 1. The values in the table indicate that higher Spe (or a lower false-positive rate) could indicate better diagnostic power (CIB) for biomarkers with the same Yen.

In Table 4, the combined different cardinal numbers and CIB values for biomarkers with the same CIB values are shown. A combination of two biomarkers was found to have more significant power (as the CIB increased when a combination of two biomarkers was used), however, CIBs for biomarkers with the same CIB were not similar when two biomarkers were combined.

Table 4

Combination of two biomarkers with the same CIB but different cardinal numbers
Marker	Group	One marker		Two markers (A₁A₂ or B₁B₂) combined
		Positive	CIB	Positive	CIB
A Lower cardinal number	Disease	0.450	0.176	0.698	0.196
	Control	0.050		0.098
B Higher cardinal number	Disease	0.915	0.176	0.837	0.371
	Control	0.200		0.040

Relationship between Yen and CIB are shown in Fig. 2. A plotted scatter diagram revealed that when the available CIB level was defined as 0.5, the Yen was > 0.90 to reach 0.5 for a 0.01 incidence rate in the total population.

Biomarkers are used for disease diagnosis; therefore, the CIB could be more reasonable. Even so, the Yen is still important, thus suggesting CIB that is an comprehensive index with combination of consistency in both case control and cohort study to quantitatively describe the power of a biomarker and evaluate the effectiveness of a biomarker or diagnosis made using a biomarker. Fortunately, the ROC analysis could still be available and used as a reference of CIB, however, there were more features for CIB. The results indicated that higher Spe could indicate better diagnostic power and that higher Sen could indicate better joint action for biomarkers with the same Yen.

Because the CIB range is typically 0–1, we still propose that a CIB > 0.50 is considered to have clinical value [4]. Accordingly, the Yen over 0.9 could reach clinical value (Fig. 2). Therefore, to obtain a high-power CIB value, a combination of two or more biomarkers is necessary.

More importantly, we found that the Yen value from the case-control design was significantly larger than CIB value, showing overestimation. Another example is the analyses of genetic associations (screening genetic marker), which have been successful in mapping genes, but clinically disappointing because of inconsistent findings, which has been partly attributed to overestimations in case-control studies. Statistical differences do not necessarily represent strong clinical effects. Except for Mendelian diseases, significant associations are difficult to detect because few genes have a Yen over 0.9. Hence, it might be misleading to pay attention only to the results of Yen.

It should be pointed out that to simplify the calculation, the CIB value in present study was not equal to the actual CIB values. When the natural incidence is given for a disease, the definite CIB could be calculated by data from case-control studies without difficulty.

The CIB with combination of consistency in both case control and cohort study could be more reasonable than Yen for determining the efficacy of a biomarker. We propose that the CIB provides a better understanding of the power of a biomarker and would be better at evaluating biomarkers from new systems or concepts.

AUC: area under the curve; CIB: comprehensive index of biomarker; Crc: consistency rate in a cohort study; Pc: frequencies of a biomarker in the control group in case-control study; Pd: frequencies of a biomarker in the disease group in case-control study; Pe: incidence in the exposed group in cohort study; Pn: incidence in the non-exposed group in cohort study; ROC: receiver operating characteristic; Sen: sensitivity; Spe: specificity; Yen: Youden index

Ethics approval and consent to participate

This article does not contain any studies with human participants or animals performed.

Consent to publish

Not applicable

Availability of data and materials

The data used to support the findings of this study are available from the corresponding author upon request.

Competing interests

None declared

Funding

Technological Innovation of Dalian (2018J12SN084)

Authors' Contributions

L.H. conceived the analysis and wrote the final version of the manuscript. All authors have read and approved the manuscript.

Acknowledgments

I would like to thank the native English speaking scientists of Elixigen Company (Huntington Beach, California) for editing our manuscript.

Wan L, Zhu H, Gu Y, Liu H. Diagnostic value of trait antinuclear antibodies and multiple immunoglobulin production in autoimmune diseases. J Clin Lab Anal. 2018;32(4):e22361.
Hui L, Rixv L, Xiuying Z. A system for tumor heterogeneity evaluation and diagnosis based on tumor markers measured routinely in the laboratory. Clin 2015;48:1241-5.
Liping Wan, Shijun Li, Hui Liu. Diagnostic usefulness of trait specific IgE and multiple immunoglobulin production in allergic diseases. Int J Clin Exp Med. 2017;10(9):13577-13587.
Hui L, Liping G. Statistical estimation of diagnosis with genetic markers based on decision tree analysis of complex disease. Comput Biol Med. 2009;39(11):989-992.
Durr-E-Sadaf. How to apply evidence-based principles in clinical dentistry. J Multidiscip Healthc. 2019;12:131-136.
Wallace DK. Evidence-based medicine and levels of evidence. Am Orthopt J. 2010;60:2-5.
Burns PB, Rohrich RJ, Chung KC. The levels of evidence and their role in evidence-based medicine. Plast Reconstr Surg. 2011;128(1):305-10.
Hui L, Qigui L, Sashuang R, Xiliang L, Guihong L. Nonspecific changes in clinical laboratory indicators in unselected terminally ill patients and a model to predict survival time based on a prospective observational study. J Transl Med. 2014;12:78.
Palmas W. The CONSORT guidelines for noninferiority trials should be updated to go beyond the absolute risk difference. J Clin Epidemiol. 2017;83:6-7.
Wenbo L, Congxia B, Hui L. Genetic and environmental-genetic interaction rules for the myopia based on a family exposed to risk from a myopic environment. Gene. 2017; 626:305-8.
Liu Hui. Analysing the relationship between cohort and case-control study results based on model for multiple pathogenic factors. Comput Math Methods Med. 2019;2019:7507043.
Guang Y, Jie Z, Feng D, Hui L. Surrogate scale for evaluating respiratory function based on complete blood count parameters. J Clin Lab Anal. 2018;32(5):e22385.

Download PDF

Version 1

posted

You are reading this latest preprint version

An Comprehensive Index (CIB) with Combination of Consistency in both Case Control and Cohort Study to Determine the Efficacy of a Biomarker

Status:

Version 1

Abstract

Figures

Background

Methods

Calculation of CIB

Yen = Pd-(Pc-1)-1 = Pd-Pc

Crc = Pe-(Pn-1)-1 = Pe-Pn

Evaluation of ROC analysis

Evaluation of sensitivity and specificity

Combination of two biomarkers based on CIB

RelationshipbetweenYen and CIB

Results

Discussion

Conclusion

Abbreviations

Declarations

Ethics approval and consent to participate

Consent to publish

Availability of data and materials

Competing interests

Funding

Authors' Contributions

Acknowledgments

References

Status:

Version 1