Detecting Dengue Fever in Children Using A Combined Scheme Compared with Individual Algorithms: An APP Development and Usability Study

doi:10.21203/rs.3.rs-551659/v1

Download PDF

Research

Detecting Dengue Fever in Children Using A Combined Scheme Compared with Individual Algorithms: An APP Development and Usability Study

https://doi.org/10.21203/rs.3.rs-551659/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

Background

Dengue fever (DF) is an important public health issue in Asia. However, the disease is extremely hard to detect using traditional dichotomous (i.e., absent vs. present) evaluations of symptoms. Convolution neural network (CNN) and artificial neural networks(ANN) can improve prediction accuracy on account of its usage of a large number of parameters for modeling. A hypothesis using a combined scheme of algorithms, including convolutional neural networks(CNN), artificial neural networks(ANN), K-nearest Neighbors Algorithm(KNN), and logis-tical regression(LR), was made to improve the prediction DF accuracy for children.

Methods

We extracted 19 feature variables of DF-related symptoms from 177 pediatric patients (69 diagnosed with DF). A 11-variables were eligible by observing the statistical significance in predicting DF risk. The prediction accuracy was based on two training (80%) and testing (20%) sets on model accuracy of the area under the receiver operating characteristic curve (AUC) greater than 0.80 and 0.70, respectively, for discriminating DF+ and DF− in the two sets. Two scenarios of the combined scheme and individual algorithms were compared using the training set to predict the testing set.

Results

We observed that (i) k-nearest neighbors algorithm has poorer AUC(<0.50), (ii)LR has relatively higher AUC(=0.70), and (ii) the three alternatives have almost equal AUC(=0.68), but smaller than the individual algorithms of NaiveBayes, Logistic regression in raw data and NaiveBayes in normalized data.

Conclusion

An LR-based APP was designed to detect DF in children. The 11-item model is suggested to develop the APP for helping patients, family members, and clinicians discriminate DF from other febrile illnesses at an early stage.

Health Economics & Outcomes Research

dengue fever

receiver operating characteristic curve

artificial neural networks

logistic regression

WEKA

Excel module

Dengue virus infection is one of the most common mosquito-borne human viral diseases worldwide [1,2]. The infection causes a flu-like illness with symptoms ranging from mild dengue fever (DF) to severe DF syndrome [3-7]; occasionally, a potentially lethal complication called severe dengue may develop [8]. Since severe dengue was first recognized in the 1950s during dengue epidemics in the Philippines and Thailand [9,10], the incidence of the disease has dramatically increased throughout the world [11,12]. Thus, an application (APP) for self-assessment of DF is needed to help patients, family members, and clinicians identify the disease at an early stage.

1.1 APP required to assess DF at an early stage

DF is frequently found in tropical and sub-tropical climates, especially in urban and semi-urban areas [8]. The 2014 dengue outbreak in Tokyo was notable because this event marked the first time in 70 years that Japan had experienced autochthonous transmissions [13]; thus, precautions for emerging infectious threats during the 2020(or later in 2021) Summer Olympics and Paralympics in Tokyo were proposed [14]. Feasible and efficient approaches to assess DF have been published in the literature [1.4,15-18], but no related study has yet developed a useful APP as an accurate and rapid diagnostic screening tool for DF at an early stage in the clinical setting.

Some studies [5, 6, 12] have used univariate analysis to report a presumptive diagnosis of DF, but the results are usually imprecise. On the other hand, multivariate logistic regression cannot accurately distinguish patients with DF from those with other febrile illnesses [19]. Additionally, the sensitivity (Sens) and specificity (Spec) for detecting DF are lower than 0.80, and the area under the receiver operating characteristic curve (AUC) did not exceed 0.90 [4,20-22]. Moreover, because expensive laboratory tests (e.g., Dengue Duo Immunoglobulin M and Rapid Strips, Panbio, Queensland, Australia) are usually required to confirm DF infection [4,12,20], developing new methods to increase the accuracy(ACC) of predicting DF using only eligible symptoms is urgently required.

1.2 Convolution neural network, Artificial Neural Networks, and Logistical Regression

DF is often assessed using dichotomous (i.e., absent vs. present) evaluations of symptoms. The dependent variable (DF+ vs. DF−) is traditionally predicted using independent evaluations with summations [4,12, 20-22]. Sens, Spec, and AUC are among the most common indicators of DF prediction accuracy. Sens is generally lower than Spec in DF prediction [1,12,23]. Whether the convolution neural network (CNN)[24-27], Artificial Neural Networks(ANN)[28,29], K-nearest Neighbors Algorithm(KNN)[30], and/or Logistical Regression(LR)[31-35] can improve the prediction accuracy of DF must be verified.

1.3 Combination of Algorithms to Improve Prediction Accuracy

The K-nearest Neighbors Algorithm, KNN) and LR enable a straightforward interpreta-tion of results [29]. The ANN and KNN have widely used classifiers representing two differ-ent machine learning concepts [36]. The ANN uses a global data-based optimization method as typically trained using all samples to build a single “global” optimization target function to cover the entire feature domain[37]. The primary advantage of ANN is its ability to approximate any function given a sufficiently complex architecture. However, the drawback of ANN is over-fitting the training data during ANN optimization, potentially resulting in poor testing performance[37].

On the other hand, KNN uses a local instance-based learning method as adaptively building different local approximations to the target function depending on the “neighbor-hood” of the test case. KNN has an advantage when the target function is very complex as it can be generally described by a collection of less complex local approximations [37]. None-theless, the primary disadvantage of KNN is its sensitivity to the data noise (including both in selecting neighbors and features)[37].

The nearest class mean (NCM) classifiers[38] considered the mean score in the algo-rithm to obtain a better classification. Due to the selective sensitiveness of the neighborhood size k, the simple majority vote makes KNN-based classification performance be easily degraded, especially in the small training sample size cases[39]. The local mean representation-based k-nearest neighbor classifier (LMRKNN) was proposed to improve the prediction accuracy. We are motivated to develop a combined scheme of algorithms that can improve the prediction accuracy of DF in Children.

1.4 A Challenge Encountered in the Current Study

A challenge encountered by many researchers is the lack of an accurate diagnostic screening tool for predicting DF. Thus far, no published study has yet described the use of the CNN/ANN/KNN/LG algorithms in an APP to assess DF at an early stage. In the present study, we applied a combined scheme to build a DF prediction model and verified whether the Sens and Spec might be higher than other counterparts for predicting DF.

1.5 Study Purposes

One hypothesis was made in this work: a combined scheme of algorithms can im-prove the prediction accuracy of DF in children. Three tasks would be achieved, including (i) extracting feature variables, (ii) Comparing the combination scheme with individual algorithms, and (iii) developing an APP to help alert patients, family members, and clinicians to the possibility of DF at an early stage.

2.1 Data source

A sample of 177 pediatric patients (age ≤16 years; DF+: 69; DF−: 108) was extracted from a previous article [4]; see Appendix 1. Feature variables were extracted from the collected 19 DF-related symptoms, including (1) personal history of DF, (2) family history of DF,(3) mosquito bites within the previous two weeks, (4) fever ≥39°C, (5) biphasic fever, (6)erythema, (7) skin rash, (8) petechiae, (9) headache, (10) myalgia, (11) abdominal pain, (12) vomiting, (13) soft (watery) stool, (14) cough, (15) sore throat, (16) anorexia, (17) weak sense, (18) bone pain (arthralgia), and (19) flushed skin.

All data used in this study were downloaded from a previous article [4]. Given its design, this study does not require ethical approval according to the regulations of the Taiwan Ministry of Health and Welfare.

2.2 Combination of Algorithms to Improve DF Classification

Four prediction models, including CNN, ANN, KNN, and LR, were proposed to compare the DF classification with individual algorithms. The two CNN and ANN have been mentioned with Microsoft(MS) Excel modules in studies[24-26].

2.2.1 The KNN Model Deposited In MS Excel

A KNN model with an MS Excel module is shown in Figure 1. After extracting feature variables, the KNN algorithm was applied with the following steps:

Step 1: Computing the Distance for Each Paired Case(at panel A in Figure 1)

In the n-case training sample, there are n rows and n columns to record the Euclidean distance for each pair player. For instance, the D2(=0) is the distance in the first play himself. The E2(=9.83) is the distance between the first and the second players.

Step 2: Sorting the distances in columns for Each Player(at the panel B in Figure 1)

All distances in columns were sorted in acceding order for players in rows. The shortest distances(=0) are placed in column D, followed by other shorted distances in the row(e.g., 6.48 and 6.84 in columns E and F for the first player in row 2).

Step 3: Labeling the Classifications Sorted by Distances in Columns for Each Row(at the panel C in Figure 1).

All sorted distances in columns were replaced with the corresponding digital labels(e.g., 1 and 0 for classification). For instance, the last four cases are labeled with 1 in the first three columns from D to F, and the first five cases with 0.

Step 4: Determining the k Value

We simulate the k values(i.e., the number of columns used to predict the classification) from 1 to 10 and select the highest accuracy rate as the nearest k value used for classification.

Step 5: Using the Mode Function in MS Excel to Classify the case Label in the k Value

An example of k=3 is shown at the bottom of Figure 1. Before classification, the red circle is possibly assigned into either class A or B. The nearest three distances of cases are compared using the mode function in MS Excel. In this case, the circle is assigned to be in Class A because the mode is 2 with squares in yellow. As such, the red circle player is assigned based on the majority vote of its k(=3) neighbors in KNN.

2.2.2 The LR Model Deposited In MS Excel

The LR model with an MS Excel module is shown in Figure 2 with the following steps:

Step 1: Actual labels(in Quadrant III of Figure 2)

In the n-case training sample, there are classes 0 and 1 in green and red, respectively.

Step 2: LR Model Building(in Quadrant IV of Figure 2)

The LR model was built in Quadrant IV of Figure 2. The logit formula (=a+WX) was set for each case.

Step 3: The Probability of Classification (in Quadrant I of Figure 2)

The probability(=prob=1/(1+exp(-1xlogit))=exp(logit)/(1+exp(logit))) was also assigned for each case.

Step 4: The Predicted Labels (in Quadrant II of Figure 2)

The predicted labels were set(i.e., as 0 if prob.<0.5, otherwise as 1).

Step 5: Minimizing the model Residual (in Quadrant III of Figure 2)

The model residual was determined by the MS function of SUMXMY2(range1:range2), where range1 was composed by the actual labels for each case with two columns(i.e., (0,1)

as DF+ and (1,0) as DF-), and range2 was constructed by the corresponding

probabilities of DF+ and DF-.

The MS solver was applied to estimate parameters a and W in Quadrant IV. That is, the interception coefficient and variable coefficients were calibrated by the iteration looped from (1) to (4) in the model optimization process.

After parameters were estimated, the model accuracies in training and testing sets can be obtained through the following equations[27,28]:

The accuracy was determined by observing the higher Sensitivity(SENS), Specificity(SPEC), precision, accuracy, and AUC in both models. The definitions are listed below:

True positive (TP)=the number of predicted DF to the true DF, (1)

True negative (TN)= the number of predicted Non-DF to the true Non-DF, (2)

False-positive (FP)= the number of Non-DF minuses TN, (3)

False-negative (FN)= the number of DF minuses TP, (4)

SENS=Sensitivity = true positive rate (TPR)=TP÷(TP+FN), (5)

SPEC=Specificity= true negative rate (TNR)=TN÷(TN+FP), (6)

Precision= positive predictive value (PPV)=TP÷(TP+FP), (7)

ACC= accuracy=(TP+TN) ÷ N, (8)

N=TP+TN+FP+FN, (9)

AUC=(1- Specificity) ×Sensitivity÷2+(Sensitivity+1)×Specificity÷2, (10)

SE for AUC==√(AUC×(1-AUC)÷N), (11)

95%CI=AUC ± 1.96×SE for AUC, (12)

2.3 Three Tasks Required to Achieve

Three tasks would be achieved:

2.3.1 Extracting Feature variables(Task 1):

From the 19 observed DF variables mentioned in section 2.1, we performed LR to extract feature variables against the DF by the criterion of Type I error <0.05 shown on a forest plot[40-42].

Feature variables were extracted from 19 items mentioned in section 2.1 via the following steps: (i) standardize each variable to the mean (0) and standard deviation (i.e., SD = 1), and (ii) compare the standardized mean difference (SMD) on a forest plot [40-42].

The Chi-square test was conducted to assess the heterogeneity between variables. The forest plots (confidence interval (CI) plot) were drawn to display the effect estimates and their CIs for each study.

2.3.2 Comparing the Combined scheme With Individual Algorithms

2.3.2.1 Comparison Between Algorithms

Two scenarios of non-normalized(i.e., raw data with present and absent responses on DF) and normalized(i.e., mean=0, SD=1) data were applied to compare model accuracy and stability among algorithms, including CNN, ANN, KNN, LR, and others yielded from WEKA software (University of Waikato, Wellington, New Zealand) [43], such as Support Vector Machines(SVM)[44], LIBSVM [45], BauesNET, Naïve Bayes[45], Random Forest Classification[47], REPTtree [48], Logistic regression[48], artificial neural network(ANN)[49], and CNN[24-26]; see Appendix 1. The criteria of AUC≥0.8 in training set and AUC≥0.7 in testing set were taken into account for determining an acceptable model accuracy and stability in prediction of DF.

2.3.2.2 Comparison Within the Combined Scheme

The model accuracies and stabilities within the combined scheme(i.e., CNN/ANN/KNN/LR) were compared based on several scenarios(e.g., including KNN and excluding KNN, etc.) using the mode to determine the classification of DF and Non-DF.

Due to the hypothesis that a combined scheme of algorithms can improve the prediction accuracy of DF in children, the combined effects of accuracy and stability based on AUC were examined. That is, the accuracy and stability in the combined scheme greater than other individual algorithms are required for verifications.

2.3.3 Developing an APP for patients, family members, and clinicians.

An app for the detection of DF in children was designed and developed. Model parameters

were embedded in the computer module. The results of the classification (i.e. DF+ and DF-)

instantly appear on smartphones. The visual representation with binary (i.e. DF+ and DF-) categories is shown on a dashboard displayed on Google Maps.

2.4. Statistical Tools and Data Analysis

IBM SPSS Statistics 22.0 for Windows (SPSS Inc., Chicago, US) and MedCalc 9.5.0.0 for Windows (MedCalc Software, Ostend, Belgium) were used to obtain the descriptive statistics and frequency distributions among groups and to compute the model prediction indicators expressed in Equations (1) to (12). The significance level of Type I errors was set at 0.05. The four proposed models of CNN, ANN, KNN, and LR were performed on MS Excel and deposited in Appendix 1. The study flowchart is present in Figure 3. The abstract video is provided in Appendices 2 to 5.

3.1 Demographic data of the 177 cases

Sixty-nine pediatric patients (40 [58.0%] males; median age: 10 years; age range: 0–16 years) diagnosed with DF were included in this study (Table 1). One hundred-eight pediatric patients (61 [56.5%] male; median age: 5 years; age range: 0–16 years) with no evidence of DF infection in their medical records were used as the non-DF (reference) group. A chi-squared test at the α level of 0.05 showed that the groups are similar in terms of gender but dissimilar in terms of age.

Table 1

Demographic characteristics of the patients suspected of dengue virus infection
Demongraphical		DF(-)		DF(+)		Total
Variables		n	%	n	%	n	%	P-value
Gender	Female	47	43.5	29	42	76	42.9	0.845
	Male	61	56.5	40	58	101	57.1
Age	0–4	48	44.4	11	16.2	59	33.5	0.005
	5–9	24	22.2	20	29.4	44	25
	9–16	36	33.3	37	54.4	73	41.5
Nore. P values were determined using the Chi-square test.
DF indicates patients with Dengue Duo IgM Rapid Strips test(+); Non-DF, patients with Dengue Duo IgM Rapid Strips test(–)

3.1. Task 1: Feature Variables Extracted from the Data

Of the original 19 items, 11 feature variables with significant differences between the two groups of DF and Non-DF(p < 0.05) were extracted using the forest plot. Figure 4 [50] shows the SMD methods used in the meta-analysis. We can see that the eight variables were excluded from the study, including (3) mosquito bites within the two weeks, (9)headache, (10)myalgia, (13)soft(watery)stool, (14)cough, (15)sore throat, (16)anorexia, and (18) bone pain)arthralgia). The Q-index is 63(p < 0.05), indicating significant differences found among variables.

3.2. Task 2: Comparison of Model Accuracy Among Algorithms

3.2.1 Comparison Between Algorithms

The criteria of AUC ≥ 0.80 in the training set and AUC ≥ 0.70 in the testing set were applied to determine the acceptable model accuracy and stability in the prediction of DF. We can see that only six algorithms are highlighted in the two scenarios of non-normalized and normalized data with good accuracy and stability in Table 2(see the last column with the symbol \(\surd\)), including LR(in MS Excel module; see Appendix 1), NaiveBayes, Logistic regression in raw (non-normalized) data(shown in panel A) and LR(in MS Excel module; see Appendix 1), NaiveBayes, PERTtree in normalized data(shown in panel B).

Table 2

Comparison of Model Accuracy Among Algorithms
Scenarios		Accuracy ≥ 0.80					Stability ≥ 0.70
Study model	n1/n2	SENC	SPEC	PREC	ACC	ROC	SENC	SPEC	PREC	ACC	ROC
A. Raw data
ANN	142/35	0.9	0.87	0.83	0.88	0.88	0.6	0.76	0.50	0.71	0.68
CNN	142/35	0.86	0.81	0.76	0.83	0.84	0.6	0.68	0.43	0.66	0.64
KNN8	142/35	0.71	0.80	0.71	0.76	0.75	0.2	0.08	0.08	0.11	0.14
LR		0.85	0.88	0.83	0.87	0.86	0.60	0.80	0.55	0.74	0.70	\(\surd\)
BayesNet	142/35	0.70	0.80	0.71	0.75	0.75	0.80	0.64	0.47	0.69	0.72
NaiveBayes	142/35	0.64	0.87	0.78	0.77	0.89	0.50	0.92	0.71	0.80	0.77	\(\surd\)
Decision tree(j48)	142/35	0.76	0.95	0.92	0.87	0.91	0.40	0.76	0.40	0.66	0.62
PERTtree	142/35	0.80	0.80	0.73	0.80	0.82	0.80	0.64	0.47	0.69	0.71
SMO	142/35	0.85	0.81	0.76	0.82	0.83	0.60	0.76	0.50	0.71	0.68
Logistic regression	142/35	0.81	0.87	0.81	0.85	0.92	0.60	0.84	0.60	0.71	0.77	\(\surd\)
LibSVM	142/35	0.71	0.90	0.84	0.82	0.81	0.50	0.84	0.56	0.74	0.67
B. Normalized data
ANN	142/35	0.9	0.92	0.88	0.91	0.91	0.50	0.76	0.45	0.69	0.63
CNN	142/35	0.88	0.93	0.90	0.91	0.90	0.50	0.80	0.50	0.71	0.65
KNN4	142/35	0.80	0.70	0.65	0.74	0.75	0.80	0.56	0.421	0.629	0.68
LR	142/35	0.85	0.88	0.83	0.87	0.86	0.60	0.80	0.55	0.74	0.70	\(\surd\)
BayesNet	142/35	0.70	0.80	0.70	0.75	0.75	0.80	0.64	0.47	0.69	0.43
NaiveBayes	142/35	0.64	0.87	0.78	0.77	0.89	0.50	0.92	0.71	0.80	0.70	\(\surd\)
Decision tree(j48)	142/35	0.76	0.95	0.92	0.87	0.91	0.40	0.76	0.40	0.66	0.62
PERTtree	142/35	0.80	0.80	0.73	0.8	0.82	0.80	0.64	0.47	0.69	0.70	\(\surd\)
SMO	142/35	0.85	0.80	0.76	0.82	0.83	0.60	0.76	0.50	0.71	0.41
Logistic regression	142/35	0.81	0.87	0.84	0.85	0.92	0.60	0.84	0.60	0.77	0.67
LibSVM	142/35	0.81	0.90	0.86	0.87	0.86	0. 60	0.84	0.60	0.77	0.47
C. Combined Mode
1. All four models	142/35	0.90	0.87	0.83	0.88	0.88	0.60	0.76	0.50	0.71	0.68	\(\surd\)
2. All four but KNN8	142/35	0.88	0.87	0.83	0.87	0.87	0.60	0.76	0.50	0.71	0.68
3.All four but KNN8/KNN4 142/35		0.90	0.87	0.83	0.88	0.88	0.60	0.76	0.50	0.71	0.68	\(\surd\)

3.2.2 Comparison Within the Combined Scheme

The model accuracies and stabilities within the combined scheme(i.e., CNN/ANN/KNN/LR) were compared using the mode function in MS Excel to determine the classification of DF and Non-DF. We can see that (i) k-nearest neighbors algorithm has poorer AUC(< 0.50), (ii)LR has relatively higher AUC(= 0.70), and (ii) the three alternatives have almost equal stability (AUC = 0.68) shown in panel C of Table 2, but smaller than the individual algorithms of NaiveBayes, Logistic regression in raw data and NaiveBayes in normalized data.

The hypothesis that a combined scheme of algorithms can improve the prediction accuracy of DF in children is rejected in this study accordingly.

3.3. Task 3: Developing an APP for patients, family members, and clinicians.

A snapshot obtained from a mobile phone used to respond to questions is shown in Fig. 5, top, and the assessment results are shown at the bottom.

In this example, we can see that the patient has no tendency toward DF+; the odds equal 0.01(= 0.01/0.99) shown at the bottom in Fig. 5.

In the DF season, patients suspected of having DF could click on the link[51] to obtain their DF possibilities and examine whether these 11 symptoms using the LR algorithm are useful for predicting their DF risk. Readers are encouraged to view demonstrations of the APP[51[ in action in Appendix 2 using an MP4 video player].

3.4. Online Dashboards Shown on Google Maps

Five QR-codes shown in Figures(or links[40, 41]) are provided for readers who can manipulate the dashboards on their own.

4.1 Principal Findings

We observed that (i) k-nearest neighbors algorithm has poorer AUC(<0.50), (ii)LR has relatively higher AUC(=0.70), and (ii) the three alternatives have almost equal stability (AUC=0.68) shown in panel C of Table 2, but smaller than the individual algorithms of NaiveBayes, Logistic regression in raw data and NaiveBayes in normalized data. An LR-based APP was designed to detect DF in children.

4.2 What This Knowledge Adds to What We Already Know

A diagnosis of DF is usually confirmed by three steps: (1) observing DF-related symptoms, (2) laboratory testing, such as by white blood cell and platelet counting, and (3) applying serological tools to verify DF using dengue immunoglobulin M and G antibodies, polymerase chain reaction, and virus isolation tests [4,12]. The latter two tests are relatively expensive.

Results from the offline and online experiments on the utility of health utilization predictions suggest that such prediction can have utility for health care providers [52] that is similar to the current study, useful and applicable in the healthcare settings in the future.

A self-assessment APP [51] that allows patients to click on the link or scan quick response codes on any pamphlet, respond to related questions, and obtain their DF risk on their smartphone (Figure 5) was developed to (1) help patients assess their symptoms at an earlier stage and (2) prompt medical doctors to test patients for confirmation when their DF result is labeled DF+ during the online assessment.

We performed LR in MS Excel, which is innovative and friendly used in practice (mentioned In Figure 2 and Appendix 1). The LR model parameters are involved in the APP[51] that helps patients, family members, and clinicians discriminate DF from other febrile illnesses at an early stage.

4.3 The Strength and features in this study

We introduced the Solver add-in in MS Excel to estimate model parameters after the LN model is built in MS Excel. Readers are invited to view details of the Excel module we provided in Appendix 1 and see the MP4 video in Appendices 2 and 5. This module has not been previously reported in the literature; see Appendices 2 to 5.

Another unique feature of this study is its inclusion of the combined scheme of algorithms that cannot increase the accuracy and stability in the prediction of DF when compared to other counterparts of individual algorithms. The process of designing an APP to assess DF assessment is shown in 2 to 5, and Figure 3 can help readers better understand the process of APP design and development in this study.

Performing LR in professional statistical software is common and usual. None provides LR in MS Excel to proceed with the two major parts in machine learning: (i) selecting feature variables and (ii)using training set to predict testing set. Readers are invited to download the MS Excel module developed in this study and manipulate the data on their own to perform the classification and prediction under the LR module.

Furthermore, numerous published articles[53-55] merely compared the difference in predictive accuracy with AUC among ML methods. However, none demonstrated a real app that can be a useful, feasible, efficient, and effective device applied to clinical settings, as we did providing a prototype with an ML method in a study for readers to manipulate it on their own on dashboards as we did in Figures 4 and 5. . .

4.4 Limitations and Directions for Future Study

This study presents some limitations that may encourage future research efforts. First, the APP used in this study merely demonstrates the model parameters estimated in cases with 11 symptoms. The incorporation of more eligible feature variables could lead to higher accuracy for predicting DF risk.

Second, performing LR in professional statistical software is common and usual. The LR performed in MS Excel is subject to a small number of training cases (e.g., less than 50000) because of the limited RAM of most personal computers, which could impede the efficiency of CNN calculations with larger amounts of data).

Third, the study sample size (n = 177) is too small to render inferences reliable and supportable. The data of more DF patients are required to enable the application of the proposed LR module(or others in Appendix 1) to the clinical setting.

Fourth, the APP developed in this study is only suitable for patients aged less than 16 years old. An adult version of the DF prediction APP should be developed in the future.

Fifth, we examined that LR has higher accurate prediction effects, significantly higher than the other three proposed models(e.g., CNN, ANN, and KNN). Future studies are encouraged to compare them in other types of diseases in clinical assessment.

Finally, somewhat different results were found in LR and Logistic regression in Append 1 and WEKA[43], respectively, in Table 2. Readers are encouraged to compare them with their own data and examine the difference that also existed in the two approaches.

The 11-item LR model yielded higher accuracy (0.87) and Stability (0.70) than the other three models(i.e., CNN, ANN, and KNN). An LR-based APP was designed to detect DF in children. The proposed LR predictive model has been successfully developed with an APP to help patients, family members, and clinicians identify DF at an early stage. The APP could help assess DF risk and may eliminate the need for a costly and time-consuming dengue confirmation test.

ACC: Accuracy

AUC: area under the receiver operating characteristic curve

CNN: convolution neural network

DF: dengue fever

FDR: false discovery rate

FNR: false-negative rat

FOR: false omission rate

FPR: false positive rate

LR: logistic regression

ML: machine learning

MLPL: multilayer perceptron

MRSA: matching personal response scheme

NPV: negative predictive value

PPV: positive predictive value

ROC: the receiver operating characteristic curve

Sens: sensitivity

Spec: specificity

SNA: social network analysis

SVM: Support Vector Machines

Ethics approval and consent to participate

Not applicable for studies not involving humans. All data used in this study were downloaded from a previous article [4]

Consent to publish

Not applicable.

Availability of data and materials

All data used in this study are available in Appendices.

Competing interests

The authors declare that they have no competing interests.

Funding

There are no sources of funding to be declared.

Authors’ contributions

TWC conceived and designed the study. JC and WC performed the statistical analyses and was in charge of recruiting study participants. JC and WC contributed the idea. WC helped design the study, collected information, and JC interpreted the data. TWC monitored the research. All authors read and approved the final article.

Acknowledgments

We thank AJE (American Journal Experts at https://www.aje.com/) for the English language review of this manuscript. All authors declare no conflicts of interest.

Appendixes

Appendix 1 Dataset, MP4 and Excel module at https://osf.io/d2svm/?view_only=567f0bba08fd425085c82c3620600b01

Appendix 2 MP4: How to conduct LR in MS Excel at

https://youtu.be/eqMYYPKpZdM

Appendix 3 MP4: How to perform CNN in Excel and use the APP developed in this study at https://youtu.be/RagA9An_lvc

Appendix 4 Comparison of Machine Learning methods MP4 video at https://youtu.be/WdkE70dTFrY

Appendix 5 MP4: How to design DF app in this study at https://youtu.be/nYnj30fbB5I

Chien TW, Chow JC, Chou W. An App Detecting Dengue Fever in Children: Using Sequencing Symptom Patterns for a Web-Based Assessment. JMIR Mhealth Uhealth. 2019 May 31;7(5):e11461. doi: 10.2196/11461. PMID: 31152525; PMCID: PMC6658251.
Lwin MO, Vijaykumar S, Rathnayake VS, Lim G, Panchapakesan C, Foo S, Wijayamuni R, Wimalaratne P, Fernando ON. A Social Media mHealth Solution to Address the Needs of Dengue Prevention and Management in Sri Lanka. J Med Internet Res. 2016 Jul 1;18(7):e149. doi: 10.2196/jmir.4657. PMID:
yamsuddin M, Fakhruddin M, Sahetapy-Engel JTM, Soewono E. Causality Analysis of Google Trends and Dengue Incidence in Bandung, Indonesia With Linkage of Digital Data Modeling: Longitudinal Observational Study. J Med Internet Res. 2020 Jul 24;22(7):e17633. doi: 10.2196/17633. PMID: 32706682; PMCID: PMC7414412.
Lai WP, Chien TW,Lin HJ, Su SB, Chang CH. A screening tool for dengue fever in children.Pediatr Infect Dis J 2013; 32(4):320-4.
Lim JK, Carabali M, Camacho E, Velez DC, Trujillo A, Egurrola J, Lee KS, Velez ID, Osorio JE. Epidemiology and genetic diversity of circulating dengue viruses in Medellin, Colombia: a fever surveillance study. BMC Infect Dis. 2020 Jul 2;20(1):466. doi: 10.1186/s12879-020-05172-7. PMID: 32615988; PMCID: PMC7331258.
Heath CJ, Grossi-Soyster EN, Ndenga BA, Mutuku FM, Sahoo MK, Ngugi HN, Mbakaya JO, Siema P, Kitron U, Zahiri N, Hortion J, Waggoner JJ, King CH, Pinsky BA, LaBeaud AD. Evidence of transovarial transmission of Chikungunya and Dengue viruses in field-caught mosquitoes in Kenya. PLoS Negl Trop Dis. 2020 Jun 19;14(6):e0008362. doi: 10.1371/journal.pntd.0008362. PMID: 32559197; PMCID: PMC7329127.
Khan E, Prakoso D, Imtiaz K, Malik F, Farooqi JQ, Long MT, Barr KL. The Clinical Features of Co-circulating Dengue Viruses and the Absence of Dengue Hemorrhagic Fever in Pakistan. Front Public Health. 2020 Jun 17;8:287. doi: 10.3389/fpubh.2020.00287. PMID: 32626679; PMCID: PMC7311566.
World Health Organization. Dengue and severe dengue. 2019/6/23 available at . https://www.who.int/news-room/fact-sheets/detail/dengue-and-severe-dengue
Sylla M, Bosio C, Urdaneta-Marquez L, Ndiaye M, Black WC 4th.Gene flow, subspecies composition, and dengue virus-2 susceptibility among Aedes aegypti collections in Senegal.PLoS Negl Trop Dis. 2009;3(4):e408.
Rico-Hesse R, Harrison LM, Nisalak A, Vaughn DW, Kalayanarooj S, Green S, Rothman AL, Ennis FA.Molecular evolution of dengue type 2 virus in Thailand.Am J Trop Med Hyg. 1998 Jan;58(1):96-101.
Bhatt S, Gething PW, Brady OJ, Messina JP, Farlow AW, Moyes CL et.al. The global distribution and burden of dengue. Nature 2013;496:504-507.
Brady OJ, Gething PW, Bhatt S, Messina JP, Brownstein JS, Hoen AG et al. Refining the global spatial limits of dengue virus transmission by evidence-based consensus. PLoS Negl Trop Dis. 2012;6:e1760. doi:10.1371/journal.pntd.0001760.
Furuya H.Estimating Vector-borne Viral Infections in the Urban Setting of the 2020 Tokyo Olympics, Japan, Using Mathematical Modeling.Tokai J Exp Clin Med. 2017 Dec 20;42(4):160-164.
Yanagisawa N, Wada K, Spengler JD, Sanchez-Pina R. Health preparedness plan for dengue detection during the 2020 summer Olympic and Paralympic games in Tokyo.PLoS Negl Trop Dis. 2018 Sep 20;12(9):e0006755.
Herbuela VRDM, Karita T, Francisco ME, Watanabe K. An Integrated mHealth App for Dengue Reporting and Mapping, Health Communication, and Behavior Modification: Development and Assessment of Mozzify. JMIR Form Res. 2020 Jan 8;4(1):e16424. doi: 10.2196/16424. PMID: 31913128; PMCID: PMC6996774.
Macedo Hair G, Fonseca Nobre F, Brasil P. Characterization of clinical patterns of dengue patients using an unsupervised ma-chine learning approach.BMC Infect Dis. 2019 Jul 22;19(1):649.
Aguas R, Dorigatti I, Coudeville L, Luxemburger C, Ferguson NM.Cross-serotype interactions and disease outcome prediction of dengue infections in Vietnam.Sci Rep. 2019 Jun 28;9(1):9395.
Davi CCM, Pastor A, Oliveira T, Lima Neto FB, Braga-Neto U, Bigham A, Bamshad M, Marques ETA, Acioli-Santos B.Severe Dengue Prognosis Using Human Genome Data and Machine Learning.IEEE Trans Biomed Eng. 2019 Feb 4.
Potts J, Rothman A. Clinical and laboratory features that distinguish dengue from other febrile illnesses in endemic populations. Trop Med Int Health. 2008 Nov;13(11):1328–40. doi: 10.1111/j.1365-3156.2008.02151.x.
Lai W, Chien T, Lin H, Kan W, Su S, Chou M. An approach for early and appropriate prediction of dengue fever using white blood cells and platelets. HealthMED. 2012;6(3):806–12.
Kittigul L, Suankeow K. Use of a rapid immunochromatographic test for early diagnosis of dengue virus infection. Eur J Clin Microbiol Infect Dis. 2002 Mar;21(3):224–6.
Vaughn D, Nisalak A, Kalayanarooj S, Solomon T, Dung NM, Cuzzubbo A, Devine PL. Evaluation of a rapid immunochroma-tographic test for diagnosis of dengue virus infection. J Clin Microbiol. 1998 Jan;36(1):234–8.
Tuan NM1, Nhan HT2, Chau NV3, Hung NT1, Tuan HM4, Tram TV5, Ha Nle D6, Loi P, Quang HK, Kien DT, Hubbard S, Chau TN, Wills B, Wolbers M, Simmons CP. Sensitivity and specificity of a novel classifier for the early diagnosis of dengue.PLoS Negl Trop Dis. 2015 Apr 2;9(4):e0003638.
Ma SC, Chou W, Chien TW, et al. An App for Detecting Bullying of Nurses Using Convolutional Neural Networks and Web-Based Computerized Adaptive Testing: Development and Usability Study. JMIR Mhealth Uhealth. 2020;8(5):e16747. Pub-lished 2020 May 20. doi:10.2196/16747
Lee YL, Chou W, Chien TW, Chou PH, Yeh YT, Lee HF. An App Developed for Detecting Nurse Burnouts Using the Convo-lu-tional Neural Networks in Microsoft Excel: Population-Based Questionnaire Study. JMIR Med Inform. 2020 May 7;8(5):e16528. doi: 10.2196/16528. PMID: 32379050; PMCID: PMC7243132.
Yan YH, Chien TW, Yeh YT, Chou W, Hsing SC. An App for Classifying Personal Mental Illness at Workplace Using Fit Statistics and Convolutional Neural Networks: Survey-Based Quantitative Study. JMIR Mhealth Uhealth. 2020;8(7):e17857. Published 2020 Jul 31. doi:10.2196/17857
Rere LM, Fanany MI, Arymurthy AM. Metaheuristic Algorithms for Convolution Neural Network. Comput Intell Neurosci. 2016;2016:1537325.
Chou PH, Chien TW, Yang TY, Yeh YT, Chou W, Yeh CH. Predicting Active NBA Players Most Likely to Be Inducted into the Basketball Hall of Famers Using Artificial Neural Networks in Microsoft Excel: Development and Usability Study. Int J Environ Res Public Health. 2021 Apr 16;18(8):4256. doi: 10.3390/ijerph18084256. PMID: 33923846; PMCID: PMC8072800.
Tey, S.-F.; Liu, C.-F.; Chien, T.-W.; Hsu, C.-W.; Chan, K.-C.; Chen, C.-J.; Cheng, T.-J.; Wu, W.-S. Predicting the 14-Day Hospital Readmission of Patients with Pneumonia Using Artificial Neural Networks (ANN). Int. J. Environ. Res. Public Health 2021, 18, in print.
Viana Dos Santos Santana Í, C M da Silveira A, Sobrinho Á, Chaves E Silva L, Dias da Silva L, Freire de Souza Santos D, Candeia E, Perkusich A. Machine Learning Classification Models for COVID-19 Test Prioritization in Brazil. J Med Internet Res. 2021 Mar 21. doi: 10.2196/27293. PMID: 33750734.
Golpour P, Ghayour-Mobarhan M, Saki A, Esmaily H, Taghipour A, Tajfard M, Ghazizadeh H, Moohebati M, Ferns GA. Com-parison of Support Vector Machine, Naïve Bayes and Logistic Regression for Assessing the Necessity for Coronary Angiography. Int J Environ Res Public Health. 2020 Sep 4;17(18):6449. doi: 10.3390/ijerph17186449. PMID: 32899733; PMCID: PMC7558963.
Gholizadeh P, Esmaeili B. Developing a Multi-variate Logistic Regression Model to Analyze Accident Scenarios: Case of Electrical Contractors. Int J Environ Res Public Health. 2020 Jul 6;17(13):4852. doi: 10.3390/ijerph17134852. PMID: 32640549; PMCID: PMC7369826.
Nhu VH, Shirzadi A, Shahabi H, Singh SK, Al-Ansari N, Clague JJ, Jaafari A, Chen W, Miraki S, Dou J, Luu C, Górski K, Thai Pham B, Nguyen HD, Ahmad BB. Shallow Landslide Susceptibility Mapping: A Comparison between Logistic Model Tree, Lo-gistic Regression, Naïve Bayes Tree, Artificial Neural Network, and Support Vector Machine Algorithms. Int J Environ Res Public Health. 2020 Apr 16;17(8):2749. doi: 10.3390/ijerph17082749. PMID: 32316191; PMCID: PMC7215797.
Choi Y, Boo Y. Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality. Int J Environ Res Public Health. 2020 Jan 31;17(3):897. doi: 10.3390/ijerph17030897. PMID: 32023993; PMCID: PMC7037603.
Wu L, Deng F, Xie Z, Hu S, Shen S, Shi J, Liu D. Spatial Analysis of Severe Fever with Thrombocytopenia Syndrome Virus in China Using a Geographically Weighted Logistic Regression Model. Int J Environ Res Public Health. 2016 Nov 11;13(11):1125. doi: 10.3390/ijerph13111125. PMID: 27845737; PMCID: PMC5129335.
Mitchell TM. Machine learning. WCB/McGraw-Hill; Boston, MA: 1997.
Zheng B, Wang X, Lederman D, Tan J, Gur D. Computer-aided detection; the effect of training databases on detection of subtle breast masses. Acad Radiol. 2010 Nov;17(11):1401-8. doi: 10.1016/j.acra.2010.06.009. Epub 2010 Jul 22. PMID: 20650667; PMCID: PMC2952663.
Mensink T, Verbeek J, Perronnin F, Csurka G. Distance-based image classification: generalizing to new classes at near-zero cost. IEEE Trans Pattern Anal Mach Intell. 2013 Nov;35(11):2624-37. doi: 10.1109/TPAMI.2013.83. PMID: 24051724.
Gou J, Qiu W, Yi Z, Xu y, Mao Q. Zhan Y.A Local Mean Representation-based K-Nearest Neighbor Classifier.ACM Transactions on Intelligent Systems and TechnologyApril 2019;10(3),29:1-25
Hamling, J.; Lee, P.; Weitkunat, R.; Ambühl, M. Facilitating meta-analyses by deriving relative effect and precision estimates for alternative comparisons from a set of estimates presented by exposure level or disease category. Stat Med. 2008, 27, 954–970.
Chen, C.J.; Wang, L.C.; Kuo, H.T.; Fang, Y.C.; Lee, H.F. Significant effects of late evening snack on liver functions in patients with liver cirrhosis: A meta-analysis of randomized controlled trials. J. Gastroenterol Hepatol. 2019, 34, 1143–1152.
Lalkhen, A.G. Statistics V: Introduction to clinical trials and systematic reviews. Contin. Educ. Anaesth. Crit. Care Pain 2008, 8, 143–146.
Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Ian, H. The WEKA Data Mining Software: An Update. SIGKDD Explor. 2009, 11. Available online: https://www.kdd.org/exploration_files/p2V11n1.pdf (accessed on 20 November 2020).
Al-Yousef A, Samarasinghe S. A Novel Computational Approach for Biomarker Detection for Gene Expression-Based Comput-er-Aided Diagnostic Systems for Breast Cancer. Methods Mol Biol. 2021;2190:195-208. doi:10.1007/978-1-0716-0826-5_9
Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2011;2(3):1–27. 10.1145/1961189.1961199
Kalidoss R, Umapathy S, Kothalam R, Sakthivelu U. Adsorption kinetics feature extraction from breathprint obtained by gra-phene based sensors for diabetes diagnosis. J Breath Res. 2020 Oct 13. doi: 10.1088/1752-7163/abc09b. Epub ahead of print. PMID: 33049727.
Neto MP, Paulovich FV. Explainable Matrix - Visualization for Global and Local Interpretability of Random Forest Classification Ensembles. IEEE Trans Vis Comput Graph. 2020 Oct 13;PP. doi: 10.1109/TVCG.2020.3030354. Epub ahead of print. PMID: 33048689.
Saha S, Saha M, Mukherjee K, Arabameri A, Ngo PTT, Paul GC. Predicting the deforestation probability using the binary logistic regression, random forest, ensemble rotational forest, REPTree: A case study at the Gumani River Basin, India. Sci Total Environ. 2020 Aug 15;730:139197. doi: 10.1016/j.scitotenv.2020.139197. Epub 2020 May 4. PMID: 32402979.
Tarekegn A, Ricceri F, Costa G, Ferracin E, Giacobini M. Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches. JMIR Med Inform. 2020;8(6):e16678. Published 2020 Jun 4. doi:10.2196/16678
Chien TW. Figure 4 in this study. 2021/4/20 retrieved at http://www.healthup.org.tw/gps/DFf0rest177.htm
Chien TW. Dengue fever online assessment for children. 2019/9/20 retrieved at http://www.healthup.org.tw/irs/irsin_e.asp?type1=86
Agarwal V, Zhang L, Zhu J, et al. Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis. J Med Internet Res. 2016;18(9):e251. Published 2016 Sep 21. doi:10.2196/jmir.6240
Zhang PI, Hsu CC, Kao Y, Chen CJ, Kuo YW, Hsu SL, Liu TL, Lin HJ, Wang JJ, Liu CF, Huang CC. Real-time AI prediction for major adverse cardiac events in emergency department patients with chest pain. Scand J Trauma Resusc Emerg Med. 2020 Sep 11;28(1):93. doi: 10.1186/s13049-020-00786-x. PMID: 32917261; PMCID: PMC7488862.
Dao DV, Ly HB, Trinh SH, Le TT, Pham BT. Artificial Intelligence Approaches for Prediction of Compressive Strength of Geo-polymer Concrete. Materials (Basel). 2019 Mar 25;12(6):983. doi: 10.3390/ma12060983. PMID: 30934566; PMCID: PMC6471228.
Alaka SA, Menon BK, Brobbey A, Williamson T, Goyal M, Demchuk AM, Hill MD, Sajobi TT. Functional Outcome Prediction in Ischemic Stroke: A Comparison of Machine Learning Algorithms and Regression Models. Front Neurol. 2020 Aug 25;11:889. doi: 10.3389/fneur.2020.00889. PMID: 32982920; PMCID: PMC7479334.

Download PDF

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

Detecting Dengue Fever in Children Using A Combined Scheme Compared with Individual Algorithms: An APP Development and Usability Study

Status:

Version 1

Abstract

Figures

Background

Methods

Results

3.1 Demographic data of the 177 cases

Discussion

Conclusions

Abbreviations

Declarations

Acknowledgments

References

Supplementary Files

Status:

Version 1