Identified Events
The TWST algorithm, developed to identify time periods of excess malaria case counts that were considered of potential public health interest, found a total of 255 events for P. falciparum and mixed species. The numbers of events declined from 2013 to 2018, however in 2019 the number of events greatly increased. Also during 2019, the average number of cases in events was the highest since 2012 [Table 2, all events shown over time in Additional File 1, Supplemental Fig. 1].
Table 2
Number of events and descriptive statistics about malaria cases during those events for P. falciparum and mixed malaria over the study period.
Year
|
Events
|
Average # of cases in events
|
Total # of cases during events
|
% of yearly cases during events
|
2012 (W28-W52)
|
34
|
6500
|
220983
|
72
|
2013
|
53
|
2749
|
145686
|
42
|
2014
|
52
|
1630
|
84761
|
39
|
2015
|
32
|
2635
|
84304
|
38
|
2016
|
34
|
1769
|
60162
|
30
|
2017
|
15
|
1824
|
27353
|
22
|
2018
|
5
|
2667
|
13334
|
13
|
2019
|
30
|
2950
|
88498
|
42
|
The TWST algorithm was specifically designed to account for seasonality, and not identify every seasonal peak as being an event, in the context of overall declining trends in malaria transmission. However, different woredas in the region exhibited various patterns in incidence, including decreasing trends, increases in the middle or end of the time period, clear single seasonal peaks, dual seasonal peaks, and various combinations of these patterns. The TWST algorithm was flexible enough to appropriate identify events in these patterns [Figure 3].
Figure 3. Selected woredas showing various patterns in seasonal and long-term trends in the incidence of malaria and the TWST events:(a) Mecha, (b) Baso Liben, (c) Jawi. Mecha and Baso Liben both had decreasing incidence and a resurgence in 2019, but Baso Liben had maintained seasonal peaks while Mecha did not. Seasonal patterns also vary between clear single or dual peaks to more jagged patterns such as in Jawi. Observed incidence is marked in light grey and the smoothed incidence in black. Week and year thresholds from the TWST algorithm are shown as dot-dashed lines in green and blue, respectively. Any identified events are marked with red circles at the appropriate weeks at the top of the graphs.
The algorithm was able to identify peaks that would have been overshadowed by peaks in much earlier years but are important relative to more recent patterns. For example, the woreda Abargelie had high peaks in 2013 and to a lesser extent in 2014. During 2015 however, the season was very quiet with no large peaks. In the fall of 2016, a moderate seasonal peak returned and larger fall peaks in 2017 and 2018, but if the thresholds had not more strongly considered the 2015 season (trend-weighting), the 2017 or 2018 peak would not been identified as an event [Figure 4]. The time-shift allowance in TWST was also needed to prevent notifications of events where the peak simply declined more slowly than in other years [Figure 4].
Figure 4. Example TWST results for the selected woredas: (a) Abergele, (b) Borena / Debresina, and (c) Artuma Fursi. In Abargelie, the fall 2016 and 2017 peaks would have been below an unadjusted week threshold (orange dotted line) but were identified using the final TWST thresholds (green dot-dashed line) that had been trend-weighted. Borena / Debresina and Artuma Fursi shows multiple instances where the non-peak expansion and time-shift allowances prevented inappropriately identified events on the edges of incidence peaks or in the seasonal troughs.
Event Detection
A total of 234 event detection algorithm and variations were tested on the 30 TWST-identified events in the 2019 evaluation period [selected entries in Table 3, for all results see Additional File 2, Supplemental Tables S1 and S2]. Of the 234, there were 12 CDC EARS variants, 12 WHO and statistical variants, 205 Farrington variants, and six random alarm generators [Figure 5]. The six random alarm generators were run with per week probabilities of 0.2, 0.1, 0.05, 0.025, 0.012, and 0.006 yielded 233, 151, 92, 61, 33, and 18 alarms respectively, a range similar to the number of alarms from the other event detection algorithms.
Figure 5. Scatter plot of the percent of events caught versus percent of true positive alarms results from all event detection algorithm and variants. Each category is marked in a different shape and color combination. The size of the marker shows the percent of a timely alarms for an event.
As expected, random alerts performed poorly and had the lowest percentages of true positive alarms across the variants (Table 3, Fig. 5). Variants with higher probabilities created more alarms, and saw higher event caught percent scores, as the more alarms are present the more likely they are to randomly overlap with an event.
The CDC EARS methods generated large numbers of alarms (98 to 152), with an associated high percentage of events caught (80–100%) and variable events caught timely (43–87%), but also had low to midrange percentages (25–40%) of true positive alarms (selected items in Table 3, full listing in Supplemental materials). Of the weekly statistical summaries, the Cullen mean plus two standard deviations variant produced the highest true positive rates (51 to 61%, depending on the number of years of historical data included), but the lowest event caught scores (57–80%) and lowest events caught timely scores (13–37%). The WHO 75th percentile with 5 years of data, a commonly used algorithm, produced 200 alarms with a 97% event caught rate (93% timely) but only a 26% true positive rate (Table 3). The 85th percentile variants produced somewhat fewer alarms with higher true positive rates, and with similar or slightly reduced event caught and timely percentages.
Examining the Farrington results (orange hollow circles in Fig. 5), there was a trade-off seen between events caught and true positives. The Farrington variants were based on a sensitivity analysis of five parameters: window half size (w), years of historical data included (b), number of periods for seasonality, long-term trend inclusion, and the exclusion period for spin up time. Not all parameters influenced the outcomes; window half size and the exclusion period did not greatly affect the results, although the 26-week exclusion period seemed slightly preferable. The parameters for number of historical years of data, number of periods for seasonality, and trend inclusion had the greatest impacts on the outcome metrics. Of the 200 variants with population offset, the event caught rate was highest when the trend was included and there were 4 to 12 periods for seasonality (Fig. 6). The event caught rate fell as more years of historical data were included, especially in variants that did not include a trend.
Figure 6. Plot of event caught percentages from the Farrington event detection variants. Scores were higher when a long-term trend was included (filled circles) than when no trend was included (hollow triangles). Event caught rate fell as more years of historical data were included (x-axis), especially in variants that did not include a trend. Within each trend set, scores were higher with 4 to 12 periods for seasonality (blue to green colors), and lowest with one period, i.e. no seasonality (dark purple). The number of alarms generated is indicated by the size of the marker and decreases as more years of historical data are included.
Of the 200 variants that included population offsets, the true positive percentages were highest when no trend was included, two to four periods for seasonality were included, and increases as more years of historical data are included (Fig. 7). The number of alarms generated decreased with additional years of historical data (size of the marker in Figs. 6, 7).
Figure 7. Plot of true positive alarm percentages from the Farrington event detection variants. Scores were higher when the long-term trend was not included (hollow triangles) as compared to variants where trend was included (filled circles). More historical data (x-axis) increased the alarm true positive score and decreased the total number of alarms generated (size of marker). Scores were highest with two to four periods for seasonality (blues), and lowest with no seasonality (one period, dark purple). The number of alarms generated is indicated by the size of the marker and decreases as more years of historical data are included.
The Farrington original and improved methods with default values (and with seasonality) and no population offset were compared against the 200 parameter sensitivity runs using the Improved method and population offsets (original A1 and A2, base improved B1 and B2 in Table 3 and Table 4). As seen in Fig. 6 and Fig. 7, there were large trade-offs in the 200 variant set between events caught and true positive rates. Some Farrington runs reached 100% events caught, but the highest true positive rate of that set was only 26% (Farrington C1 in Table 3). Other variants reached 100% in alarm true positive rate, but the highest event caught score in that set was 40% (Farrington C2 in Table 3). Taking a balanced approach, a variant with reasonable trade-offs had a score of 73% events caught and 74% alarm true positive but only 37% evets caught timely (Farrington C3 in Table 3 and Table 4). Another, and our selected balanced variant had 83% events caught and 53% events caught timely, though 51% alarms true positive (Farrington C4 in Table 3 and Table 4).
Table 3
Results for selected event detection algorithms for P. falciparum and mixed malaria events in the 2019 evaluation time period. The percent of events caught, percent of event caught timely, percent of true positive alarms, and the total number of alarms generated are reported. Farrington parameter details can be found in Table 4.
Algorithm
|
% Events caught
|
% Timely
|
% True positive alarms
|
Total number of alarms
|
Random (p = 0.2)
|
90
|
70
|
17
|
233
|
Random (p = 0.05)
|
57
|
27
|
22
|
92
|
Random (p = 0.012)
|
13
|
3
|
4
|
33
|
EARS C1 (7 weeks)
|
90
|
53
|
25
|
146
|
EARS C2 (7 weeks)
|
100
|
73
|
27
|
152
|
EARS C3 (7 weeks)
|
100
|
87
|
30
|
135
|
WHO mean + 2sd (5 years)
|
80
|
11
|
51
|
72
|
WHO 75th percentile (5 years)
|
97
|
28
|
26
|
200
|
WHO 85th percentile (5 years)
|
97
|
73
|
35
|
143
|
Farrington A1
|
97
|
80
|
19
|
203
|
Farrington A2
|
100
|
87
|
25
|
163
|
Farrington B1
|
90
|
63
|
24
|
146
|
Farrington B2
|
93
|
63
|
29
|
116
|
Farrington C1
|
100
|
77
|
26
|
156
|
Farrington C2
|
40
|
7
|
100
|
16
|
Farrington C3
|
73
|
37
|
74
|
39
|
Farrington C4
|
83
|
53
|
51
|
63
|
Table 4
Details of selected Farrington variants, including parameter settings.
Farrington Label
|
Description
|
Selected Parameters
|
Farrington A1
|
Original Farrington method, base settings, no seasonality
|
5 years, 3 window half size (w), 1 period, trend conditionally included (0.05 threshold), window half size (w) weeks excluded, no population offset
|
Farrington A2
|
Original Farrington method, base settings, four periods for seasonality
|
5 years, 3 w, 4 period, trend conditionally included (0.05 threshold), w-weeks excluded, no population offset
|
Farrington B1
|
Improved Farrington method, base settings, no seasonality
|
5 years, 3 w, 1 period, trend included, 26 weeks excluded, no population offset
|
Farrington B2
|
Improved Farrington method, base settings, four periods for seasonality
|
5 years, 3 w, 4 period, trend included, 26 weeks excluded, no population offset
|
Farrington C1
|
Improved with population offset; highest true positive with highest caught
|
4 years, 5 w, 4 periods, trend included, 26 weeks excluded
|
Farrington C2
|
Improved with population offset; highest caught with highest true positive
|
Maximum years, 3 w, 4 periods, no trend, w-weeks excluded
|
Farrington C3
|
Improved with population offset; balanced trade-off option
|
5 years, 3 w, 8 periods, no trend, w-weeks excluded
|
Farrington C4
|
Improved with population offset; selected balanced trade-off
|
3 years, 5 w, 4 periods, no trend, w-weeks excluded
|