Ebola virus disease prediction using association rules
Abstract
Epidemic diseases pose a very big threat to Africa and the world at large, this has been as
a result of abrupt or unforeseen disease outbreaks. However, a number of solutions have
been proposed although most of them die as they are proposed. Mathematics, computing
and data science have provided a number of techniques (including SIR models) to study
the trends of epidemic diseases although it is still a challenge to predict the outbreak
early enough to introduce control measures. With focus on data mining, association rules
represent a promising technique to improve epidemic disease prediction. Unfortunately,
when they are applied on a data set, they produce an extremely large number of rules.
Most of such rules are irrelevant and the time required to find them can be impractical.
A more important issue is that, in general, association rules are mined on the entire data
set without validation on an independent sample. To solve these limitations, we introduce
an enhanced Association rule algorithm (Enhanced Aprior) that uses search constraints
to reduce the number of rules, searches for association rules on a training set, and finally
validates them on an independent test set. The prediction significance of discovered
rules is evaluated with support, confidence, and lift. Association rules are applied on
a real data set containing medical records of patients with heart disease. In this study,
associations rules relate Ebola symptoms and causes (Referred to as Attributes) with
existence of the Ebola virus. Search constraints and test-set validation significantly
reduce the number of association rules and produce a set of rules with high predictive
accuracy. We exhibit important rules with high confidence, high lift, or both, that remain
valid on the test-set on several runs. These rules represent valuable and interesting
patterns.