This research project was conducted under my supervision at Rensselaer Polytechnic Institute (RPI), My students worked on this project under my guidlines and mentoring are: Lilian Ngweta, Karan Bhanot, Ariane Maharaj, Ian Bogle . This research project was a part of the Data Science course that I taught during Fall 2019 at Rensselaer Polytechnic Institute.
Outcome of this project was published as a conference paper in IEEE BigData Conference ( IEEE BigData) in December, 2019 in Los Angeles, USA.
Dengue, Malaria and Zika are vector-borne diseases caused by mosquitoes that carry the parasites that lead to illnesses. According to the World Health Organization (WHO) hundreds of thousands of people around the world die every year due to disease-transmitting mosquitoes. Mosquito outbreaks occur most commonly in warm climates, in areas close to the equator and tropical regions. Female mosquitoes lay their eggs in the ponds and puddles where water accumulates due to rainfall. In 2016, there was a significant spike in the number of dengue cases in Argentina; there were 79,455 cases of dengue reported in 2016 compared to 3250 cases in 2014 and 4774 in 2015 and went back down to the hundreds in 2017 and rose to thousands again in 2018. Since Dengue and Zika are spread by the same species of mosquito, Aedes aegypti, the goal of this project is to examine whether the spike in dengue cases in 2016 in Argentina also led to a spike in Zika cases. During this study we will determine if there is a correlation between the number of Zika cases and rainfall precipitation levels. For the analysis, we use available Zika data for Argentina obtained from the Centers for Disease Control and Prevention (CDC) database. More specifically, we are looking at the number of cases per month at a county level in Argentina. The precipitation data is obtained from the National Aeronautics and Space Administration (NASA) through the Global Precipitation Measurement Mission (GPM). Just like the Zika data, precipitation data is also on a monthly basis and data is available at a county level. There are existing systems for forecasting the outbreak of vector-borne diseases in various countries and each system looks at a particular factor. For example, the Dengue forecasting MOdel Satellite-based System (D-MOSS) is a system that issues warnings of dengue outbreaks eight months before outbreaks are likely to occur in Vietnam. Another existing early warning system is the Predictive fLUshing Mosquito (PLUM) model. PLUM was developed by Draper Scientists in collaboration with scientists from Boston University and the Massachusetts Institute of Technology (MIT) to help predict and decrease outbreaks of dengue fever using observations collected in Singapore, Peru, and Puerto Rico. Our work is inspired by D-MOSS, but instead of including water availability as a component in the prediction and focusing on Vietnam, we are interested in finding out the relationship between Zika outbreaks and precipitation levels in Argentina. The Sustainable Development Goals of the United Nations aim to address global problems of peace, justice, gender equality, good health and many others. Similar to how D-MOSS targets these UN Sustainability goals, our project aspires to bring a similar approach when dealing with the Zika Virus. We will produce a report of our analysis results and visualizations that can be used by beneficiaries in Argentina to help in the efforts of combating and controlling the spread of Zika virus. .
Global Precipitation Measurement (GPM) Mission of NASA aims to collect reliable rainfall and snow (precipitation) information, recorded every three hours and combined together. The recorded data is made freely available in HDF5 format. We subset the data using a bounding box around Argentina and collect the final run data from January, 2016 to December, 2017. A total of 24 data files were collected, each corresponding to a specific month and year.
The Zika data is compiled from integrated surveillance bulletins from Argentina’s Ministry of Health. The data was provided in CSV file format for each bulletin, which were issued irregularly throughout a month and included the number of Zika cases in each province since the last bulletin. Since our precipitation data was grouped by month, we used a python script to aggregate the Zika data into a monthly granularity.
The project workflow as shown in Figure 1, project starts with obtaining the data from the sources, cleaning the data, and storing the data on a local repository. We then plan to perform analysis by creating models and visualizations that will help us determine whether the increase in Dengue cases in 2016 also led to the increase in Zika cases and help us determine the correlation between the number of Zika cases and precipitation levels. We will produce a final report that will include our analysis results and visualizations. Then as a final step, we will archive the project on GitHub.
From Figure 2 shown below, it is seen that the highest precipitation levels occur in the upper north-eastern region of Argentina where as the rest of the country had relatively low rates of precipitation. However, this region does not correspond with the highest numbers of Zika cases, which is seen in the north-western region of Argentina as well as on central eastern border of the country.
Figure 3 shown below shows a spread of comparatively higher precipitation levels across the middle of Argentina which fades towards the eastern and western regions of the country, with the western region having the lowest amounts of precipitation. The number of Zika cases are relatively low and only exist in a few provinces. Comparing Figures 2 and 3 the amount of precipitation is lower for March 2016 than March 2017 but there are a larger number of Zika cases for March 2016 than 2017 and they are spread throughout the entire country where as for March 2017 the cases are sparsely spread across the northern and eastern parts of the country.
Delving further into the relationship between the number of Zika cases and the trends in rainfall precipitation levels and other climatic factors, as well social and economic factors, valuable insights can be gained and used for prevention planning. It will also provide context as to how climate change can affect mosquito breeding trends and thus vector-borne diseases. A direct correlation between the average rainfall spread and the Zika cases of the same month do not align well with one another but a delayed correlation has high probability due to the nature of Zika spread.