This research project was conducted under my supervision at Rensselaer Polytechnic Institute (RPI), My student worked on this project under my guidlines and mentoring is: Ariane Maharaj. This project was a collaboration with the NASA - Interagency Implementation and Advanced Concepts Team (IMPACT) and part of the Data Analytics course that I taught during Spring 2020 at Rensselaer Polytechnic Institute.
Outcome of this project was published as a conference paper in IEEE BigData Conference ( IEEE BigData) IEEE 2nd Workshop on Machine Learning for Big Data Analytics in Remote Sensing in December, 2020 (Virtual).
The effect of dust on precipitation at high latitudeshas not been adequately explored as most of this research hasbeen restricted to mid- and low-latitude regions. Even so, at theselower latitudes the relationship between dust and precipitation still has not been distinctly established. The purpose of this studyis to act as a starting point for understanding the role dustplays in precipitation processes at high latitudes. Knowledge of the effects dust can have on precipitation will allow forbetter prediction and forecasting through an improvement ofregional and global climate models. In order to investigate thisrelationship, we looked at instances of high-latitude dust and precipitation amounts specifically in regions of Alaska, Icelandand Patagonia. The precipitation data were separated in terms of days where dust was present, versus absent (binary basis).Most of the precipitation averages were low with the highestprecipitation amounts seen only on days without dust. The datawere also separated into four specific days surrounding the dustevent (4-day basis); the day before the dust event, the day of,one day after and two days after. The day of and one day afterwere limited to having lower levels of precipitation whereas theday before and two days after exhibited a broader range ofprecipitation levels, which included the highest amounts. These initial observations indicated more of a negative relationship between the high-latitude dust events and precipitation. Two classifications of precipitation intensity were created and testedfor the purpose of this study. After which, correlation analyses(Chi-Square Test, Fisher’s Exact Test and Rank Correlations) and classification modeling (Decision Tree modeling) were applied. The p-values from the correlation analyses were less than 0.05 suggesting that there is likely a statistically significant relationship between the high-latitude dust events and precipitation, and this relationship again appears to be slightly negative. The classification models were able to predict the precipitation amounts for the days surrounding the dust events using the two classifications. However, they were not specifically able to distinguish between each of the categories for the classifications as most of the data were in the lower categories.
The high-latitude dust data images were shared with Rensselaer Polytechnic Institute by the NASA Interagency Implementation and Advanced Concepts Team (IMPACT). It included 206 images of high-latitude dust which were produced as part of NASA’s Global Imagery Browse Service (GIBS). These images were extracted in conjunction with their corresponding dates and bounding box coordinates in Alaska, Iceland and Patagonia. The dates and coordinates were used to manually obtain the precipitation data from NASA’s Global Precipitation Measurement (GPM) Mission . For days where there were multiple dust events in the same general location, the bounding box for acquiring the precipitation data was chosen as the minimum extent containing all of the dust events. The post-processing resulted in 112 unique dust events that were used in obtaining the GPM data.
The overall aim of this project is to add to the investigation into how the presence of dust affects the amounts of precipitation of a region and gain insights into a time line for how long the effect, if any, can be observed. This study also looks to predict precipitation amounts based on the days surrounding the dust event, specifically the 4-day separation. As such, the precipitation was first classified based on intensity.
In order to optimize the model to achieve the goal of splitting the data into four separate leaf nodes distinctive of each specific day, multiple variations of the Initial Classi-fication were tested. This included; changing the limits ofeach of the classes, removing a class, that is, only having “Light”, “Moderate”, “Heavy”, as well as, adding in extraclasses such as “Trace” and “Very Light”. The most successful classification in terms of achieving the goal was an expanded classification of sorts in which the lower end of the spectrum, where most of the data were restricted to, was broken into two classes and the ranges for other categories were expanded shown in the Table below. The preliminary analysis resulted in the decision tree model shown below, splitting the data into 3 groups; “dayof” the dust event, the “day before” and the days after (“1 dayafter” and “2 days after”).
In order to determine what was contributing to the significance in the Chi-Square Test, the residuals, that is, the difference between the observed and the expected frequencies, were considered.
The relationship between dust and precipitation in the high latitude regions of Alaska, Iceland and Patagonia was investigated through multiple correlation analysis techniques and decision tree modelling. Results from this study indicate that when high-latitude dust is absent, observed precipitation can increase. The high latitude regions exhibited lower levels of precipitation regardless of the presence or absence of dust, and heavier precipitation was only seen (as outliers) in the absence of dust. Additionally, the correlation analyses seem to suggest a statistically significant relationship between high-latitude dust and precipitation with p-values less than 0.05. The relationship appears to be negative based on the residuals from the Chi-Square Test as well as the correlation coefficients from the rank correlations, which corresponds with initial observations made during the exploratory data analysis.