The National Aquarium is the state coordinator for the Ocean Conservancy’s International Coastal Cleanup (ICC) events in Maryland. The ICC program engages volunteers to do marine debris cleanup events around the world and record the items they collect along with the location of the cleanup. These data are published in the annual Ocean Trash Index, an item-by-item, location-by-location accounting of marine debris picked up by volunteers. The major task of this project is to use this dataset to predict the possible high-risk marine debris locations in the mid-Atlantic region to better inform and target future reduction efforts.
The cleanup records do not represent an unbiased dataset because they only record information where people have done cleanups. No information about where there isn’t debris is available. This means the dataset is biased towards urban development and population, therefore finding the best predictive modeling approach is the major difficulty of this project. Xiao performed data cleaning, feature engineering, exploratory analysis and multiple modeling approaches in predicting the high-risk marine debris locations. In the modeling approaches, she tried multiple machine learning approaches including decision tree, random forest, gradient boosting, extreme gradient boosting and geostatistical analysis, including Kriging interpolation. As the final result of this project, she aggregated the geostatistical regression result, Kriging, to the county level for the whole Mid-Atlantic region and created a map of the high-risk areas for marine debris to better inform future cleaning efforts.
Fellowship applications for Summer 2019 are open. Apply for a paid fellowship position and work on high impact nonprofit projects!