Assessment of Sources Affecting and Prediction of Event Mean Concentrations (EMCs) of Nutrients and Sediment in Urban Runoff Using Supervised Machine Learning Approaches

Urbanization results in higher runoff, sediment, and nutrients loadings, causing degraded water quality downstream. Event mean concentration (EMC) is a common method used in most watershed models for estimating nutrients and sediment loads in urban runoff. Land use and antecedent dry period (ADP) are known factors affecting water quality (WQ) EMCs; however, several studies have shown there is no significant correlation between WQ EMCs and ADP. The objective of this study is to (1) discover which parameters, climatological information or catchment characteristics, are most significantly affect WQ EMCs and (2) estimate WQ EMCs based on the most significant parameters.

Urban runoff quality data was obtained from a U.S.A. dataset, the National Stormwater Quality Database (NSQD), where monitoring results from over 5000 storm events from 308 homogenous catchments, with respect to land use, are stored.

Bayesian Network Structure Learner (BNSL), a supervised machine learning approach, was used to assess the relationships between catchment characteristics, climatological information, and WQ EMCs for each land use. Given the optimal BN structure, it was determined which parameters affect WQ EMCs the most. Random Forest (RF), a supervised machine learning approach, was applied to over 5000 storm events for estimating WQ EMCs from homogenous catchments.

The results demonstrated that (1) BNSL and RF are powerful approaches for discovering relationships between various parameters and WQ EMCs and estimating WQ EMCs from homogenous land use catchments and (2) other factors (such as rainfall depth and duration, surface slope) exert a more important influence on WQ EMCs than ADP.