Estimation of suspended sediment load using regression trees and model trees approaches (Case study: Hyderabad drainage basin in Iran)

doi:10.1080/09715010.2016.1264894

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=tish20

Download by: [Mohammad Taghi Dastorani] Date: 02 January 2017, At: 04:41

ISH Journal of Hydraulic Engineering

ISSN: 0971-5010 (Print) 2164-3040 (Online) Journal homepage: http://www.tandfonline.com/loi/tish20

Estimation of suspended sediment load using

regression trees and model trees approaches

(Case study: Hyderabad drainage basin in Iran)

Ali Talebi, Javad Mahjoobi, Mohammad Taghi Dastorani & Vahid Moosavi

To cite this article: Ali Talebi, Javad Mahjoobi, Mohammad Taghi Dastorani & Vahid Moosavi

(2016): Estimation of suspended sediment load using regression trees and model trees

approaches (Case study: Hyderabad drainage basin in Iran), ISH Journal of Hydraulic

Engineering, DOI: 10.1080/09715010.2016.1264894

To link to this article: http://dx.doi.org/10.1080/09715010.2016.1264894

Published online: 30 Dec 2016.

Submit your article to this journal

Article views: 1

View related articles

View Crossmark data

ISH JOURNAL OF HYDRAULIC ENGINEERING, 2016

http://dx.doi.org/10.1080/09715010.2016.1264894

Estimation of suspended sediment load using regression trees and model trees

approaches (Case study: Hyderabad drainage basin in Iran)

Ali Talebi

a

, Javad Mahjoobi

b

, Mohammad Taghi Dastorani

c

and Vahid Moosavi

a

Faculty of Natural Resources, Yazd University, Yazd, Iran;

b

Ministry of Energy, Yazd Regional Water Authority, Yazd, Iran;

c

Faculty of Natural Resources

and Environment, Ferdowsi University of Mashhad, Mashhad, Iran

ABSTRACT

Estimation of suspended sediment load is one of the important topics in river engineering. Dierent

methods are used for estimating the sediment rate. In recent years, dierent articial intelligence (AI)

methods, such as articial neural network (ANN), have been used for the estimation of sediments in

rivers. In this research, the suspended sediment load has been studied by using regression trees (RTs)

and model trees (MTs). The study area has been located in Hyderabad watershed in west of Iran. The

input data included the ow discharge, sum of three days discharge, sum of ve days precipitation and

the suspended sediment discharge were considered as output in the models. The numbers of total

data of sediment discharge was 223 records. The obtained results were compared with ANN method

(feed forward back propagation algorithm) and sediment rating curve (SRC). Results showed that RT

and MT outperformed ANN method in the study area. The method of SRC had high accuracy for daily

sediment discharge less than 100 ton per day in comparison with AI models, while the AI models had

higher accuracy for high sediment discharge. Moreover, the combination of articial intelligent models

had high accuracy regarding to each model lonely.

1. Introduction

Estimation of sediment transport rate is one of the basic prob-

lems in river engineering. Several empirical methods have

been developed to solve this problem. As these methods have

been obtained based on climatic conditions of other parts of

the world, they have high level of errors when are used in

rivers of Iran. One of the common methods for estimating

the suspended load in rivers is the rating curve method in

which the relation between ow discharge and sediment dis-

charge is presented as a power equation. In recent years, the

methods based on articial intelligence (AI) and machine

learning have been used for the estimation and prediction

of dierent phenomena in river engineering. e articial

neural network (ANN) is one of these methods that were

used by many scientists for estimating the sediment rates

in rivers (Abrahart and White 2001; Jain 2001; Nagy et al.

2002; Tayfur 2002; Merritt et al. 2003; Yitian and Gu 2003;

Cigizoglu 2004; Kisi 2004; Agarwal et al. 2005; Cigizoglu and

Alp 2006; Cigizoglu and Kisi 2006; Cigizoglu and Alp 2007;

Dogan et al. 2007).

Decision trees (DT) are one of the other common and

strong tools for prediction and classication. In contrast to

ANN, DT produces the roles. is means that DT presents

its prediction based on the role set, while in the procedure in

ANN is not transparent and it is like a black box. Recently,

application of regression trees (RTs) and model trees (MTs)

have been presented in water resource engineering eld.

Mahjoobi and Etemad-Shahidi (2008) predicted the wave

height due to wind in Lake Michigan using RT and apply-

ing classication and regression trees (CART) algorithm.

Ayoubloo et al. (2010) investigated the regular wave scour

around a circular pile using regression tress (CART algo-

rithm). Moreover, Etemad-Shahidi and Mahjoobi (2009) pre-

dicted the signicant wave height in Lake Superior using MT

model and applying the M5′ algorithm. MTs have also been

applied in rainfall–runo modeling (Solomatine and Dulal

2003); ood forecasting (Solomatine and Yunpeng 2004);

modeling water-level discharge relationship (Bhattacharya

and Solomatine 2005), sediment transport (Bhattacharya

et al. 2007), derivation of wave spectrum (Sakhare and Deo

2009), estimation of wind speed from wave measurements

(Daga and Deo 2009) and prediction of suspended sediment

load in rivers. Reddy and Ghimire (2009) applied the M5

MT and Gene Expression Programming to predict suspended

sediment load. ey also compared the obtained results with

sediment rating curve (SRC) and multiple linear regressions

(MLRs) and concluded that MT gives good performance

as compared with other used models. Etemad-Shahidi and

Ghaemi (2011) used MT method to predict pile groups

scour due to waves. ey presented new equations using MT

method and demonstrated that the proposed equations were

as accurate as other so computing methods, such as ANN

and SVM. Bonakdar and Etemad-Shahidi (2011), predicted

wave run-up on rubble-mound structures using M5 MT. ey

stated that the main advantage of MTs, unlike the other so

computing tools, is their easier use and more importantly

their understandable mathematical rules. ey showed that

the predictive accuracy of the MT approach was superior to

that of Van der Meer and Stam’s empirical formula. Wolfs

and Willems (

2014) developed discharge-stage curves using

KEYWORDS

Suspended sediment;

CART; M5′ algorithm; ANN;

sediment rating curve

ARTICLE HISTORY

Received 28 May 2016

Accepted 22 November 2016

CONTACT Ali Talebi talebisf@yazd.ac.ir

2 A. TALEBI ET AL.

several various approaches, i.e., single rating curves, rating

curves with dynamic correction, ANNs and M5′ MTs. ey

showed that all abovementioned methods outperformed the

traditional rating curve. Abolfathi et al. (

2016), used M5′ DT

algorithm to predict the wave run-up using existing labora-

tory data. ey demonstrated that the M5′ MT algorithm

had high precision in predicting the wave run-up. ey also

showed that a good agreement existed between the proposed

run-up formulae and existing empirical relations. Zounemat-

Kermani et al. (2016) used 8-year data series from hydromet-

ric stations located in Arkansas, Delaware and Idaho (USA),

to assess the ability of ANN and support vector regression

(SVR) models to forecast/estimate daily suspended sediment

concentrations and to compare the results with traditional

MLR and SRC models. ey tested three dierent ANN model

algorithms, along with four dierent SVR model kernels. ey

showed that ANN and SVR outperformed traditional meth-

ods. Shamaei and Kaedi (2016) introduced stacking method

to predict the suspended sediment. ey used linear genetic

programming and neuro-fuzzy methods as two successful so

computing methods to predict the suspended sediment. en,

they increased the accuracy of prediction by combining their

results with the meta-model of neural network based on cross

validation. e obtained results demonstrated that the stack-

ing method greatly improved root mean square error (RMSE)

and R2 statistics compared to use of linear genetic program-

ming or neuro-fuzzy solitarily. Makarynskyy et al. (2015) used

two numerical current and wave models in addition to AI

technique of neural networks (ANNs) to reproduce values of

sediment concentrations observed at two sites. ey showed

that ANN method provides accurate results. Nourani et al.

(2016) used a two-stage modeling strategy in order to han-

dle spatio-temporal variation of SSL. At temporal stage, they

used support vector machine (SVM) to nd the nonlinear

relationship of SSL in time domain. In spatial modeling stage,

they used semivariogram of monthly SSL data and then they

tted theoretical semivariogram model to the empirical var-

iogram. e obtained results showed that the hybrid of SVM

and Spatial statistics methods could predict and simulate SSL

appropriately by enjoying unique features of both approaches.

Chen and Chau (2016) used a hybrid double feedforward

neural network (HDFNN) model for daily SSL estimation, by

combining fuzzy pattern-recognition and continuity equation

into a structure of double neural networks. ey showed that

HDFNN is appropriate for modeling the sediment transport

process with nonlinear, fuzzy and time-varying characteris-

tics. Shiau and Chen (2015) developed a probabilistic esti-

mation scheme for daily and annual suspended sediment

loads using quantile regression. ey used daily suspended

sediment load and discharge data to construct quantile-de-

pendent SRCs. eir proposed approach was applied to the

Laonung station located in southern Taiwan. e results indi-

cated that the proposed approach provided not only the prob-

abilistic description for daily and annual suspended sediment

loads, but also the single estimations including the mean,

median and mode of the derived probability distribution.

e main purpose of this research is to apply the RT model

(CART algorithm) and MTs (M5′ algorithm) for estimating

the suspended sediment load in Hyderabad watershed, west

Iran. In addition, the obtained results of these two methods

will be compared with the SRC method and ANN model (feed

forward back propagation algorithm).

2. Materials and methods

2.1. Study area and data

is research has been done on Hyderabad watershed in

Kermanshah province in western part of Iran (Figure 1). e

total area of watershed is 1719km

2

, mean height is 1871m,

maximum height 3300m and minimum height is 1325m. is

watershed has been located in 47° 04′–47° 52′ longitudes and

34° 25′–34° 52′ latitude. e main river of the watershed is the

Jamishan permanent river. e meteorological station of the

watershed is Hyderabad station with 47° 27′ longitude and 34°

42′ latitude (Figure 2). e precipitation regime of the study

area is rainy-snowy and mean annual precipitation is 420mm

which is mostly occurred in winter and spring. e used data

in this research involve precipitation, ow discharge and sed-

iment discharge. e length of data period is 21years (from

Figure 1.The position of the study area in Iran. Source: The authors.

ISH JOURNAL OF HYDRAULIC ENGINEERING 3

1985 to 2006) with the total number of 223 samples. Eighty

percent of these data have been used for training and 20% for

testing and evaluating the models. One of the problems that

occur during training process is called overtting. e error on

the training set is driven to a very small value, but when new

data is presented to the network the error is large. e network

has memorized the training examples, but it has not learned

to generalize to new situations. One of the main ways to avoid

overtting (or recognize if occurs) is to separate data to training

and test data sets. e training subset is composed of 60–80%

of all the records. e remaining records are usually used as

test data set. Gharagheizi (2007) showed that the percent of test

set allocated from the main data set should be between 5% and

35%. If this percent is lower than 5%, the accuracy of the model

over the training set is much greater than the test set. Also, if

the percent is greater than 40%, the obtained model cannot pre-

dict the test set as well as the training set. Each record should

be randomly chosen from the data set and placed in one of the

two subsets. erefore, the data were separated to training and

test data sets using a common random method. e ranges and

average values of water and sediment discharge for training and

testing have been shown in Table 1.

2.2. RTs (CART algorithm)

e CART method developed by Breiman et al. (1984) gen-

erates binary DTs. CART is a nonparametric statistical meth-

odology developed for analyzing classication issues either

from categorical or continuous dependent variables. If the

dependent variable is categorical, CART produces a clas-

sication tree. When the dependent variable is continuous,

it produces a RT. e CART tree is constructed by splitting

subsets of the data set using all predictor variables to create

two child nodes repeatedly, beginning with the entire data

set. e best predictor is chosen using a variety of impurity

or diversity measures. e goal is to produce subsets of the

data which are as homogeneous as possible with respect to

the target variable. In CART algorithm for each split, each

predictor is evaluated to nd the best cut point (continuous

predictors) or groupings of categories (nominal and ordinal

predictors) based on improvement score or reduction in impu-

rity (Breiman et al. 1984). en, the predictors are compared

and the predictor with the best improvement is selected for the

split. e process repeats recursively until one of the stopping

rules is triggered. RT building centers on three major com-

ponents: (1) a set of questions of the form: is

X ≤ d?

where X

is a variable and d is a constant. (2) Goodness of split criteria

for choosing the best split on a variable and (3) the generation

of summary statistics for terminal nodes. e least-squared

deviation (LSD) impurity measure is used for splitting rules

and goodness of t criteria. e LSD measure R(t) is simply

the weighted within node variance for node t, and it is equal

to the resubstitution estimate of risk for the node (Breiman

et al. 1984). It is dened as:

where N

W

(t) is the weighted number of records in node t, ω

i

is the value of the weighting eld for record i (if any), f

i

is the

value of the frequency eld (if any), y

i

is the value of the target

eld, and

y(t)

is the mean of the dependent variable (target

eld) at node t. e LSD criterion function for split s at node

t is dened as follows:

(1)

R

(t)=

1

N

W

(t)

∑

i∈t

𝜔

i

f

i

(

y

i

− y(t)

)

2

(2)

y(t)=

1

N

W

(t)

∑

i∈t

𝜔

i

f

i

y

i

(3)

N

W

(t)=

∑

i∈t

𝜔

i

f

i

Figure 2.Drainage network and hydrometry station in Hydrabad watershed. Source: The authors.

Table 1.Ranges and average values of diﬀerent parameters in training and test data sets.

Parameters

Training data set Test data set

Minimum Maximum Average Minimum Maximum Average

Water discharge (m

3

/s) 0.04 285.41 17.95 0.25 188.67 23.35

Suspended sediment discharge (ton/day) 0.0001 215.897 2027.66 0.397 68,107.35 2696.08

4 A. TALEBI ET AL.

calculated by averaging the absolute dierence between the

predicted value and the actual value for each of the training

examples that reach that node. is results in underestima-

tion of the expected error outside the calibrating data. e

expected error is multiplied by (n+v)/(n−v), where n is the

number of training instances that reach the node and the v

is the number of parameters in the model that represent the

value at that node (Wang and Witten 1997). Aer pruning,

the adjacent linear models will be sharply discontinuous at

the leaves of the pruned tree. M5 applies smoothing process

combining the model at a leaf with the models on the path to

the root to form the nal model that is placed at the leaf. In

the smoothing process, the estimated value of the leaf model

is ltered along the path back to the root. At each node, that

value is combined with the value predicted by the linear model

for that node as follows:

where P′ is the prediction passed up to the next higher node,

p is the prediction passed to this node from the below, q is the

value predicted by the model at this node, n is the number of

training instances that reach the node below, and k is a constant

(Wang and Witten 1997). Experiments of Wang and Witten,

(1997) have showed that smoothing substantially increases the

accuracy of predictions.

2.4. ANNs and SRC

ANNs are powerful nonlinear modeling approaches based

on the function of human brain. ey can identify and learn

correlated patterns between input data sets and target values.

Neural networks can be described as a network of simple pro-

cessing nodes or neurons, interconnected to each other in a

specic order, performing simple numerical manipulations

(See and Openshaw 1999). A three-layered neural network

is consists of several elements namely nodes. ese networks

are made up of an input layer consisting of nodes representing

(6)

P

�

=

np + kq

n + k

where R(t

R

) is the sum of squares of the right child node and

R(t

L

) is the sum of squares of the le child node. e split s is

chosen to maximize the value of Q(s, t). Stopping rules con-

trol how the algorithm decides when to stop splitting nodes

in the tree. Tree growth proceeds until every leaf node in the

tree triggers at least one stopping rule. Any of the following

conditions will prevent a node from being split:

(1) All records in the node have the same value for all

predictor elds used by the model.

(2) e number of records in the node is less than the

minimum parent node size (user dened).

(3) If the number of records in any of the child nodes

resulting from the node’s best split is less than the

minimum child node size (user dened).

(4) e best split for the node yields a decrease in

impurity that is less than the minimum change in

impurity (user dened).

In RTs, each terminal node’s predicted category is the

weighted mean of the target values for records in the node

(

y(t)

).

2.3. MTs (M5′ algorithm)

MTs (Quinlan 1992) are an extension of RTs in the sense that

they associate leaves with multivariate linear models. MTs

are a technique for dealing with continuous class problems

that provide a structural representation of the data and a

piecewise linear t of the class. ey have a conventional

DT structure but use linear function at the leaves instead of

discrete class labels (Figure 3). M5 MTs were rst introduced

by Quinlan (1992), and then, the idea was reconstructed and

improved in a system called M5′ by Wang and Witten, (1997).

An M5′ MT is an eective learning method for predicting real

values. M5′ MT algorithm rst constructs a RT by recursively

splitting the instance space. e splitting criterion is used

to minimize the intrasubset variability in the values down

from the root through the branch to the node. e variabil-

ity is measured by the standard deviation of the values that

reach that node from the root through the branch with cal-

culating the expected reduction in error as a result of testing

each attribute at that node. e attribute that maximizes the

expected error reduction is chosen. e splitting stops if the

values of all instances that reach a node vary slightly or only

a few instances remain. e standard deviation reduction

(SDR) is calculated by:

where T is the set of examples that reach the node, T

i

is the

sets that are resulted from splitting the node according to the

chosen attribute and SD is the standard deviation (Wang and

Witten 1997). Aer the tree has been grown, M5′ computes

a linear multiple regression model for every interior node.

e data associated with that node and only the attributes

tested in the subtree rooted at that node are used in the regres-

sion. e attributes will be dropped one by one if they lower

the estimated error. en the tree is pruned from the leaves

if that results in a lower expected estimated error. In Wang

and Witten, (1997)’s implementation, the expected error is

(4)

Q

(

s, t

)=

R

(

t

)−

R

(

t

L

)−

R

(

t

R

)

(5)

SDR

= sd(T)−



i



T

i





T



× sd(T

i

)

X1

<= a > a

Training

data set

X2

X3

LM5

LM6

LM4

X1

LM1

LM2

LM3

X2

<= b

<= d

> e

> b

> c

<= c

>d

<= e

Figure 3.MT used to split input space (X

i

: inputs, LM

i

: linear model).

Estimation of suspended sediment load using regression trees and model trees approaches (Case study: Hyderabad drainage basin in Iran)

Citations

Wavelet coupled MARS and M5 Model Tree approaches for groundwater level forecasting

River suspended sediment load prediction based on river discharge information: application of newly developed data mining models

On the applicability of maximum overlap discrete wavelet transform integrated with MARS and M5 model tree for monthly pan evaporation prediction

Suspended sediment load prediction using artificial neural network and ant lion optimization algorithm

Estimating suspended sediment load with multivariate adaptive regression spline, teaching-learning based optimization, and artificial bee colony models.

References

Neural Networks: A Comprehensive Foundation

Classification and Regression Trees.

Classification and regression trees

A review of erosion and sediment transport models

Induction of model trees for predicting continuous classes

Related Papers (5)

River suspended sediment modelling using the CART model: A comparative study of machine learning techniques.

Daily suspended sediment load prediction using artificial neural networks and support vector machines

Daily and monthly suspended sediment load predictions using wavelet based artificial intelligence approaches

Prediction of suspended sediment load using ANN GA conjunction model with Markov chain approach at flood conditions

A comparison of various artificial intelligence approaches performance for estimating suspended sediment load of river systems: a case study in United States