Predicting non-deposition sediment transport in sewer pipes using Random Forest.

Designing sediment-carrying sewer systems is a well-known field of research in hydraulic engineering

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html
background image

Water

 Research 189 (2021) 116639 

Contents

 

lists

 

available

 

at

 

ScienceDirect

 

Water

 

Research

 

journal

 

homepage:

 

www.elsevier.com/locate/watres

 

Predicting

 

non-deposition

 

sediment

 

transport

 

in

 

sewer

 

pipes

 

using

 

Random

 

forest

 

Carlos

 

Montes

 

a

 

,

 

,

 

Zoran

 

Kapelan

 

b

 

,

 

Juan

 

Saldarriaga

 

a

 

a

 Department of Civil and Environmental Engineering, Universidad de los Andes, Bogotá, Colombia 

b

 Department of Water Management, Delft University of Technology, Delft, Netherlands 

a

 

r

 

t

 

i

 

c

 

l

 

e

 

i

 

n

 

f

 

o

 

Article

 history: 

Received

 15 July 2020 

Revised

 29 October 2020 

Accepted

 12 November 2020 

Available

 online 13 November 2020 

Keywords:

 

Non-deposition

 

Random

 forest 

Sediment

 transport 

Self-cleansing

 

Sewer

 systems 

a

 

b

 

s

 

t

 

r

 

a

 

c

 

t

 

Sediment

 

transport

 

in

 

sewers

 

has

 

been

 

extensively

 

studied

 

in

 

the

 

past.

 

This

 

paper

 

aims

 

to

 

propose

 

a

 

new

 

method

 

for

 

predicting

 

the

 

self-cleansing

 

velocity

 

required

 

to

 

avoid

 

permanent

 

deposition

 

of

 

material

 

in

 

sewer

 

pipes.

 

The

 

new

 

Random

 

Forest

 

(RF)

 

based

 

model

 

was

 

implemented

 

using

 

experimental

 

data

 

col-

 

lected

 

from

 

the

 

literature.

 

The

 

accuracy

 

of

 

the

 

developed

 

model

 

was

 

evaluated

 

and

 

compared

 

with

 

ten

 

promising

 

literature

 

models

 

using

 

multiple

 

observed

 

datasets.

 

The

 

results

 

obtained

 

demonstrate

 

that

 

the

 

RF

 

model

 

is

 

able

 

to

 

make

 

predictions

 

with

 

high

 

accuracy

 

for

 

the

 

whole

 

dataset

 

used.

 

These

 

predictions

 

clearly

 

outperform

 

predictions

 

made

 

by

 

other

 

models,

 

especially

 

for

 

the

 

case

 

of

 

non-deposition

 

with

 

de-

 

posited

 

bed

 

criterion

 

that

 

is

 

used

 

for

 

designing

 

large

 

sewer

 

pipes.

 

The

 

volumetric

 

sediment

 

concentration

 

was

 

identified

 

as

 

the

 

most

 

important

 

parameter

 

for

 

predicting

 

self-cleansing

 

velocity.

 

© 2020

 

Elsevier

 

Ltd.

 

All

 

rights

 

reserved.

 

1.

 

Introduction

 

Designing

 

sediment-carrying

 

sewer

 

systems

 

is

 

a

 

well-known

 

field

 

of

 

research

 

in

 

hydraulic

 

engineering.

 

This

 

interest

 

is

 

explained

 

by

 

the

 

problems

 

related

 

to

 

the

 

presence

 

of

 

material

 

in

 

the

 

systems.

 

Due

 

to

 

the

 

varying

 

environmental

 

conditions

 

(i.e.

 

loading

 

and

 

sedi-

 

ment

 

characteristics

 

and

 

intermittent

 

flow),

 

the

 

risk

 

of

 

building

 

up

 

a

 

permanent

 

sediment

 

deposit

 

increases

 

during

 

dry

 

weather

 

sea-

 

sons.

 

These

 

deposits

 

lead

 

to

 

problems

 

such

 

as

 

reduced

 

pipe

 

capac-

 

ity,

 

increased

 

roughness,

 

and

 

premature

 

overflows.

 

As

 

an

 

example,

 

Ackers

 

et

 

al.

 

(2001)

 

showed

 

that

 

the

 

presence

 

of

 

a

 

permanent

 

de-

 

posit

 

at

 

the

 

bottom

 

of

 

sewer

 

pipes

 

increases

 

hydraulic

 

roughness

 

and

 

reduces

 

discharge

 

capacity

 

by

 

about

 

20%.

 

The

 

most

 

common

 

criterion

 

to

 

avoid

 

permanent

 

deposit

 

of

 

ma-

 

terial

 

in

 

sewer

 

pipes

 

is

 

known

 

as

 

non-deposition.

 

Several

 

authors

 

(

 

Safari

 

et

 

al.,

 

2018

 

;

 

Vongvisessomjai

 

et

 

al.,

 

2010

 

)

 

have

 

classified

 

this

 

criterion

 

into

 

two

 

subgroups:

 

1)

 

Non-deposition

 

without

 

deposited

 

bed

 

and

 

2)

 

Non-deposition

 

with

 

deposited

 

bed.

 

Both

 

groups

 

are

 

based

 

on

 

the

 

presence

 

of

 

sediments

 

at

 

the

 

bottom

 

of

 

the

 

pipe.

 

In

 

the

 

first

 

case,

 

high

 

water

 

velocities

 

produce

 

an

 

individual

 

and

 

sep-

 

arate

 

movement

 

of

 

the

 

particles

 

by

 

slicing

 

or

 

rolling

 

over

 

the

 

pipe

 

invert,

 

i.e.

 

without

 

deposited

 

bed.

 

In

 

contrast,

 

the

 

second

 

case

 

is

 

seen

 

when

 

lower

 

water

 

velocities

 

are

 

presented

 

and

 

the

 

particles

 

are

 

grouped

 

and

 

move

 

as

 

a

 

transitional

 

deposited

 

bed.

 

Corresponding

 author at: Cra 1 Este No. 19A – 40 Bogota, Colombia. 

E-mail

 

addresses:

 

cd.montes1256@uniandes.edu.co

 

(C.

 

Montes),

 

Z.Kapelan@tudelft.nl

 

(Z.

 Kapelan), 

jsaldarr@uniandes.edu.co

 

(J.

 Saldarriaga). 

In

 

the

 

case

 

of

 

‘without

 

deposited

 

bed’,

 

traditional

 

criteria

 

of

 

minimum

 

velocities

 

and

 

shear

 

stress

 

values

 

are

 

commonly

 

found

 

in

 

water

 

utilities

 

standards

 

and

 

industry

 

design

 

codes.

 

Generally,

 

these

 

standards

 

and

 

codes

 

suggest

 

values

 

ranging

 

from

 

0.30

 

m

 

s

 

−1

 

to

 

1.0

 

m

 

s

 

−1

 

for

 

minimum

 

velocity

 

and

 

from

 

1.0

 

Pa

 

to

 

4.0

 

Pa

 

for

 

shear

 

stress

 

(

 

Montes

 

et

 

al.,

 

2019

 

;

 

Nalluri

 

and

 

Ab

 

Ghani,

 

1996

 

;

 

Vongvisessomjai

 

et

 

al.,

 

2010

 

).

 

Several

 

authors

 

(

 

Merritt

 

and

 

Enfin-

 

ger,

 

2019

 

;

 

Nalluri

 

and

 

Ab

 

Ghani,

 

1996

 

)

 

have

 

shown

 

how

 

tradi-

 

tional

 

threshold

 

values

 

lead

 

to

 

over-design

 

of

 

small

 

diameter

 

pipes

 

and

 

under-design

 

of

 

large

 

diameter

 

pipes

 

(as

 

a

 

rule-of-thumb,

 

pipes

 

with

 

diameter

 

greater

 

than

 

500

 

mm).

 

Consequently,

 

large

 

sewers

 

commonly

 

require

 

frequent

 

removal

 

of

 

sediment

 

deposits

 

(

 

Ackers

 

et

 

al.,

 

2001

 

)

 

because

 

of

 

the

 

minimum

 

self-cleansing

 

value

 

adopted

 

during

 

the

 

design

 

stage.

 

A

 

unique

 

design

 

value

 

is

 

inad-

 

equate;

 

hence

 

sediment

 

characteristics

 

and

 

hydraulic

 

conditions

 

must

 

be

 

included

 

in

 

the

 

definition

 

of

 

the

 

self-cleansing

 

design

 

cri-

 

terion.

 

According

 

to

 

Safari

 

and

 

Aksoy

 

(2020)

 

,

 

existing

 

traditional

 

self-

 

cleansing

 

criteria

 

can

 

be

 

up

 

to

 

20%

 

different

 

from

 

laboratory-

 

scale

 

measured

 

values.

 

The

 

channel

 

cross-section

 

is

 

relevant

 

in

 

the

 

choice

 

of

 

the

 

self-cleansing

 

criterion.

 

For

 

example,

 

rectangu-

 

lar

 

cross-sections

 

require

 

lower

 

velocities

 

compared

 

to

 

V-bottom

 

or

 

U-shape

 

channels.

 

Even

 

criteria

 

based

 

on

 

the

 

Shields

 

diagram,

 

such

 

as

 

the

 

Camp

 

criterion,

 

seem

 

to

 

be

 

inadequate

 

to

 

define

 

the

 

self-cleansing

 

value

 

due

 

to

 

the

 

non-inclusion

 

of

 

sediment

 

concen-

 

tration.

 

The

 

above

 

has

 

motivated

 

extensive

 

experimental

 

research

 

(

 

Ab

 

Ghani,

 

1993

 

;

 

El-Zaemey,

 

1991

 

;

 

May,

 

1993

 

;

 

May

 

et

 

al.,

 

1989

 

;

 

https://doi.org/10.1016/j.watres.2020.116639

 

0043-1354/© 2020 Elsevier Ltd. All rights reserved.

 

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html
background image

C.

 Montes, Z. Kapelan and J. Saldarriaga 

Water

 Research 189 (2021) 116639 

Mayerle,

 

1988

 

;

 

Montes

 

et

 

al.,

 

2020a

 

,

 

2020b

 

;

 

Ota,

 

1999

 

;

 

Perrusquía,

 

1991

 

;

 

Vongvisessomjai

 

et

 

al.,

 

2010

 

)

 

aiming

 

to

 

collect

 

data

 

and

 

developing

 

models

 

for

 

predicting

 

the

 

self-cleansing

 

velocity

 

as

 

a

 

function

 

of

 

sediment

 

characteristics

 

and

 

sewer

 

hydraulics,

 

based

 

on

 

the

 

concept

 

of

 

non-deposition.

 

These

 

studies

 

have

 

been

 

car-

 

ried

 

out

 

at

 

laboratory

 

scale

 

under

 

well-controlled

 

and

 

steady

 

flow

 

conditions,

 

using

 

non-cohesive

 

sediments.

 

Different

 

authors

 

col-

 

lected

 

data

 

in

 

pipes

 

with

 

different

 

materials

 

(e.g.

 

concrete,

 

acrylic

 

or

 

PVC,

 

among

 

other

 

materials)

 

and

 

internal

 

diameters,

 

ranging

 

from

 

100

 

mm

 

to

 

595

 

mm.

 

In

 

the

 

end,

 

all

 

these

 

studies

 

proposed

 

a

 

model

 

for

 

predicting

 

the

 

self-cleansing

 

conditions

 

in

 

practice

 

that

 

was

 

either

 

developed

 

with

 

their

 

own

 

experimental

 

data

 

or

 

using

 

the

 

benchmark

 

data

 

reported

 

in

 

the

 

literature.

 

Most

 

models

 

devel-

 

oped

 

are

 

regression-based

 

and

 

include

 

the

 

group

 

of

 

input

 

param-

 

eters

 

that

 

most

 

affect

 

the

 

prediction

 

of

 

the

 

self-cleansing

 

veloc-

 

ity

 

(

 

Ackers

 

et

 

al.,

 

2001

 

;

 

Ebtehaj

 

and

 

Bonakdari,

 

2016a

 

;

 

May

 

et

 

al.,

 

1996

 

).

 

Most

 

of

 

these

 

models

 

are

 

in

 

the

 

form

 

of:

 

V

 

l

 

 

gd

 

(

 

S

 

s

 

− 1

 

)

 

=

 

aC

 

b

 

v

 

d

 

R

 

or

 

d

 

D

 

c

 

λ

e

 

D

 

f

 

gr

 

W

 

b

 

Y

 

or

 

y

 

s

 

Y

 

or

 

y

 

s

 

D

 

g

 

P

 

B

 

h

 

(1)

 

where

 

V

 

l

 

is

 

the

 

self-cleansing

 

velocity,

 

the

 

mean

 

particle

 

diam-

 

eter,

 

the

 

gravity

 

acceleration

 

coefficient,

 

S

 

s

 

the

 

specific

 

gravity

 

of

 

sediments,

 

C

 

v

 

the

 

volumetric

 

sediment

 

concentration,

 

R

 

the

 

hy-

 

draulic

 

radius,

 

D

 

the

 

pipe

 

diameter,

 

λ

the

 

channel

 

friction

 

fac-

 

tor,

 

D

 

gr

 

the dimensionless grain

 

size

(

 

=

 

(

 

(

 

S

 

s

 

−1

)

 

g

 

d

 

3

 

ν

2

 

)

 

1

 

3

 

)

 

,

 

ν

the wa-

 

ter

 

kinematic

 

viscosity,

 

W

 

b

 

the

 

sediment

 

deposited

 

width,

 

P

 

the

 

wetted

 

perimeter,

 

y

 

s

 

the sediment

 

deposited

 

thickness,

 

B

 

the

 

wa-

 

ter

 

surface

 

width,

 

Y

 

the

 

water

 

level

 

and

 

a

 

,

 

b,

 

c,

 

e

 

,

 

,

 

and

 

h

 

re-

 

gression

 

coefficients.

 

Other

 

parameters

 

as

 

V

 

t

 

the

 

threshold

 

veloc-

 

ity

 

required

 

to

 

initiate

 

movement

 

(

 

=

 

0

 

.

 

125

 

(

 

gd

(

 

S

 

s

 

− 1

 

)

 

)

 

0

 

.

 

5

 

(

 

Y

/d

 

)

 

0

 

.

 

47

 

)

 

and

 

S

 

o

 

the

 

pipe

 

slope

 

have

 

also

 

been

 

included

 

in

 

regression

 

models

 

(

 

May

 

et

 

al.,

 

1996

 

;

 

Montes

 

et

 

al.,

 

2020a

 

).

 

Most

 

of

 

above

 

studies

 

for

 

both

 

non-deposition

 

criteria,

 

have

 

de-

 

veloped

 

predictive

 

models

 

which

 

tend

 

to

 

be

 

overfitted

 

to

 

their

 

own

 

experimental

 

data.

 

This

 

problem

 

can

 

be

 

seen

 

especially

 

in

 

the

 

ear-

 

lier

 

works,

 

where

 

no

 

advanced

 

techniques

 

were

 

used

 

to

 

develop

 

regression

 

models.

 

For

 

example,

 

several

 

authors

 

(

 

Montes

 

et

 

al.,

 

2020b

 

;

 

Safari

 

et

 

al.,

 

2018

 

)

 

have

 

pointed

 

out

 

that

 

early

 

work

 

of

 

Mayerle’s

 

(1988)

 

has

 

developed

 

a

 

model

 

that

 

shows

 

high

 

accu-

 

racy

 

prediction

 

with

 

its

 

data

 

and

 

poor

 

prediction

 

when

 

other

 

datasets

 

are

 

used.

 

In

 

contrast,

 

recent

 

regression-models,

 

which

 

used

 

novel

 

techniques

 

such

 

as

 

Evolutionary

 

Polynomial

 

Regression

 

– Multi-Objective

 

Genetic

 

Algorithm

 

(EPR-MOGA)

 

and

 

Least

 

Abso-

 

lute

 

Shrinkage

 

and

 

Selection

 

Operator

 

(LASSO)

 

have

 

demonstrated

 

better

 

prediction

 

results

 

(

 

Montes

 

et

 

al.,

 

2020a

 

,

 

2020b

 

).

 

In

 

order

 

to

 

address

 

the

 

above

 

overfitting

 

issue

 

in

 

regres-

 

sion

 

models,

 

new

 

Machine

 

Learning

 

(ML)

 

and

 

Artificial

 

Intelli-

 

gence

 

(AI)

 

techniques

 

have

 

been

 

introduced

 

for

 

predicting

 

the

 

self-

 

cleansing

 

velocity

 

based

 

on

 

the

 

concept

 

of

 

non-deposition

 

sed-

 

iment

 

transport.

 

Examples

 

of

 

models

 

developed

 

for

 

the

 

‘with-

 

out

 

deposited

 

bed’

 

case

 

include

 

using

 

techniques

 

such

 

as

 

Artifi-

 

cial

 

Neural

 

Network

 

(ANN)

 

(

 

Ebtehaj

 

and

 

Bonakdari,

 

2013

 

),

 

Sup-

 

port

 

Vector

 

Regression

 

(SVR)

 

coupled

 

with

 

the

 

Firefly

 

Algorithm

 

(

 

Ebtehaj

 

and

 

Bonakdari,

 

2016b

 

),

 

the

 

Group

 

Method

 

of

 

Data

 

Han-

 

dling

 

(GMDH)

 

(

 

Ebtehaj

 

and

 

Bonakdari,

 

2016a

 

),

 

neuro-fuzzy

 

in-

 

ference

 

system

 

combined

 

with

 

the

 

Particle

 

Swarm

 

Optimisation

 

(ANFIS-PSO)

 

(

 

Ebtehaj

 

et

 

al.,

 

2019

 

),

 

Decision

 

Trees

 

(DT),

 

Generalised

 

Regression

 

Neural

 

Network

 

(GRNN),

 

Multivariate

 

Adaptive

 

Regres-

 

sion

 

Splines

 

(MARS)

 

(

 

Safari,

 

2019

 

)

 

and

 

Extreme

 

Learning

 

Machine

 

(ELM)

 

(

 

Ebtehaj

 

et

 

al.,

 

2020

 

).

 

For

 

the

 

other

 

case

 

of

 

‘non-deposition

 

with

 

deposited

 

bed’,

 

fewer

 

ML/AI

 

type

 

models

 

have

 

been

 

devel-

 

oped.

 

Examples

 

include

 

models

 

based

 

on

 

Particle

 

Swarm

 

Optimisa-

 

tion

 

(PSO)

 

algorithm

 

(

 

Safari

 

et

 

al.,

 

2017

 

),

 

Gene

 

Expression

 

Program-

 

ming

 

(GEP)

 

(

 

Roushangar

 

and

 

Ghasempour,

 

2017

 

)

 

and

 

Multigene

 

Genetic

 

Programming

 

(MGP)

 

(

 

Safari

 

and

 

Danandeh

 

Mehr,

 

2018

 

).

 

The

 

above

 

models,

 

developed

 

using

 

different

 

ML/AI

 

tech-

 

niques

 

(for

 

both

 

non-deposition

 

criteria),

 

have

 

improved

 

the

 

prediction

 

accuracy

 

of

 

self-cleansing

 

velocities

 

and

 

addressed

 

the

 

issues

 

of

 

model

 

overfitting

 

but

 

only

 

partially.

 

As

 

noted

 

by

 

Zendehboudi

 

et

 

al.

 

(2018)

 

,

 

these

 

models

 

still

 

tend

 

to

 

have

 

rather

 

limited

 

extrapolation

 

capabilities

 

meaning

 

that

 

once

 

they

 

are

 

ap-

 

plied

 

to

 

datasets

 

that

 

were

 

not

 

used

 

for

 

their

 

training

 

they

 

tend

 

to

 

underperform.

 

Also,

 

the

 

ML/AI

 

based

 

models

 

developed

 

so

 

far

 

are

 

largely

 

black-box

 

type

 

models

 

(e.g.

 

ANN)

 

meaning

 

that,

 

un-

 

like

 

white-box

 

type

 

regression

 

models,

 

they

 

suffer

 

from

 

low

 

inter-

 

pretability

 

of

 

physical

 

significance

 

of

 

model

 

inputs

 

(i.e.

 

explanatory

 

factors),

 

and

 

interactions

 

with

 

the

 

model

 

output.

 

The

 

aim

 

of

 

this

 

paper

 

is

 

to

 

overcome

 

above

 

deficiencies

 

us-

 

ing

 

the

 

Random

 

Forest

 

(RF)

 

technique

 

for

 

predicting

 

self-cleansing

 

sewer

 

velocities.

 

RF

 

(

 

Breiman,

 

2001

 

)

 

is

 

a

 

flexible

 

and

 

interpretable

 

supervised

 

ML

 

technique

 

that

 

combines

 

the

 

results

 

(outputs)

 

of

 

multiple

 

individual

 

decision

 

trees

 

to

 

make

 

a

 

prediction

 

of

 

interest.

 

Due

 

to

 

its

 

good

 

characteristics

 

and

 

easy

 

application,

 

it

 

has

 

been

 

a

 

widely

 

used

 

for

 

addressing

 

many

 

other

 

problems

 

in

 

water

 

en-

 

gineering.

 

Tyralis

 

et

 

al.

 

(2019)

 

showed

 

a

 

full

 

review

 

of

 

studies

 

in

 

which

 

RF

 

was

 

successfully

 

applied

 

to

 

water

 

resources

 

problems.

 

Using

 

the

 

RF

 

technique,

 

a

 

new

 

predictive

 

self-cleansing

 

model

 

is

 

developed

 

and

 

presented

 

here

 

for

 

both

 

non-deposition

 

criteria

 

(with

 

and

 

without

 

deposited

 

bed).

 

This

 

model

 

aims

 

to

 

increase

 

prediction

 

accuracy

 

whilst

 

avoiding

 

overfitting

 

issues

 

and

 

enabling

 

interpretability

 

of

 

results

 

obtained.

 

The

 

new

 

modelling

 

technique

 

is

 

compared

 

to

 

ten

 

literature

 

models

 

using

 

multiple

 

datasets.

 

2.

 

Data

 

2.1.

 

Non-deposition

 

without

 

deposited

 

bed

 

data

 

Several

 

experimental

 

data

 

were

 

collected

 

from

 

the

 

literature

 

to

 

implement

 

the

 

RF

 

method.

 

Mayerle

 

(1988)

 

studied

 

the

 

sediment

 

transport

 

in

 

a

 

152

 

mm

 

diameter

 

pipe

 

and

 

in

 

two

 

rectangular

 

chan-

 

nels

 

of

 

311.5

 

mm

 

and

 

462.3

 

mm

 

bottom

 

width

 

(

 

W

 

)

 

using

 

granular

 

sands

 

ranging

 

from

 

0.50

 

mm

 

to

 

8.74

 

mm.

 

Ab

 

Ghani

 

(1993)

 

col-

 

lected

 

221

 

data

 

in

 

154

 

mm,

 

305

 

mm

 

and

 

450

 

mm

 

diameter

 

pipes,

 

testing

 

sands

 

between

 

0.46

 

mm

 

and

 

8.40

 

mm.

 

Ota

 

(1999)

 

used

 

a

 

225

 

mm

 

concrete

 

pipe

 

with

 

a

 

constant

 

slope

 

of

 

0.002,

 

vary-

 

ing

 

the

 

volumetric

 

sediment

 

concentration

 

between

 

4.2

 

ppm

 

to

 

59.4

 

ppm.

 

Vongvisessomjai

 

et

 

al.

 

(2010)

 

used

 

two

 

circular

 

PVC

 

pipes

 

of

 

100

 

mm

 

and

 

150

 

mm

 

diameter

 

to

 

study

 

the

 

bedload

 

and

 

suspended

 

load

 

transport.

 

Montes

 

et

 

al.

 

(2020a)

 

collected

 

ex-

 

perimental

 

data

 

in

 

a

 

242

 

mm

 

acrylic

 

pipe

 

using

 

granular

 

mate-

 

rial

 

with

 

a

 

mean

 

particle

 

diameter

 

of

 

0.35

 

mm

 

and

 

1.51

 

mm.

 

Montes

 

et

 

al.

 

(2020b)

 

carried

 

out

 

107

 

experiments

 

in

 

a

 

595

 

mm

 

PVC

 

pipe,

 

using

 

sediments

 

ranging

 

from

 

0.35

 

mm

 

to

 

2.6

 

mm.

 

2.2.

 

Non-deposition

 

with

 

deposited

 

bed

 

data

 

For

 

the

 

non-deposition

 

with

 

deposited

 

bed,

 

El-Zaemey

 

(1991)

 

studied

 

the

 

sediment

 

transport

 

in

 

a

 

305

 

mm

 

diameter

 

pipe,

 

using

 

granular

 

particles

 

ranging

 

from

 

0.53

 

mm

 

to

 

8.40

 

mm.

 

Perrusquía

 

(1991)

 

carried

 

out

 

experiments

 

in

 

a

 

225

 

mm

 

diame-

 

ter

 

pipe,

 

varying

 

the

 

sediment

 

concentration

 

from

 

18.7

 

ppm

 

to

 

408.0

 

ppm.

 

Ab

 

Ghani

 

(1993)

 

collected

 

the

 

deposited

 

bed

 

data

 

only

 

in

 

the

 

450

 

mm

 

concrete

 

pipe

 

and

 

using

 

granular

 

sand

 

with

 

a

 

mean

 

particle

 

diameter

 

of

 

0.72

 

mm.

 

May

 

(1993)

 

extended

 

their

 

previous

 

study

 

(

 

May

 

et

 

al.,

 

1989

 

)

 

and

 

collected

 

experimental

 

data

 

with

 

sediment

 

thickness

 

varying

 

from

 

57.6

 

mm

 

to

 

129.6

 

mm.

 

Finally,

 

Montes

 

et

 

al.

 

(2020b)

 

carried

 

out

 

experiments

 

in

 

a

 

595

 

mm

 

PVC

 

pipe,

 

considering

 

a

 

relative

 

sediment

 

thickness

 

(

 

y

 

s

 

/D

 

)

 

between

 

0.13%

 

and

 

1.11%.

 

Table

 

1

 

outlines

 

the

 

characteristics

 

of

 

the

 

data

 

used

 

for

 

developing

 

the

 

RF

 

algorithm.

 

2

 

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html
background image

C.

 Montes, Z. Kapelan and J. Saldarriaga 

Water

 Research 189 (2021) 116639 

Ta

b

le

 1

 

Dat

use

fo

im

plementing

 dat

mining

 and

 r

e

gr

ession

 models.

 

R

e

fer

e

nce

 

N

o

n-deposition

 

crit

erion

 

No

of

 runs

 

Pipe

 diame

te

or

 

bo

tt

om

 width

 (mm)

 

Flo

w

 Velocity

 

(m/s)

 

Pipe

 slope

 (%)

 

Se

diment

 

Concentr

a

tion

 (ppm)

 

Se

diment

 thic

kness

 

bed

 (mm)

 

Ma

y

e

rl

(1988)

 

cir

cular

 c

h

annel

 

Without

 deposit

e

bed

 

106

 

152

 

0.37

 -

 1.10

 

0.13

 -

 0.56

 

20.0

 -

 1275.0

 

Ma

y

e

rl

(1988)

 

re

ct

angular

 c

h

annel

 

Without

 deposit

e

bed

 

105

 

311.5

 and

 462.3

 

0.41

 -’

 1.04

 

0.09

 –0

.6

14.0

 –

1568.0

 

Ab

 Ghani

 (1993)

 

Without

 deposit

e

bed

 

221

 

154,

 305

 and

 405

 

0.24

 -

 1.22

 

0.04

 -

 2.56

 

0.8

 -

 1450.0

 

Ot

(1999)

 

Without

 deposit

e

bed

 

36

 

305

 

0.39

 -

 0.74

 

0.2

 

4.2

 -

 59.4

 

Vongvisessomjai

 et

 al.

 (2010)

 

Without

 deposit

e

bed

 

45

 

100

 and

 150

 

0.24

 -

 0.63

 

0.20

 -

 0.60

 

4.0

 -

 90.0

 

Mont

e

et

 al.

 (2020a)

 

Without

 deposit

e

bed

 

44

 

242

 

0.24

 -

 1.05

 

0.20

 -

 0.80

 

0.3

 -

 875.7

 

Mont

e

et

 al.

 (2020b)

 

Without

 deposit

e

bed

 

107

 

595

 

0.41

 -

 1.41

 

0.04

 -

 3.43

 

1.3

 -

 19,957.0

 

El-Zaeme

(1991)

 

With

 deposit

e

bed

 

290

 

305

 

0.39

 -

 0.96

 

0.05

 -

 0.44

 

7.0

 -

 917.0

 

47.0

 –

120.0

 

Pe

rr

u

sq

u

ía

 (1991)

 

With

 deposit

e

bed

 

38

 

225

 

0.29

 -

 0.67

 

0.20

 -

 0.60

 

18.7

 -

 408.0

 

45.0

 –9

0

.0

 

Ab

 Ghani

 (1993)

 

With

 deposit

e

bed

 

26

 

450

 

0.49

 -

 1.33

 

0.07

 -

 0.47

 

21.0

 -

 1259.0

 

52.0

 –1

0

8

.0

 

Ma

(1993)

 

With

 deposit

e

bed

 

46

 

450

 

0.39

 -

 1.14

 

0.07

 -

 0.97

 

3.5

 -

 823.0

 

57.6

 –

129.6

 

Mont

e

et

 al.

 (2020b)

 

With

 deposit

e

bed

 

54

 

595

 

0.73

 -

 1.53

 

0.46

 -

 5.42

 

389.0

 -

 10,275.0

 

0.8

 –6

.6

 

As

 

shown

 

in

 

Table

 

1

 

,

 

a

 

total

 

of

 

664

 

and

 

454

 

data

 

are

 

available

 

for

 

the

 

development

 

of

 

models

 

without

 

deposited

 

bed

 

and

 

with

 

deposited

 

bed,

 

respectively.

 

3.

 

Mehodology

 

3.1.

 

Random

 

forest

 

model

 

Random

 

Forest

 

model

 

developed

 

here

 

predicts

 

the

 

par-

 

ticle

 

Froude

 

number

 

(

 

F

 

r

 

)

 

as

 

a

 

function

 

of

 

several

 

well-

 

known

 

dimensionless

 

explanatory

 

factors

 

(

 

Kargar

 

et

 

al.,

 

2019

 

;

 

Vongvisessomjai

 

et

 

al.,

 

2010

 

):

 

F

 

r

 

=

 

V

 

l

 

 

gd

 

(

 

S

 

s

 

− 1

 

)

 

=

 

f

 

C

 

v

 

,

 

D

 

gr

 

,

 

d

 

R

 

,

 

λ

,

 

y

 

s

 

D

 

(2)

 

Random

 

forest

 

(RF)

 

is

 

a

 

bagging

 

algorithm

 

for

 

regression

 

and

 

classification

 

problems

 

proposed

 

by

 

Breiman

 

(2001)

 

.

 

This

 

is

 

a

 

low-

 

variance

 

method,

 

which

 

randomly

 

split

 

the

 

training

 

data

 

and

 

the

 

input

 

variables

 

predictors

 

to

 

build

 

a

 

set

 

of

 

b

 

decision

 

trees

 

(

 

B

 

t

 

).

 

The

 

results

 

of

 

all

 

decision

 

trees

 

generated

 

from

 

bootstrapped

 

train-

 

ing

 

samples

 

(

 

T

 

b

 

(

 

x

 

;

b

 

)

 

)

 

are

 

then

 

averaged,

 

i.e.

 

the

 

final

 

result

 

(

 

ˆ

 

y

 

(

x

 

)

 

)

 

is

 

the

 

average

 

of

 

the

 

output

 

of

 

all

 

decision

 

trees

 

(as

 

shown

 

in

 

Eq.

 

(3)

 

).

 

This

 

procedure

 

ensures

 

the

 

reduction

 

of

 

the

 

model

 

vari-

 

ance

 

and

 

consequently

 

the

 

reduction

 

of

 

the

 

risk

 

of

 

overfitting.

 

A

 

simplified

 

conceptual

 

diagram

 

of

 

the

 

RF

 

method

 

is

 

shown

 

in

 

Fig.

 

1

 

.

 

ˆ

 

y

 

(

 

x

 

)

 

=

 

1

 

B

 

t

 

B

 

t

 

 

b

=1

 

T

 

(

 

x

 

;

b

 

)

 

(3)

 

In

 

this

 

paper,

 

the

 

R

 

package

 

‘RandomForest’

 

(

 

Liaw

 

and

 

Wiener,

 

2002

 

)

 

was

 

used

 

for

 

constructing

 

both

 

non-deposition,

 

without

 

deposited

 

bed

 

and

 

deposited

 

bed,

 

self-cleansing

 

models.

 

The

 

number

 

of

 

predictors

 

considered

 

at

 

each

 

split

 

(

 

mtry

 

)

 

and

 

the

 

number

 

of

 

trees

 

in

 

the

 

forest

 

(

 

B

 

t

 

)

 

are

 

the

 

parameters

 

that

 

define

 

the

 

structure

 

of

 

the

 

RF

 

regression

 

model.

 

The

 

mtry

 

parameter

 

is

 

estimated

 

by

 

using

 

the

 

rfcv()

 

function,

 

which

 

shows

 

the

 

cross-

 

validation

 

performance

 

for

 

each

 

number

 

of

 

predictors.

 

In

 

addition,

 

the

 

optimal

 

number

 

of

 

trees

 

is

 

defined

 

as

 

the

 

value

 

that

 

minimises

 

the

 

Mean

 

Square

 

Error

 

(MSE)

 

value

 

of

 

the

 

training

 

data.

 

These

 

pa-

 

rameters

 

are

 

estimated

 

and

 

the

 

results

 

are

 

shown

 

in

 

Fig.

 

2

 

.

 

Accord-

 

ing

 

to

 

this

 

figure,

 

the

 

optimal

 

number

 

of

 

features

 

(i.e.

 

the

 

random

 

predictors

 

used

 

in

 

each

 

tree)

 

are

 

three

 

and

 

four

 

non-dimensional

 

parameters

 

for

 

the

 

cases

 

of

 

without

 

deposited

 

bed

 

and

 

with

 

de-

 

posited

 

bed,

 

respectively.

 

Similarly,

 

the

 

optimal

 

number

 

of

 

trees

 

is

 

471

 

for

 

without

 

deposited

 

bed

 

and

 

229

 

for

 

with

 

deposited

 

bed.

 

Cross-validation

 

is

 

carried

 

out

 

during

 

the

 

training

 

stage

 

using

 

out-of-bag

 

(OOB)

 

samples.

 

As

 

mentioned

 

above,

 

the

 

method

 

ran-

 

domly

 

bootstraps

 

the

 

training

 

sample,

 

that

 

is,

 

some

 

of

 

the

 

train-

 

ing

 

data

 

are

 

left

 

out

 

to

 

build

 

each

 

decision

 

tree.

 

Only

 

two

 

out

 

of

 

three

 

parts

 

of

 

the

 

total

 

training

 

data

 

are

 

used

 

to

 

build

 

the

 

tree

 

(

 

Breiman,

 

2001

 

).

 

Based

 

on

 

this,

 

data

 

not

 

included

 

in

 

the

 

boot-

 

strapped

 

sample

 

(OOB

 

data)

 

are

 

predicted,

 

and

 

the

 

prediction

 

error

 

is

 

averaged

 

over

 

the

 

trees

 

that

 

do

 

not

 

include

 

these

 

data

 

(OOB

 

Er-

 

ror).

 

3.1.1.

 

Splitting

 

of

 

training

 

and

 

testing

 

data

 

The

 

whole

 

benchmarking

 

data

 

collected

 

from

 

the

 

literature

 

are

 

used

 

for

 

both

 

training

 

and

 

testing

 

stages

 

of

 

the

 

RF

 

model.

 

Usually,

 

75%

 

of

 

the

 

data

 

is

 

used

 

during

 

the

 

training

 

stage

 

of

 

the

 

model

 

and

 

the

 

other

 

25%

 

to

 

validate

 

the

 

results.

 

According

 

to

 

Safari

 

(2020)

 

,

 

the

 

range

 

of

 

variation

 

in

 

the

 

training

 

data

 

has

 

direct

 

implications

 

for

 

model

 

performance

 

(i.e.

 

accuracy).

 

As

 

a

 

result,

 

the

 

model

 

can

 

show

 

overfitting

 

issues

 

and

 

poor

 

extrapolation

 

capabilities

 

when

 

narrow

 

datasets

 

are

 

used

 

in

 

the

 

training

 

stage

 

(i.e.

 

data

 

with

 

a

 

low

 

range

 

of

 

variation).

 

3

 

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html
background image

C.

 Montes, Z. Kapelan and J. Saldarriaga 

Water

 Research 189 (2021) 116639 

Fig.

 1. Simplified conceptual diagram of the RF method. 

Fig.

 2. Selection of the optimal Random forest parameters. 

Fig.

 3. Variation of the training and testing error using different combination of percentages between the training and testing dataset. A) Training stage and B) Testing stage. 

Checking

 

the

 

non-overfitting

 

of

 

the

 

RF

 

model

 

is

 

carried

 

out

 

by

 

using

 

several

 

sizes

 

in

 

the

 

training

 

and

 

testing

 

data

 

(i.e.

 

changing

 

the

 

percentage

 

of

 

data

 

used

 

as

 

training

 

and

 

testing)

 

and

 

by

 

ver-

 

ifying

 

the

 

error,

 

defined

 

by

 

the

 

Coefficient

 

of

 

Determination

 

(

 

R

 

2

 

)

 

(as

 

shown

 

in

 

Eq.

 

(14)

 

).

 

For

 

this,

 

ten

 

different

 

combinations

 

of

 

per-

 

centages

 

are

 

defined

 

(i.e.

 

%

 

of

 

the

 

training

 

data

 

:

 

%

 

of

 

the

 

test-

 

ing

 

data

 

=

 

[5:95,

 

15:85,

 

25:75,

 

35:65,

 

45:55,

 

55:45,

 

65:35,

 

75:25,

 

85:15,

 

95:5]),

 

randomly

 

changing

 

the

 

ranges

 

of

 

the

 

training

 

and

 

testing

 

data,

 

and

 

developing

 

100

 

RF

 

models

 

for

 

each

 

combination.

 

As

 

a

 

result,

 

10

 

0

 

0

 

RF

 

models

 

are

 

trained

 

and

 

the

 

error

 

is

 

estimated

 

for

 

both

 

training

 

and

 

testing

 

stage.

 

Using

 

this

 

information,

 

several

 

boxplots

 

are

 

constructed

 

showing

 

the

 

R

 

2

 

variation

 

for

 

each

 

stage.

 

Fig.

 

3

 

shows

 

how

 

the

 

model

 

error

 

decreases

 

as

 

the

 

training

 

sam-

 

ple

 

size

 

increases.

 

For

 

example,

 

when

 

only

 

5%

 

of

 

the

 

whole

 

dataset

 

is

 

used

 

for

 

training

 

the

 

model

 

and

 

the

 

remaining

 

95%

 

for

 

testing

 

it,

 

the

 

error

 

varies

 

between

 

0.84

 

and

 

0.96,

 

for

 

the

 

training

 

stage,

 

and

 

between

 

0.39

 

and

 

0.73

 

for

 

the

 

testing

 

stage.

 

This

 

clearly

 

shows

 

that

 

the

 

model

 

is

 

under-trained;

 

however,

 

when

 

the

 

ratio

 

is

 

greater

 

than

 

50:50

 

the

 

error

 

tends

 

to

 

be

 

constant

 

and

 

slightly

 

variable

 

for

 

both

 

stages.

 

Ratios

 

greater

 

than

 

90:10

 

tend

 

to

 

generate

 

unsatis-

 

factory

 

results

 

for

 

the

 

testing

 

stage,

 

i.e.

 

the

 

model

 

is

 

over-trained

 

and

 

shows

 

high

 

variation

 

in

 

the

 

error,

 

i.e.

 

overfitting,

 

(as

 

shown

 

in

 

Fig.

 

3

 

b).

 

Based

 

on

 

this,

 

a

 

combination

 

of

 

75:25

 

is

 

taken

 

as

 

optimal

 

for

 

implementing

 

the

 

model.

 

The

 

variation

 

of

 

the

 

data

 

used

 

for

 

training

 

and

 

testing

 

dataset

 

is

 

presented

 

in

 

Table

 

2

 

.

 

Using

 

the

 

above

 

considerations,

 

the

 

RF

 

model

 

is

 

implemented

 

with

 

the

 

optimal

 

parameters

 

defined

 

in

 

Fig.

 

2

 

and

 

using

 

the

 

ranges

 

of

 

variation

 

of

 

the

 

training

 

data

 

outlined

 

in

 

Table

 

2

 

.

 

The

 

full

 

data

 

collected

 

from

 

the

 

literature

 

are

 

shown

 

in

 

the

 

Supplementary

 

ma-

 

terial.

 

Table

 

S1

 

and

 

Table

 

S2

 

show

 

the

 

data

 

for

 

non-deposition

 

without

 

and

 

with

 

deposited

 

bed,

 

respectively,

 

and

 

the

 

correspond-

 

4

 

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html
background image

C.

 Montes, Z. Kapelan and J. Saldarriaga 

Water

 Research 189 (2021) 116639 

Table

 2 

Variation

 of the data for training and testing the RF model. 

Non-deposition

 

criterion

 

Stage

 

No.

 of runs 

Channel

 geometry 

(mm)

 

Flow

 Velocity (m/s) 

Pipe

 slope (%) 

Sediment

 

Concentration

 

(ppm)

 

Sediment

 

thickness

 bed 

(mm)

 

Without

 deposited bed 

Training

 

498

 

D

 = 100.0 – 595.0 

W

 = 311.5 – 462.3 

0.237

 - 1.41 

0.04

 – 3.43 

0.53

 – 19,957 

Testing

 

166

 

D

 = 100.0 – 595.0 

W

 = 311.5 – 462.3 

0.237

 – 1.24 

0.04

 – 2.74 

1.00

 – 13,840 

With

 deposited bed 

Training

 

340

 

D

 = 225 – 595 

0.294

 – 1.53 

0.05

 – 5.42 

3.50

 - 10,274 

0.78

 – 129.6 

Testing

 

114

 

D

 = 225 – 595 

0.319

 – 1.28 

0.05

 – 2.58 

17.00

 - 9101 

1.78

 – 120.0 

Fig.

 4. Random Forest code to calculate the particle Froude number in sewer pipes. 

.

 

ing

 

RF

 

particle

 

Froude

 

number

 

predictions.

 

The

 

implemented

 

code

 

for

 

the

 

RF

 

method

 

is

 

shown

 

in

 

Fig.

 

4

 

.

 

An

 

example

 

of

 

one

 

of

 

the

 

471

 

decision

 

trees

 

generated

 

by

 

the

 

RF

 

model,

 

for

 

the

 

non-deposition

 

without

 

deposited

 

bed,

 

is

 

shown

 

in

 

Figure

 

S1,

 

in

 

the

 

Supplemen-

 

tary

 

material.

 

3.1.2.

 

Measure

 

of

 

feature

 

importance

 

Note

 

that

 

in

 

this

 

paper,

 

a

 

decrease

 

in

 

model

 

accuracy

 

when

 

the

 

j

 

th

 

variable

 

is

 

permuted

 

(i.e.

 

the

 

percentage

 

of

 

the

 

increase

 

in

 

the

 

MSE,

 

%

 

IncMSE)

 

is

 

considered

 

as

 

a

 

measure

 

of

 

the

 

importance

 

of

 

a

 

model

 

input

 

variable.

 

This

 

index

 

shows

 

the

 

strength

 

of

 

each

 

ex-

 

planatory

 

variable

 

based

 

on

 

the

 

reduction

 

of

 

the

 

MSE.

 

The

 

step-

 

by-step

 

to

 

calculate

 

the

 

%

 

IncMSE is

 

shown

 

as

 

follows

 

(

 

Hastie

 

et

 

al.,

 

2009

 

):

 

(1)

 

Calculate

 

the

 

MSE

 

of

 

the

 

OOB-sample

 

data

 

in

 

each

 

tree

 

of

 

the

 

forest

 

(

 

MS

 

E

 

b

 

).

 

(2)

 

Randomly

 

permute

 

the

 

value

 

of

 

the

 

j

 

th

 

explanatory

 

variable

 

and

 

calculate

 

the

 

MSE

 

(

 

MS

 

E

 

j

 

).

 

(3)

 

Finally,

 

calculate

 

%

 

IncMSE for

 

each

 

explanatory

 

variable

 

as:

 

%

 

IncM

 

SE

 

=

 

100

 

·

M

 

S

 

E

 

j

 

− M

 

S

 

E

 

b

 

M

 

S

 

E

 

b

 

(4)

 

As

 

a

 

result,

 

the

 

more

 

the

 

%

 

IncMSE increases

 

for

 

a

 

variable,

 

the

 

more

 

important

 

it

 

is.

 

3.2.

 

Performance

 

assessment

 

3.2.1.

 

Models

 

used

 

for

 

comparing

 

the

 

RF

 

results

 

In

 

order

 

to

 

evaluate

 

the

 

RF

 

model

 

performance,

 

it

 

is

 

com-

 

pared

 

to

 

several

 

literature

 

models.

 

The

 

models

 

selected

 

for

 

com-

 

parison

 

are

 

the

 

replicable

 

white-box

 

models

 

with

 

high

 

predic-

 

tion

 

accuracy

 

reported

 

in

 

the

 

literature

 

and

 

two

 

black-box

 

mod-

 

els

 

where

 

the

 

implementing

 

code

 

is

 

provided

 

in

 

the

 

original

 

pa-

 

pers.

 

Other

 

black-box

 

models

 

cannot

 

be

 

evaluated

 

due

 

to

 

the

 

lim-

 

ited

 

replicability

 

shown

 

by

 

these

 

models

 

(e.g.

 

ANN).

 

Based

 

on

 

this,

 

in

 

the

 

case

 

of

 

non-deposition

 

without

 

deposited

 

bed,

 

seven

 

mod-

 

els

 

selected

 

are

 

the

 

EPR-MOGA

 

model

 

(

 

Montes

 

et

 

al.,

 

2020a

 

),

 

the

 

GEP

 

model

 

(

 

Kargar

 

et

 

al.,

 

2019

 

),

 

the

 

MARS

 

model

 

(

 

Safari,

 

2019

 

),

 

the

 

May

 

et

 

al.

 

(1996)

 

model,

 

the

 

Safari

 

and

 

Aksoy

 

(2020)

 

model,

 

the

 

ANFIS-PSO

 

model

 

(

 

Ebtehaj

 

et

 

al.,

 

2019

 

)

 

and

 

the

 

ELM

 

model

 

(

 

Ebtehaj

 

et

 

al.,

 

2020

 

).

 

In

 

the

 

case

 

of

 

non-deposition

 

with

 

de-

 

posited

 

bed,

 

three

 

models

 

used

 

for

 

comparison

 

are

 

the

 

PSO

 

model

 

(

 

Safari

 

and

 

Shirzad,

 

2019

 

),

 

the

 

LASSO

 

model

 

(

 

Montes

 

et

 

al.,

 

2020b

 

)

 

and

 

the

 

MGP

 

model

 

(

 

Safari

 

and

 

Danandeh

 

Mehr,

 

2018

 

).

 

The

 

EPR-

 

MOGA,

 

LASSO,

 

May

 

et

 

al.

 

(1996)

 

and

 

Safari

 

and

 

Aksoy

 

(2020)

 

are

 

the

 

regression

 

type

 

models

 

whilst

 

GEP,

 

MARS,

 

ANFIS-PSO,

 

ELM,

 

PSO

 

and

 

MGP

 

models

 

make

 

use

 

of

 

ML/AI

 

techniques.

 

The

 

equations

 

used

 

by

 

above

 

ten

 

models

 

are

 

as

 

follows:

 

EPR-MOGA:

 

V

 

l

 

 

gd

 

(

 

S

 

s

 

− 1

 

)

 

=

 

5

 

.

 

6

 

C

 

0

 

.

 

16

 

v

 

d

 

R

 

−0

 

.

 

58

 

S

 

0

 

.

 

14

 

o

 

D

 

0

 

.

 

02

 

gr

 

(5)

 

GEP:

 

V

 

l

 

 

gd

 

(

 

S

 

s

 

− 1

 

)

 

=

 

3

 

.

 

05

 

C

 

0

 

.

 

16

 

v

 

atan

 

atan

 

 

d

 

R

 

+

 

atan

 

(

 

3

 

.

 

41

 

− ln

 

(

 

D

 

gr

 

)

 

)

 

+

 

atan

 

 

 

tan

 

 

8

 

.

 

37

 

− 7

 

.

 

99

 

λ

+

 

d

 

R

 

λ

2

 

 

2

 

 

 

+

 

ln

 

 

 

 

d

 

R

 

3

 

 

2

 

λ

 

 

(6)

 

MARS:

 

V

 

l

 

 

gd

 

(

 

S

 

s

 

− 1

 

)

 

=

 

7

 

.

 

26

 

− 1

 

.

 

75

 

· max

 

(

 

0

 

,

 

d

/R

 

− 0

 

.

 

12

 

)

 

+

 

2

 

· max

 

(

 

0

 

,

 

0

 

.

 

12

 

− d/R

 

)

 

+15

 

.

 

89

 

· max

 

(

 

0

 

,

 

C

 

v

 

− 0

 

.

 

44

 

)

 

− 16

 

.

 

42

 

· max

 

(

 

0

 

,

 

0

 

.

 

44

 

− C

 

v

 

)

 

+0

 

.

 

47

 

· max

 

(

 

0

 

,

 

D

 

gr

 

− 0

 

.

 

29

 

)

 

− 7

 

.

 

25

 

· max

 

(

 

0

 

,

 

λ

− 0

 

.

 

3

 

)

 

−16

 

.

 

03

 

· max

 

(

 

0

 

,

 

C

 

v

 

− 0

 

.

 

01

 

)

 

+

 

3

 

.

 

7

 

· max

 

(

 

0

 

,

 

D

 

gr

 

− 0

 

.

 

12

 

)

 

−4

 

.

 

33

 

· max

 

(

 

0

 

,

 

D

 

gr

 

− 0

 

.

 

08

 

)

 

+

 

0

 

.

 

43

 

· max

 

(

 

0

 

,

 

λ

− 0

 

.

 

59

 

)

 

+6

 

.

 

75

 

· max

 

(

 

0

 

,

 

λ

− 0

 

.

 

28

 

)

 

+

 

1

 

.

 

67

 

· max

 

(

 

0

 

,

 

d

/R

 

− 0

 

.

 

07

 

)

 

(7)

 

5

 

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html
background image

C.

 Montes, Z. Kapelan and J. Saldarriaga 

Water

 Research 189 (2021) 116639 

May

 

et

 

al.

 

(1996)

 

:

 

C

 

v

 

=

 

0

 

.

 

0303

 

D

 

2

 

A

 

d

 

D

 

0

 

.

 

6

 

1

 

V

 

t

 

V

 

l

 

4

 

V

 

l

 

2

 

gD

 

(

 

S

 

s

 

− 1

 

)

 

1

 

.

 

5

 

(8)

 

Safari

 

and

 

Aksoy

 

(2020)

 

:

 

V

 

l

 

 

gd

 

(

 

S

 

s

 

− 1

 

)

 

=

 

4

 

.

 

83

 

C

 

0

 

.

 

09

 

v

 

d

 

R

 

−0

 

.

 

32

 

D

 

−0

 

.

 

14

 

gr

 

P

 

B

 

0

 

.

 

20

 

(9)

 

ANFIS-PSO:

 

No

 

equation.

 

The

 

Matlab

 

code

 

can

 

be

 

found

 

in

 

Ebtehaj

 

et

 

al.

 

(2019)

 

.

 

ELM:

 

V

 

l

 

 

gd

 

(

 

S

 

s

 

− 1

 

)

 

=

 

1

 

(

 

1

 

+

 

exp

 

(

 

InW

 

· InV

 

+

 

BHI

 

)

 

)

 

T

 

· OutW

 

(10)

 

where

 

InW

 

and

 

OutW

 

are

 

the

 

input

 

and

 

output

 

weights,

 

BHI the

 

bias

 

of

 

the

 

hidden

 

neurons

 

and

 

InV

 

the

 

input

 

variables

 

(i.e.

 

C

 

v

 

,

 

d

/R

 

,

 

D

 

2

 

/A

 

,

 

R

/D

 

,

 

D

 

gr

 

,

 

d

/D

 

and

 

λ

).

 

Full

 

details

 

of

 

the

 

values

 

chosen

 

for

 

each

 

parameter

 

are

 

shown

 

in

 

Ebtehaj

 

et

 

al.

 

(2020)

 

.

 

PSO:

 

V

 

l

 

 

gd

 

(

 

S

 

s

 

− 1

 

)

 

=

 

3

 

.

 

66

 

C

 

0

 

.

 

16

 

v

 

d

 

R

 

−0

 

.

 

40

 

y

 

s

 

Y

 

−0

 

.

 

10

 

(11)

 

LASSO:

 

V

 

l

 

 

gd

 

(

 

S

 

s

 

− 1

 

)

 

=

 

5

 

.

 

83

 

C

 

0

 

.

 

144

 

v

 

d

 

R

 

−0

 

.

 

305

 

λ

−0

 

.

 

059

 

D

 

−0

 

.

 

169

 

gr

 

y

 

s

 

D

 

−0

 

.

 

104

 

(12)

 

MGP:

 

V

 

l

 

 

gd

 

(

 

S

 

s

 

− 1

 

)

 

=

 

1

 

.

 

96

 

− 0

 

.

 

61

 

λ

− 0

 

.

 

51

 

C

 

v

 

+

 

1

 

.

 

18

 

D

 

0

 

.

 

50

 

gr

 

λ

1

 

.

 

50

 

+

 

0

 

.

 

61

 

2

 

C

 

v

 

+

 

d

 

R

 

0

 

.

 

50

 

− 2

 

.

 

45

 

d

 

R

 

1

 

/

 

8

 

(13)

 

3.2.2.

 

Performance

 

indices

 

The

 

RF

 

model

 

performance

 

is

 

evaluated

 

and

 

compared

 

to

 

above

 

ten

 

models

 

using

 

three

 

performance

 

indicators.

 

These

 

are

 

the

 

Co-

 

efficient

 

of

 

Determination

 

(

 

R

 

2

 

),

 

the

 

Root

 

Mean

 

Square

 

Error

 

(

 

RMSE

 

)

 

and

 

the

 

Mean

 

Absolute

 

Percentage

 

Error

 

(

 

MAPE

 

),

 

defined

 

as

 

fol-

 

lows:

 

R

 

2

 

=

 

1

 

 

n

 

i

 

=1

 

F

 

r

 

OBS

 

− F

 

r

 

MOD

 

2

 

 

n

 

i

 

=1

 

F

 

r

 

OBS

 

− F

 

r

 

OBS

 

2

 

(14)

 

RMSE

 

=

 

 

1

 

n

 

n

 

 

i

 

=1

 

F

 

r

 

OBS

 

− F

 

r

 

MOD

 

2

 

(15)

 

MAP

 

E

 

=

 

100

 

n

 

n

 

 

i

 

=1

 

F

 

r

 

OBS

 

− F

 

r

 

MOD

 

F

 

r

 

OBS

 

(16)

 

where

 

F

 

r

 

OBS

 

is

 

the

 

particle

 

Froude

 

number

 

observed

 

data,

 

F

 

r

 

MOD

 

the

 

particle

 

Froude

 

number

 

estimated

 

by

 

RF

 

algorithm

 

(or

 

other

 

pre-

 

dictive

 

model),

 

n

 

the

 

number

 

of

 

data

 

and

 

F

 

r

 

OBS

 

the

 

mean

 

of

 

ob-

 

served

 

particle

 

Froude

 

number

 

data.

 

The

 

Coefficient

 

of

 

Determination

 

measures

 

the

 

percentage

 

of

 

the

 

model

 

variance

 

that

 

can

 

be

 

explained.

 

This

 

coefficient

 

varies

 

between

 

0

 

and

 

1,

 

with

 

a

 

value

 

of

 

1

 

denoting

 

a

 

perfect

 

match

 

be-

 

tween

 

observed

 

and

 

modelled

 

data.

 

The

 

Root

 

Mean

 

Square

 

Error

 

measures

 

the

 

standard

 

deviation

 

of

 

the

 

residuals.

 

Note

 

that

 

a

 

value

 

close

 

to

 

0

 

indicates

 

high

 

model

 

prediction

 

accuracy.

 

Finally,

 

the

 

Mean

 

Absolute

 

Percentage

 

Error

 

assesses

 

the

 

model

 

prediction

 

ac-

 

curacy

 

(i.e.

 

bias)

 

as

 

a

 

percentage

 

of

 

the

 

observed

 

value.

 

Value

 

of

 

0

 

indicates

 

the

 

perfect

 

model

 

where

 

there

 

are

 

no

 

differences

 

be-

 

tween

 

predictions

 

and

 

observations.

 

4.

 

Results

 

The

 

results

 

obtained

 

by

 

using

 

the

 

methodology

 

shown

 

in

 

the

 

previous

 

section

 

are

 

presented

 

in

 

Tables

 

3

 

and

 

4

 

,

 

for

 

without

 

de-

 

posited

 

bed

 

and

 

deposited

 

bed

 

criteria,

 

respectively.

 

Graphically,

 

these

 

results

 

are

 

shown

 

in

 

Figs.

 

5

 

and

 

6

 

.

 

As

 

shown

 

in

 

these

 

tables,

 

for

 

the

 

MARS,

 

ANFIS-PSO,

 

ELM

 

and

 

MGP

 

models,

 

the

 

outliers

 

of

 

the

 

particle

 

Froude

 

number

 

(i.e.

 

F

 

r

 

<

 

0.00

 

and

 

F

 

r

 

>

 

20.00)

 

were

 

re-

 

moved.

 

This

 

is

 

because

 

these

 

models

 

can

 

produce

 

extreme

 

values

 

(e.g.

 

F

 

r

 

=

 

−58.67

 

or

 

F

 

r

 

=

 

163.59,

 

among

 

others)

 

that

 

misrepresent

 

the

 

model

 

comparison

 

when

 

evaluating

 

the

 

performance

 

indices.

 

As

 

it

 

can

 

be

 

seen

 

from

 

Table

 

3

 

,

 

Random

 

Forest

 

model

 

shows

 

a

 

better

 

generalisation

 

capacity

 

than

 

other

 

models

 

shown,

 

as

 

demonstrated

 

in

 

high

 

prediction

 

accuracy

 

observed

 

for

 

all

 

avail-

 

able

 

datasets

 

(0.88

 

>

 

R

 

2

 

>

 

0.98,

 

0.24

 

>

 

RMSE

>

 

0.73

 

and

 

4.36%

 

>

 

MAP

 

E

>

 

11.09%).

 

The

 

following

 

observations

 

can

 

be

 

made

 

from

 

the

 

performance

 

of

 

the

 

other

 

models

 

evaluated:

 

EPR-MOGA,

 

similarly

 

to

 

RF,

 

shows

 

good

 

results

 

but

 

has

 

infe-

 

rior

 

accuracy

 

in

 

large

 

sewer

 

pipes

 

(

 

R

 

2

 

=

 

0.86,

 

RMSE

=

 

1.03

 

and

 

MAP

 

E

=

 

11.31%).

 

In

 

addition,

 

EPR-MOGA

 

model

 

shows

 

limita-

 

tions

 

for

 

predicting

 

the

 

particle

 

Froude

 

number

 

in

 

non-circular

 

sections

 

(as

 

shown

 

in

 

the

 

Mayerle

 

(1988)

 

rectangular

 

data).

 

This

 

equation

 

shows

 

good

 

extrapolation

 

capabilities

 

because

 

of

 

the

 

inclusion

 

of

 

the

 

pipe

 

slope

 

as

 

input

 

feature

 

for

 

the

 

self-

 

cleansing

 

prediction.

 

GEP

 

shows

 

acceptable

 

results

 

(0.79

 

>

 

R

 

2

 

>

 

0.87,

 

0.66

 

>

 

RMSE

>

 

0.89

 

and

 

11.45%

 

>

 

MAP

 

E

>

 

22.33%)

 

for

 

the

 

datasets

 

used

 

for

 

its

 

development

 

in

 

circular

 

channels

 

(

 

Ab

 

Ghani,

 

1993

 

;

 

Mayerle,

 

1988

 

;

 

Vongvisessomjai

 

et

 

al.,

 

2010

 

)

 

and

 

poor

 

perfor-

 

mance

 

for

 

other

 

datasets

 

(0.00

 

>

 

R

 

2

 

>

 

0.76,

 

1.00

 

>

 

RMSE

>

 

1.95

 

and

 

14.35%

 

>

 

MAP

 

E

>

 

37.92%).

 

This

 

model

 

presents

 

good

 

performance

 

for

 

large

 

sewer

 

pipes.

 

In

 

contrast,

 

for

 

non-circular

 

channels

 

the

 

model

 

quickly

 

loss

 

accuracy.

 

According

 

to

 

Safari

 

(2019)

 

,

 

MARS

 

model

 

was

 

developed

 

by

 

us-

 

ing

 

the

 

experimental

 

data

 

collected

 

by

 

Mayerle

 

(1988)

 

(in

 

both

 

circular

 

and

 

rectangular

 

channels),

 

May

 

(1993)

 

,

 

Ab

 

Ghani

 

(1993)

 

and

 

Vongvisessomjai

 

et

 

al.

 

(2010)

 

.

 

As

 

a

 

result,

 

this

 

model

 

shows

 

acceptable

 

performance

 

for

 

these

 

datasets

 

(0.49

 

>

 

R

 

2

 

>

 

0.87,

 

0.81

 

>

 

RMSE

>

 

1.15

 

and

 

13.63%

 

>

 

MAP

 

E

>

 

28.08%)

 

but

 

poor

 

performance

 

for

 

the

 

remaining

 

datasets

 

(

 

R

 

2

 

=

 

0.00,

 

1.48

 

>

 

RMSE

>

 

2.88

 

and

 

29.14%

 

>

 

MAP

 

E

>

 

51.28%).

 

Based

 

on

 

the

 

above,

 

and

 

compared

 

to

 

the

 

RF

 

model,

 

limited

 

extrapolation

 

capabilities

 

are

 

identified

 

for

 

the

 

MARS

 

model.

 

May

 

et

 

al.

 

(1996)

 

is

 

the

 

best

 

regression-based

 

equation

 

re-

 

ported

 

in

 

the

 

literature

 

(

 

Ackers

 

et

 

al.,

 

2001

 

;

 

Ebtehaj

 

et

 

al.,

 

2014

 

),

 

as

 

it

 

was

 

developed

 

using

 

several

 

experimental

 

datasets.

 

This

 

is

 

the

 

equation

 

proposed

 

by

 

the

 

Construction

 

Industry

 

Re-

 

search

 

and

 

Information

 

Association

 

(CIRIA)

 

for

 

designing

 

self-

 

cleansing

 

sewer

 

pipes

 

transporting

 

coarser

 

granular

 

material

 

as

 

bedload

 

(

 

Ackers

 

et

 

al.,

 

2001

 

).

 

This

 

model

 

shows

 

good

 

perfor-

 

mance

 

for

 

pipe

 

diameters

 

less

 

than

 

500

 

mm

 

(0.83

 

>

 

R

 

2

 

>

 

0.99,

 

0.13

 

>

 

RMSE

>

 

0.82

 

and

 

2.38%

 

>

 

MAP

 

E

>

 

11.61%).

 

In

 

con-

 

trast,

 

limited

 

extrapolation

 

for

 

large

 

sewer

 

pipes

 

is

 

identified

 

as

 

the

 

low

 

performance

 

indices

 

values

 

obtained

 

(

 

R

 

2

 

=

 

0.00,

 

RMSE

=

 

4.88

 

and

 

MAP

 

E

=

 

48.97%).

 

This

 

equation

 

shows

 

better

 

performance

 

than

 

the

 

RF

 

model

 

when

 

compared

 

to

 

data

 

from

 

Vongvisessomjai

 

et

 

al.

 

(2010)

 

,

 

but

 

lower

 

accuracy

 

when

 

applied

 

to

 

the

 

rest

 

of

 

the

 

datasets.

 

6

 

/var/www/pavco.com.co/public/site/pdftohtml/0a11d602f962b8945d803abdfbf03967/index-html.html
background image

C.

 Montes, Z. Kapelan and J. Saldarriaga 

Water

 Research 189 (2021) 116639 

Table

 3 

Accuracy

 of self-cleansing models for without deposited bed criterion using performance indices for training and testing dataset. Bolded values show best performance 

model.

 

Dataset

 

Performance

 Index 

Model

 

RF

 

EPR-MOGA

 

GEP

 

MARS

 

May

 et al. (1996) a  Safari and Aksoy (2020) 

ANFIS-PSO

 

ELM

 

Training

 

R

 

2

 

0.98

 

0.90

 

0.75

 

0.00

 

0.27

 

0.74

 

0.51

 

0.30

 

RMSE

 

0.33

 

0.76

 

1.22

 

2.55

 

2.17

 

1.25

 

1.69

 

1.95

 

MAPE

 (%) 

4.88

 

11.54

 

23.52

  34.16 

17.49

 

17.21

 

19.32

 

29.76

 

Testing

 

R

 

2

 

0.91

 

0.86

 

0.69

 

0.00

 

0.09

 

0.74

 

0.40

 

0.32

 

RMSE

 

0.73

 

0.88

 

1.33

 

2.55

 

2.27

 

1.21

 

1.84

 

1.92

 

MAPE

 (%) 

11.09

 

12.35

 

26.43

  36.57 

19.15

 

17.24

 

20.95

 

29.82

 

Mayerle

 (1988) 

circular

 

R

 

2

 

0.96

 

0.89

 

0.87

 

0.87

 

0.87

 

0.75

 

0.80

 

0.42

 

RMSE

 

0.45

 

0.75

 

0.81

 

0.81

 

0.82

 

1.12

 

1.00

 

1.71

 

MAPE

 (%) 

5.62

 

8.90

 

14.77

  14.03 

11.49

 

14.91

 

17.92

 

26.75

 

Mayerle

 (1988) 

rectangular

 

R

 

2

 

0.93

 

0.38

 

0.30

 

0.81

 

0.87

 

0.00

 

0.47

 

RMSE

 

0.49

 

1.44

 

1.54

 

0.81

 

0.66

 

2.74

 

1.33

 

MAPE

 (%) 

8.49

 

28.97

 

33.00

  15.51 

13.14

 

45.28

 

20.75

 

Ab

 Ghani (1993) 

R

 

2

 

0.97

 

0.96

 

0.83

 

0.72

 

0.90

 

0.81

 

0.88

 

0.38

 

RMSE

 

0.36

 

0.43

 

0.89

 

1.15

 

0.67

 

0.94

 

0.74

 

1.69

 

MAPE

 (%) 

5.94

 

9.35

 

22.33

  28.08 

10.32

 

15.60

 

10.34

 

23.96

 

Ota

 (1999) 

R

 

2

 

0.97

 

0.98

 

0.44

 

0.00

 

0.96

 

0.97

 

0.97

 

0.55

 

RMSE

 

0.24

 

0.20

 

1.00

 

1.48

 

0.27

 

0.25

 

0.22

 

0.90

 

MAPE

 (%) 

5.55

 

6.90

 

37.92

  51.28 

7.78

 

7.90

 

6.46

 

19.54

 

Vongvisessomjai

 et al. (2010) 

R

 

2

 

0.88

 

0.95

 

0.79

 

0.49

 

0.99

 

0.71

 

0.97

 

0.00

 

RMSE

 

0.49

 

0.33

 

0.66

 

1.03

 

0.13

 

0.78

 

0.24

 

1.59

 

MAPE

 (%) 

6.56

 

5.78

 

11.45

  13.63 

2.38

 

13.34

 

3.62

 

28.50

 

Montes

 et al. (2020a) 

R

 

2

 

0.96

 

0.98

 

0.00

 

0.00

 

0.83

 

0.67

 

0.77

 

0.00

 

RMSE

 

0.31

 

0.25

 

1.64

 

2.37

 

0.67

 

0.94

 

0.75

 

1.85

 

MAPE

 (%) 

4.36

 

4.94

 

28.15

  49.73 

11.61

 

15.39

 

12.39

 

33.96

 

Montes

 et al. (2020b) 

R

 

2

 

0.94

 

0.86

 

0.76

 

0.00

 

0.00

 

0.34

 

0.00

 

0.00

 

RMSE

 

0.70

 

1.03

 

1.37

 

2.88

 

4.88

 

2.26

 

3.01

 

3.10

 

MAPE

 (%) 

7.33

 

11.31

 

14.35

  29.14 

48.97

 

23.44

 

30.56

 

39.30

 

a

 Model not valid for non-circular channels. 

Outliers

 removed. 

Table

 4 

Accuracy

 of self-cleansing models for deposited bed criterion using performance in- 

dices

 for training and testing dataset. Bolded values show best performance model. 

Dataset

 

Performance

 

Index

 

Model

 

RF

 

PSO

 

LASSO

 

MGP

 

Training

 

R

 

2

 

0.98

 

0.75

 

0.82

 

0.51

 

RMSE

 

0.32

 

1.30

 

1.13

 

1.69

 

MAPE

 (%) 

4.70

 

14.36

 

13.07

 

28.78

 

Testing

 

R

 

2

 

0.91

 

0.70

 

0.83

 

0.29