Technical comparisons of simulation-based productivity prediction methodologies by means of estimation tools focusing on conventional earthmovings/Imitaciniu produktyvumo prognozavimo metodiku techninis palyginimas, pasitelkus vertinimo priemones, akcentuojant iprastus zemes darbus.
Han, Seungwoo ; Hong, TaeHoon ; Kim, Gwangho 等
1. Introduction
Productivity in construction is considered an important criterion
to evaluate operational performance by specific construction activities.
Productivity prediction prior to actual commencement of operations is an
important task that planners or managers in construction have made a top
priority from the viewpoint of management (Capachi 1987; Schaufelberger
1998; Kandil and El-Rayes 2005).
When basic planning is conducted, planners refer to their own
experiences or historical data in order to predict productivity as
accurately as possible prior to commencement of site work. Reference
manuals representing historical data of cost and productivity provide
basic information that allows planners to predict the productivity.
However, the information, which is comprised of average values, provided
by the reference manuals is not easily applied to various site
conditions where numerous unexpected factors are at play (Schaufelberger
1998).
The need for reliable prediction of construction productivity has
long motivated researchers to investigate appropriate methods. However,
many methods created thus far have limitations such as unreliable
prediction and, difficult implementation (Han 2005; Han and Halpin 2005;
Han et al. 2006). This study is conducted on the basis of the previous
researches by Han and Halpin (2005), Han et al. (2006), Han et al.
(2008) in order to resolve the problems and limitations on the suggested
methodology combining simulation and multiple regression (MR)
techniques.
This study suggests new methods for productivity prediction with
the use of construction simulation as a tool for data generation, and a
MR analysis and an artificial neural network (ANN) analysis as tools for
easy and reliable prediction. This study is also capable of providing
different characteristics and technical performance comparisons by two
different estimation techniques, a MR and an ANN.
[FIGURE 1 OMITTED]
An earthmoving operation was chosen as the construction activity
used for the target operation in this study. The reason for selecting an
earthmoving operation is that it is a fundamental operation of civil and
architectural construction projects. In addition, it is simple and easy
to collect data, since it is composed of relatively fewer different
activities than are other construction operations. Over the past 100
years, earthmoving operations have involved the same basic work
procedures (i.e., surveying, staking, excavating with an excavator or
other equipment, hauling by a hauler, filling, and compacting by a
compactor or other equipment). These work procedures have not changed
over time, although there have been minor updates to specifications of
some equipment. Despite that similar or even identical procedures have
been used for a lengthy period of time, it remains difficult to predict
the productivity of this simple operation (Han and Halpin 2005; Han et
al. 2006; Han et al. 2008).
This study created and developed a new prediction methodology that
combines several tools: construction simulation and either MR analysis
or an ANN analysis. Several steps are carried out: construction data
collection, data generation, and productivity prediction based on
estimation tools. For generation of data that serves as input data for
implementing an estimation tool, a construction simulation was used in
both estimation tools. MR and an ANN were employed as estimation tools
using the generated data. Quantified comparisons of the prediction
accuracy between the MR and the ANN techniques were also presented. A
diagram illustrating the research method employed in this study is
presented in Fig. 1.
2. Method for productivity prediction
Planners have relied upon three methods to predict productivity
based on: 1) historical data; 2) references, such as RS Means cost data
by Reed Construction Data, Inc. and equipment performance handbooks; 3)
methods such as construction simulation or statistic analysis. Methods
based on historical data or references are typically referred to as
deterministic analysis (Kannan et al. 1997; Kannan 1999).
2.1. Deterministic analysis
Deterministic analysis was developed for simple calculation of the
productivity of earthmoving operations based on equipment
characteristics, equivalent grades, and the haul distance provided by
performance handbooks published by most manufacturers. A deterministic
model primarily focuses on the use of time duration, which is a fixed or
constant value, with the assumption that any variability in the task
duration is ignored (Halpin and Riggs 1992). Authors described an
example of a simple deterministic model for earthmoving operations,
consisting of a scraper for hauling and a pusher dozer for loading.
Deterministic analysis tends to overestimate actual field productivity.
2.2. Simulation techniques
With rapid advances in computer technologies, researchers have
tried to create simulation models to help construction engineers predict
construction productivity prior to commencing actual activities.
Simulation models have been extensively developed and broadly used as
management tools within manufacturing and business industries. The
CYCLONE (CYCLic Operation Network) system approach was developed in the
early 1970s. This system demonstrated potential for modeling and
simulation of repetitive construction processes. In 1982, Lluch and
Halpin developed a microcomputer version of CYCLONE named MicroCYCLONE.
Many improvements to MicroCYCLONE have been developed in the past two
decades. In general, a construction simulation is conducted in several
steps (i.e., site observation, duration and resource data collection,
modeling using CYCLONE, running simulation, and sensitivity analysis)
(Kannan 1999; Wang and Halpin 2004). Martinez and Ioannou created
STROBOSCOPE (State ResSorce Based Simulation of COnstruction ProcESSES),
which adopts the CYCLONE methodology such as normal, queue, and combi
activities (Martinez and Ioannou 1994; Ioannou and Martinez 1996).
WebCYCLONE, another variation of the CYCLONE methodology, simplifies the
simulation modeling process and makes it accessible to construction
practitioners with limited simulation experience (Halpin and Riggs
1992).
Simulation techniques are currently improved through many
researches for overcoming practical limitations to be applied to real
operations. Symphony, one of simulation systems, developed by Hajjar and
AbouRizk (2002) was the unified modeling technique under an integrated
development environment (Hajjar and AbouRizk 2002; Mohamed and AbouRizk
2005). Based on this technique, AbouRizk and his colleagues presented
the developed simulation methodologies based on intelligent decision
supports for easy usage by practitioners in fields (Mohamed and AbouRizk
2005, 2006; van Tol and AbouRizk 2006). Another effort pursuing more
reliable predicting results was presented as a form of situation based
simulation models based on the cause-and effect relationships by Choy
and Ruwanpura (2006). These all research accomplishments were mainly
focused on improvement of simulation techniques to be applied to
construction field with more efficiency. The basic elements used in the
CYCLONE method are shown in Table 1.
[TABLE 1 OMITTED]
2.3. Multiple regression analysis
Regression analysis is the most commonly performed statistical
procedure for prediction of certain tendencies based on observed
datasets. The ultimate goal of a regression analysis is not only to find
the values of parameters, but also to determine what type of
mathematical function fits best. Using this tool, researchers have been
able to investigate and understand the relationships between explanatory
variables and a result called a response variable (Devore 2000).
Smith (1999) presented stepwise MR techniques to investigate the
relationships between earthmoving operation conditions and productivity
and to develop a deterministic model allowing earthmoving operations to
be planned for many different situations. This MR model using input data
taken from four different highway construction projects demonstrated
that there is a strong linear relationship between operation conditions
and productivity (Smith 1999; Han et al. 2008).
2.4. Artificial neural network technique
An ANN is an extremely powerful tool that provides a computing
environment in the form of a highly interconnected network of many
simple processing units capable of acquiring, representing, and applying
mappings from one space of information as inputs to another space as
outputs. An ANN is composed of simple processing elements, called neural
network artificial neurons, an architecture comprised of connections
between the elements, and weights associated with each connection. The
ANN performs computations by propagating changes in activation between
its processing elements over weighted connections (Tsoukalas and Uhrig
1997).
Shi (1999) demonstrated the use of an ANN to predict earthmoving
production and presented an easy method for a user who does not have a
background in computer simulation to predict the productivity of
earthmoving operations. However, the results of the neural network
system were not validated through a comparison with actual data
collected from job sites. In addition, there is a lack of information
about the detailed components, including the architecture of the network
(Shi 1999). Schabowicz and Hola (2007) and Hola and Schabowicz (2010)
investigated recently the productivity of earthworks using ANN. These
researches presented the efficiency of the ANN as a feasible tool
capable of the productivity estimation in construction. This study
suggests the additional methodology of the input data generation using a
simulation technique in case of the shortage of the collected
construction data unlike other researches mentioned previously.
2.5. Limitations of the conventional methods for productivity
prediction
Many studies have presented the limitations of existing
productivity prediction methods. A deterministic analysis does not
present actual productivity based on real situations such as idleness
and loss of productivity due to random variation in the system activity
duration (Halpin and Riggs 1992). While simulation methods are able to
overcome these limitations, there are still considerable complexities
involved in making necessary models reflecting actual operational
situations. Mathematical relations between productivity and operating
conditions can be determined through a MR analysis, and such relations
would then be more easily applied than other techniques. A large amount
of input datasets covering various actual conditions necessitates a
reliable regression model. However, in reality, acquiring a large amount
of actual datasets from various construction job sites presents
practical challenges. Implementation by an ANN has the same limitation
mentioned above in practical application caused by insufficient input
datasets (Han et al. 2006; Han et al. 2008). It noted that limitations
of the conventional methods were mainly caused by the difficulty of
actual data collection from jobsite.
3. Data collection and data generation
In compliance with the need of a new methodology enabling
straightforward prediction of productivity, this study suggests a
methodology that combines a simulation method and an estimation tool,
either a MR analysis or an ANN analysis. The simulation method is used
for generating a large amount of data that is then used as input data in
creating a MR model or an ANN model. The methodology of MR and an ANN
respectively based on a construction simulation provides a means of
predicting productivity as well as establishing the relationship between
operating conditions and productivity.
3.1. Data collection
As the first phase, actual raw datasets were collected from
construction sites where earthmoving was conducted in West Lafayette and
Lafayette, Indiana. Table 2 describes the six construction projects
where data collection was conducted (Han 2005; Han et al. 2008).
From the projects described in Table 2, raw datasets were collected
for four or five hours in two or three consecutive days at each jobsite.
A total of 23 separate hourly data including a series of multiple cycles
were collected. Each dataset represents a remarkable sample of
earthmoving operations involving both a two-link system composed of an
excavator and trucks and a three-link system composed of an excavator, a
dozer, and trucks. Video of the earthmoving operations in the jobsites
was recorded, providing consistent observations for the analysis of the
event times of each piece of equipment. The event times analyzed in the
video tapes made it possible to determine the cycles times of each
activity using a stop watch analysis, interviews, and field measurement
(Everett et al. 1998). Sieve analysis using soil samples taken from the
jobsites provided basic information regarding the soil characteristics.
The travel time, loading time, machine break time, and resurveying time
were acquired through observations and analyses. Interviews with site
personnel and field measurements provided the basic conditions of the
jobsite, such as hauling distance, equipment capacity and the number of
pieces of equipment and probabilities of machine break and resurveying
(Han 2005; Han et al. 2008). Table 3 summarizes the data collected from
the selected jobsites.
[FIGURE 2 OMITTED]
3.2. Simulation
WebCYCLONE, a construction simulation tool, was run using the
collected raw datasets. The data obtained from the simulation are used
as preliminary data that are expanded to a large number of datasets to
be utilized as input datasets for implementing a MR or an ANN analysis.
Fig. 2 demonstrates one of the simulation models based on a dataset
collected from the construction site for Project A.
This simulation model was designed to measure the productivity in
terms of truck-dumps per hour. It was noted in the simulation model that
4.55% of interruptions by the on-site surveyor were observed during the
excavation process. These interruptions were due to restaking the
knock-down stacks. This kind of interruption is generally observed in
all sites where earthmoving is conducted. The result of the simulation
model, which reflects actual situations, indicates that this
interruption causes a delay of the cycle time and eventually lowers
productivity. The duration associated with various cycle times, such as
loading the earth to truck, trucks' traveling and returning were
assumed to fit a beta distribution. According to a study by AbouRizk and
Halpin (1992), these distributions could be used in modeling random
input processes of construction duration periods for simulation studies.
3.3. Comparison of actual data and simulated data
In order to establish a reliable prediction method, the collected
raw data were replaced by the data obtained from the construction
simulation, since it is difficult for users to collect a sufficient
amount of data by actual measurement and site observation from jobsites.
The reliability and confidence of replacing the actual data with
the simulated data could be verified by statistical analyses. The
Wilcoxon signed rank test is a method for checking the similarity of two
samples. It tests the median difference between pairs of datasets in two
samples where a normal distribution is not assumed. Since the difference
between two samples is calculated, the simulated data can be measured on
an interval scale that corresponds to with the degree of difference from
the actual data (Devore 2000).
When the data consists of pairs of ([X.sub.1], [Y.sub.1]) ...,
([X.sub.n], [Y.sub.n]), the differences [D.sub.1] = [X.sub.1] -
[Y.sub.1], ..., [D.sub.n] = [X.sub.n] - [Y.sub.n] are checked with
testing hypotheses on the expected difference [mu]D, by using the
Wilcoxon signed-rank test on the Di's (Devore 2000).
A Wilcoxon signed-rank test of the difference between the actual
data and the simulated data was conducted using the SAS program. Based
on the test assumptions, the null hypothesis and rejection regions for a
level a test are as follows:
Null hypothesis: H0: D ([[X.sub.1] - [Y.sub.i]: Absolute magnitude
between [X.sub.i], the actual measurement, and [Y.sub.i], the simulation
models) = 0;
Alternative hypothesis: Ha: D ([X.sub.i] - [Y.sub.i]) [not equal
to] 0.
The UNIVARIATE procedure provided by the SAS program was conducted
to test the statistical values. The P values were used for investigation
of acceptance or rejection of the null hypothesis.
Halpin and Riggs (1992) illustrated that productivity values vary
with the means by which those values are obtained. According to their
study, the productivity value obtained through actual measurement has
approximately 10% points of loss in deterministic productivity due to
bunching caused by random travel times. In contrast with deterministic
productivity, simulated productivity is estimated with consideration of
the bunching effect and variances in travel times, and it generally has
a higher value than the productivity value obtained through actual
measurement. The value of the simulated productivity in this study was
between that of the deterministic productivity and that of the actual
productivity (Halpin and Riggs 1992).
It is assumed that simulated productivity locates in five
percentage points, which can be a criterion located between zero and 10%
of the average range of differences by the deterministic productivity
and the actual productivity, higher than actual productivity based on
the information produced from the study by Halpin and Riggs (1992). A
Wilcoxon signed rank test was conducted to compare two groups of
datasets: the value of the simulated productivity and the 5% higher
value than the actual productivity (Han 2005; Han et al. 2008). Table 4
shows the results of the Wilcoxon signed rank test for the pairs of data
described above.
3.4. Data generation
The comparison of simulated data and actual data based on a
statistical methodology presented in the previous section showed that
the simulation data could be used as a substitute for the limited amount
of raw data collected from jobsites. The next step is to generate
datasets using a simulation methodology. The generated datasets by the
simulation serve as input data in estimation tools such as a MR and an
ANN analysis. A guideline must be established prior to input data
generation (Han 2005).
Interviews were conducted with site personnel and site observations
were carried out to identify the main factors, which varied depending on
actual site conditions and influenced productivity significantly. The
following four factors among 17 factors listed in Table 3 were selected:
1) the probability of resurveying, 2) the number of trucks, 3) the
number of excavators, and 4) the resurveying time. All the other factors
were assumed to have been invariable in a single dataset collected
within one hour. Variable durations, such as the loading time and the
travel time, were implemented using duration input modules in the
simulation methodology. The probability of machine breakdown was
excluded from the main variable factors, because the probabilities of
this event were so low that they would not have influenced productivity
(Han et al. 2008).
--Several guidelines, listed below, for input data generation based
on the simulation methodology were determined:
--The low and high levels of the numbers of trucks and excavators
in each dataset were determined by analyzing the collected datasets and
through site observations;
--The specific ranges of the low and high levels of the probability
of resurveying/checking and the resurveying/checking time were
determined from the actual values of the collected data and the mean
values of distribution of all datasets in each system; and
--The numbers of generated datasets derived from one actual dataset
must be identical so that all the datasets were evenly reflected.
To determine the low and high levels of the probability of
resurveying and the resurveying time, the best-fit distributions were
investigated to find the mean value, which was assumed to function as
the low or high level for data generation. Figs 3 and 4 show the
best-fit distributions, obtained via the Best Fit program, of the
resurveying time and the probability of resurveying (Han and Halpin
2005; Han 2005; Han et al. 2008).
The mean values, which were derived from the best-fit distributions
shown in Figs 3 and 4, are listed in Table 5. The number of resources
associated with the simulation methodology was determined from the range
of availability of such resources in the jobsites. This information was
determined through interviews with site personnel. The low or high
levels of the number of equipment were determined depending on the
minimum or maximum number of equipment available at the jobsites. Based
on the guideline described previously, one dataset collected from the
actual jobsites generated 192 datasets (i.e., combinations of 2 x 2 x 3
x 16 for cases under the two-link system or 2 x 2 x 2 x 3 x 8 for cases
under the three-link system). This process, therefore, generated 4,416
datasets (i.e., 23 actual datasets x 192 simulated datasets / one actual
dataset) (Han 2005; Han and Halpin 2005).
[FIGURE 3 OMITTED]
[FIGURE 4 OMITTED]
4. Productivity prediction modeling
4.1. Model configuration
As stated previously, a total of 17 factors that were presumably
considered to affect the productivity were determined by interviews and
site observations, as listed in Table 2. During the interviews with site
personnel, it was noted that data correlated with several factors among
17 factors can be seldom collected depending on actual site conditions.
Some factors that could not be identified before commencing actual
operation were also included in these 17 factors. Owing to these
problems the established methodology would not be appropriate for
predicting the productivity, the ultimate goal of this study. Three
model types were therefore considered and investigated in order to
resolve these problems: 1) Model I: a full model with 17 factors, 2)
Model II: a reduced model with 10 factors, and 3) Model III: a reduced
model with 7 factors (Han 2005).
Model I was associated with all 17 factors, which were regarded to
affect the productivity. Accordingly, Model I was expected to yield the
most reliable prediction results. However, the factors that were
included in Model I, such as the probability of resurveying and
resurveying time, the probability of machine break time, machine break
time, and so on, could not be identified before actual operations
started or resumed. Thus, Model I was limited as a prediction tool. On
the contrary, the reduced models, Models II and III, were expected to
yield prediction results, because they were composed of only factors
that could be identified prior to actual operations. The reduced models
were separated into one model with sufficient information, named Model
II, and one model with insufficient information, named Model III. The
criterion determining sufficient or insufficient information was whether
three specific factors were included in the models or not. These three
variables were excavator operator experience, excavator age, and truck
age, which are considered in Model II. On the other hand, these three
factors are not considered in Model III. These three factors may be
identified or not, depending on different management levels (Han 2005;
Han et al. 2008). The factors used in each model are shown in Table 6.
4.2. Modeling by MR analysis
A MR model provides the prediction of specific results,
demonstrating the relationship between a response variable, i.e., in the
present study, the productivity of each dataset, and the explanatory
variables, which are the factors (i.e., travel times, loading times, and
hauling distance) affecting the productivity. In order to achieve the
best-fitted regression model, three steps were conducted in this study:
1) step regression, 2) transformations, and 3) ridge regression (Devore
2000; Neter et al. 1996; Han et al. 2008).
Table 7 shows the finalized MR models (I, II, and III) obtained
through the three steps mentioned above. They present mathematical
relationships between the explanatory variables, denoted as predictors,
and a response variable. These mathematical relationships allowed the
user to predict the productivity when input data reflecting actual
situations is provided prior to actual commencement of site work (Han et
al. 2008).
4.3. Modeling by ANN analysis
A well-trained ANN with sufficient input data can provide
appropriate estimation results (Tsoukalas and Uhrig 1997). The
researches by Schabowicz and Hola (Schabowicz and Hola 2007, Hola and
Schabowicz 2010) introduced the usage of ANN for productivity prediction
based on a conjugate gradient algorithm (BPNN-CGB) with five input data;
number of excavators, number of trucks, excavator bucket capacity, truck
loading platform capacity, and type of road surface.
As stated previously, the shortage of raw data, one of problems for
usage of the ANN, was resolved by data generation based on a simulation.
The architecture of the network used in this study was a multi-layer
"feedforward" network. The ANN model in this study was
designed with two hidden layers with 50 neurons and 20 neurons,
respectively, through numerous experiments. Two "tansig"
functions were adopted as the first two transfer functions of the two
hidden layers and one "purelin" function was adopted as the
function of the last output layer. As a training algorithm,
"resilient backpropagation (tainrp)" was adopted as it
provides useful functionality for multi-layer networks.
"Sigmoid" transfer functions compress an infinite input range
into a finite output range. Most backpropagation algorithms tend to have
small changes in the weights and biases even though the weights and
biases are far from their optimal values. The purpose of the resilient
backpropagation (Rprop) training algorithm is to eliminate these
limitations (Demuth and Beale 2001). Resilient backpropagation allows
the network to approach the goal, denoted by the differences between a
target value and the output with a steep gradient.
In addition, functions of pre-processing and post-processing, named
"premnmx and postmnmx" were added in this study. These
functions are useful to scale the inputs and targets such that they
always fall within a specified range (Han 2005). Fig. 5 shows a basic
diagram of the network, which was optimally designed for accomplishing
the goal of this study.
[FIGURE 5 OMITTED]
[FIGURE 6 OMITTED]
[FIGURE 7 OMITTED]
For selection of datasets for training and validation, one-tenth of
the datasets generated by the simulation models were used for validation
and the remaining were used for training. For instance, a total of 4,416
datasets was divided into 3,975 datasets and 441 datasets for training
and validation, respectively. Model I is reviewed in Figs 6 and 7, which
show the procedures and results based on an ANN as an example. Model I
was trained based on resilient backpropagation with pre-processing and
post-processing with a 0.001 error goal and 20,000 maximum epochs (Han
2005).
Fig. 6 shows a goal graph showing 0.001 as the range of the errors,
which were the difference between the optimal target value and the
output reaching the goal as 0.001.
Fig. 7 shows that the validation results were well-fitted with the
optimal target value. The R value of 0.997, shown in Fig. 7, is close to
1, which also indicates that the trained model reliably estimates the
optimal result.
5. Comparison of results by two prediction models
5.1. Comparison of results by the fitted predictive model A: MR
analysis
The fitted predictive model A, a predictive model using a MR
analysis, employs procedures based on the construction simulation, data
generation, and a MR analysis. A comparison between raw data collected
from jobsites and the results yielded by the fitted predictive model A
is presented in this section. A comparison of these two values provides
an assessment of the fitted predictive model. The comparison rates shown
in Table 8 represent the percentage rates of the predicted productivity
by the fitted predictive model A to the actual productivity measured
directly from jobsites (Han et al. 2008).
According to Table 8, the average comparison rates of model I,
model II, and model III were 99.06%, 91.23%, and 90.89%, respectively.
The differences among the average comparison rates of each model also
indicated that the factors that were included in model I and excluded in
models II and III, i.e., the probability of resurveying and resurveying
time, the probability of machine break time, machine break time, and so
on, significantly influenced the predicted results. The factors that
were included in model II and excluded in model III, such as experience
of excavator's operator, age of excavator, and age of trucks, did
not have a significant influence on the predicted results (Han et al.
2008).
5.2. Comparison of results by the fitted predictive model B: ANN
analysis
The results by the fitted predictive model B using ANN analysis
were compared to actual productivity calculated based on raw collected
data. Table 9 presents a comparison of the actual productivity and the
predicted productivity by the fitted predictive model B.
As listed in Table 9, the average comparison rates of model I,
model II, and model III were 103.06%, 98.80%, and 99.28%, respectively.
Unlike the fitted predictive model A, there were not significant
differences among the average comparison rates of model I, II, and III.
Focusing solely on the average comparison rates of each model, the
average comparison rate of model III was closer to 100% than was that of
model I. The standard deviation of model I, however, is clearly less
than those of model II and III. This observation indicates that the
predictive results by model I, which was composed of all 17 factors, was
more precise and stable than those of the other models.
5.3. Comparison of the predictive results between the fitted
predictive models A and B
As presented on previous, this study provided two comparisons of
the fitted predictive models A and B. According to these comparisons, it
noted that the predictive results of predictive model A in model I,
which included all 17 factors, were more reliable than those of model B.
The predictive results in models II and III, however, showed that
predictive model B provides more reliable results than model A. This
analysis indicates that model B would be more useful in productivity
prediction, since models II and III composed of factors that can be
identified before commencing actual operation, could be used for
productivity prediction under actual site situations. However, the
standard deviations of the comparison rates in Tables 8 and 9 show that
further improvement of both models is required.
There are performance differences in the two estimation tools in
terms of implementation. The MR analysis included in the fitted
predictive model A eventually provided a mathematical relationship
between the factors and the predictive productivity. This model would
enable a user to obtain the predictive result by merely inputting the
factors which is the information under specific site conditions.
However, implementation of the fitted predictive model B, which includes
an ANN analysis, is difficult compared to predictive model A, since
professional skill for running the MATLAB program (Demuth and Beale
2001) is required for implementation (Han 2005).
6. Conclusions
Productivity prediction is an important issue to construction
managers and planners. A literature reviews conducted in this study
revealed that many studies have been performed to date with the goal of
improving productivity prediction results. Most methodologies developed
thus far were based on one of various methods, and present several
limitations in terms of practical applications. This study accordingly
presented a new methodology that combines methods that function
correlatively. The methods used in this study were actual data
collection, data generation using construction simulation, and
estimation tools, that is, MR and ANN analysis. Two reliable estimation
tools, MR and ANN analysis, which have been widely used for prediction
results in engineering, serve as the last step correlated to data
collection and data generation. This study also presented the
differences of basic characteristics and comparisons of technical
performance yielded by MR and ANN analysis.
The first step to produce the fitted predictive model was data
generation, which was based on actual data collection from jobsites.
This step enables the user to secure a sufficiently large quantity of
input data to run the estimation tools, i.e., MR and ANN analysis. A
construction simulation technique was used to overcome difficulties in
acquiring raw data and a Wilcoxon signed-rank test was conducted to
replace the actual productivity calculated based on raw data with the
simulated productivity. The next step was implementation of estimation
tools using the generated data as input data. This study provided fitted
predictive models A and B using either MR or ANN analysis, respectively.
Each predictive model was composed of models I, II, and III, which
varied according to the factors included or excluded.
Comparison between the actual productivity and the results yielded
by fitted predictive model A showed that the average comparison rates
were 99.06%, 91.23%, and 90.89% of models I, II, and III, respectively.
In contrast with the results obtained by fitted predictive model A, the
average comparison rates of fitted predictive model B were 103.06%,
98.80%, and 99.28% of model I, II, and III, respectively. These results
indicated that predictive model B was better fitted to the actual data
than was model A. Implementation of predictive model B, however, is
difficult in that running the MATLAB program demands specific skill.
Implementation of predictive model A was relatively easier than that of
model B, since the user can obtain predictive results by merely
inputting the information for each factor or explanatory variable.
The fitted predictive models suggested in this study enable
planners who presently are faced with the insufficient actual datasets,
to carry out reliable productivity prediction by means of combination of
the simulation either MR or ANN. This study also contributes to the
research community by providing a new methodology that combines various
methods and produces more reliable prediction results than conventional
predictive methods.
doi: 10.3846/13923730.2011.574381
Acknowledgment
This work was supported by the INHA UNIVERSITY Research Grant.
References
AbouRizk, S. M.; Halpin, D. W. 1992. Statistical properties of
construction duration data, Journal of Construction Engineering and
Management ASCE 118(3): 525-544.
doi:10.1061/(ASCE)0733-9364(1992)118:3(525)
Capachi, N. 1987. Excavation & grading handbook revised.
Craftsman Book Company: Carlsbad, CA. 512 p.
Choy, E.; Ruwanpura, J. Y. 2006. Predicting construction
productivity using situation-based simulation models, Canadian Journal
of Civil Engineering NRC 33(12): 15851600. doi:10.1139/L06-088
Demuth, H.; Beale, M. 2001. Neural network toolbox for use with
MATLAB. MATLAB Tool Box menu. The Mathworks Inc. 840 p.
Devore, J. L. 2000. Probability and Statistics for Engineering and
the Sciences. 5th Ed. Duxbury, Pacific Grove, CA. 750 p.
Everett, J. G.; Halkali, H.; Schlaff, T. G. 1998. Time-lapse
applications for construction project management, Journal of
Construction Engineering and Management ASCE 124(3): 204-209.
doi:10.1061/(ASCE)0733-9364(1998)124:3(204)
Hajjar, D.; AbouRizk, S. M. 2002. Unified modeling methodology for
construction simulation, Journal of Construction Engineering and
Management ASCE 128(2): 174-185.
doi:10.1061/(ASCE)0733-9364(2002)128:2(174)
Halpin, D. W.; Riggs, L. S. 1992. Planning and analysis of
construction operations. John Wiley & Sons, Inc., New York, NY. 381
p.
Han, S. 2005. Application modeling of the conventional and the
GPS-based earthmoving systems. PhD Dissertation of Purdue University at
West Lafayette, Indianapolis, USA. 226 p.
Han, S.; Halpin, D. W. 2005. The use of simulation for productivity
estimation based on multiple regression analysis, in Proc. of the 2005
Winter Simulation Conference, Orlando, Florida, 2005. Ed. by M. E. Kuhl,
N. M. Steiger, F. B. Armstrong, J. A. Jones. IEEE, Piscataway, NJ.
1492-1499.
Han, S.; Hong, T.; Lee, S. 2008. Productivity prediction of
conventional and GPS-based earthmoving systems using simulation and
multiple regression analysis, Canadian Journal of Civil Engineering NRC
35(6): 574-587. doi:10.1139/L08-005
Han, S.; Lee, S.; Hong, T.; Chang, H. 2006. Simulation analysis of
productivity variation by global positioning system (GPS) implementation
in earthmoving operations, Canadian Journal of Civil Engineering NRC
33(9): 1005-1114.
Hola, B.; Schabowicz, K. 2010. Estimation of earthworks execution
time cost by means of artificial neural networks, Automation in
Construction 19(5): 570-579. doi:10.1016/j.autcon.2010.02.004
Ioannou, P. G.; Martinez, J. C. 1996. Comparison of construction
alternatives using matched simulation experiments, Journal of
Construction Engineering and Management ASCE 122(3): 231-241.
doi:10.1061/(ASCE)0733-9364(1996)122:3(231)
Kandil, A.; El-Rayes, K. 2005. Parallel computing framework for
optimizing construction planning in large-scale projects, Journal of
Computing in Civil Engineering ASCE 19(3): 304-312.
doi:10.1061/(ASCE)0887-3801(2005)19:3(304)
Kannan, G. 1999. A methodology for the development of a production
experience database for earthmoving operations using automated data
collection. PhD Dissertation of Civil Engineering in Virginia
Polytechnic Institute and State University at Blacksburg, VA, USA.
Kannan, G.; Martinez, J. C.; Vorster, M. C. 1997. A framework for
incorporating dynamic strategies in earth-moving simulations, in Proc.
of the 1997 Winter Simulation Conference, Atlanta, GA. 1997. Ed. by S.
Andradottir, K. J. Healy, D. H. Withers, B. L. Nelson. IEEE. Piscataway,
NJ, USA. 1119-1126.
Martinez, J. C.; Ioannou, P. G. 1994. General purpose simulation
with STROBOCOPE, in Proc. of the 1994 Winter Simulation Conference, Lake
Buena Vista, FL, 1994. Ed. by J. D. Tew, S. Manivannan, D. Sadowski, A.
F. Seila. IEEE. Piscataway, NJ, USA. 1159-1166.
Mohamed, Y.; AbouRizk, S. M. 2005. Framework for building
intelligent simulation models of construction operations, Journal of
Computing in Civil Engineering ASCE 19(3): 277-291.
doi:10.1061/(ASCE)0887-3801(2005) 19:3(277)
Mohamed, Y.; AbouRizk, S. M. 2006. A hybrid approach for developing
special purpose simulation tools, Canadian Journal of Civil Engineering
NRC 33(12): 1505-1515. doi:10.1139/L06-073
Neter, J.; Kutner, M. H.; Nachtsheim, C. J.; Wasserman, W. 1996.
Applied linear statistical models. 4th Ed. WCB/McGraw Hill, Boston, MA.
1408 p.
Schabowicz, K.; Hola, B. 2007. Mathematical-neural model for
assessing productivity of earthmoving machinery, Journal of Civil
Engineering and Management 13(1): 47-54.
Schaufelberger, J. E. 1998. Construction equipment management.
Prentice Hall, Inc. Upper Saddle River, NJ. 357 p.
Shi, J. J. 1999. A neural network based system for predicting
earthmoving production, Construction Management and Economics 17(4):
463-471. doi:10.1080/014461999371385
Smith, S. D. 1999. Earthmoving productivity estimation using linear
regression techniques, Journal of Construction Engineering and
Management ASCE 125(3): 133-141.
doi:10.1061/(ASCE)0733-9364(1999)125:3(133)
Tsoukalas, L. H.; Uhrig, R. E. 1997. Fuzzy and neural approached in
engineering. John Wiley & Sons, Inc., New York, NY. USA. 587 p.
Van Tol, A. A.; AbouRizk, S. M. 2006. Simulation modeling decision
support through belief networks, Simulation Modelling Practice and
Theory 14(5): 614-640. doi:10.1016/j.simpat.2005.10.010
Wang, S.; Halpin, D. W. 2004. Simulation experiment for improving
construction processes, in Proc. of the 2004 Winter Simulation
Conference, Washington DC. 2004. Ed. by R. G. Ingalls, M. D. Rossetti,
J. S. Smith, B. A. Peters, IEEE. Piscataway, NJ, USA. 1252-1259.
Seungwoo Han (1), TaeHoon Hong, Gwangho Kim, Sangyoub Lee (4)
(1) Department of Architectural Engineering, Inha University, 253
Younghyun-dong, Nam-gu Incheon 402-751, Korea
(2) Department of Architectural Engineering, Yonsei University, 50
Yonsei-Ro, Seodameun-Gu, Seol 120-749, Korea
(3) Advanced Building Science and Technology Research Center,
Yonsei University, 50 Yonsei-Ro, Seodameun-Gu, Seol 120-749, Korea
(4) Department of Real Estate Studies, Konkuk University,
Hwayang-dong, Gwangjin-gu, Seoul 143-701, Korea
E-mails: (1) shan@inha.ac.kr; (2) hong7@yonsei.ac.kr (corresponding
author); (3) allwe@hanmail.net; 4sangyoub@konkuk.ac.kr
Received 31 Aug. 2009; accepted 10 Sept. 2010
Seungwoo HAN. Currently working as an associate prof. in the
Department of Architectural Engineering, College of Engineering at Inha
University, Incheon, South Korea. He is a member of several institutes;
Architectural Institute of Korea (AIK), Korea Institute of Construction
Engineering and Management (KICEM), and Korea Institute of building
Construction (KIC). His research interests are to create estimation
models capable of being applied to emerging construction technologies to
jobsites and to evaluate construction performances based on construction
simulation techniques.
TaeHoon HONG. An assistant prof. at the Department of Architectural
Engineering of Yonsei University, Seoul, Korea. He is a corresponding
member of editorial board in the Journal of Management in Engineering,
ASCE and is also a member of academic or practical institute such as
AIK, KSCE, ASCE, KICEM and KCVE. His main research areas include life
cycle cost analysis, life cycle assessment, infrastructure asset
management, facility management, and sustainable construction.
Gwangho KIM. Dr in the Advanced Building Science and Technology
Research Center, Yonsei University, Seoul, Korea. He is also currently
working as an adjunct professor in the Department of Architectural
Engineering, College of Engineering at Inha University, Incheon, South
Korea. He is a member of Architectural Institute of Korea (AIK) and
Korea Institute of Construction Engineering and Management (KICEM). His
research interests include the analysis of development cost, a
feasibility analysis for multi- family housing development plans,
simulation study, and cost planning.
Sangyoub LEE. PhD, is an associate professor and head of the
department of real estate studies at the Konkuk University, Korea. He is
a member of Korea Institute of Construction Engineering and Management
as well as a director of Korea Real Estate Analysts Association. His
research interests include the project management of construction and
development project focused on the risk management, particularly the
quantification of risk evaluation, analysis, forecasting.
Table 2. Descriptions of earthmoving projects
Haul distances in
Projects Fleet organization two ways (miles)
Project A 1 excavator, 7 trucks 3
Project B 1 excavator, 1 dozer, 2 2.9
trucks
Project C 1 excavator, 4 trucks 15.8
Project D 1 excavator, 10 trucks 4.8
Project E 1 excavator, 2 trucks 1.1
Project F 1 excavator, 7 trucks 9.4
Table 3. Summary of data characteristics collected from the jobsites
Methods of data collection Types of data collected
Site Stop watch analysis Machine break time, Resurveying
observations using videotaping time, Loading duration, Travel
duration, Number of loading
Interviews Equipment capacity (bucket of
excavator), Number of
equipment, Operators'
experience, Age of equipment
Field measurements Soil conditions, Hauling
distance, Probabilities of
machine break and resurveying
Calculations Hauling speed, Productivity
Table 4. Results of Wilcoxon signed-rank test of all datasets
Project Productivity Confidence
No name Dataset Actual Simulation P-value level (%)
1 Project A 1 18.53 19.10
2 13.29 14.22
3 24.51 26.00 0.0625 95
4 15.97 16.13
5 19.37 19.82
2 Project B 1 5.01 5.08
2 2.54 2.62 0.0625 95
3 3.33 3.40
3 Project C 1 4.05 4.17
2 3.50 3.74 0.5000 95
4 Project D 1 16.42 17.48
2 8.09 8.38
3 15.19 15.52 0.0625 95
4 18.14 18.56
5 16.04 16.12
5 Project E 1 4.39 4.57
2 3.26 3.37 0.2500 95
3 3.87 3.88
6 Project F 1 15.46 16.14
2 15.60 15.94
3 14.31 16.90 0.0625 95
4 12.20 12.72
5 14.75 15.70
Table 5. The best fitted distribution and mean values of two
main factors
Factors Distribution Mean
Probability of resurveying Beta 21.68%
Resurveying time Gamma 14.43 min
Table 6. Variables used in three models
Variables Models
Descriptions Denotes I II III
Haul distance A O O O
Hauling speed B O X X
Bucket capacity of excavator C O O O
Number of loading D O O O
Probability of machine break E O X X
Machine break time F O X X
Prob. of resurveying G O X X
Resurveying time H O X X
Soil conditions I O O O
Loading duration J O X X
Travel duration K O X X
Number of trucks L O O O
Number of dozers M O O O
Number of excavators N O O O
Experience of excavator's operator O O X X
Age of excavator P O X X
Age of trucks Q O O X
Productivity by simulation models O O O
Table 7. Variables and coefficients of MR models I, II, and III
Models Regression Models
I Y = 2.0584 + (1.2702 * G) + (0.1018 * L) + (-0.0729 * AI) +
(0.0081 * AL) + (-0.0646 * BG) + (0.0443 * BI) +
(0.0260 * BM) + (-0.0045 * CK) + (0.0185 * CL) +
(-0.5733 * CM) + (-0.0042 * DH) + (-0.0252 * DI) +
(0.1777 * EF) + (0.0028 * FN) + (1.5072 * GG) +
(-0.1593 * GH) + (-0.5425 * GI) + (0.0420 * GK) +
(-0.1206 * GL) + (0.0003 * HH) + (0.0051 * HJ) +
(0.0007 * HK) + (-0.0014 * HL) + (-0.0013 * HO) +
(0.0080 * HQ) + (0.0088 * IL) + (-0.4720 * LM) +
(-0.0087 * JK) + (-0.0247 * JP) + (-0.0075 * LL) +
(0.0087 * LM) + (0.0017 * LN) + (0.0642 * LO) +
(-0.0776 * JM)
II Y = 0.6912 + (0.0179 * L) + (-0.0049 * AC) + (0.0272 * AI) +
(-0.0018 * AL) + (-0.0013 * DL) + (-0.0055 * DP) +
(-0.0195 * IO) + (0.0013 * LL) + (-0.0014 * LM) +
(-0.0003 * LN) + (0.0016 * LO) + (0.0156 * MM) +
(-0.0230 * MO)
III Y = 0.6984 + (-0.0028 * N) + (-0.0004 * AD) +
(0.0314 * AI) + (-0.0018 * AL) + (-0.0137 * AM) +
(-0.0078 * CC) + (-0.0042 * CL) + (-0.0044 * DD) +
(-0.0430 * DI) + (-0.0009 * DL) + (0.0018 * DM) +
(-0.0043 * IL) + (0.0013 * LL) + (-0.0020 * LM)
Table 8. Comparison of the actual productivity and the predicted
productivity by the fitted predictive model A
Model I
Actual Predicted Comparison
Data sets Productivity Productivity Rate (%)
1 18.53 18.60 100.38
2 13.29 13.84 104.15
3 24.51 22.32 91.07
4 15.97 15.84 99.20
5 19.37 17.85 92.14
6 5.01 4.63 92.48
7 2.54 2.51 98.83
8 3.33 3.18 95.59
9 4.05 3.62 89.47
10 3.50 3.63 103.83
11 16.42 16.63 101.30
12 8.09 9.46 116.91
13 15.19 15.01 98.82
14 18.14 17.50 96.45
15 16.04 14.55 90.69
16 4.39 4.59 104.58
17 3.26 3.23 99.15
18 3.87 3.62 93.56
19 15.46 15.77 101.99
20 15.6 15.45 99.02
21 14.31 16.25 113.55
22 12.20 11.82 96.86
23 14.75 14.52 98.41
Average 99.06
Standard deviation 6.72
Model II
Predicted Comparison
Data sets Productivity Rate (%)
1 13.71 73.99
2 13.71 103.16
3 13.71 55.94
4 13.71 85.85
5 13.71 70.78
6 3.85 76.78
7 3.85 151.44
8 3.85 115.51
9 3.27 80.73
10 3.27 93.42
11 12.72 77.45
12 12.72 157.20
13 12.72 83.72
14 12.72 70.11
15 12.72 79.29
16 3.91 89.16
17 3.91 120.06
18 3.91 101.14
19 11.84 76.58
20 11.84 75.90
21 11.84 82.74
22 11.84 97.05
23 11.84 80.27
Average 91.23
Standard deviation 24.74
Model III
Predicted Comparison
Data sets Productivity Rate (%)
1 13.52 72.94
2 13.52 101.70
3 13.52 55.14
4 13.52 84.63
5 13.52 69.78
6 3.81 76.05
7 3.81 150.00
8 3.81 114.41
9 3.22 79.61
10 3.22 92.12
11 12.83 78.15
12 12.83 158.62
13 12.83 84.48
14 12.83 70.74
15 12.83 80.00
16 3.93 89.42
17 3.93 120.42
18 3.93 101.44
19 11.79 76.29
20 11.79 75.60
21 11.79 82.42
22 11.79 96.67
23 11.79 79.96
Average 90.89
Standard deviation 24.84
Table 9. Comparison of the actual productivity and the predicted
productivity by the fitted predictive model B
Model I
Actual Predicted Comparison
Data sets Productivity Productivity Rate (%)
1 18.53 18.95 102.27
2 13.29 14.32 107.75
3 24.51 25.69 104.81
4 15.97 15.75 98.62
5 19.37 19.55 100.93
6 5.01 4.85 96.81
7 2.54 2.63 103.54
8 3.33 3.50 105.11
9 4.05 4.05 100.00
10 3.50 3.83 109.43
11 16.42 17.13 104.32
12 8.09 8.09 100.00
13 15.19 15.39 101.32
14 18.14 17.88 98.57
15 16.04 16.02 99.88
16 4.39 4.36 99.32
17 3.26 3.44 105.52
18 3.87 3.80 98.19
19 15.46 17.29 111.84
20 15.60 16.04 102.82
21 14.31 16.70 116.70
22 12.20 11.91 97.62
23 14.75 15.50 105.08
Average 103.06
Standard deviation 4.90
Model II
Predicted Comparison
Data sets Productivity Rate (%)
1 15.57 84.03
2 15.57 117.16
3 15.57 63.53
4 15.57 97.50
5 15.57 80.38
6 4.01 80.04
7 4.01 157.87
8 4.01 120.42
9 3.97 98.02
10 3.97 113.43
11 12.93 78.75
12 12.93 159.83
13 12.93 85.12
14 12.93 71.28
15 12.93 80.61
16 3.92 89.29
17 3.92 120.25
18 3.92 101.29
19 13.59 87.90
20 13.59 87.12
21 13.59 94.97
22 13.59 111.39
23 13.59 92.14
Average 98.80
Standard deviation 24.40
Model III
Predicted Comparison
Data sets Productivity Rate (%)
1 15.36 82.89
2 15.36 115.58
3 15.36 62.67
4 15.36 96.18
5 15.36 79.30
6 4.09 81.64
7 4.09 161.02
8 4.09 122.82
9 4.11 101.48
10 4.11 117.43
11 12.74 77.59
12 12.74 157.48
13 12.74 83.87
14 12.74 70.23
15 12.74 79.43
16 4.08 92.94
17 4.08 125.15
18 4.08 105.43
19 13.50 87.32
20 13.50 86.54
21 13.50 94.34
22 13.50 110.66
23 13.50 91.53
Average 99.28
Standard deviation 25.13