2 Data Sets for Creating Models
Using the following data, we create models to predict both planning effort (E f f )
and error (Err).
E f f : “The amount of effort” that needs be predicted.
Err: “The number of errors” in a project.
Vnew: “Volume of newly added”, which denotes the number of steps in the newly
generated functions of the target project.
Vmodi f y: “Volume of modification”, which denotes the number of steps modifying
and adding to existing functions to use the target project.
Vsurvey: “Volume of original project”, which denotes the original number of steps
in the modified functions, and the number of steps deleted from the functions.
Vreuse: “Volume of reuse”, which denotes the number of steps in functions of
which only an external method has been confirmed and which are applied to
the target project design without confirming the internal contents.
3 Artificial Neural Network Model
An artificial neural network (ANN) is essentially a simple mathematical model
defining a function.
where X = {xi|0 ≤ xi ≤ 1, i ≥ 1} and Y = {yi|0 ≤ yi ≤ 1, i ≥ 1}.
ANNs are non-linear statistical data modeling tools that can be used to model
complex relationships between inputs and outputs. The basic model is illustrated in
Fig. 1, where the output is calculated as follows.
1. Calculate values for hidden nodes. The value of Hidden Nodej is calculated
using the following equation:
Hidden Nodej = f Σi
(wi, j ×Inputi) (1)
where f (x) equals 1
1+exp(−x) and the wi, j are weights calculated by the learning
algorithm.
2. Calculate Out put using Hj as follows:
Out put = f Σk
(w
k ×Hidden Nodek) (2)
where f (x) equals 1
1+exp(−x) and the w
k are the weights calculated by the learning
algorithm.
We can use an ANN to create efforts and errors prediction models.
14 K. Iwata et al.
3.1 Problems in Original ANN Model
In an ANN, the range of input values or output values is usually less than or equal
to 1 and greater than or equal to 0. The values of most selected data, however, are
greater than 1. Thus each data range needs to be converted to the range [0, 1] by
normalization, a process that leads to a large margin for error in some projects.
In our original ANN model, normalized values are calculated using Eq. (3),
where the normalized value for t is expressed as fnl (t) (where t denotes E f f , Err,
Vnew, Vmodi f y, Vsurvey, andVreuse).
fnl (t) =
t −min(T)
max(T)−min(T)
(3)
where T denotes the set of t, and max(T) and min(T) denote the maximum and
minimum values of T, respectively.
Since the normalization is flat and smooth, a small change in a normalized value
has a greater degree of influence on a small-scale project than on a large scale
project.
For example, let min(TS) equal 10, max(TS) equal 300, tS1 equal 15, tS2 equal 250,
and the predicted values for tS1 and tS2 be
tS1 and
tS2 , respectively. If the prediction
model has an error of +0.01, then f−1
nl (0.01)= 2.90. The predicted values are given
as
tS1 = 17.90 and
tS2 = 252.90. In both cases the error is the same, but the absolute
values of the relative error (ARE) are given by:
ARES1 =
tS1
−tS1
tS1
=
17.90−15
15
= 0.1933
ARES2 =
tS2
−tS2
tS2
=
252.90−250
250
=
0.0116
The results show that the absolute value of the relative error of the former equation
is greater than that of the latter.
The distributions of the amount of effort and the number of errors are shown in
Figure 2 and 3, respectively. These distributions confirm that both the amount of
effort and number of errors in small-scale projects are major and significant and
greater than those in the large scale projects. Thus, in order to improve prediction
accuracy, it is important to reconstruct the normalization method.
Using the following data, we create models to predict both planning effort (E f f )
and error (Err).
E f f : “The amount of effort” that needs be predicted.
Err: “The number of errors” in a project.
Vnew: “Volume of newly added”, which denotes the number of steps in the newly
generated functions of the target project.
Vmodi f y: “Volume of modification”, which denotes the number of steps modifying
and adding to existing functions to use the target project.
Vsurvey: “Volume of original project”, which denotes the original number of steps
in the modified functions, and the number of steps deleted from the functions.
Vreuse: “Volume of reuse”, which denotes the number of steps in functions of
which only an external method has been confirmed and which are applied to
the target project design without confirming the internal contents.
3 Artificial Neural Network Model
An artificial neural network (ANN) is essentially a simple mathematical model
defining a function.
where X = {xi|0 ≤ xi ≤ 1, i ≥ 1} and Y = {yi|0 ≤ yi ≤ 1, i ≥ 1}.
ANNs are non-linear statistical data modeling tools that can be used to model
complex relationships between inputs and outputs. The basic model is illustrated in
Fig. 1, where the output is calculated as follows.
1. Calculate values for hidden nodes. The value of Hidden Nodej is calculated
using the following equation:
Hidden Nodej = f Σi
(wi, j ×Inputi) (1)
where f (x) equals 1
1+exp(−x) and the wi, j are weights calculated by the learning
algorithm.
2. Calculate Out put using Hj as follows:
Out put = f Σk
(w
k ×Hidden Nodek) (2)
where f (x) equals 1
1+exp(−x) and the w
k are the weights calculated by the learning
algorithm.
We can use an ANN to create efforts and errors prediction models.
14 K. Iwata et al.
3.1 Problems in Original ANN Model
In an ANN, the range of input values or output values is usually less than or equal
to 1 and greater than or equal to 0. The values of most selected data, however, are
greater than 1. Thus each data range needs to be converted to the range [0, 1] by
normalization, a process that leads to a large margin for error in some projects.
In our original ANN model, normalized values are calculated using Eq. (3),
where the normalized value for t is expressed as fnl (t) (where t denotes E f f , Err,
Vnew, Vmodi f y, Vsurvey, andVreuse).
fnl (t) =
t −min(T)
max(T)−min(T)
(3)
where T denotes the set of t, and max(T) and min(T) denote the maximum and
minimum values of T, respectively.
Since the normalization is flat and smooth, a small change in a normalized value
has a greater degree of influence on a small-scale project than on a large scale
project.
For example, let min(TS) equal 10, max(TS) equal 300, tS1 equal 15, tS2 equal 250,
and the predicted values for tS1 and tS2 be
tS1 and
tS2 , respectively. If the prediction
model has an error of +0.01, then f−1
nl (0.01)= 2.90. The predicted values are given
as
tS1 = 17.90 and
tS2 = 252.90. In both cases the error is the same, but the absolute
values of the relative error (ARE) are given by:
ARES1 =
tS1
−tS1
tS1
=
17.90−15
15
= 0.1933
ARES2 =
tS2
−tS2
tS2
=
252.90−250
250
=
0.0116
The results show that the absolute value of the relative error of the former equation
is greater than that of the latter.
The distributions of the amount of effort and the number of errors are shown in
Figure 2 and 3, respectively. These distributions confirm that both the amount of
effort and number of errors in small-scale projects are major and significant and
greater than those in the large scale projects. Thus, in order to improve prediction
accuracy, it is important to reconstruct the normalization method.
No comments:
Post a Comment