Microsoft Word - 1-2551-6848-1-ED_s

Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3871

www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized …

Predicting Injury Severity of Angle Crashes Involving
Two Vehicles at Unsignalized Intersections Using

Artificial Neural Networks

Stephen A. Arhin

Howard University Transportation Research and Data Center
Washington, DC, USA

Adam Gatiba

Howard University Transportation Research and Data Center
Washington, DC, USA

Abstract—In 2015, about 20% of the 52,231 fatal crashes that

occurred in the United States occurred at unsignalized

intersections. The economic cost of these fatalities have been

estimated to be in the millions of dollars. In order to mitigate the

occurrence of theses crashes, it is necessary to investigate their

predictability based on the pertinent factors and circumstances

that might have contributed to their occurrence. This study

focuses on the development of models to predict injury severity of

angle crashes at unsignalized intersections using artificial neural

networks (ANNs). The models were developed based on 3,307

crashes that occurred from 2008 to 2015. Twenty-five different

ANN models were developed. The most accurate model predicted

the severity of an injury sustained in a crash with an accuracy of

85.62%. This model has 3 hidden layers with 5, 10, and 5

neurons, respectively. The activation functions in the hidden and

output layers are the rectilinear unit function and sigmoid

function, respectively.

Keywords-crashes; unsignalized intersection; artificial neural

network; injury severity

I. INTRODUCTION

Even though intersections constitute a relatively low
proportion of the facilities of transportation systems, a
significant number of crashes occur at these locations,
especially in urban areas. In California for instance, an annual
average of 1.5 crashes occur at unsignalized intersections in
rural locations, compared to an average of 2.5 crashes per year
in urban locations [1]. Data from the World Health
Organization (WHO) reveal that 1.25 million people die
annually worldwide in road crashes. The economic cost of
these deaths is estimated to be approximately $260 billion per
year [2]. In the United States, there were a total of 37,456
fatalities in road-related crashes reported in 2016 [3]. Though
most of these crashes occurred on road segments, a significant
number occurred at or near intersections. Out of the total of
52,231 fatal crashes in the United States in 2015,
approximately 4.4% (2,298) of the crashes occurred at STOP-
controlled intersections, while 7.5% (3,917) of the crashes
occurred at intersections controlled by traffic signals.
Intersections without any type of traffic control device recorded
the highest number of fatal crashes (4,227) [4].

Several studies have investigated the causes of these
crashes. These causes are either driver-induced, or occur due to
road geometry, road defects, vehicle defects and atmospheric or
weather conditions. Various countermeasures have been
proposed and/or implemented to reduce the occurrence of
crashes at intersections, which in some instances have been
successful. In order to effectively reduce the frequency and
mitigate the severity of intersection related crashes, it is
necessary to explore the predictability of these crashes based
on the pertinent factors and circumstances that might have
contributed to the occurrence of these crashes. Several studies
have resulted in the development of mathematical models that
predict crashes on roadways in general and, in a few instances,
at unsignalized intersections in particular. These mathematical
models include linear regression and machine learning
methods. Given the varying characteristics of intersections, it is
necessary to develop models that are focused and specific to a
particular set of conditions. This study therefore focuses on the
development of models to predict the severity of right-angle
crashes involving two vehicles at unsignalized intersections in
urban centers using ANNs.

II. LITERATURE REVIEW

A. Contributory Factors for Intersection-Related Crashes

There are many factors that determine the degree of injury
sustained by people involved in crashes at unsignalized
intersections. However, it is shown that only certain factors are
statistically significant predictors. Authors in [5] assessed the
degree of injury sustained by drivers involved in angle
collisions in relation to the fault status of drivers. The results of
the study showed that drivers who were not at fault tended to
sustain more severe injuries than those who were at fault. It
was further determined that injury severity was affected by
factors including time of year, speed limit, age, gender,
restraint/helmet use, and alcohol/drug use. Authors in [6]
concluded that the road surface condition (wet or dry) was a
significant predictor of injury severity. Additionally, female
drivers are more likely to sustain severe injuries than male
drivers. Crashes at urban areas were determined to result in less
serious injuries than crashes at rural areas [6]. Also, traffic
volume on a major road is a significant predictor of crashes at
unsignalized intersections [7].

Corresponding author: Stephen A. Arhin (saarhin@howard.edu)

Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3872

www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized …

The geometric characteristics and features of unsignalized
intersections have also been found to be potential explanatory
variables in crash prediction models. Authors in [8] predicted
the frequency of accidents at unsignalized intersections in
urban areas using negative binomial models. It was concluded
that besides traffic exposure functions such as traffic flow,
which usually significantly predict crashes, intersection
geometrics, absence of street lighting and dedicated left-turning
lanes are positively correlated with accident frequency at
intersections. Typical geometric characteristics included
number of lanes on major road, width of lanes, and presence of
median on intersecting roads. The study further revealed that T-
intersections with Yield control had a much lower accident
potential than those with Stop control.

B. Crash Prediction Models

Several modeling techniques have been employed to predict
crashes at intersections.

1) Linear Regression Models

Linear regression modeling is an approach to establish a
relationship between scalar responses, also called dependent
variables, and other explanatory (or independent) variables.
Model parameters are estimated using a data set of values of
the response and explanatory variables. The model is usually
fitted to the observed data set using the least square approach.
Linear regression models take the form:

�� = �� + �� �� + �
�
+ ⋯+ �� �
+ ؏� (1)
where, yi is the i

th
dependent variable, β1, β2… βp are estimated

parameters, xi1, xi2…xin are the predictor variables of the i
th

dependent variable and ؏� is the error term. The error term is
an independent and normally distributed random variable with
mean of zero and a variance greater than zero. Linear
regression modelling has been applied in several studies to
establish various relationships between the frequency of injury
crashes and other traffic characteristics. Authors in [9]
investigated the relationship between the number of injuries or
property damage only (PDO) crashes that occur annually at
intersections and traffic and environmental factors. The crash
records (ranging from 1984 to 1987) of 2,488 intersections in
California were sampled. The linear regression analysis
employed in this study was conducted in two levels. In the first
level, a simple linear regression model was developed with
injury/PDO crashes per year as the response variable and traffic
intensity, expressed in millions of vehicles entering the
intersection per year from all approaches, as the predictor
variable. In the second model, additional information such as
design, traffic control, proportion of cross street traffic, and
environmental features were included as predictor variables.
The results of the analysis showed that the accuracy of the
model improved as more predictors variables were added.

Though linear regression models are easy to use and
interpret, it has been shown that they are not ideal for crash
predictions. Crashes are usually sporadic and random in nature
and hence are not best fitted by linear relationships. Also, the
assumption that the error term is normally distributed is not
accurate for crash predictions which are usually discrete and
non-negative. Further, some factors have been determined to

strongly correlate with each other, thus introducing
multicollinearity thereby invalidating such linear models [10].
In overcoming the shortcomings of the linear regression
models, generalized linear models (GLMs) have been used to
model crashes at intersections. GLMs are a flexible
generalization of the ordinary linear regression that can
accommodate the non-normal distributed error terms. The most
common forms of generalized linear models used in crash
prediction models are the negative (NB) model and the ordered
probit model (OPM)

2) Negative Binomial Model

NB models are a generalization of the Poisson regression.
Unlike the Poisson models where the variance of the
distribution of the response variables is equal to its mean, in
NB models, the variance differs from the mean. NB models
have been found to be suitable for crash predictions due to the
nature of the dependent variables in such analysis. Usually the
response required is the number of crashes at a specific
location. Such responses are nonnegative integers and generally
follow the NB distribution. The distribution is given by the
following Poisson-Gamma distribution:

��(Y=yi |ui,α)=
ɼ(������)

ɼ(���)ɼ(����)
( ����������)

���( ����������)
�� (2)

where, u is the mean of the dependent variable y, β is an
estimated parameter to be estimated, α is the heterogeneity
parameter, and xi is the i

th
the predictor variable. Authors in

[11] investigated the relationship between crash frequencies
and factors such as traffic conditions, geometric and
operational characteristics or roadways, and weather conditions
using data of crashes that occurred from 2004 to 2010 on a
motorway in Auckland. The NB regression model developed
had a goodness of fit, ρ

2
of 0.119. Additionally, several

individual predictors such as length of road segments, AADT,
number of lanes and shoulder width were found to be
significant predictors of the model.

3) Ordered Probit Models

The ordered probit model (OPM) is used in developing
models which have an ordered response. This approach in
modeling data employs the use of the probit link function. The
latent continuous metric underlying the ordinal responses
observed are partitioned into a series of regions corresponding
to the ordinal categories. Generally, the probability of obtaining
a particular outcome is given by:

��(�� = �| !) =
"#$ (%&'(�))

(��"#$�%&'(�)�)
− "#$�%&��'(�)����"#$�%&��'(�)�� (3)

where, yi is an observable ordinal variable, Xi is a vector of
exogenous variables, β is a vector of unknown parameters to be
estimated and and τj is the threshold associated with the j

ordinal partition interval which are assumed to be of ascending
order. OPM has been applied in the development of several
crash prediction models which seek to predict injury severity
based on several factors. Authors in [12] developed an OPM
that sought to relate the severity of crashes experienced at
freeway exits. Crash data for 326 locations in Florida were
sampled. The results of the study indicated that the factors
which significantly influenced crash severity included mainline

Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3873

www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized …

lane number, length of ramp, difference of speed limits
between mainline and ramp, light condition, weather condition,
surrounding land type, alcohol/drug involvement, road surface
condition, and crash type. The model developed had a
goodness of fit of 0.019 and a chi-squared goodness of fit value
of 95.63.

4) Empirical Bayes Refinement of the GLM

Crash estimates made with GLMs are susceptible to
regression-to-the-mean. The regression to mean occurs when a
randomly large number of accidents during a period is
normally followed by a reduced number of accidents during a
similar after period, even if no measure has been implemented.
The GLMs do not account for this effect. Hence, to improve
the accuracy of the predictions made with GLMs, the empirical
Bayes (EB) method is usually applied. The EB method
compensates for the regression-to-the-mean bias by pulling the
crash count towards the mean. Thus, prior data (observed crash
counts) are combined with the predicted crash frequency from
the GLM to calculate a corrected value. The corrected value is
expected to lie somewhere between the observed crash
frequency and the predicted frequency from the GLM. This is
expressed as:

( )1 Observed crashes frequencyE Weight Weightµ= × + − × (4)

where, E is the corrected value, and µ is the average number of

crashes (determined from the GLM) [13].

5) Artificial Neural Networks (ANNs)

ANNs are mathematical models inspired by the biological
neural networks in the human brain. ANNs are used in
engineering to perform complex tasks such as pattern
recognition, forecasting, data compression and classification.
The effectiveness of an ANN is based on its ability to
approximate both linear and nonlinear functions to a required
degree of accuracy using a learning algorithm, and to build
‘‘piece-wise’’ approximations of the functions [14].
Classification or forecasting using ANNs involves training and
learning procedure, where, historical data (a set of input data
with known outputs) is presented to the network. Usually large
amounts of such data are required for the training of the
network. The network goes through a learning process by
constructing a network of inputs and outputs, and weights
assigned to each mapping are adjusted at each iteration. The
method by which these weights and bias levels of a network are
updated is determined by the learning rule used. Thus, the
learning rule helps a neural network to learn from the existing
conditions and improve its performance. There are several
learning rules used in training neural networks. Notable among
the rules are the hebbian, perceptron (error-correction), delta,
correlation and outstar learning rules [15]. However, the most
common known rule is the multilayer perceptron (MLP). MLP
basically consists of three layers: input layer, hidden layer, and
output layer. MLP is a feed forward network in which
information flows from the input layer through the hidden to
the output layer to produce the outcome. These layers have
interconnected nodes (neurons). The interconnections are
assigned weights (representing information flow) which are
computed using mathematical functions. The outputs for

specific inputs are obtained by adjusting the weights to
minimize the errors between the output produced and the
desired output by error-back propagation. The MLP is known
to be a universal approximator because of its ability to
approximate continuous functions on a compact set of real
numbers with little assumption made. Activation functions,
also called transfer functions, are an essential component of
ANNs. Activation functions are models in the output neurons
of the ANN which introduce non-linearity into the network.
They function by calculating the weighted sum of their inputs
and adding a bias, then deciding whether a neuron should be
activated or not. The three most common types of activation
functions used in an ANN are the sigmoid, the hyperbolic
tangent, and the rectified linear unit [16]. Authors in [17]
utilized ANNs to develop a model to show the relationship
between crash severity on urban highways, and traffic variables
such as traffic volume, flow speed, human factors and road,
vehicle and weather conditions. The study showed that MLP
with feed forward back propagation networks provided the best
results compared to other learning methods. Network
architecture with 2 hidden layers with 17 and 7 neurons
respectively were determined to be the best. Mean square errors
(MSE) within acceptable range of 3% to 4% were achieved.
Also, correlation coefficients of 86% to 87 % were achieved.

III. METHODOLOGY

A. Study Area

This study is based on data obtained in the District of
Columbia (DC). The capital of the USA, Washington, DC is
divided into four (equal) quadrants areas: Northwest (NW),
Northeast (NE), Southeast (SE), and Southwest (SW) which
are further divided into eight wards. As of July 2018, the
population of DC was about 702,455 with a growth rate of
approximately 1.41% [18]. The city is highly urbanized, and
it’s ranked the sixth most congested city in the United States
with each driver spending an average of 63 hours in traffic
annually [19]. It has a land area of 68.34mi

2
and a total of

1,503 miles of roadway comprised of local roads, collector
roads, minor arterials, principal arterials, freeways and
interstates [20]. Also, the city has about 7,700 intersections of
which 1,450 are signalized [21]. The American Society of Civil
Engineers’ 2017 infrastructure report card reported that about
95% of the roads in DC are in poor condition [22].

B. The Crash Database System

Crash prediction models are data dependent and as a result
the accuracy of the developed models depends largely on the
quality of the available crash data. To ensure that a reliable
model is developed, this research utilized traffic crash data
from the District Department of Transportation’s (DDOT’s)
crash database called Traffic Accident Reporting and Analysis
Systems Version 2.0 (TARAS2). The District of Columbia
Metropolitan Police Department (MPD) records traffic crash
information at the scene of crashes electronically on a Police
Department Form number 10 (PD-10) crash reporting form.
The crash data is then downloaded through secure servers from
MPD into DDOT’s database and are then processed and made
available in TARAS2, which is an Oracle-based application.
TARAS2 contains data fields that can be broadly categorized

Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3874

www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized …

under vehicle characteristics, environmental conditions,
roadway characteristics, traffic exposure characteristics, as well
as crash location, date, time, crash type, crash severity and
information on of persons involved.

C. Data Extraction and Encoding

Nine years of crash data (2008-2015) were queried and
extracted from TARAS2. The data were then filtered to obtain
angle crashes involving two vehicles at unsignalized
intersections. Further, the extracted data were cleaned by
identifying and removing duplicate and incomplete crash
records and irrelevant data fields. In all, 3,307 data points were
extracted and used for analysis. The extracted data set
contained the following fields: accident complaint number,
main street name, side street name, year of accident, month of
accident, time of accident, day of week, quadrant of accident
occurrence, type of collision, road surface condition, street
lighting condition, lighting condition, weather condition, traffic
condition, traffic control type, drivers’ age, drivers’ gender,
contributing circumstances, and injury severity. Only numerical
data can be analyzed by ANNs. Hence, qualitative data must be
converted to quantitative data. Thus, both input and output data
must be encoded into either real or integer values. Secondly,
binary method (0 and 1) of encoding has been determined to
yield better results since it minimizes the loss functions values
with respect to the models’ parameters. The loss value
determines how well the model fits the data set. The lower the
loss function value the better the model fits the data set. Table
II presents the variables and coding scheme used in this study.

D. Types of Collision

The crash types considered for this study are angle
collisions. Three types of angle collisions are specified: right-
angle, right turn, and left turn collisions.

• Right-angle collision: This type of collision occurs when
the side of one vehicle is impacted by the front of another
vehicle which is traveling in a direction at right angle to the
direction of the former vehicle. Figure 1 depicts a right-
angle collision at an intersection.

• Right turn collision: This type of collision occurs when a
vehicle turning right at an intersection is impacted by a
vehicle from the other intersecting road. Figure 2 depicts a
right turn collision.

• Left turn collision: This type of collision occurs when a left
turning vehicle at an intersection is impacted by a vehicle
from the oncoming traffic. Figure 3 depicts a left turn
collision.

E. Injury Severity

The outcome variable describes the degree of injury
severity sustained by persons involved in a crash. The crash
database specifies five degrees of injury severity: No injury,
complain, non-disabling injury, disabling injury and fatal. Due
to the insignificant percentage of fatal and disabling injury
crashes in the data set, all complain, injury and fatal crashes
were categorized as injury crashes. Table I shows the levels of
crashes used in the analysis.

Fig. 1. Right-angle collision

Fig. 2. Right turn collision

Fig. 3. Left turn collision

TABLE I. LEVELS OF INJURY SEVERITY

Injury Severity Level

No Injury Non-Injury

Complain

Injury
Non-Disabling Injury

Disabling Injury

Fatal

F. Data Standardization

To achieve accurate predictions from machine learning
models it is necessary that variables used in developing the
models are of equal scale. Also, most optimization algorithms
minimize the loss function converge faster when variables are
of the same scale. The method of scaling used on this data set is
standardization. The raw scores (of the encoded data) are
converted to standard scores by subtracting the mean of each
variable from the raw score of each observation and then
dividing the difference by the standard deviation of the
variable. By doing so, the variables are transformed to have a
mean of zero and a unit variance. The standardized value, Z, of
each score of each variable is given by (5):

_
)/(Z X X σ= − (5)

where, 6 is the mean of the variable, X is the encoded score of
each observation of a variable and σ is its standard deviation.

Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3875

www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized …

TABLE II. VARIABLE ENCODING

Variable Variable Name Code Variable Variable Name Code

Day of Crash Lighting condition

X1 Monday 1-Present, 0-Otherwise X26 Dark 1-Present, 0-Otherwise

X2 Tuesday 1-Present, 0-Otherwise X27 Dark Lighted 1-Present, 0-Otherwise

X3 Wednesday 1-Present, 0-Otherwise X28 Daylight 1-Present, 0-Otherwise

X4 Thursday 1-Present, 0-Otherwise Weather Condition

X5 Friday 1-Present, 0-Otherwise X29 Clear 1-Present, 0-Otherwise

X6 Saturday 1-Present, 0-Otherwise X30 Rain 1-Present, 0-Otherwise

X7 Sunday 1-Present, 0-Otherwise X31 Snow 1-Present, 0-Otherwise

Time of Day X32 Traffic Condition 0-Low, 1-Medium, 2-High

X8 A.M. Peak (06:00 – 10:00) 1-Present, 0-Otherwise Traffic Control Type

X9 Off Peak (10:00 – 15:00) 1-Present, 0-Otherwise X33 Stop 1-Present, 0-Otherwise

X10 P.M. Peak (15:00 – 19:00) 1-Present, 0-Otherwise X34 Yield 1-Present, 0-Otherwise

X11 Evening (19:00 – 00:00) 1-Present, 0-Otherwise X35 None 1-Present, 0-Otherwise

X12 Night (0000 – 0600) 1-Present, 0-Otherwise Contributing Circumstances of Driver 1

Quadrant X36 No Violation D1 1-Present, 0-Otherwise

X13 NW 1-Present, 0-Otherwise X37 Alcohol/ Drug Use D1 1-Present, 0-Otherwise

X14 SW 1-Present, 0-Otherwise X38 Speeding D1 1-Present, 0-Otherwise

X15 NE 1-Present, 0-Otherwise X39 STOP/ YIELD Sign Violation D1 1-Present, 0-Otherwise

X16 SE 1-Present, 0-Otherwise X40 Improper Maneuvering D1 1-Present, 0-Otherwise

X17 BN 1-Present, 0-Otherwise Contributing Circumstances of Driver 2

Type of Collision X42 No Violation D2 1-Present, 0-Otherwise

X18 Right Angle 1-Present, 0-Otherwise X43 Alcohol/ Drug Use D2 1-Present, 0-Otherwise

X19 Left Turn 1-Present, 0-Otherwise X44 Speeding D2 1-Present, 0-Otherwise

X20 Right Turn 1-Present, 0-Otherwise X46 Improper Maneuvering D2 1-Present, 0-Otherwise

Road Surface Condition X47 Distraction D2 1-Present, 0-Otherwise

X21 Wet 1-Present, 0-Otherwise

X22 Dry 1-Present, 0-Otherwise X48 Age of Driver 1 1-Present, 0-Otherwise

Street Lighting X49 Age of Driver 2 1-Present, 0-Otherwise

X23 Light Off 1-Present, 0-Otherwise X50 Gender of Driver 1 0-Female, 1-Male

X24 Light On 1-Present, 0-Otherwise X51 Gender of Driver 2 0-Female, 1-Male

X25 None 1-Present, 0-Otherwise Y1 Injury Severity 0-No Injury, 1-Injury

G. Development of Models

The process of classification by ANN is an iterative process
of weight adjustments based on information flow that mimics
the functioning of neurons in the human brain. The steps below
describe in detail how models for crash injury severity
classification were developed using ANN:

• Selection of network architecture.

• Training of neural network.

• Testing and evaluation of model.

1) Selection of Network Architecture

The network architecture was first set up. A multi-layer
perceptron (MLP) feedforward ANN was adopted to develop
classification models. An MLP consists of at least three layers:
an input layer, hidden layer(s) and an output layer. Each layer
consists of nodes or neurons. The neurons of each layer are
interconnected with those of the succeeding layer. Also, the
neurons of the hidden and output layers are embedded with
nonlinear activation functions. The MLP ANN architecture
used in this research consists of an input layer with 44 neurons
(each neuron represents each of the input variables, Xi in Table
II) and an output layer with 1 neuron, which is the target or
dependent variable, Y. The number of hidden layers and
neurons varied for several iterations until the optimal numbers
of hidden layers and neurons which produced the best model

were obtained. Figure 4 shows the MLP ANN architecture used
in developing the model.

Fig. 4. MLP ANN

2) Training of Neural Network

Training of the neural network by backward propagation
was carried out in the following sequence:

• Presentation of training dataset to the network: The training
dataset was imported into the network to commence
training. The vector of independent variables was fed into
each input neuron connected to neurons of the first hidden
layer. The training process was initialized by randomly

Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3876

www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized …

selecting weights for all interconnections between the
neurons of the input and hidden layers.

• Forward Computation: The forward propagation was then
implemented by multiplying the weights with the values of
the input neurons and the sum products are stored in the
corresponding neurons of the hidden layer. The weighted
sums are subsequently transferred into an activation
function and based on the output of the functions, the
neuron is either activated or not. Mathematically this can be
expressed as:

789 = ∑ ;8�9 �
(9'�)<

�=� (6)

�89 = ф��78� (7)

where, 789 is the weighted sum in jth neuron of the lth hidden
layer, ;8�9 is the weight coefficient of the jth neuron of the lth
layer that is fed from the i

th
neuron in layer l-1, �

(9'�)
the

output of th i-th neuron in the previous layer l-1, �8 is the
output of the of the j

th
neuron in layer l-1, ф� is the activation

function which is a rectilinear unit function in the hidden layers
and a sigmoid function in the output layer. Hence for the last
layer (output layer) l=L,

�8?(@) = AB (8)
where, AB is the output of the n-th iteration.
• Computation of error: The error of the j

th
neuron of the n

iteration is then computed as

C8(@) = D8(@) − AB(@) ¨ (9)
where, D8 is the target output.
• Backward computation: The weights in the network are

adjusted based on a local gradient, σ, which is a function of
the error, e, and computed as follows:

E89(@) = C8?(@) фF G78?(@)H (10a)

for neuron j in the output layer L, and

E89(@) = C8?фF G789(@)H∑ EI
(9��)(@);I8

(9��)(@)I (10b)

for neuron j in the hidden layer L, where, k is the succeeding
neuron in layer l+1 and фF(·) is the derivative of the function
ф(·). The weights in the network are then adjusted by the given
relation:

;8�9 (@ + 1) = ;8�9 (@)B + KL;8�9 (@)B(@ − 1)M + NE89(@)��
(9'�)(@) (11)

where η is the learning-rate parameter and α is the momentum

constant.

• Iteration: The procedures in the three previous steps are
repeated for batches of 3 observations per iteration until the
stopping criteria of 100 epochs is met. Figure 5 illustrates
the training process.

3) Model Testing and Evaluationl

After the training of the network for the required number of
epochs (100), the model was tested using the test dataset. The
accuracy of the model was evaluated by the confusion matrix.

The number of hidden layers and neurons in the network
architecture was varied and the training process was repeated.
This iterative process was done until the model with the best
performance was achieved.

Fig. 5. ANN training process

4) Model Evaluation

The performance of each of the models was assessed using
the test dataset. The results were then evaluated by using the
data generated by a confusion matrix (CM). A CM contains
information about actual and predicted classifications done by a
classification system. Each row of the CM represents the
instances of an actual class and each column represents the
instances of a predicted class. Table III shows the confusion
matrix for a two-class classifier.

TABLE III. CONFUSION MATRIX

Total No. of

Observations

Predicted

Negative Positive

Actual
Negative True Negative (TN) False Positive (FP)

Positive False Negative (FN) True Positive (TP)

The entries of the CM are defined as follows: True Positive

(TP) instances are positive and correctly classified as positive,
True Negative (TN) instances are negative and correctly
classified as negative, False Positive (FP) instances are
negative but wrongly classified as positive, and False Negative
(FN) instances are positive but wrongly classified as negative.
Based on the CM, the following measures were computed to
evaluate the models developed.

• Accuracy (AC): The accuracy is the proportion of the total
number of predictions that were correctly classified. It is
computed as:

AC=(TN+TP)/(TN+FP+FN+TP) (12)

• Error Rate (ER): The error rate is the rate at which
predictions will be misclassified:

ER=1-AC (13)

• Sensitivity (S): It is the proportion of positive cases that
were correctly identified:

S=TP/(FN+TP) (14)

Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3877

www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized …

• Precision (P): It is the proportion of the predicted positive
cases that were correct:

P=TP/(FP+TP) (15)

• F-measure (F): It is a measure of the accuracy of the test
model computed using S and P. The value of F ranges from
0 to 1, where 1 shows an excellent model and 0 shows a
bad model. F- measure is calculated as:

F=2(S·P)/(S+P) (16)

H. Analysis Software

The classification models of all three machine learning
techniques were developed by using the high-level general-
purpose programming language Python. Especifically, the
Anaconda Python distribution was used. This is an open source
distribution with standard and robust libraries for data
processing, analysis and machine learning applications. The
NumPy and Pandas libraries were imported to facilitate data
preprocessing. Also, Tensorflow and Keras libraries were
imported to develop the ANN models. In addition, the
descriptive statistics of the data were obtained using IBM
Statistical Software for Social Scientist (SPSS).

IV. RESULTS

A. Descriptive Statistics

Tables IV and V present the descriptive statistics of the data
set. The frequencies of categorical variables are presented in
Table IV, while Table V presents the mean and standard
deviation of the continuous variable Age. It can be observed
from Table IV that the highest number of crashes (1,252)
occurred during the off-peak period, from 10:00A.M. to
3:00P.M., while the least number of crashes (176) occurred at
night, between 12:00AM to 6:00AM. Most of the crashes
occurred on Tuesdays, Wednesdays and Thursdays while
Sundays recorded the least number of crashes. The Northwest
quadrant of Washington D.C. recorded the highest number of
crashes (1,167). Right-angle collision was the most frequent
occurring crash type. Most of the crashes occurred under
daylight, clear weather and light level traffic conditions.
Though most crashes were as a result of no violation on the
part of one or both drivers, distracted driving and Stop/Yield-
sign violation were also reported as comparatively high
contributing circumstances. Among the drivers involved, 3,936
were male and 2,678 were female. Of the 3,307 recorded
crashes, 1,274 resulted in injury. It is observed that the rate of
injury crashes was highest during the night (41.24%), on
Fridays (41%), and in the northeast quadrant (40.44%). Most
were right turn collisions (40.69%), absent street lights
(39.52%), rainy weather (50.57%), under light traffic
conditions (54.78%). Intersections controlled by Yield signs
also recorded the highest rate (70.59%) of injury crashes. This
is complemented by the fact that the highest rates of injury
crashes were a result of at least one driver’s failure to comply
with a Stop/Yield sign. Thus, the contributing circumstance
which resulted in the highest rate (69.94%) is Stop/Yield sign
violation. Crashes in which at least on driver was a female
recorded the highest rate of injury crashes. A correlation
analysis was conducted to investigate the relations between age
and injury severity. The results are presented in Table VI. The

Spearman’s Rho of -0.52 was found to be statistically
significant (p=0.03). This implies that, the severity of a crash
increased with decreasing age of drivers involved in the crash.

TABLE IV. CRASH FREQUENCIES

No Factor Level

Crashes

Total Injury
Non-

Injury

Rate (%)

1
Period of

Day

A.M. Peak 730 296 435 40.49

Off Peak 1252 466 785 37.25

P.M. Peak 776 298 478 38.4

Evening 373 142 230 38.17

Night 176 73 104 41.24

2 Day of Week

Monday 265 102 163 38.49

Tuesday 566 228 338 40.28

Wednesday 957 371 586 38.77

Thursday 657 243 414 36.99

Friday 400 160 240 40

Saturday 261 90 170 34.62

Sunday 201 80 122 39.6

3 Quadrant

Northwest 1,167 442 725 37.87

Northeast 858 347 511 40.44

Southwest 226 76 150 33.62

Southeast 984 382 602 38.82

Boundary 72 27 45 39.13

4
Type of
Collision

Right Angle 1,338 530 808 39.61

Left Turn 1,217 438 779 39.61

Right Turn 752 306 446 40.69

5
Street

Lighting
Condition

Lights Off 2,503 967 1,536 38.63

Lights On 680 258 422 37.94

None 124 49 75 39.52

6
Lighting
Condition

Dark 757 15 727 2.02

Dark (Lighted) 581 193 388 33.22

Day Light 1,967 1,063 906 53.99

7
Weather
Condition

Clear 2,350 921 1,429 39.19

Rain 609 308 301 50.57

Snow 348 45 303 12.93

8
Traffic

Condition

Light 2,178 1,193 985 54.78

Medium 808 71 737 8.79

Heavy 321 71 737 8.79

9
Traffic

Control Type

STOP Sign 2,504 1,066 1,450 42.37

YIELD Sign 604 132 55 70.59

None 187 76 528 12.58

10
Gender of
Driver 1

Male 1,621 419 1,202 25.85

Female 1,686 855 831 50.71

11
Gender of
Driver 2

Male 2,315 1,026 1,289 44.32

Female 992 248 744 25

12
Contri.

Circum. of
Driver 1

No Viloation 1,700 869 831 51.12

Alcohol 159 0 159 0

Distracted 682 122 560 17.89

Speed 430 134 296 31.16

STOP/YIELD
Sign Violation

310 148 162 47.74

Improper
Maneuver

24 2 22 8.33

13
Contri.

Circum. of
Driver 2

No Viloation 1,041 7 764 0.91

Alcohol 160 0 160 0

Distracted 996 408 588 40.96

Speed 276 7 269 2.54

STOP/YIELD
Sign Violation

672 470 202 69.94

Improper
Maneuver

161 112 49 69.57

14
Injury
Severity

3,307 1,274 2,033 38.52

Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3878

www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized …

TABLE V. DRIVER AGE STATISTICS

Factor Mean Standard Deviation Min. Max

Drivers Age 42.56 15.73 14 86

TABLE VI. AGE-INJURY SEVERITY CORELATION ANALYSIS

Factor Test Statistic (Spearman’s Rho) P-value

Age of Driver -0.52 0.03

B. Spatial Distribution of Crashes

This section presents the results of the spatial analysis of
the crashes using ArcGIS Pro software program. The spatial
analysis performed included the spatial distribution of crashes
based on injury severity and a kernel density analysis for injury
crashes. The spatial distribution and density of crashes are
shown in Figures 6 and 7, respectively. Figure 7 shows that
most of the crashes were located in the NW quadrant. This
covers the downtown and central business district of
Washington DC. Figure 7 also shows that higher densities of
injury crashes are in the same region of Washington DC.

Fig. 6. Spatial distribution of crashes [source: ArcGISPro]

Fig. 7. Kernel density of injury crashes [source: ArcGISPro]

C. Results of Classification of Crashes

Twenty-five distinct ANN models were developed using
the training dataset. Each model was trained with batches of 3
observations per iteration until the stopping criteria of 100
epochs was met. The performance of each model was then
evaluated using the test data set (which constitutes of 25% of
the total dataset). The performance of the models after training
and testing are presented in Tables VII and VIII respectively.

The Tables show the number of models explored and the
structure of the neural network. The performance measures
(accuracy, error rate, sensitivity, precision and F-measure) of
each model were computed and are also presented.

TABLE VII. RESULTS OF TRAINING ANN

Model

Network Arch.

AC ER S P F Hidden

Layer No.

No. of

Neurons

1 1 20 0.9181 0.0819 0.8995 0.8892 0.8943

2 1 15 0.9032 0.0968 0.9162 0.8454 0.8794

3 1 5 0.8649 0.1351 0.8366 0.8170 0.8267

4 1 3 0.8573 0.1427 0.8461 0.7961 0.8203

5 2 25-20 0.9585 0.0415 0.9455 0.9465 0.9460

6 2 20-25 0.9472 0.0528 0.9435 0.9213 0.9322

7 2 20-15 0.9512 0.0488 0.9874 0.8964 0.9397

8 2 15-20 0.9258 0.0742 0.9539 0.8668 0.9083

9 2 10-15 0.9157 0.0843 0.9529 0.8473 0.8970

10 2 15-10 0.9302 0.0698 0.9445 0.8826 0.9125

11 2 5-10 0.8722 0.1278 0.8785 0.8067 0.8411

12 2 10-5 0.9060 0.0940 0.8953 0.8654 0.8801

13 2 6-3 0.8685 0.1315 0.8628 0.8086 0.8349

14 2 3-6 0.8597 0.1403 0.8440 0.8020 0.8224

15 2 2-2 0.8427 0.1573 0.8304 0.7767 0.8026

16 3 30-20-25 0.9516 0.0484 0.9832 0.9170 0.9490

17 3 25-30-20 0.9689 0.0311 0.9204 0.9565 0.9381

18 3 20-15-20 0.9402 0.0598 0.9916 0.8926 0.9395

19 3 15-20-15 0.9404 0.0596 0.9644 0.8916 0.9266

20 3 15-10-15 0.9293 0.0707 0.9738 0.8692 0.9185

21 3 10-15-10 0.9310 0.0690 0.8995 0.8677 0.8833

22 3 5-10-5 0.9115 0.0885 0.8859 0.8270 0.8554

23 3 10-5-10 0.9102 0.0898 0.9414 0.8293 0.8818

24 3 6-4-2 0.9159 0.0841 0.9058 0.8374 0.8702

25 3 6-2-6 0.9237 0.0763 0.9058 0.8547 0.8795

The accuracy, sensitivity, precision and F-measure (F)
performance measures range from 0 to 1, with values closer to
1 showing models with better performance measures and
conversely values closer to 0 showing worse performance
measures. In contrast, models with error rates (ER) closer to 0
are better than models with error rate closer to 1. The results of
the analysis in Table VII show that after the training of the
models, the accuracy ranged from 84.87% to 96.89%. Model
17 produced the best classification accuracy (96.89%) with a
corresponding error rate of 3.11%, while Model 15 produced
the worse accuracy (84.87%) with a corresponding error rate of
15.73%. Model 7 had the highest sensitivity (S) measure, while
Model 15 had the least sensitivity measure. With regards to the
precision measure, Model 17 was the most precise (P) model
with a precision of 0.9565, while Model 15 was the least
precise one. Model 16 recorded the highest F-measure of
0.9490, while the lowest F-measure was recorded by Model 6.
The variation of performance measures with varying models is
shown in Figure 8. Table VIII presents the results of evaluation
of the trained models using the test data set. The results show
that the accuracy (after testing) of the models ranged from
76.54% to 85.62%. Model 22 produced the best classification
accuracy (85.62%) with a corresponding error rate of 14.38%,
while Model 6 produced the worse accuracy. Model 14 had the
highest sensitivity measure, while Model 16 had the least
sensitivity measure. With regards to the precision measure,
Model 15 was the most precise model with a precision of
0.7850, while Model 18 was the least precise model with a

Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3879

www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized …

precision of 0.6882. Model 15 recorded the highest F-measure
of 0.7875, while the lowest F-measure was recorded by Model
6. The variation of performance measures with varying models
is shown in Figure 9.

Fig. 8. Variation of performance measures for training dataset using ANN

TABLE VIII. RESULTS OF TESTING ANN

Model

Network Arch.

AC ER S P F Hidden

Layer No.

No. of

Neurons

1 1 20 0.8005 0.1995 0.7900 0.7200 0.7534

2 1 15 0.8114 0.1886 0.7492 0.7587 0.7539

3 1 5 0.7896 0.2104 0.7524 0.7164 0.7339

4 1 3 0.8295 0.1705 0.7806 0.7781 0.7793

5 2 25-20 0.7872 0.2128 0.7210 0.7256 0.7233

6 2 20-25 0.7654 0.2346 0.7116 0.6900 0.7006

7 2 20-15 0.7836 0.2164 0.7304 0.7147 0.7225

8 2 15-20 0.7787 0.2213 0.7179 0.7112 0.7145

9 2 10-15 0.7944 0.2056 0.7586 0.7224 0.7401

10 2 15-10 0.7715 0.2285 0.7429 0.6890 0.7149

11 2 5-10 0.8198 0.1802 0.7680 0.7656 0.7668

12 2 10-5 0.7993 0.2007 0.7524 0.7339 0.7430

13 2 6-3 0.8174 0.1826 0.7774 0.7561 0.7666

14 2 3-6 0.8114 0.1886 0.8276 0.7233 0.7719

15 2 2-2 0.8356 0.1644 0.7900 0.7850 0.7875

16 3 30-20-25 0.8440 0.1560 0.6865 0.7252 0.7053

17 3 25-30-20 0.8256 0.1744 0.7398 0.7024 0.7206

18 3 20-15-20 0.8100 0.1900 0.8025 0.6882 0.7410

19 3 15-20-15 0.8300 0.1700 0.7837 0.7163 0.7485

20 3 15-10-15 0.8511 0.1489 0.7367 0.7460 0.7413

21 3 10-15-10 0.8532 0.1468 0.7712 0.7546 0.7628

22 3 5-10-5 0.8562 0.1438 0.7586 0.7586 0.7586

23 3 10-5-10 0.8457 0.1543 0.7524 0.7385 0.7453

24 3 6-4-2 0.8406 0.1594 0.8119 0.7379 0.7731

25 3 6-2-6 0.8340 0.1660 0.7868 0.7233 0.7538

Fig. 9. Variation of performance measures for testing dataset using ANN

V. DISCUSSION

The study sought to develop classification models to predict
injury severity of angle crashes involving two vehicles at
unsignalized intersections using ANNs. A total of 3,307
reported crashes from 2008 to 2015 were extracted from a
crash database and used in the analysis. Of the total number of
crashes, 1,272 resulted in injury and/or fatality, while the
remaining 2,035 crashes were non-injury crashes. The spatial
distribution of the crashes showed that the downtown area of
Washington DC experienced the highest frequency of crashes.
Also, most of the crashes occurred during off-peak periods and
under light traffic conditions. Right angle collisions were the
most frequent collision type. The combination of driver
contributing circumstances which result in injury were
Stop/Yield sign violation by one driver, and no violation on the
part of the other driver.

The accuracy of classification models developed using
ANN generally tends to increase as the number of hidden
layers increases. Models with higher accuracies were attained
with three hidden layers. Model 22 was the most accurate
(85.62%) for predicting injury severity of angle crashes at
unsignalized intersections. This model has 3 hidden layers with
5, 10, and 5 neurons respectively. The activation function in the
hidden layers is the rectilinear unit function and the activation
function in the output layer is the sigmoid function. The
confusion matrix of this model is presented in Table IX. We
can see that 51.5% of the crashes were correctly classified as
non-injury crashes, while 10.3% were wrongly classified as
injury crashes. Similarly, 29% of the crashes were correctly
classified as injury crashes while 9.2% were wrongly classified
as non-injury crashes. F-measure, is a combined measure for
both precision and sensitivity. F-measures of the ANN models
generally ranged between 0.7 and 0.8, and the higher values of
F-measure were achieved with two hidden layers. Models 15
and 22 are the most accurate ANN models for predicting injury
severity of angle crashes at unsignalized intersections.

TABLE IX. CONFUSION MATRIX OF MODEL 22

Total No. of Observations
Predicted

Negative Positive

Actual
Negative 431 77

Positive 77 242

VI. CONCLUSION AND RECOMMENDATION

In conclusion, the most accurate ANN model for predicting
the severity of an injury sustained in a crash is a model with 3
hidden layers with 5, 10, and 5 neurons. The activation
functions in the hidden and output layers are the rectilinear unit
function and sigmoid function. This research explored the
ANN machine learning technique. Future research can explore
other techniques such as decision trees, K-nearest neighbors
and linear discriminants. Also, other types of crashes can be
explored at unsignalized intersections. Further, these analyses
could be extended to signalized intersections.

REFERENCES

[1] T. R. Neuman, R. Pfefer, K. L. Slack, K. K. Hardy, D. W. Harwood, I.
B. Potts, D. J. Torbic, E. R. K. Rabbani, National Cooperative Highway

Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3880

www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized …

Research Program: Guidance for Implementation of the AASHTO
Strategic Highway Safety Plan, Transportation Research Board, 2003

[2] World Health Organization, Global Status Report on Toad Safety 2015,
WHO, 2015

[3] National Highway Traffic Safety Administration, “USDOT Releases
2016 Fatal Traffic Crash Data”, available at: https://www.nhtsa.gov/
press-releases/usdot-releases-2016-fatal-traffic-crash-data, 2017

[4] National Highway Traffic Safety Administration, Traffic Safety Facts
2015, US Department of Transportation-National Highway Traffic
Safety Administration, 2015

[5] B. J. Russo, P. T. Savolainen, W. H. Schneider, P. C. Anastasopoulos,
“Comparison of factors affecting injury severity in angle collisions by
fault status using a random parameter bivariate ordered probit model”,
Analytic Methods in Accident Research, Vol. 2, pp. 21-29, 2014

[6] R. Garrido, A. Bastos, A. de Almeida, J. P. Elvas, “Prediction of Road
Accident Severity Using the Ordered Probit Model”, Transport
Research. Procedia, Vol. 3, pp. 214-223, 2014

[7] T. Sayed, F. Rodriguez, “Accident Prediction Models for Urban
Unsignalized Intersections in British Columbia”, Transportation
Research Record Journal of the Transportation Research Board, Vol.
1665, No. 1, pp. 93-99, 1999

[8] W. Ackaah, M. Salifu, “Crash prediction model for two-lane rural
highways in the Ashanti region of Ghana”, International Association of
Traffic and Safety Sciences Research, Vol. 35, No. 1, pp. 34-40, 2011

[9] M. Y. Lau, A. D. May, Accident Prediction Model Development:
Signalized Intersections, Institute of Transportation Studies, University
of California-Berkeley, 1988

[10] A. Kamer-Ainur, M. Marioara, “Errors And Limitations Associated
With Regression And Correlation Analysis”, Statistics and Economic
Informatics, pp. 710-712, 2007

[11] P. Chengye, P. Ranjitkar, “Modelling Motorway Accidents using
Negative Binomial Regression”, Journal of the Eastern Asia Society for
Transportation Studies, Vol. 10, pp. 1946-1963, 2013

[12] Z. Yang, L. Zhibin, L. Pan, Z. Liteng, “Exploring contributing factors to
crash injury severity at freeway diverge areas using ordered probit
model”, Procedia Engineering, Vol. 21, pp. 178-185, 2011

[13] Federal Highway Administration, “Highway Safety Improvement
Program Manual–Safety”, available at: https://safety.fhwa.dot.gov/
hsip/resources/fhwasa09029/sec6.cfm, 2011

[14] G. Dutta, P. Jha, A. K. Laha, N. Mohan, “Artificial Neural Network
Models for Forecasting Stock Price Index in the Bombay Stock
Exchange”, Journal of Emerging Market Finance, Vol. 5, No. 3, pp. 283-
295, 2006

[15] M. H. Hassoun, Fundamentals of Artificial Neural Networks, MIT Press,
1995

[16] S. Sharma, “Activation Functions in Neural Networks”, available at:
https://towardsdatascience.com/activation-functions-neural-networks-
1cbd9f8d91d6, 2017

[17] F. R. Moghaddam, S. Afandizadeh, M. Ziyadi, “Prediction of accident
severity using artificial neural networks”, International Journal of Civil
Engineering, Vol. 9, No. 1, pp. 41-49, 2011

[18] K. S. Jadaan, M. Al-Fayyad, H. F. Gammoh, “Prediction of Road Traffic
Accidents in Jordan using Artificial Neural Network (ANN)”, Journal of
Traffic Logistics Engineering, Vol. 2, No. 2, pp. 92-94, 2014

[19] Office of the State Superintendent of Education, “New U.S. Census
Bureau Numbers Officially Put DC’s Population Over 700,000”,
available at: https://osse.dc.gov/release/new-us-census-bureau-numbers-
officially-put-dc%E2%80%99s-population-over-700000, 2018

[20] T. Winship, “The 10 US cities with the worst traffic”, available at:
https://www.businessinsider.com/the-10-us-cities-with-the-worst-traffic-
2018-2, 2018

[21] District Department of Transportation, “DDOT by the Numbers”,
available at: https://ddot.dc.gov/page/ddot-numbers

[22] American Society of Civil Engineers, Repord Card for D.C.’s
Infrastructure, ASCE, 2016