Microsoft Word - 1-2551-6848-1-ED_s Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3871 www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized Intersections Using Artificial Neural Networks Stephen A. Arhin Howard University Transportation Research and Data Center Washington, DC, USA Adam Gatiba Howard University Transportation Research and Data Center Washington, DC, USA Abstract—In 2015, about 20% of the 52,231 fatal crashes that occurred in the United States occurred at unsignalized intersections. The economic cost of these fatalities have been estimated to be in the millions of dollars. In order to mitigate the occurrence of theses crashes, it is necessary to investigate their predictability based on the pertinent factors and circumstances that might have contributed to their occurrence. This study focuses on the development of models to predict injury severity of angle crashes at unsignalized intersections using artificial neural networks (ANNs). The models were developed based on 3,307 crashes that occurred from 2008 to 2015. Twenty-five different ANN models were developed. The most accurate model predicted the severity of an injury sustained in a crash with an accuracy of 85.62%. This model has 3 hidden layers with 5, 10, and 5 neurons, respectively. The activation functions in the hidden and output layers are the rectilinear unit function and sigmoid function, respectively. Keywords-crashes; unsignalized intersection; artificial neural network; injury severity I. INTRODUCTION Even though intersections constitute a relatively low proportion of the facilities of transportation systems, a significant number of crashes occur at these locations, especially in urban areas. In California for instance, an annual average of 1.5 crashes occur at unsignalized intersections in rural locations, compared to an average of 2.5 crashes per year in urban locations [1]. Data from the World Health Organization (WHO) reveal that 1.25 million people die annually worldwide in road crashes. The economic cost of these deaths is estimated to be approximately $260 billion per year [2]. In the United States, there were a total of 37,456 fatalities in road-related crashes reported in 2016 [3]. Though most of these crashes occurred on road segments, a significant number occurred at or near intersections. Out of the total of 52,231 fatal crashes in the United States in 2015, approximately 4.4% (2,298) of the crashes occurred at STOP- controlled intersections, while 7.5% (3,917) of the crashes occurred at intersections controlled by traffic signals. Intersections without any type of traffic control device recorded the highest number of fatal crashes (4,227) [4]. Several studies have investigated the causes of these crashes. These causes are either driver-induced, or occur due to road geometry, road defects, vehicle defects and atmospheric or weather conditions. Various countermeasures have been proposed and/or implemented to reduce the occurrence of crashes at intersections, which in some instances have been successful. In order to effectively reduce the frequency and mitigate the severity of intersection related crashes, it is necessary to explore the predictability of these crashes based on the pertinent factors and circumstances that might have contributed to the occurrence of these crashes. Several studies have resulted in the development of mathematical models that predict crashes on roadways in general and, in a few instances, at unsignalized intersections in particular. These mathematical models include linear regression and machine learning methods. Given the varying characteristics of intersections, it is necessary to develop models that are focused and specific to a particular set of conditions. This study therefore focuses on the development of models to predict the severity of right-angle crashes involving two vehicles at unsignalized intersections in urban centers using ANNs. II. LITERATURE REVIEW A. Contributory Factors for Intersection-Related Crashes There are many factors that determine the degree of injury sustained by people involved in crashes at unsignalized intersections. However, it is shown that only certain factors are statistically significant predictors. Authors in [5] assessed the degree of injury sustained by drivers involved in angle collisions in relation to the fault status of drivers. The results of the study showed that drivers who were not at fault tended to sustain more severe injuries than those who were at fault. It was further determined that injury severity was affected by factors including time of year, speed limit, age, gender, restraint/helmet use, and alcohol/drug use. Authors in [6] concluded that the road surface condition (wet or dry) was a significant predictor of injury severity. Additionally, female drivers are more likely to sustain severe injuries than male drivers. Crashes at urban areas were determined to result in less serious injuries than crashes at rural areas [6]. Also, traffic volume on a major road is a significant predictor of crashes at unsignalized intersections [7]. Corresponding author: Stephen A. Arhin (saarhin@howard.edu) Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3872 www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … The geometric characteristics and features of unsignalized intersections have also been found to be potential explanatory variables in crash prediction models. Authors in [8] predicted the frequency of accidents at unsignalized intersections in urban areas using negative binomial models. It was concluded that besides traffic exposure functions such as traffic flow, which usually significantly predict crashes, intersection geometrics, absence of street lighting and dedicated left-turning lanes are positively correlated with accident frequency at intersections. Typical geometric characteristics included number of lanes on major road, width of lanes, and presence of median on intersecting roads. The study further revealed that T- intersections with Yield control had a much lower accident potential than those with Stop control. B. Crash Prediction Models Several modeling techniques have been employed to predict crashes at intersections. 1) Linear Regression Models Linear regression modeling is an approach to establish a relationship between scalar responses, also called dependent variables, and other explanatory (or independent) variables. Model parameters are estimated using a data set of values of the response and explanatory variables. The model is usually fitted to the observed data set using the least square approach. Linear regression models take the form: �� = �� + �� �� + � � + ⋯+ �� � + ؏� (1) where, yi is the i th dependent variable, β1, β2… βp are estimated parameters, xi1, xi2…xin are the predictor variables of the i th dependent variable and ؏� is the error term. The error term is an independent and normally distributed random variable with mean of zero and a variance greater than zero. Linear regression modelling has been applied in several studies to establish various relationships between the frequency of injury crashes and other traffic characteristics. Authors in [9] investigated the relationship between the number of injuries or property damage only (PDO) crashes that occur annually at intersections and traffic and environmental factors. The crash records (ranging from 1984 to 1987) of 2,488 intersections in California were sampled. The linear regression analysis employed in this study was conducted in two levels. In the first level, a simple linear regression model was developed with injury/PDO crashes per year as the response variable and traffic intensity, expressed in millions of vehicles entering the intersection per year from all approaches, as the predictor variable. In the second model, additional information such as design, traffic control, proportion of cross street traffic, and environmental features were included as predictor variables. The results of the analysis showed that the accuracy of the model improved as more predictors variables were added. Though linear regression models are easy to use and interpret, it has been shown that they are not ideal for crash predictions. Crashes are usually sporadic and random in nature and hence are not best fitted by linear relationships. Also, the assumption that the error term is normally distributed is not accurate for crash predictions which are usually discrete and non-negative. Further, some factors have been determined to strongly correlate with each other, thus introducing multicollinearity thereby invalidating such linear models [10]. In overcoming the shortcomings of the linear regression models, generalized linear models (GLMs) have been used to model crashes at intersections. GLMs are a flexible generalization of the ordinary linear regression that can accommodate the non-normal distributed error terms. The most common forms of generalized linear models used in crash prediction models are the negative (NB) model and the ordered probit model (OPM) 2) Negative Binomial Model NB models are a generalization of the Poisson regression. Unlike the Poisson models where the variance of the distribution of the response variables is equal to its mean, in NB models, the variance differs from the mean. NB models have been found to be suitable for crash predictions due to the nature of the dependent variables in such analysis. Usually the response required is the number of crashes at a specific location. Such responses are nonnegative integers and generally follow the NB distribution. The distribution is given by the following Poisson-Gamma distribution: ��(Y=yi |ui,α)= ɼ(������) ɼ(���)ɼ(����) ( ����������) ���( ����������) �� (2) where, u is the mean of the dependent variable y, β is an estimated parameter to be estimated, α is the heterogeneity parameter, and xi is the i th the predictor variable. Authors in [11] investigated the relationship between crash frequencies and factors such as traffic conditions, geometric and operational characteristics or roadways, and weather conditions using data of crashes that occurred from 2004 to 2010 on a motorway in Auckland. The NB regression model developed had a goodness of fit, ρ 2 of 0.119. Additionally, several individual predictors such as length of road segments, AADT, number of lanes and shoulder width were found to be significant predictors of the model. 3) Ordered Probit Models The ordered probit model (OPM) is used in developing models which have an ordered response. This approach in modeling data employs the use of the probit link function. The latent continuous metric underlying the ordinal responses observed are partitioned into a series of regions corresponding to the ordinal categories. Generally, the probability of obtaining a particular outcome is given by: ��(�� = �| !) = "#$ (%&'(�)) (��"#$�%&'(�)�) − "#$�%&��'(�)����"#$�%&��'(�)�� (3) where, yi is an observable ordinal variable, Xi is a vector of exogenous variables, β is a vector of unknown parameters to be estimated and and τj is the threshold associated with the j th ordinal partition interval which are assumed to be of ascending order. OPM has been applied in the development of several crash prediction models which seek to predict injury severity based on several factors. Authors in [12] developed an OPM that sought to relate the severity of crashes experienced at freeway exits. Crash data for 326 locations in Florida were sampled. The results of the study indicated that the factors which significantly influenced crash severity included mainline Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3873 www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … lane number, length of ramp, difference of speed limits between mainline and ramp, light condition, weather condition, surrounding land type, alcohol/drug involvement, road surface condition, and crash type. The model developed had a goodness of fit of 0.019 and a chi-squared goodness of fit value of 95.63. 4) Empirical Bayes Refinement of the GLM Crash estimates made with GLMs are susceptible to regression-to-the-mean. The regression to mean occurs when a randomly large number of accidents during a period is normally followed by a reduced number of accidents during a similar after period, even if no measure has been implemented. The GLMs do not account for this effect. Hence, to improve the accuracy of the predictions made with GLMs, the empirical Bayes (EB) method is usually applied. The EB method compensates for the regression-to-the-mean bias by pulling the crash count towards the mean. Thus, prior data (observed crash counts) are combined with the predicted crash frequency from the GLM to calculate a corrected value. The corrected value is expected to lie somewhere between the observed crash frequency and the predicted frequency from the GLM. This is expressed as: ( )1 Observed crashes frequencyE Weight Weightµ= × + − × (4) where, E is the corrected value, and µ is the average number of crashes (determined from the GLM) [13]. 5) Artificial Neural Networks (ANNs) ANNs are mathematical models inspired by the biological neural networks in the human brain. ANNs are used in engineering to perform complex tasks such as pattern recognition, forecasting, data compression and classification. The effectiveness of an ANN is based on its ability to approximate both linear and nonlinear functions to a required degree of accuracy using a learning algorithm, and to build ‘‘piece-wise’’ approximations of the functions [14]. Classification or forecasting using ANNs involves training and learning procedure, where, historical data (a set of input data with known outputs) is presented to the network. Usually large amounts of such data are required for the training of the network. The network goes through a learning process by constructing a network of inputs and outputs, and weights assigned to each mapping are adjusted at each iteration. The method by which these weights and bias levels of a network are updated is determined by the learning rule used. Thus, the learning rule helps a neural network to learn from the existing conditions and improve its performance. There are several learning rules used in training neural networks. Notable among the rules are the hebbian, perceptron (error-correction), delta, correlation and outstar learning rules [15]. However, the most common known rule is the multilayer perceptron (MLP). MLP basically consists of three layers: input layer, hidden layer, and output layer. MLP is a feed forward network in which information flows from the input layer through the hidden to the output layer to produce the outcome. These layers have interconnected nodes (neurons). The interconnections are assigned weights (representing information flow) which are computed using mathematical functions. The outputs for specific inputs are obtained by adjusting the weights to minimize the errors between the output produced and the desired output by error-back propagation. The MLP is known to be a universal approximator because of its ability to approximate continuous functions on a compact set of real numbers with little assumption made. Activation functions, also called transfer functions, are an essential component of ANNs. Activation functions are models in the output neurons of the ANN which introduce non-linearity into the network. They function by calculating the weighted sum of their inputs and adding a bias, then deciding whether a neuron should be activated or not. The three most common types of activation functions used in an ANN are the sigmoid, the hyperbolic tangent, and the rectified linear unit [16]. Authors in [17] utilized ANNs to develop a model to show the relationship between crash severity on urban highways, and traffic variables such as traffic volume, flow speed, human factors and road, vehicle and weather conditions. The study showed that MLP with feed forward back propagation networks provided the best results compared to other learning methods. Network architecture with 2 hidden layers with 17 and 7 neurons respectively were determined to be the best. Mean square errors (MSE) within acceptable range of 3% to 4% were achieved. Also, correlation coefficients of 86% to 87 % were achieved. III. METHODOLOGY A. Study Area This study is based on data obtained in the District of Columbia (DC). The capital of the USA, Washington, DC is divided into four (equal) quadrants areas: Northwest (NW), Northeast (NE), Southeast (SE), and Southwest (SW) which are further divided into eight wards. As of July 2018, the population of DC was about 702,455 with a growth rate of approximately 1.41% [18]. The city is highly urbanized, and it’s ranked the sixth most congested city in the United States with each driver spending an average of 63 hours in traffic annually [19]. It has a land area of 68.34mi 2 and a total of 1,503 miles of roadway comprised of local roads, collector roads, minor arterials, principal arterials, freeways and interstates [20]. Also, the city has about 7,700 intersections of which 1,450 are signalized [21]. The American Society of Civil Engineers’ 2017 infrastructure report card reported that about 95% of the roads in DC are in poor condition [22]. B. The Crash Database System Crash prediction models are data dependent and as a result the accuracy of the developed models depends largely on the quality of the available crash data. To ensure that a reliable model is developed, this research utilized traffic crash data from the District Department of Transportation’s (DDOT’s) crash database called Traffic Accident Reporting and Analysis Systems Version 2.0 (TARAS2). The District of Columbia Metropolitan Police Department (MPD) records traffic crash information at the scene of crashes electronically on a Police Department Form number 10 (PD-10) crash reporting form. The crash data is then downloaded through secure servers from MPD into DDOT’s database and are then processed and made available in TARAS2, which is an Oracle-based application. TARAS2 contains data fields that can be broadly categorized Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3874 www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … under vehicle characteristics, environmental conditions, roadway characteristics, traffic exposure characteristics, as well as crash location, date, time, crash type, crash severity and information on of persons involved. C. Data Extraction and Encoding Nine years of crash data (2008-2015) were queried and extracted from TARAS2. The data were then filtered to obtain angle crashes involving two vehicles at unsignalized intersections. Further, the extracted data were cleaned by identifying and removing duplicate and incomplete crash records and irrelevant data fields. In all, 3,307 data points were extracted and used for analysis. The extracted data set contained the following fields: accident complaint number, main street name, side street name, year of accident, month of accident, time of accident, day of week, quadrant of accident occurrence, type of collision, road surface condition, street lighting condition, lighting condition, weather condition, traffic condition, traffic control type, drivers’ age, drivers’ gender, contributing circumstances, and injury severity. Only numerical data can be analyzed by ANNs. Hence, qualitative data must be converted to quantitative data. Thus, both input and output data must be encoded into either real or integer values. Secondly, binary method (0 and 1) of encoding has been determined to yield better results since it minimizes the loss functions values with respect to the models’ parameters. The loss value determines how well the model fits the data set. The lower the loss function value the better the model fits the data set. Table II presents the variables and coding scheme used in this study. D. Types of Collision The crash types considered for this study are angle collisions. Three types of angle collisions are specified: right- angle, right turn, and left turn collisions. • Right-angle collision: This type of collision occurs when the side of one vehicle is impacted by the front of another vehicle which is traveling in a direction at right angle to the direction of the former vehicle. Figure 1 depicts a right- angle collision at an intersection. • Right turn collision: This type of collision occurs when a vehicle turning right at an intersection is impacted by a vehicle from the other intersecting road. Figure 2 depicts a right turn collision. • Left turn collision: This type of collision occurs when a left turning vehicle at an intersection is impacted by a vehicle from the oncoming traffic. Figure 3 depicts a left turn collision. E. Injury Severity The outcome variable describes the degree of injury severity sustained by persons involved in a crash. The crash database specifies five degrees of injury severity: No injury, complain, non-disabling injury, disabling injury and fatal. Due to the insignificant percentage of fatal and disabling injury crashes in the data set, all complain, injury and fatal crashes were categorized as injury crashes. Table I shows the levels of crashes used in the analysis. Fig. 1. Right-angle collision Fig. 2. Right turn collision Fig. 3. Left turn collision TABLE I. LEVELS OF INJURY SEVERITY Injury Severity Level No Injury Non-Injury Complain Injury Non-Disabling Injury Disabling Injury Fatal F. Data Standardization To achieve accurate predictions from machine learning models it is necessary that variables used in developing the models are of equal scale. Also, most optimization algorithms minimize the loss function converge faster when variables are of the same scale. The method of scaling used on this data set is standardization. The raw scores (of the encoded data) are converted to standard scores by subtracting the mean of each variable from the raw score of each observation and then dividing the difference by the standard deviation of the variable. By doing so, the variables are transformed to have a mean of zero and a unit variance. The standardized value, Z, of each score of each variable is given by (5): _ )/(Z X X σ= − (5) where, 6 is the mean of the variable, X is the encoded score of each observation of a variable and σ is its standard deviation. Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3875 www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … TABLE II. VARIABLE ENCODING Variable Variable Name Code Variable Variable Name Code Day of Crash Lighting condition X1 Monday 1-Present, 0-Otherwise X26 Dark 1-Present, 0-Otherwise X2 Tuesday 1-Present, 0-Otherwise X27 Dark Lighted 1-Present, 0-Otherwise X3 Wednesday 1-Present, 0-Otherwise X28 Daylight 1-Present, 0-Otherwise X4 Thursday 1-Present, 0-Otherwise Weather Condition X5 Friday 1-Present, 0-Otherwise X29 Clear 1-Present, 0-Otherwise X6 Saturday 1-Present, 0-Otherwise X30 Rain 1-Present, 0-Otherwise X7 Sunday 1-Present, 0-Otherwise X31 Snow 1-Present, 0-Otherwise Time of Day X32 Traffic Condition 0-Low, 1-Medium, 2-High X8 A.M. Peak (06:00 – 10:00) 1-Present, 0-Otherwise Traffic Control Type X9 Off Peak (10:00 – 15:00) 1-Present, 0-Otherwise X33 Stop 1-Present, 0-Otherwise X10 P.M. Peak (15:00 – 19:00) 1-Present, 0-Otherwise X34 Yield 1-Present, 0-Otherwise X11 Evening (19:00 – 00:00) 1-Present, 0-Otherwise X35 None 1-Present, 0-Otherwise X12 Night (0000 – 0600) 1-Present, 0-Otherwise Contributing Circumstances of Driver 1 Quadrant X36 No Violation D1 1-Present, 0-Otherwise X13 NW 1-Present, 0-Otherwise X37 Alcohol/ Drug Use D1 1-Present, 0-Otherwise X14 SW 1-Present, 0-Otherwise X38 Speeding D1 1-Present, 0-Otherwise X15 NE 1-Present, 0-Otherwise X39 STOP/ YIELD Sign Violation D1 1-Present, 0-Otherwise X16 SE 1-Present, 0-Otherwise X40 Improper Maneuvering D1 1-Present, 0-Otherwise X17 BN 1-Present, 0-Otherwise Contributing Circumstances of Driver 2 Type of Collision X42 No Violation D2 1-Present, 0-Otherwise X18 Right Angle 1-Present, 0-Otherwise X43 Alcohol/ Drug Use D2 1-Present, 0-Otherwise X19 Left Turn 1-Present, 0-Otherwise X44 Speeding D2 1-Present, 0-Otherwise X20 Right Turn 1-Present, 0-Otherwise X46 Improper Maneuvering D2 1-Present, 0-Otherwise Road Surface Condition X47 Distraction D2 1-Present, 0-Otherwise X21 Wet 1-Present, 0-Otherwise X22 Dry 1-Present, 0-Otherwise X48 Age of Driver 1 1-Present, 0-Otherwise Street Lighting X49 Age of Driver 2 1-Present, 0-Otherwise X23 Light Off 1-Present, 0-Otherwise X50 Gender of Driver 1 0-Female, 1-Male X24 Light On 1-Present, 0-Otherwise X51 Gender of Driver 2 0-Female, 1-Male X25 None 1-Present, 0-Otherwise Y1 Injury Severity 0-No Injury, 1-Injury G. Development of Models The process of classification by ANN is an iterative process of weight adjustments based on information flow that mimics the functioning of neurons in the human brain. The steps below describe in detail how models for crash injury severity classification were developed using ANN: • Selection of network architecture. • Training of neural network. • Testing and evaluation of model. 1) Selection of Network Architecture The network architecture was first set up. A multi-layer perceptron (MLP) feedforward ANN was adopted to develop classification models. An MLP consists of at least three layers: an input layer, hidden layer(s) and an output layer. Each layer consists of nodes or neurons. The neurons of each layer are interconnected with those of the succeeding layer. Also, the neurons of the hidden and output layers are embedded with nonlinear activation functions. The MLP ANN architecture used in this research consists of an input layer with 44 neurons (each neuron represents each of the input variables, Xi in Table II) and an output layer with 1 neuron, which is the target or dependent variable, Y. The number of hidden layers and neurons varied for several iterations until the optimal numbers of hidden layers and neurons which produced the best model were obtained. Figure 4 shows the MLP ANN architecture used in developing the model. Fig. 4. MLP ANN 2) Training of Neural Network Training of the neural network by backward propagation was carried out in the following sequence: • Presentation of training dataset to the network: The training dataset was imported into the network to commence training. The vector of independent variables was fed into each input neuron connected to neurons of the first hidden layer. The training process was initialized by randomly Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3876 www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … selecting weights for all interconnections between the neurons of the input and hidden layers. • Forward Computation: The forward propagation was then implemented by multiplying the weights with the values of the input neurons and the sum products are stored in the corresponding neurons of the hidden layer. The weighted sums are subsequently transferred into an activation function and based on the output of the functions, the neuron is either activated or not. Mathematically this can be expressed as: 789 = ∑ ;8�9 � (9'�)< �=� (6) �89 = ф��78� (7) where, 789 is the weighted sum in jth neuron of the lth hidden layer, ;8�9 is the weight coefficient of the jth neuron of the lth layer that is fed from the i th neuron in layer l-1, � (9'�) the output of th i-th neuron in the previous layer l-1, �8 is the output of the of the j th neuron in layer l-1, ф� is the activation function which is a rectilinear unit function in the hidden layers and a sigmoid function in the output layer. Hence for the last layer (output layer) l=L, �8?(@) = AB (8) where, AB is the output of the n-th iteration. • Computation of error: The error of the j th neuron of the n th iteration is then computed as C8(@) = D8(@) − AB(@) ¨ (9) where, D8 is the target output. • Backward computation: The weights in the network are adjusted based on a local gradient, σ, which is a function of the error, e, and computed as follows: E89(@) = C8?(@) фF G78?(@)H (10a) for neuron j in the output layer L, and E89(@) = C8?фF G789(@)H∑ EI (9��)(@);I8 (9��)(@)I (10b) for neuron j in the hidden layer L, where, k is the succeeding neuron in layer l+1 and фF(·) is the derivative of the function ф(·). The weights in the network are then adjusted by the given relation: ;8�9 (@ + 1) = ;8�9 (@)B + KL;8�9 (@)B(@ − 1)M + NE89(@)�� (9'�)(@) (11) where η is the learning-rate parameter and α is the momentum constant. • Iteration: The procedures in the three previous steps are repeated for batches of 3 observations per iteration until the stopping criteria of 100 epochs is met. Figure 5 illustrates the training process. 3) Model Testing and Evaluationl After the training of the network for the required number of epochs (100), the model was tested using the test dataset. The accuracy of the model was evaluated by the confusion matrix. The number of hidden layers and neurons in the network architecture was varied and the training process was repeated. This iterative process was done until the model with the best performance was achieved. Fig. 5. ANN training process 4) Model Evaluation The performance of each of the models was assessed using the test dataset. The results were then evaluated by using the data generated by a confusion matrix (CM). A CM contains information about actual and predicted classifications done by a classification system. Each row of the CM represents the instances of an actual class and each column represents the instances of a predicted class. Table III shows the confusion matrix for a two-class classifier. TABLE III. CONFUSION MATRIX Total No. of Observations Predicted Negative Positive Actual Negative True Negative (TN) False Positive (FP) Positive False Negative (FN) True Positive (TP) The entries of the CM are defined as follows: True Positive (TP) instances are positive and correctly classified as positive, True Negative (TN) instances are negative and correctly classified as negative, False Positive (FP) instances are negative but wrongly classified as positive, and False Negative (FN) instances are positive but wrongly classified as negative. Based on the CM, the following measures were computed to evaluate the models developed. • Accuracy (AC): The accuracy is the proportion of the total number of predictions that were correctly classified. It is computed as: AC=(TN+TP)/(TN+FP+FN+TP) (12) • Error Rate (ER): The error rate is the rate at which predictions will be misclassified: ER=1-AC (13) • Sensitivity (S): It is the proportion of positive cases that were correctly identified: S=TP/(FN+TP) (14) Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3877 www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … • Precision (P): It is the proportion of the predicted positive cases that were correct: P=TP/(FP+TP) (15) • F-measure (F): It is a measure of the accuracy of the test model computed using S and P. The value of F ranges from 0 to 1, where 1 shows an excellent model and 0 shows a bad model. F- measure is calculated as: F=2(S·P)/(S+P) (16) H. Analysis Software The classification models of all three machine learning techniques were developed by using the high-level general- purpose programming language Python. Especifically, the Anaconda Python distribution was used. This is an open source distribution with standard and robust libraries for data processing, analysis and machine learning applications. The NumPy and Pandas libraries were imported to facilitate data preprocessing. Also, Tensorflow and Keras libraries were imported to develop the ANN models. In addition, the descriptive statistics of the data were obtained using IBM Statistical Software for Social Scientist (SPSS). IV. RESULTS A. Descriptive Statistics Tables IV and V present the descriptive statistics of the data set. The frequencies of categorical variables are presented in Table IV, while Table V presents the mean and standard deviation of the continuous variable Age. It can be observed from Table IV that the highest number of crashes (1,252) occurred during the off-peak period, from 10:00A.M. to 3:00P.M., while the least number of crashes (176) occurred at night, between 12:00AM to 6:00AM. Most of the crashes occurred on Tuesdays, Wednesdays and Thursdays while Sundays recorded the least number of crashes. The Northwest quadrant of Washington D.C. recorded the highest number of crashes (1,167). Right-angle collision was the most frequent occurring crash type. Most of the crashes occurred under daylight, clear weather and light level traffic conditions. Though most crashes were as a result of no violation on the part of one or both drivers, distracted driving and Stop/Yield- sign violation were also reported as comparatively high contributing circumstances. Among the drivers involved, 3,936 were male and 2,678 were female. Of the 3,307 recorded crashes, 1,274 resulted in injury. It is observed that the rate of injury crashes was highest during the night (41.24%), on Fridays (41%), and in the northeast quadrant (40.44%). Most were right turn collisions (40.69%), absent street lights (39.52%), rainy weather (50.57%), under light traffic conditions (54.78%). Intersections controlled by Yield signs also recorded the highest rate (70.59%) of injury crashes. This is complemented by the fact that the highest rates of injury crashes were a result of at least one driver’s failure to comply with a Stop/Yield sign. Thus, the contributing circumstance which resulted in the highest rate (69.94%) is Stop/Yield sign violation. Crashes in which at least on driver was a female recorded the highest rate of injury crashes. A correlation analysis was conducted to investigate the relations between age and injury severity. The results are presented in Table VI. The Spearman’s Rho of -0.52 was found to be statistically significant (p=0.03). This implies that, the severity of a crash increased with decreasing age of drivers involved in the crash. TABLE IV. CRASH FREQUENCIES No Factor Level Crashes Total Injury Non- Injury Injury Rate (%) 1 Period of Day A.M. Peak 730 296 435 40.49 Off Peak 1252 466 785 37.25 P.M. Peak 776 298 478 38.4 Evening 373 142 230 38.17 Night 176 73 104 41.24 2 Day of Week Monday 265 102 163 38.49 Tuesday 566 228 338 40.28 Wednesday 957 371 586 38.77 Thursday 657 243 414 36.99 Friday 400 160 240 40 Saturday 261 90 170 34.62 Sunday 201 80 122 39.6 3 Quadrant Northwest 1,167 442 725 37.87 Northeast 858 347 511 40.44 Southwest 226 76 150 33.62 Southeast 984 382 602 38.82 Boundary 72 27 45 39.13 4 Type of Collision Right Angle 1,338 530 808 39.61 Left Turn 1,217 438 779 39.61 Right Turn 752 306 446 40.69 5 Street Lighting Condition Lights Off 2,503 967 1,536 38.63 Lights On 680 258 422 37.94 None 124 49 75 39.52 6 Lighting Condition Dark 757 15 727 2.02 Dark (Lighted) 581 193 388 33.22 Day Light 1,967 1,063 906 53.99 7 Weather Condition Clear 2,350 921 1,429 39.19 Rain 609 308 301 50.57 Snow 348 45 303 12.93 8 Traffic Condition Light 2,178 1,193 985 54.78 Medium 808 71 737 8.79 Heavy 321 71 737 8.79 9 Traffic Control Type STOP Sign 2,504 1,066 1,450 42.37 YIELD Sign 604 132 55 70.59 None 187 76 528 12.58 10 Gender of Driver 1 Male 1,621 419 1,202 25.85 Female 1,686 855 831 50.71 11 Gender of Driver 2 Male 2,315 1,026 1,289 44.32 Female 992 248 744 25 12 Contri. Circum. of Driver 1 No Viloation 1,700 869 831 51.12 Alcohol 159 0 159 0 Distracted 682 122 560 17.89 Speed 430 134 296 31.16 STOP/YIELD Sign Violation 310 148 162 47.74 Improper Maneuver 24 2 22 8.33 13 Contri. Circum. of Driver 2 No Viloation 1,041 7 764 0.91 Alcohol 160 0 160 0 Distracted 996 408 588 40.96 Speed 276 7 269 2.54 STOP/YIELD Sign Violation 672 470 202 69.94 Improper Maneuver 161 112 49 69.57 14 Injury Severity 3,307 1,274 2,033 38.52 Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3878 www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … TABLE V. DRIVER AGE STATISTICS Factor Mean Standard Deviation Min. Max Drivers Age 42.56 15.73 14 86 TABLE VI. AGE-INJURY SEVERITY CORELATION ANALYSIS Factor Test Statistic (Spearman’s Rho) P-value Age of Driver -0.52 0.03 B. Spatial Distribution of Crashes This section presents the results of the spatial analysis of the crashes using ArcGIS Pro software program. The spatial analysis performed included the spatial distribution of crashes based on injury severity and a kernel density analysis for injury crashes. The spatial distribution and density of crashes are shown in Figures 6 and 7, respectively. Figure 7 shows that most of the crashes were located in the NW quadrant. This covers the downtown and central business district of Washington DC. Figure 7 also shows that higher densities of injury crashes are in the same region of Washington DC. Fig. 6. Spatial distribution of crashes [source: ArcGISPro] Fig. 7. Kernel density of injury crashes [source: ArcGISPro] C. Results of Classification of Crashes Twenty-five distinct ANN models were developed using the training dataset. Each model was trained with batches of 3 observations per iteration until the stopping criteria of 100 epochs was met. The performance of each model was then evaluated using the test data set (which constitutes of 25% of the total dataset). The performance of the models after training and testing are presented in Tables VII and VIII respectively. The Tables show the number of models explored and the structure of the neural network. The performance measures (accuracy, error rate, sensitivity, precision and F-measure) of each model were computed and are also presented. TABLE VII. RESULTS OF TRAINING ANN Model Network Arch. AC ER S P F Hidden Layer No. No. of Neurons 1 1 20 0.9181 0.0819 0.8995 0.8892 0.8943 2 1 15 0.9032 0.0968 0.9162 0.8454 0.8794 3 1 5 0.8649 0.1351 0.8366 0.8170 0.8267 4 1 3 0.8573 0.1427 0.8461 0.7961 0.8203 5 2 25-20 0.9585 0.0415 0.9455 0.9465 0.9460 6 2 20-25 0.9472 0.0528 0.9435 0.9213 0.9322 7 2 20-15 0.9512 0.0488 0.9874 0.8964 0.9397 8 2 15-20 0.9258 0.0742 0.9539 0.8668 0.9083 9 2 10-15 0.9157 0.0843 0.9529 0.8473 0.8970 10 2 15-10 0.9302 0.0698 0.9445 0.8826 0.9125 11 2 5-10 0.8722 0.1278 0.8785 0.8067 0.8411 12 2 10-5 0.9060 0.0940 0.8953 0.8654 0.8801 13 2 6-3 0.8685 0.1315 0.8628 0.8086 0.8349 14 2 3-6 0.8597 0.1403 0.8440 0.8020 0.8224 15 2 2-2 0.8427 0.1573 0.8304 0.7767 0.8026 16 3 30-20-25 0.9516 0.0484 0.9832 0.9170 0.9490 17 3 25-30-20 0.9689 0.0311 0.9204 0.9565 0.9381 18 3 20-15-20 0.9402 0.0598 0.9916 0.8926 0.9395 19 3 15-20-15 0.9404 0.0596 0.9644 0.8916 0.9266 20 3 15-10-15 0.9293 0.0707 0.9738 0.8692 0.9185 21 3 10-15-10 0.9310 0.0690 0.8995 0.8677 0.8833 22 3 5-10-5 0.9115 0.0885 0.8859 0.8270 0.8554 23 3 10-5-10 0.9102 0.0898 0.9414 0.8293 0.8818 24 3 6-4-2 0.9159 0.0841 0.9058 0.8374 0.8702 25 3 6-2-6 0.9237 0.0763 0.9058 0.8547 0.8795 The accuracy, sensitivity, precision and F-measure (F) performance measures range from 0 to 1, with values closer to 1 showing models with better performance measures and conversely values closer to 0 showing worse performance measures. In contrast, models with error rates (ER) closer to 0 are better than models with error rate closer to 1. The results of the analysis in Table VII show that after the training of the models, the accuracy ranged from 84.87% to 96.89%. Model 17 produced the best classification accuracy (96.89%) with a corresponding error rate of 3.11%, while Model 15 produced the worse accuracy (84.87%) with a corresponding error rate of 15.73%. Model 7 had the highest sensitivity (S) measure, while Model 15 had the least sensitivity measure. With regards to the precision measure, Model 17 was the most precise (P) model with a precision of 0.9565, while Model 15 was the least precise one. Model 16 recorded the highest F-measure of 0.9490, while the lowest F-measure was recorded by Model 6. The variation of performance measures with varying models is shown in Figure 8. Table VIII presents the results of evaluation of the trained models using the test data set. The results show that the accuracy (after testing) of the models ranged from 76.54% to 85.62%. Model 22 produced the best classification accuracy (85.62%) with a corresponding error rate of 14.38%, while Model 6 produced the worse accuracy. Model 14 had the highest sensitivity measure, while Model 16 had the least sensitivity measure. With regards to the precision measure, Model 15 was the most precise model with a precision of 0.7850, while Model 18 was the least precise model with a Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3879 www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … precision of 0.6882. Model 15 recorded the highest F-measure of 0.7875, while the lowest F-measure was recorded by Model 6. The variation of performance measures with varying models is shown in Figure 9. Fig. 8. Variation of performance measures for training dataset using ANN TABLE VIII. RESULTS OF TESTING ANN Model Network Arch. AC ER S P F Hidden Layer No. No. of Neurons 1 1 20 0.8005 0.1995 0.7900 0.7200 0.7534 2 1 15 0.8114 0.1886 0.7492 0.7587 0.7539 3 1 5 0.7896 0.2104 0.7524 0.7164 0.7339 4 1 3 0.8295 0.1705 0.7806 0.7781 0.7793 5 2 25-20 0.7872 0.2128 0.7210 0.7256 0.7233 6 2 20-25 0.7654 0.2346 0.7116 0.6900 0.7006 7 2 20-15 0.7836 0.2164 0.7304 0.7147 0.7225 8 2 15-20 0.7787 0.2213 0.7179 0.7112 0.7145 9 2 10-15 0.7944 0.2056 0.7586 0.7224 0.7401 10 2 15-10 0.7715 0.2285 0.7429 0.6890 0.7149 11 2 5-10 0.8198 0.1802 0.7680 0.7656 0.7668 12 2 10-5 0.7993 0.2007 0.7524 0.7339 0.7430 13 2 6-3 0.8174 0.1826 0.7774 0.7561 0.7666 14 2 3-6 0.8114 0.1886 0.8276 0.7233 0.7719 15 2 2-2 0.8356 0.1644 0.7900 0.7850 0.7875 16 3 30-20-25 0.8440 0.1560 0.6865 0.7252 0.7053 17 3 25-30-20 0.8256 0.1744 0.7398 0.7024 0.7206 18 3 20-15-20 0.8100 0.1900 0.8025 0.6882 0.7410 19 3 15-20-15 0.8300 0.1700 0.7837 0.7163 0.7485 20 3 15-10-15 0.8511 0.1489 0.7367 0.7460 0.7413 21 3 10-15-10 0.8532 0.1468 0.7712 0.7546 0.7628 22 3 5-10-5 0.8562 0.1438 0.7586 0.7586 0.7586 23 3 10-5-10 0.8457 0.1543 0.7524 0.7385 0.7453 24 3 6-4-2 0.8406 0.1594 0.8119 0.7379 0.7731 25 3 6-2-6 0.8340 0.1660 0.7868 0.7233 0.7538 Fig. 9. Variation of performance measures for testing dataset using ANN V. DISCUSSION The study sought to develop classification models to predict injury severity of angle crashes involving two vehicles at unsignalized intersections using ANNs. A total of 3,307 reported crashes from 2008 to 2015 were extracted from a crash database and used in the analysis. Of the total number of crashes, 1,272 resulted in injury and/or fatality, while the remaining 2,035 crashes were non-injury crashes. The spatial distribution of the crashes showed that the downtown area of Washington DC experienced the highest frequency of crashes. Also, most of the crashes occurred during off-peak periods and under light traffic conditions. Right angle collisions were the most frequent collision type. The combination of driver contributing circumstances which result in injury were Stop/Yield sign violation by one driver, and no violation on the part of the other driver. The accuracy of classification models developed using ANN generally tends to increase as the number of hidden layers increases. Models with higher accuracies were attained with three hidden layers. Model 22 was the most accurate (85.62%) for predicting injury severity of angle crashes at unsignalized intersections. This model has 3 hidden layers with 5, 10, and 5 neurons respectively. The activation function in the hidden layers is the rectilinear unit function and the activation function in the output layer is the sigmoid function. The confusion matrix of this model is presented in Table IX. We can see that 51.5% of the crashes were correctly classified as non-injury crashes, while 10.3% were wrongly classified as injury crashes. Similarly, 29% of the crashes were correctly classified as injury crashes while 9.2% were wrongly classified as non-injury crashes. F-measure, is a combined measure for both precision and sensitivity. F-measures of the ANN models generally ranged between 0.7 and 0.8, and the higher values of F-measure were achieved with two hidden layers. Models 15 and 22 are the most accurate ANN models for predicting injury severity of angle crashes at unsignalized intersections. TABLE IX. CONFUSION MATRIX OF MODEL 22 Total No. of Observations Predicted Negative Positive Actual Negative 431 77 Positive 77 242 VI. CONCLUSION AND RECOMMENDATION In conclusion, the most accurate ANN model for predicting the severity of an injury sustained in a crash is a model with 3 hidden layers with 5, 10, and 5 neurons. The activation functions in the hidden and output layers are the rectilinear unit function and sigmoid function. This research explored the ANN machine learning technique. Future research can explore other techniques such as decision trees, K-nearest neighbors and linear discriminants. Also, other types of crashes can be explored at unsignalized intersections. Further, these analyses could be extended to signalized intersections. REFERENCES [1] T. R. Neuman, R. Pfefer, K. L. Slack, K. K. Hardy, D. W. Harwood, I. B. Potts, D. J. Torbic, E. R. K. Rabbani, National Cooperative Highway Engineering, Technology & Applied Science Research Vol. 9, No. 2, 2019, 3871-3880 3880 www.etasr.com Arhin & Gatiba: Predicting Injury Severity of Angle Crashes Involving Two Vehicles at Unsignalized … Research Program: Guidance for Implementation of the AASHTO Strategic Highway Safety Plan, Transportation Research Board, 2003 [2] World Health Organization, Global Status Report on Toad Safety 2015, WHO, 2015 [3] National Highway Traffic Safety Administration, “USDOT Releases 2016 Fatal Traffic Crash Data”, available at: https://www.nhtsa.gov/ press-releases/usdot-releases-2016-fatal-traffic-crash-data, 2017 [4] National Highway Traffic Safety Administration, Traffic Safety Facts 2015, US Department of Transportation-National Highway Traffic Safety Administration, 2015 [5] B. J. Russo, P. T. Savolainen, W. H. Schneider, P. C. Anastasopoulos, “Comparison of factors affecting injury severity in angle collisions by fault status using a random parameter bivariate ordered probit model”, Analytic Methods in Accident Research, Vol. 2, pp. 21-29, 2014 [6] R. Garrido, A. Bastos, A. de Almeida, J. P. Elvas, “Prediction of Road Accident Severity Using the Ordered Probit Model”, Transport Research. Procedia, Vol. 3, pp. 214-223, 2014 [7] T. Sayed, F. Rodriguez, “Accident Prediction Models for Urban Unsignalized Intersections in British Columbia”, Transportation Research Record Journal of the Transportation Research Board, Vol. 1665, No. 1, pp. 93-99, 1999 [8] W. Ackaah, M. Salifu, “Crash prediction model for two-lane rural highways in the Ashanti region of Ghana”, International Association of Traffic and Safety Sciences Research, Vol. 35, No. 1, pp. 34-40, 2011 [9] M. Y. Lau, A. D. May, Accident Prediction Model Development: Signalized Intersections, Institute of Transportation Studies, University of California-Berkeley, 1988 [10] A. Kamer-Ainur, M. Marioara, “Errors And Limitations Associated With Regression And Correlation Analysis”, Statistics and Economic Informatics, pp. 710-712, 2007 [11] P. Chengye, P. Ranjitkar, “Modelling Motorway Accidents using Negative Binomial Regression”, Journal of the Eastern Asia Society for Transportation Studies, Vol. 10, pp. 1946-1963, 2013 [12] Z. Yang, L. Zhibin, L. Pan, Z. Liteng, “Exploring contributing factors to crash injury severity at freeway diverge areas using ordered probit model”, Procedia Engineering, Vol. 21, pp. 178-185, 2011 [13] Federal Highway Administration, “Highway Safety Improvement Program Manual–Safety”, available at: https://safety.fhwa.dot.gov/ hsip/resources/fhwasa09029/sec6.cfm, 2011 [14] G. Dutta, P. Jha, A. K. Laha, N. Mohan, “Artificial Neural Network Models for Forecasting Stock Price Index in the Bombay Stock Exchange”, Journal of Emerging Market Finance, Vol. 5, No. 3, pp. 283- 295, 2006 [15] M. H. Hassoun, Fundamentals of Artificial Neural Networks, MIT Press, 1995 [16] S. Sharma, “Activation Functions in Neural Networks”, available at: https://towardsdatascience.com/activation-functions-neural-networks- 1cbd9f8d91d6, 2017 [17] F. R. Moghaddam, S. Afandizadeh, M. Ziyadi, “Prediction of accident severity using artificial neural networks”, International Journal of Civil Engineering, Vol. 9, No. 1, pp. 41-49, 2011 [18] K. S. Jadaan, M. Al-Fayyad, H. F. Gammoh, “Prediction of Road Traffic Accidents in Jordan using Artificial Neural Network (ANN)”, Journal of Traffic Logistics Engineering, Vol. 2, No. 2, pp. 92-94, 2014 [19] Office of the State Superintendent of Education, “New U.S. Census Bureau Numbers Officially Put DC’s Population Over 700,000”, available at: https://osse.dc.gov/release/new-us-census-bureau-numbers- officially-put-dc%E2%80%99s-population-over-700000, 2018 [20] T. Winship, “The 10 US cities with the worst traffic”, available at: https://www.businessinsider.com/the-10-us-cities-with-the-worst-traffic- 2018-2, 2018 [21] District Department of Transportation, “DDOT by the Numbers”, available at: https://ddot.dc.gov/page/ddot-numbers [22] American Society of Civil Engineers, Repord Card for D.C.’s Infrastructure, ASCE, 2016