Microsoft Word - 48-2966_s_ETASR_V9_N4_pp4574-4580 Engineering, Technology & Applied Science Research Vol. 9, No. 4, 2019, 4574-4580 4574 www.etasr.com Arhin et al.: Acceptable Wait Time Models at Transit Bus Stops Acceptable Wait Time Models at Transit Bus Stops Stephen A. Arhin Howard University Transportation Research Center, Washington DC, USA Adam Gatiba Howard University Transportation Research Center, Washington DC, USA Melissa Anderson Howard University Transportation Research Center, Washington DC, USA Babin Manandhar Howard University Transportation Research Center, Washington DC, USA Melkamsew Ribbisso Howard University Transportation Research Center, Washington DC, USA Abstract—This study aimed at determining patrons’ acceptable wait times beyond the bus scheduled arrival time at bus stops in Washington, DC and to develop accompanying prediction models to provide decision-makers with additional tools to improve patronage. The research primarily relied on a combination of manual and video-based data collection efforts. Manual field data collection was used for surveying patrons to obtain their suggested acceptable wait times at bus stops, while video-based data collection was used to obtain bus stop characteristics and operations. In all, 3,388 bus patrons at 71 selected bus stops were surveyed. Also, operational data for 2,070 bus arrival events on 226 routes were extracted via video playback. Data were collected for AM peak, PM peak and mid-day periods of nine-month duration from May 2018 through January 2019. The results of the survey showed that the minimum acceptable wait time beyond the scheduled arrival time was reported to be 1 minute, while the maximum acceptable wait time was reported to be 20 minutes. Regression analyses were conducted to develop models to predict the maximum acceptable wait time based on factors including temperature, presence of shelter at the bus stops, average headway of buses, and patrons’ knowledge of bus arrival times. The models were developed for A.M., P.M. and mid-day periods. The F-Statistics for the models were determined to be statistically significant with p values<0.001 at 5% level of significance. Also, the variance explained by the models (R 2 ) ranged from 64% to 82%. Further, a test of hypothesis revealed that though female patrons generally had lesser maximum acceptable wait times than male patrons, the mean difference was determined not to be statistically significant. However, the mean differences in the maximum acceptable wait time of patrons based on ethnicity were determined to be statistically significant at 5% percent level of significance. The study revealed that Caucasian patrons have significantly lower maximum acceptable wait times compared to patrons of other ethnic groups. Keywords-crashes; unsignalized intersection; artificial neural network; injury severity I. INTRODUCTION The wait time at bus stops is one of the primary measures for assessing reliability of transit services, especially in urban areas. The uncertainty associated with waiting affects bus patrons’ perception of quality of the service provided. If transit buses arrive at scheduled times, passengers are less likely to have the need to find alternative mode(s) of transportation. However, if buses are chronically late at bus stops, patrons may feel that the bus system is unreliable and may most likely seek alternative modes of transportation. Studies in this subject area have therefore been of interest to transit service agencies and officials in a bid to gain more insight into improving quality of service. II. LITERATURE REVIEW A. Wait Time as a Measure of Transit Service Reliability In assessing the reliability of transit services, transit agencies and officials have, among other indicators, used passenger wait times as a performance measure. Passengers’ perception of transit service quality is affected by wait times. Wait time is considered an appropriate measure of service reliability for high frequency routes where the arrival of passengers is random and the average wait time approximates half the headway [1]. For low frequency services, passengers usually synchronize their arrival time at bus stops with the arrival of buses, thus minimizing wait times [2]. Authors in [3] considered waiting cost functions to account for headway and service reliability. The study contends that by analyzing the behavior of passengers, the cost of waiting can be broken down into two components: the actual mean time spent waiting and the potential waiting time. The potential waiting time is the additional time passengers have to budget for waiting and is determined as the 95% of the waiting time. This has been found to be very sensitive to service reliability. Hence, by minimizing the waiting cost function, service reliability can be improved. A similar conclusion was made in [4] which analyzed the service reliability of a high frequency bus line in Helsinki using AVL and APC data. The study found that passengers accessed the reliability of bus services mainly in terms of additional waiting and travel time. It was recommended that reduction in wait and travel time increases passenger satisfaction which leads to increase in patronage. B. Relationship Between Waiting Time and Headway Headway is the time between two vehicles passing the same point traveling in the same direction on a given route. Several studies have sought to establish the relationship between headway and waiting times of passengers. One of the Corresponding author: Stephen A. Arhin (saarhin@howard.edu) Engineering, Technology & Applied Science Research Vol. 9, No. 4, 2019, 4574-4580 4575 www.etasr.com Arhin et al.: Acceptable Wait Time Models at Transit Bus Stops earliest studies focused (among other issues) on passengers wait time for bus services with short headways was conducted in 1957 [2]. It concluded that the average waiting time of passengers who randomly arrive at a boarding point is minimum when the service is perfectly regular. The following model to estimate average wait time was suggested: ���� ���� = ∑ ��∑ (1) where h is the headway (in seconds). It was showed that the behavior of passengers of a bus network in Stuttgart (Germany) showed that passengers’ arrival at a bus stop is schedule-dependent when headways exceed 8 minutes. Thus, most passengers synchronize their arrivals with those of the buses, reducing the time spent waiting [5]. Another model was developed in [6] which took into consideration the random arrival of passengers during peak periods. The random waiting time, wr, was related to the headway h by (2): � = ℎ/2[1 + (�/ℎ)�] (2) where σ is the standard deviation of bus headway h. An analysis of passenger wait times and headways of buses data in Manchester (England) showed a linear relationship between wait time and headway [7]. The findings of the study corroborated previous study and concluded that the arrival behavior of passengers is schedule-dependent when headways exceed 8 minutes. A higher headway threshold of 12 minutes was however established in a study that utilized passenger arrival data in London [8]. Further, a comprehensive review of key elements of service reliability in Boston, Massachusetts revealed that irregular headways lead to variability in expected waiting times [9]. The average wait times of passengers has been estimated to be one-half of the headway. This simple model is valid when the arrival of passengers at the bus stop is random and the headways are regular. However, realistically, these conditions are never satisfied, leading to model inadequacy. C. Passenger Wait Time Distribution and Modeling A number of studies have examined the distribution of passenger wait times and developed models to estimate wait time. Authors in [10] developed arrival distribution curves based on data collected at 28 bus, tram and commuter rail stations in Zurich (Switzerland). The stations were served by scheduled public transits with headways ranging from 2.33 to 30 minutes. The observations were made on weekdays during the morning, evening and mid-day periods. The analysis of the results showed both passenger arrivals and wait times have a logarithmic relationship with headway. It further concluded that passengers begin to arrive at stations near the scheduled departure times, even for very short headways. The arrival rate of passengers transferring from rail to buses was fitted to normal, exponential, lognormal and gamma distributions [11]. It was concluded that the lognormal and gamma distributions had the most appropriate fit for passengers transferring directly and non-directly. Similar conclusions were made in a study conducted in Beijing (China) [12]. In that study, passenger arrival times were fitted to extreme value, exponential, lognormal, gamma and normal distributions. The results showed that the arrival time of passengers at bus stops connected to rail stations were best fitted with the lognormal distribution, while arrival time of passengers at bus stops not connected to rail stations were best fitted with the gamma distribution. The distribution of actual passenger wait times and perceived wait times based on data collected from bus stops in London (UK) was investigated in [13]. The results showed that the actual wait time of passengers followed the gamma distribution while the perceived wait time of passengers followed the lognormal distribution. Also, a study was conducted to develop a multiple linear regression model to predict perceived wait time of passengers based on data collected at three bus stops in Harbin (China). In all, 234 passengers were surveyed. Factors considered in the development of the model included gender, level of education, having a time device, presence of a companion, travel purpose, riding frequency, walking time, reserved waiting, waiting mood, waiting behavior, waiting time interval (morning or evening peak). The significance of the factors in the model was tested at 5% significant level. ANOVA results showed that gender, level of education, and walking time were not statistically significant predictors of perceived waiting time. Beyond the generalized linear models, other studies have used machine learning techniques to develop passenger wait time models. Artificial neural networks (ANNs) were used to develop passenger wait time models based on data collected on passengers using a high-speed train service in Beijing in [14]. The predictors used in the model were trip distance, transport mode, travel time, familiarity of the service facility, and level of education. The architecture of the developed ANN model consists of one input layer with 5 neurons, two hidden layers with 8 and 3 neurons respectively, and an output layer with a single neuron. Sigmoid and purelin transfer functions were used as activation functions in the hidden and output layers respectively. Also, the conjugate gradient method was used as learning algorithm. The model was trained with a data set of 720 samples, and validated with a data set of 336 samples. The model developed predicted passenger wait time with an average error of 9.2%. III. METHODOLOGY A. Study Area Description This research is based on data obtained in the District of Columbia (DC). DC is divided into four (unequal) quadrants: Northwest (NW), Northeast (NE), Southeast (SE), and Southwest (SW) which are further divided into eight (8) Wards. As of 2017, the population of DC was approximately 694,000 with an annual growth rate of approximately 1.41%. The City is highly urbanized and it is ranked as the sixth most congested city in the United States with each driver spending an average of 63 hours per year in traffic. Washington Metropolitan Area Transit Authority (WMATA) is the agency that oversees the operations of Metrobus service in the area. WMATA has a bus fleet of 1,595 buses that make more than 400,000 trips each day. These buses serve about 11,500 bus stops and operate on 325 routes in DC, portions of Maryland, and Virginia, covering a total land area of about 1,500 square Engineering, Technology & Applied Science Research Vol. 9, No. 4, 2019, 4574-4580 4576 www.etasr.com Arhin et al.: Acceptable Wait Time Models at Transit Bus Stops miles. Of the total number of bus stops, 2,556 (22.2%) have shelters, while the remaining do not. B. Data Collection 1) Selection of Bus Stops The study considered 71 bus stops in the DC at which bus operational and survey data were collected. Two main types of bus stops were considered: bus stops with and without shelter. The bus stops were selected based on criteria of being on routes with longer headways, having high patronage, and proximity to metro rail station. Data collection at the selected bus stops was conducted over a nine-month duration from May 2018 through January 2019. Data were collected during the AM peak (7:00 AM -9:30 AM), PM Peak (4:00 PM- 6:30 PM) and mid-day periods (10:00 AM – 2:30 PM). Two forms of data collection were performed: bus passengers’ survey and bus operational data collection. The data collection schedule was organized to achieve a robust sample size. 2) Survey Data Collection Passengers waiting for the arrival of the next bus at the selected bus stops were randomly selected and interviewed during morning, evening and mid-day periods from Monday to Friday. The field researchers conducted the survey by the use of electronic forms on computer tablets and paper questionnaires. The following information were obtained during the survey: temperature at the bus stop, presence of shelter at bus stop, arrival time and gender of passengers, knowledge of bus arrival time, and the maximum and minimum acceptable wait time beyond the bus scheduled arrival time for which the passenger is willing to wait. A total of 3,388 passengers were surveyed over the period of the study. When the minimum number of responses was not obtained during a particular peak period due to weather or low passenger turnout, additional passenger were surveyed on the same day and peak period the following week. 3) Bus Operational Data Bus operational data were collected at each of the 78 selected bus stops. The data were collected by installing video recording cameras at the bus stops. The video recordings took place on weekdays (Monday to Friday) over a 12-hour duration (6:30 AM to 6:30 PM). The following data were obtained of each bus arrival event during the morning, evening and mid- day periods via video playback: • Bus arrival time: a bus was determined to have arrived at a bus stop when it came to a complete stop allowing onboarding and alighting. • Bus departure time: a bus had departed the bus stop when the last passenger had either boarded or alighted and the doors were shut. From the collected data, bus arrival and departure times were used to compute headway by finding the difference between the arrival time of a bus and that of the preceding bus on the same route. Therefore, the headway was computed as: A B AH AT AT= − (3) where HA is the actual bus headway, ATA is the arrival time of bus A, and ATB is the arrival time of bus B. In all, a total of 2,070 bus arrival events on 226 routes were extracted, computed and compiled in an Excel spreadsheet for further analysis. C. Data Analysis 1) Descriptive Statistics Descriptive statistics such as frequencies, mean, median, and standard deviation were computed for the bus stop, passenger and bus operational characteristics data. 2) Model Development To investigate the relationship between the maximum acceptable waiting time and variables such as average headway, knowledge of bus arrival time, presence of shelter, and temperature at the bus stops, linear regression analyses were conducted. Regression models were developed for A.M., P.M. and Mid-day period. The general regression model for maximum acceptable wait time took the following form: ����� = ��� + �(�)� � + �(�!)��� +�("#��)�$� + �(%&)�'� + ( (4) where, ���� is the maximum acceptable wait time, AHthe average headway, T the temperature, KBAT the knowledge of bus arrival time, and PS the presence of shelter. MAWT is the dependent variable while T, AH, KBAT, and PS are independent variables. The constants, βki are the regression coefficients with an associated error of ε~N (0, σ 2 ) with k=0,1…4 for the first, second, third, fourth and fifth regression coefficients respectively. Also, i=1, 2 and 3 for the A.M., mid, and P.M. peak periods, respectively. In order to develop a robust model, the variables were tested to ensure they satisfied the assumptions of normality of errors, multicollinearity, and homoscedasticity. 3) Hypothesis Testing The test statistic primarily used in this study for the comparison is that of the mean. The hypothesis that there is a significant difference in the average MAWT of passengers based on their gender and ethnicity was tested at 5% level of significance. 4) Difference in MAWT Based on Gender It is hypothesized that there is a statistically significant difference in the average MAWT based on the passenger’s gender. This is mathematically expressed as: !): + = +� (5) !,:+ ≠ +� (6) where, X1 is the mean MAWT of female passengers and X2 the mean MAWT of male passengers. 5) Difference in MAWT Based on Ethnicity It is hypothesized that the there is significant difference in the average MAWT based on ethnicity. This mathematically is expressed as: !): . = .� = .$ = .' = ./ (7) Engineering, Technology & Applied Science Research Vol. 9, No. 4, 2019, 4574-4580 4577 www.etasr.com Arhin et al.: Acceptable Wait Time Models at Transit Bus Stops !,:. ≠ .� ≠ .$ ≠ .' ≠ ./ (8) where, Y1-Y5 are the mean MAWTs of African American, Caucasian, Hispanic, Asian and other passengers respectively. A preliminary analysis of the data to test for the parametric assumptions of normality and equality of variance indicated a statistically significant violation of these assumptions. The preliminary analysis showed a log-normal distribution of MAWT across gender and ethnicity confirming the findings of previous studies. In order to test for statistically significant differences in MAWT of passengers based on gender and ethnicity, the non-parametric Wilcoxon rank-sum test and Kruskal-Wallis test were used respectively. Wilcoxon rank- sum test is a statistical analysis used to determine if there is any significant difference between the means of two groups of independent variables. This method tests the null hypothesis by comparing the ranks of the observations of the two groups of variables to decide whether or not the mean ranks are statistically significant. The statistical significance of the Wilcoxon rank-sum test statistic Ws is determined as follows: �01 = 23 (234 2�4 )� (9) &56788888 = 9232� (234 2�4 ) � (10) : = 6;< 6788888=>?788888 (11) where �1 is the Wilcoxon rank-sum test statistic, �01 the mean of the test statistics, &56788888 the standard error of the test statistic, @ the sample size of the male passengers, @� the sample size of female passengers, and Z the z score of the test statistic. For a significance level set at 5%, z-score values greater than 1.96 are deemed as statistically significant Kruskal-Wallis Test is used to determine if there is any significant difference between the means of the groups of independent variables. This method tests the null hypothesis by comparing the ranks of the observations of three or more groups of a variable to decide whether or not the mean ranks are statistically significant. The statistical significance of the Kruskal-Wallis test statistic H, is determined as: ! = �A(A4 ) ∑ BC� 2C D�E − 3(H + 1) (12) where N is the total sample size, I� is the sum of ranks for each group, and @J is the sample size of each group. The H is then compared to a critical value Hc, which approximates to the chi- square distribution. If H is higher than Hc, then we do not accept the null hypothesis. IV. RESULTS A. Descriptive Statistics The mean acceptable passenger wait time is presented in Table I. The descriptive statistics of the headways of buses are presented in Table II. The mean headway was measured to be 1,119.5s (18.65 minutes). The minimum headway was measured to be 290.75s (4.83 minutes), while the minimum headway was measured to be 3,500s (58.33 minutes). TABLE I. MEAN ACCEPTABLE WAIT TIMES Category Avg. Max. acceptable wait time (minutes) Avg. Min. acceptable wait time (minutes) Time of day AM 7.0 2.5 MID 10.5 4.0 PM 7.5 3.0 Shelter Without shelter 7.5 3.0 With shelter 9.0 3.0 Gender Male 8.5 3.0 Female 8.0 3.0 Ethnicity White 7.0 2.0 Black 8.5 3.0 Hispanic 8.3 3.0 Asian 8.4 3.5 Other 8.5 3.0 KBAT No 10.5 4.0 Yes 7.0 2.5 Quadrant NE 8.0 3.0 NW 7.0 3.0 SE 10.0 3.0 SW 9.0 3.5 TABLE II. DESCRIPTIVE STATISTICS FOR BUS HEAD WAY Statistic Value (s) Mean 1,195.23 Median 1,097.28 Minimum 290.75 Maximum 3,500.00 B. Regression Analysis This section presents the results of the regression analyses to develop to predict the MAWTs of bus passengers. Models were developed for A.M., P.M., and mid-day periods. Thus, three models were developed. The adequacy and significance of the regression models were tested at 5% level of significance. The overall performance of the models was evaluated using the p-values of the models’ F-statistics, the R 2 , and adjusted R 2 values. Also, the statistical significance of the models’ predictors was evaluated using the p-values of the predictors’ F-statistics. In order to achieve the optimal relationship between the dependent variable, MAWT, and the independent variables Temperature T, average headway, AH, time of day, PS and knowledge of bus arrival time, KBAT, several curve estimations between the dependent variable and each independent variable were performed. The transformations were necessary to obtain the best relationship between the dependent and independent variables. The expressions used to transform each independent variable are shown in Table III. Logistic and cubic transformations of AH and T respectively, resulted in the most favorable relationships with the MAWT while PS and KBAT remained untransformed. The summaries of the results of the regression analyses are presented in Table IV. C. Model Testing 1) Kolmogorov-Smirnov (K-S) Test The results of the K-S tests for MAWT show that the maximum difference D between the cumulative distribution of the predicted and observed MAWTs for all the models were less Engineering, Technology & Applied Science Research Vol. 9, No. 4, 2019, 4574-4580 4578 www.etasr.com Arhin et al.: Acceptable Wait Time Models at Transit Bus Stops than the critical value of 1.36 at 5% level of significance. This implies that the models sufficiently predict the observed values. TABLE III. DATA TRANSFORMATION Variable Transformed variable Selected relationship with dependent variable Transformation formula AH AHTr Logistic 1 ( ) ln(1 / )f x x= T TTr Cubic 3 2 ( )f x x= KBAT KBAT Linear 3 ( )f x x= PS PS Linear 4 ( )f x x= 2) Normality of Errors Normality of errors assumption was tested for using the normal probability plot. The observed cumulative probabilities of the standardized residuals are plotted against the expected cumulative probabilities of the standardized residuals. The plots showed that the data follow the diagonal lines for all the models, indicating that the errors are normally distributed. 3) Multicollinearity The test for multicollinearity showed that the VIF of all the variables in the models were less than the maximum value of 10. Thus, multicollinearity between the independent variables is absent. TABLE IV. SUMMARY OF RESULTS # Peak Period Model R 2 Adj. R 2 F-Statistic Sig. 1 AM ����,K = −0.40 + (1.07P10