By Nathan Adlam
The primary goal of this analysis is to work with the New York Division of Transportation to improve the efficiency and reliability of the city's bus transportation system. The city has been experiencing a significant number of bus breakdowns and delays, which has been causing inconvenience to commuters and straining the city's public transportation resources.
My main task is to analyze the provided data to identify patterns and factors that contribute to these breakdowns and delays. Let's get into it.
The data was downloaded as an Excel file and was rather large (282,160 rows). The data was not very clean so some cleaning was necessary before starting the analysis.
A copy of the data was created in another sheet in order to preserve the raw data. Some of the data cleaning steps taken were to manually update the names of the bus service companies, as some of them had variously-mutated names of what appeared to be the same company (Philips Bus Service, Philip Bus Service Inc. PHILIP BUS SERVICE, INC, etc.) Another important cleaning step was to clean up a column that included a range of times for the delay, so 2 columns were created to make this data more useable.
The last main cleaning step was to change a date-time column to a day of the week. After that, the analysis could begin.
The following questions were the starting point for the analysis.
What are the most common reasons for delays and breakdowns?
How do delay times vary by bus company and borough?
Is there a correlation between specific days of the week and the frequency of breakdowns or delays?
Let's start with the first question, What are the most common reasons for delays and breakdowns?
The vast majority of breakdowns are occurring due to mechanical problems or "won't start", which also seems to be a mechanical problem.
This could be remedied with more regular monthly or weekly maintenance.
The overwhelming main cause of delays is heavy traffic. While traffic itself is not easily remedied, optimizing pickup times and locations could help reduce the time spent in traffic.
The next question is how do delay times vary by bus company and borough?
Let's take a look at a couple of charts.
The chart above displays the 10 bus companies that have the highest average delay times. Because we only have a possible range of delay times, we have an upper limit and lower limit for each company. This could be combined with a metric like number of breakdowns (while considering the number of trips) to examine their total impact in delays and review if it makes to sense to move forward with them in business.
The boroughs with the highest delay times are the main areas of NYC. These are the areas with the highest concentration of people and vehicles which would account for the longer delays.
And the last question: Is there a correlation between specific days of the week and the frequency of breakdowns or delays?
There is a spike in delays on Monday and a decrease in delays on Friday. Since the majority of delays are caused by traffic, we can assume the decrease in delays on Friday are caused by fewer people going into work that day.
There is a gradual decrease in breakdowns throughout the week. There is a higher percentage of breakdowns happening on Monday and decreases throughout the week. Perhaps scheduling some maintenance over the weekends could help with some breakdowns.
So that's it for this NYC School Bus Breakdowns and Delays analysis. This was a great exercise with a real dataset that could really add value to their organization.