Headway/Frequency Estimation

Tom Buckley

2019-08-25

Introduction

This is a brief introduction to the functions in tidytransit that can be used to describe the frequency with which vehicles are scheduled to pass through routes and stops.

Using read_gtfs()

For convenience, when you pass a frequency=TRUE parameter to read_gtfs(), a routes_frequency dataframe is added to the list of calculated dataframes in the gtfs object as read by read_gtfs.

Key Assumptions

By default read_gtfs assumes:

See the reference for the get_route_frequency() function for other options (e.g. weekends, other times of day).

By Route

View the headways along routes as a dataframe.

head(nyc$.$routes_frequency)
#> # A tibble: 6 x 5
#>   route_id median_headways mean_headways st_dev_headways stop_count
#>   <chr>              <int>         <int>           <dbl>      <int>
#> 1 1                      5             5            0.15         76
#> 2 2                      7            51          135.          120
#> 3 3                      8             8            0.08         68
#> 4 4                      6           115          205.           77
#> 5 5                      9           110          271.          102
#> 6 5X                    48            48            0            29

By Stop

View the headways at stops. stops_frequency is added to the list of gtfs dataframes read in by read_gtfs. Again, by default, frequency is calculated for service that happens every weekday from 6 am to 10 pm. See the reference for the get_stop_frequency function for other options (e.g. weekends, other times of day).

head(nyc$.$stops_frequency)
#> # A tibble: 6 x 6
#>   route_id direction_id stop_id service_id               departures headway
#>   <chr>           <int> <chr>   <chr>                         <int>   <dbl>
#> 1 1                   0 101N    ASP18GEN-1087-Weekday-00        177    5.42
#> 2 1                   0 103N    ASP18GEN-1087-Weekday-00        177    5.42
#> 3 1                   0 104N    ASP18GEN-1087-Weekday-00        177    5.42
#> 4 1                   0 106N    ASP18GEN-1087-Weekday-00        178    5.39
#> 5 1                   0 107N    ASP18GEN-1087-Weekday-00        183    5.25
#> 6 1                   0 108N    ASP18GEN-1087-Weekday-00        183    5.25

Mapping Route Frequencies

You can now map subway routes and color-code each route by how often trains come.

plot(nyc)
#> Calculating headways and spatial features. This may take a while
#> Calculating route and stop headways.

Mapping Stop Frequencies

Before we plot headways at stops, we must join the frequency table to the geometries for the stops.

some_stops_freq_sf <- nyc$.$stops_sf %>%
  left_join(nyc$.$stops_frequency, by="stop_id") %>%
  select(headway)

Then we can plot them.

plot(some_stops_freq_sf)

We will see some outliers for headway calculations in this plot.

In the NYC MTA schedule, for a few stops, a train will only show up a few times a day. Since we are calculating headways, by default, for a period from 6 am to 10 pm, the average headway for these stops will be as high as hundred of minutes.

One quick solution to the outlier stops in above plot is to throw out stops with headways greater than an unreasonable amount of time. For example, we can filter out stops with headways above 60 minutes.

some_stops_freq_sf <- some_stops_freq_sf %>%
  filter(headway<60)
plot(some_stops_freq_sf)

If you’re interested in how to work with schedules and outlier stops like this, the timetable vignette, included in this package, is a great introduction.

Route Frequency Assumptions

Headways along routes, in the routes_frequency data frame, are based on summary statistics of the frequency with which vehicles pass through the stops in the stops_frequency data frame.

head(nyc$.$routes_frequency)
#> # A tibble: 6 x 5
#>   route_id median_headways mean_headways st_dev_headways stop_count
#>   <chr>              <int>         <int>           <dbl>      <int>
#> 1 1                      5             5            0.15         76
#> 2 2                      7            51          135.          120
#> 3 3                      8             8            0.08         68
#> 4 4                      6           115          205.           77
#> 5 5                      9           110          271.          102
#> 6 5X                    48            48            0            29

The median value for a route will more closely match what a rider might experience along that route. That the median works better than the mean is due to the outlier stops discussed above.

One way we can verify these estimates is by checking against reported headways.

For example, we see that our estimated median headway for the 1 train from 6 AM to 10 PM is 5 minutes. When we compare this estimate with the wikipedia entry for this train, we have a rough match. Headways reported there are 3 minutes at rush hour, 6 minutes at mid-day and 10 minutes at night.

Specific Days and Times

You might be interested in calculating headways for more specific times of day.

For example, what are rush hour headways like on a specific weekday (2018-08-23)? The set_hms_times and set_date_service_table functions will alter the feed for us, allowing us to filter by date.

nyc <- nyc %>% 
  set_hms_times() %>% 
  set_date_service_table()

Below we pull a service ID for a specific weekday (2018-08-23).

nyc <- nyc %>% 
  set_hms_times() %>% 
  set_date_service_table()

services_on_180823 <- nyc$.$date_service_table %>% 
  filter(date == "2018-08-23") %>% select(service_id)

See the servicepatterns and timetable vignettes for more advice on schedule filtering.

Then we calculate the route frequency in the afternoon rush hour.

nyc <- get_route_frequency(nyc, service_id = services_on_180823, start_hour = 16, end_hour = 19)
#> Calculating route and stop headways.
#> Warning in get_route_frequency(nyc, service_id = services_on_180823, start_hour = 16, : failed to calculate frequency--
#>             try passing a service_id from calendar_df
head(nyc$.$routes_frequency)
#> # A tibble: 6 x 5
#>   route_id median_headways mean_headways st_dev_headways stop_count
#>   <chr>              <int>         <int>           <dbl>      <int>
#> 1 1                      5             5            0.15         76
#> 2 2                      7            51          135.          120
#> 3 3                      8             8            0.08         68
#> 4 4                      6           115          205.           77
#> 5 5                      9           110          271.          102
#> 6 5X                    48            48            0            29

Again, the median headways for the 1 train seem to roughly correspond (1 min off) to those published on wikipedia entry