Methodology For Analysis of TTC’s Vehicle Tracking Data

Introduction

This article describes the methodology behind posts on this site that use vehicle tracking data from the Toronto Transit Commission to analyze route operations.  This is intended for readers with an interest in the technical process by which the raw tracking data are transformed and presented.

Over time, this process has evolved as new questions have been posed, and as ways to present the data have evolved.  This is not a definitive “way to do it”, but a description of the approaches and techniques I have used.

All monitoring data were provided by the TTC to which I extend my thanks.  The processes, code and analysis are all my own and have been developed since early 2007.

Note: Several of the sample files here are in .xlsx format and are intended to be opened by Excel or a similar program that will automatically format them in charts and tables.

There is a change log for this article at the end.

The Vehicle Tracking System

The TTC’s vehicle monitoring system called CIS (Communications and Information System) has been in use for about three decades.  In its original form, the method for locating vehicles was rather primitive and depended on a combination of:

  • Short-range transmitters called “signposts” at many locations in the city.  When a vehicle passed one, it would record the unique code from this signpost and include that value in its position reports.
  • Hub odometers on vehicles to measure distances travelled.
  • The scheduled location of each run on a route.

This system had many limitations, and producing clean data for analysis was challenging.  Each route had its peculiarities about which the less said, the better, but the architecture of the original design compromised the accuracy of the data.

The location resolution by CIS was fraught with problems because it was calculated based on two factors:  the identity of the most recent electronic “signpost” passed by the vehicle, and the distance travelled since the time that signpost was registered.  Notable by its absence was any information about the direction of travel.

Vehicle locations were reported only to the nearest intersection, and logical inconsistencies could occur when CIS lost track of the direction a vehicle was actually travelling or, indeed, whether it was still even on its assigned route.

A vehicle could go off route or make a U-turn, but CIS kept incrementing its position in the “expected” direction of travel.  Only when additional signpost readings confirmed that the vehicle was not where it was expected did CIS start reporting accurate locations.  This produced events I referred to as “teleportations” where vehicles “moved” instantaneously over large distances when their position was corrected.

All vehicles now have GPS units, and the monitoring data include latitude and longitude rather than computed street locations using the signpost, odometer and schedule data.  Although it is possible to get “rogue” readings from GPS errors, these typically are corrected in the next report, 20 seconds later, and the erroneous data can simply be discarded as being too far from the route’s actual location.

The data supplied by the TTC now contain one record every 20 seconds per vehicle, the polling cycle of the CIS system (the central system updates the information from every vehicle in the fleet every 20 seconds).  These changes combine to give a much more detailed and accurate view of each vehicle’s location and behaviour.

Looking at the Raw Data

Sample_512_20100101_GPS_Data

This is one page of raw data for January 1, 2010 for the St. Clair 512 route.  The cars in question are leaving Roncesvalles Carhouse to enter service, and if you plug the latitude and longitude information into Google Maps, you will see where the cars were actually located.  The fields are:

  • Date, time, route, run, vehicle, latitude, longitude

Over the years, the exact format of the data supplied by the TTC has varied slightly (the biggest change came with the conversion to GPS), but this is the general idea.  Other data elements are available, but these are all that is needed for service analysis.

A larger set of data (covering the period up to 9:00 am)  produces the following when plotted.

Sample_512_GPS_Plots

There are two pages in this example.  The first includes all the data letting Excel automatically set the boundaries of what it displays depending on the data range.  The “J” shaped object near the bottom is the St. Clair route itself (including the path from Roncesvalles to St. Clair via King and Bathurst).  Other data points well beyond are rogue GPS data.  These (and the carhouse trips) will be filtered out by the mapping process.

The second is the same information but with the bounds of the chart set to include only the actual route data.  Clearly visible here is the loop at Roncesvalles Carhouse, the short excursion through Bathurst Station, and the loops on the St. Clair line itself.

Note that although St. Clair Avenue is a more or less straight line, it is not horizontal because the Toronto street grid does not lie along the strict east-west and north-south world grid.

The following plots show a sample of GPS data points from three routes with no underlying map reference.  The bounds of the displays have been chosen to show only the area where each route operates although there are some rogue data points well outside this as in the St. Clair example above.

Sample_504_GPS_Plot

Sample_505_GPS_Plot

Sample_54_GPS_Plot

The outline of a route is quite clear along with common short turn routes, carhouse/garage, and the yards.  Note that for some routes, vehicles can be found on many streets other than their official route.  These data points must be excluded from the analysis.

Mapping The Data

Each route has its own two-dimensional position and form, and this makes a generalized approach to analysis of the raw data quite difficult.  The first step is to transform data from the route-specific GPS information into a more generalized format that can be used for any route.

In these analyses, the route is considered as a straight line regardless of its actual geography.  In effect, if the route were a piece of string on a map, the string is pulled out straight.  This allows every location on the route to have a position value in one dimension rather than two.

In my analyses based on pre-GPS data, each intersection reported by CIS was converted to a position on a route.  For example on the St. Clair car, the loop at St. Clair Station was 0, and the loop at Gunn’s Road was 650.  The scale I used was 100 = 1km.  A vehicle’s location was, at best, accurate to the reported intersection, and each intersection on the St. Clair route had an associated position in the range of 0 to 650.

With the advent of GPS, the position of a vehicle must be translated from global coordinates to the linear system.  For an east-west route, the latitude values (north-south) lie within a narrow band, while the longitude values (east-west) vary along the route.  It is extremely unlikely that any route’s vehicles move strictly in one compass direction because “east west” streets do not lie along lines of latitude.

However, this does not matter as long as a route (or segment of a route) is fairly straight.  For an east-west segment, the longitude of a vehicle can be scaled onto the internal co-ordinate system used for analysis.  As an example, Yonge to Bathurst is 2km, and this would be 200 in the internal system.  The longitude of the intersections with an east-west street define the “0” and “200” positions, and everything else is scaled in between.  For north-south segments, the latitude performs the same function.

Each section of the route is defined by a “box” with:

  • maximum and minimum latitude and longitude values (the east, west, north and south edges of the box);
  • an indication of whether the primary direction of travel is east-west or north-south (it is also possible for travel to be west-east or south-north on U-shaped routes such as King;
  • the internal co-ordinate values corresponding to the corners of the box defining the route segment.

These boxes may touch at edges, but they do not overlap.  This ensures that any GPS reading is resolved to only one segment of a route.  Each segment has an associated length between the real bounds of the route it encloses.  I have generally used Google Street View’s computation of walking distances to get the separations between individual points on a route.  The segments may not all have the same orientation, but pasted together they represent the route to scale in one dimension.

In practice, things are a bit trickier, and even a straight route like St. Clair must be subdivided for various reasons:

  • Cars running to and from service via Bathurst Street will report longitudes within the range for St. Clair itself, but the vehicles will not actually be on the route.  The situation becomes even worse downtown with many nearby streets on a tight grid where streetcars may be diverting or short turning.
  • Occasionally, a GPS unit will report a spurious location, typically only for one data point.  Cars have reported locations in the wilds of Caledon, Barrie and the middle of Lake Ontario.  These events are rare, but they must be filtered out.
  • At some locations, a route is “wider” than otherwise.  St. Clair West Station is an example where cars travel both north-south and east-west.  Loops and carhouses have this characteristic, and the filtering system must be generous enough to include them while omitting the truly off-route values.

Using 504 King as an example, Roncesvalles Avenue is quite well defined and has no data points nearby other than in the carhouse.  The “east west” section is trickier because cars may be found on nearby streets short-turning.  If only their longitude were considered, they would appear to move back and forth on King when in fact they were not on it at all.  The big loop via Parliament and Dundas similarly needs to be filtered out by defining boxes tightly around the main route segments on Broadview, Queen and King.

This is illustrated in the following chart.

Sample_504_GPS_Boxes

Here the data points are overlaid by the boxes bounding each route segment.  For reference, a line showing the location of Queen Street (orange) has been included to indicate the need to keep each segment’s box close to King Street.

Sample_504_Map_Parms

The preceding table gives the numeric values associated with each box.  In the course of developing a new “map” for a route, some trial and error is required to discover unexpected places where vehicles wander off route and their locations must be filtered out.  This is a particular problem for buses which are not constrained by a track network.

This route-specific table gives the bounds of each segment (the corners of the boxes), and whether the location should be calculated N/S, E/W, S/N or W/E.  For the east-west segments, longitude determines the offset of a vehicle within a box.  For north-south segments, latitude is used.

In practice, building these tables has been the most tedious part of the exercise, but once completed for a route, changes are needed only to accommodate special cases such as long-term diversions.

In the case of King cars short-turning via Parliament/Dundas, cars disappear where they go off route, and reappear when they return.  The same approach can be used for route branches if the behaviour of vehicles on the branch itself is not part of the overall route analysis.

Branching routes require special treatment to isolate vehicle movements on each branch. Obviously a route with branches cannot be described within a one-dimensional range of values.  For analysis, branches are “lopped off” by leaving a gap in the internal values.

For example, in the combined analysis of the Kingston Road services 502 and 503, the internal values for the 502 route (Bingham Loop to McCaul Loop) range from 0 to 1000, while those for the 503 (Don Bridge to York) begin at 1400. The mapping routines treat a large jump in position as a break and do not interpolate travel over the intermediate “ghost” section of the route. Similarly, the conversion of vehicle positions to “as operated” schedules will skip over the break. The effect is just as if there were a long off-route diversion that is not part of the mapped route – vehicles disappear at the branching point and reappear in a separately mapped route segment. (See examples later of this situation.)

Samples for three bus routes:

Sample_36_GPS_Boxes

Sample_54_GPS_Boxes

Sample_100_GPS_Boxes

Finch West 36 is shown for the overall route and in five subsections (from west to east) at a finer scale to illustrate the handling of places where the route has curves and branches.

Flemingdon Park 100 is shown because it is a very twisty route in parts and poses some challenges in conversion to a linear map.  For example:

  • The Linkwood branch is omitted.  Buses taking this route vanish and reappear on the main route’s plot as a result.  This is a comparatively infrequent peak period service.
  • Buses operate in both directions via a “stub” on Concorde Place north of Wynford.  For the purpose of mapping, the entire stub is considered as having one position.
  • Buses operate in one direction only around the Gervais, Eglinton, Don Mills, Wynford loop.  The east and south path (westbound) is given the same co-ordinates as the north and east path (eastbound) and so the effect is as if this loop were flattened.

With this scheme, GPS values can be quickly and easily converted to the internal co-ordinate system.  The process for mapping a vehicle’s location to the internal one-dimensional view of a route is:

  • Scan the table of “boxes” defining each segment of the route to see whether the current point lies inside any of them (in other words, are both the latitude and longitude of the vehicle within the range defined for a specific box).  If not, this is a rogue data point and it is ignored.  (It could be a GPS error, an off-route garage/carhouse trip, or a diversion.)
  • Each box has a direction associated with it based on which end of the route we consider to be “zero”, and the predominant compass direction of travel.
    • For example, on Broadview Avenue, vehicles move north to south (considering Broadview Station as zero) and so from the subway south to Queen, decreasing latitudes (movement southward) correspond to increasing values in the co-ordinate system.  Queen is, nominally, two km south of Danforth and is therefore “200” on our scale.  Any location in between is scaled from 0 to 200 based on its latitude within the range of the box.  This process is much simpler than trying to calculate actual distance travelled based on the geometry of the street(s) involved.  (Broadview Avenue is actually broken into two sections in the table, and the “zero” point is north of Danforth so that Queen is at “225”.)
    • Other route segments work the same way, but with different ranges of values appropriate to their position on the route.
    • There are several small segments/boxes defined between Broadview and Parliament because Queen and King are quite close together.  A box with Broadview/Queen at its northeast corner and King/Parliament at its southwest would include Queen west of the Don Bridge and data points that we don’t want.
    • Roncesvalles Avenue is south to north, but otherwise the premise is the same with distances being scaled from the south side of the box northward rather than north-to-south as on Broadview.

Creating the Service Graphs

With the GPS data converted to a linear internal co-ordinate system, each vehicle’s data becomes a column in a table.  Each time slot takes one row in the table, with a 20-second increment corresponding to the CIS polling cycle.  For a 24-hour period, this gives a table 4320 (24×180) rows long.  A separate column is used for each vehicle.

In cases where data points are missing (typically caused by out-of-bound readings or by cars that do not respond to every poll), any empty time slots are filled by interpolation between the “good” values.  In the old intersection-based data, these interpolations could cover a considerable distance because location reports might be several minutes apart.  With the GPS data, the gaps are typically only one data point wide except where a car “vanishes” within the downtown canyon.

For gaps of less than two minutes (seven 20-second polls), a vehicle position will be interpolated.  For larger gaps, the vehicle is assumed to have gone off route.

The data are organized by vehicle number rather than by run number because it is vehicles that riders see providing service.  Run numbers are internal to the TTC schedules, and on some routes such as Queen, a car’s run number may be changed due to step back or step forward crewing.  This happens when an operator changes cars to get back on time, and takes their run number with them to the new car.  It is possible to have two or more cars reporting the same run number in the course of the day.

Even with the wonders of GPS, there are still a few oddities that can show up.  For example, a vehicle may actually be stationary, but it keeps reporting locations slightly different from each other on each observation.  This small sawtooth pattern is easy to see in the charts, but it can foul up some types of analysis that depend on deciding when a vehicle has actually turned around.  These are filtered out with a pattern recognition routine, and the oddball data are smoothed out to a better-behaved approximation.

Another problem can arise in that the CIS data stream may contain two reports for the same vehicle at the same time. This is a side-effect of the way that tracking data are polled and reported on 20-second cycles. Sometimes, two reports for the same vehicle appear within one cycle. When this happens, the second value is ignored for purposes of plotting vehicle movements. (This approach is required by the internal data structure in various programs where there is only one “slot” to hold a vehicle’s location for each possible 20-second interval.)

With vehicle movements translated to a series of values in a one-dimensional space, they can be directly plotted for location and time producing a characteristic zig-zag chart first devised in the 19th century to represent railway timetables.  The spacing of the lines shows how close or far apart vehicles are, and the slope of the lines shows the speed of vehicles through various parts of the route.  Vehicles that are stopped show up as horizontal lines, and with the accuracy of GPS data, it is possible to “see” holds for traffic lights and for stops.  Where the stops are farside, two separate stops at an intersection can be resolved to distinguish delays due to stop service time from that caused by the signals.

The following file contains the data for all vehicles for the period 0700 to 0959 on route 504 King for May 1, 2013 in Excel format.

Sample_504_20130501_Mapped_Data

Note that in some cases there are gaps in the data corresponding to periods where cars “vanished” from a route.  For example, between 0712 and 0717, cars 4031 and 4041 appear at Queen & Broadview (roughly 225), travel a short distance (to about 270) and then disappear.  They reappear a bit later (at roughly 180).  These are trips operating into service from Russell Carhouse via Queen, Parliament and Dundas to Broadview because the west-to-north switch at Broadview is out of service pending repairs.

Some columns are empty because the vehicles to which they apply were not in service for the period included in this extract.

Sample_504_20130501_Chart

This chart is one page covering the AM peak for 504 King on May 1, 2013.  It is created directly from the mapped data in the CSV file linked above

Creating “As Operated” Timetables

Published TTC timetables show the scheduled times vehicles leave their termini, and scheduled running times define when vehicles should pass by various locations along the route.  An “as operated” timetable shows what the service actually looked like on one day.  These tables are not published in the articles, but they are an essential part of headway and link time analysis.

Just as one might see an interurban bus or railway timetable with a list of stations, the “as operated” timetable contains a set of “timepoints” on a route.  These are not the official timepoints from the schedules, but points chosen by me to subdivide the route into sections of interest for analysis. (Traffic engineers might refer to these locations as “screenlines”, but I use “timepoints” because this matches the way that the TTC carves up its routes.)

When a vehicle goes off route for a short turn or diversion, it will usually reappear somewhere else.  This presents a challenge for different types of analysis.

On the service graphs, we may want to see one vehicle as a continuous trace even when it appears to jump (via a diversion) from one point on the line to another.  However, for analysis of headways and link times, we don’t want to “see” that jump or include it as if the vehicle had travelled through the bypassed section of a route.  (This problem arises because a two-dimensional travel path is condensed into one dimension.)

There are two options:

  • Generate two versions of the mapped data, one including the jumps and one omitting them.  This is most easily done by treating the absence of a vehicle for more than some time threshold, or its arrival at a point considerably distant from the last “on route” place we saw it, as breaks.  This is considerably simpler with the GPS-based data and reports (usually) of vehicle locations every 20 seconds than it was with the old CIS data.
  • Avoid definition of “time points” for measuring headways and link times that lie within the span of a commonly-used short turn.  For example, don’t put a timepoint on King at Queen and Broadview but rather use a location north of Dundas or west of Parliament.  Even this approach can be fooled because not all short turns follow the same path.

Each route requires careful choice of the points where headways and link times will be measured to work around the geometry of the route and its common short turns.  However, once the “map” defining the route and its time points is set up and known to behave well, this can be recycled for any new data that comes in for analysis.

Scanning through each vehicle’s data, a program compares the reported locations for each pair of time intervals with the locations of timepoints.  Remember that by this stage in the process, all of the data have been converted to the single-dimensional internal position values, and we don’t have to worry about the physical latitudes and longitudes.  Various conditions can apply.

  1. The locations do not bracket a timepoint.  This is not a time of interest.
  2. The “before” and “after” locations bracket a timepoint, but do not exactly match its location.  In this situation, an artificial time midway between the two is created.  If the “before” location is reported at the “00” second, and the “after” is at “20”, then the timepoint is assumed to have been crossed on the “10”.  Since the length of one interval is only 20 seconds, this does not seriously misrepresent the exact time a vehicle passed a point.
  3. Either the “before” or the “after” locations match a timepoint.  If “before”, then this gives the time a vehicle left that point.  If “after”, this gives the time a vehicle arrived at that point.
  4. Both locations match a timepoint.  The vehicle is stationary right at the point of interest.

Whenever the direction of travel reverses, locations in the internal co-ordinates will descend rather than ascend (or vice-versa).  This usually signals the start of a trip in the opposite direction.  For the purpose of timetable creation, this is the point where a new row in the table is created, but this has its challenges:

  • The rogue sawtooth behaviour in GPS data described above can be troublesome because a vehicle appears to be starting new trips.  Filtering by looking ahead in the data is needed to determine whether a vehicle has actually turned around.
  • Another problem can arise where a route has a large on street loop around which vehicles may pass in either direction.  The point where a new trip in the opposite direction on a route begins is not necessarily the same as the point where the previous one ended.  An example is the Parliament/Dundas short turn of 504 King which can be used in either direction.  Cars can disappear and reappear at either King/Parliament or Broadview/Dundas.
  • Special considerations apply to locations where a car can legitimately reverse direction without starting a new trip.  For example, at St. Clair West Station, westbound cars run east briefly within the loop as they traverse the loading platform.  Although it is a short distance, this shows up in the GPS data.  Provided that no timepoints are defined within such an area, the timetable creation routine will not be confused by the reversal.  In this example, a timepoint east or west of the loop would behave properly, but not one within the loop itself.

TTC schedules refer to “up” and “down” trips.  Typically “up” is westbound or northbound, while “down” is eastbound or southbound.  Three versions of the timetable are written into separate files.

  • A complete timetable showing both directions.  This is used to check the overall process, and to debug problems with mapping that might generate unexpected behaviour (typically fragmentary trips).
  • A timetable showing only “up” trips.
  • A timetable showing only “down” trips.

Sample_504_20130501_Schedule

This file is the “as operated” schedule for 504 King for May 1, 2013.  It includes some bus shuttles that covered part of the route in the evening.

The data are organized in vehicle number sequence and by time within each vehicle.  Westbound trips (“up”) read left to right, while eastbound trips (down) read right to left.

Car 4000 starts out from Russell carhouse and first appears arriving at Broadview Station.  It makes two round trips to Dundas West Station and then runs in to Russell at the end of the AM peak.  Note the missing information for the downtown section on most trips for 4000 and a few other vehicles.

This is due to problems with these vehicles’ resolution of GPS information that causes them to give rogue information.  The problem lasts long enough that the mapping program has assumed that they went off route.  If this were a more pervasive behaviour (more cars, more locations), it would severely interfere with data analysis (not to mention route monitoring).

Only certain cars are affected and they show up repeatedly in the data like this.

Creating the Headway Charts

The direction-specific timetables give the departure times for each vehicle from each timepoint for one direction’s travel.  Reading “down” the columns of departures and sorting the entries by time gives the actual headway between each vehicle.  Reading “across” the columns of locations gives the link times between each location.

If, for example, all of the departures westbound from Yonge St. are sorted by time, this gives the actual timetable for someone waiting on the platform at that location.  From that, it is easy to calculate the headways between vehicles.

Similarly, by using the “depart” and “arrive” times for any two points in one direction, one can get the link times for each trip.  Sorting by time of day gives the data that becomes the link table.

The calculated headways and link times are placed in a table organized by location and time-of-day.  These data directly create the headway and link time charts.

On busy bus routes with frequent service (e.g. Finch West), it is possible to have multiple vehicles cross a timepoint within the same CIS polling interval (typically when a traffic light turns green and a herd of buses moves off).  In this case, the first vehicle found in the data is assigned the headway to the preceding one, and the others are assigned a headway of zero.

For each timepoint on one day, this process will produce a list of value pairs containing a time and a headway.  These are arranged in time sequence so that when plotted as a scatter diagram in Excel, the points can be joined with a line and can have a trend line interpolated.  The trend line gives a sense of the general movement of data values smoothing out the short-range noise, while the line following the points shows the erratic nature of headway values scattered around the trend.

This noise is a direct result of the variation in headways from vehicle bunching, and the data points tend to occupy a wide band looking more like a cloud than a closely related, well-behaved set of data.  Although the TTC’s goal is to have all service run ±3 minutes to schedule or to headway (depending on which report you read), the actual scatter of the data is well beyond a band six minutes wide.  For frequent routes (headways of 3 minutes or better), of course all headways below the scheduled value are “in bounds” because the lower bound is negative.

The trend lines also show the degree to which data may be quite similar overall despite the variations from day to day.  Headways may follow the same general pattern and average values, but be wildly different at the detail level from one day to the next.

Sample_504_20130501_Headways_WB

This file contains the headway information for “up” or westbound trips on 504 King for May 1, 2013.  Each of the “timepoints” I have defined has a set of columns with vehicle numbers, times, and headways between vehicles.  These data directly generate the following charts.

Sample_504_20130501_Headways_WB_Charts

Creating the Link Time Charts

Link times show the time required for a vehicle to travel from one point to another.  As discussed above, there are arrival and departure times at each “timepoint” for each vehicle.  The elapsed time between a departure from point “n” and arrival at “n+1” gives the link time.

As with headways, link times can also have multiple vehicles starting from a timepoint together, but they probably do not get to the next timepoint as a group, and each vehicle’s time is calculated separately.

The link times are plotted in the same manner as the headways with a trend line to show the overall movement of data values.  Generally speaking, link times are much more tightly clustered around the trend lines than headways.  This indicates that the time needed to get from point “a” to “b” is fairly reliable and predictable in most cases even though the spacing of individual vehicles may be less so.

A special case of link time calculations occurs at terminals.  Layover/recovery times vary immensely both on the schedules and among vehicles on a route.  It is not uncommon to see vehicles leave terminals in a bunch after even the last of the parade has had a few minutes’ rest.  This has nothing to do with schedules or traffic congestion or the myriad other excuses.  It is a simple case that nobody actually dispatches vehicles on a regular headway, and some operators take longer recovery times than are scheduled hoping to make up their time enroute.

To analyse the variation in terminal times, a “link” is defined from a timepoint near the terminal (e.g. from just west of Yonge & St. Clair through St. Clair Station and back again).  These “link times” tend to vary much more than the times between points on a route.

Monthly Headway and Link Time Charts

The daily charts show headways and link times for each point along a route while the monthly charts consolidate daily data for each timepoint into one set of charts.  This allows comparison of data for multiple days.

In practice, the data are subdivided into weeks for display purposes so that the number of lines on one chart is kept within reason.  Generally speaking, trend lines will follow similar paths from day to day except when there is a major service disruption.  Events such as construction projects tend to show up for specific links on a route and for a period of time that the event causes congestion and delay.

Having a larger set of data to work with allows for generation of average and standard deviation charts. These are done separately for weekdays, Saturdays and Sunday/Holidays. This can be thrown off if there was a considerable change in the route’s character during a month under study. For weekend values, there are usually only four or five days participating, and relatively few observations per hour. One oddball day due to major delays, a parade, etc., can distort the calculated average and SD values. Such situations need to be handled on an “as they arise” basis.

Here are samples of the monthly headway and link time charts in their form up to September 2017:

29_201509_101_King_MonthHeadways

29_201505_10107_King_TransitRd_MonthLinks

The first set of charts shows the headways northbound at King & Dufferin for the month of September 2015. Each day has its own set of coloured dots and trend line. In addition, all of the weekdays are presented on one page. This is intended not to show individual data, but the overall behaviour for the month. The more spread out the “cloud” of data points is, the less reliable the service. This is also shown on the three “Stats” tabs which show the averages and standard deviations for each group of days.

The MonthLinks file shows the travel times from King to Transit Road (most of the route, not including the terminal loops) for May 2015. Again, the scatter in the dots gives a sense of the degree of variation, while trend lines for individual days show that not all days (especially on weekends) have the same behaviour.

The “Data” tabs for these charts are generated initially as .csv files and then imported into templates. The .csv versions will be used later in historical analyses (see below).

From fall 2017 onward, I redesigned the templates to add functionality and to simplify the month-to-month transition in the grouping of days into weeks:

  • Calculation of averages and standard deviations formerly performed in the program that digested monthly data was moved to a separate calculations tab (“Data2” in the samples below).
  • The calculation of quartile values was added to allow charting in block-and-whisker format.
  • The column arrangement in the .csv files was standardized so that each block of days representing one week always lies in the same place, as do Saturdays and Sunday/Holidays. This allows a monthly template to be recycled with only minor modification to include or exclude unused days (columns) from the day-by-day charts in each month.

These charts are direct descendants of the pair linked above, but with the benefit of formatting changes to improve both presentation and ease of use with month-to-month shifts in the calendar.

29_202003_101_King_MonthHeadways

29_202003_10106_King_SouthofWilson_MonthLinks

Within the spreadsheet, the “Data” tab contains the original .csv data with a month’s worth of data, while “Data2” contains calculations based on the data. Between them, these two tabs populate all of the charts on other tabs of the spreadsheet.

Headway Adherence

From the table of headways at each timepoint, it is possible to count the number of entries by period and location that fall within various bounds or values.  This, in theory, allows an estimate of headway adherence by the TTC’s formula of ±3 minutes.  However, there is a difficulty here given the actual behaviour of the schedules.

  • The transition from one headway to another occurs at different times for each location and direction of service.
  • These transitions are not necessarily instantaneous, and there may be a period when scheduled headways shift gradually from peak to off-peak (for example).
  • Vehicles that are being short turned do not necessarily contribute to the quality of service.  For example, a King car short turned eastbound at Church may be counted at Yonge, but it provides no useful service to riders waiting there.
  • Vehicles may serve a stop advertising one destination, but be short-turned before they reach a rider’s intended stop.  This will be very annoying depending on the origin-destination pattern of the riders, and the point at which the vehicle advertised its actual destination or the operator announced the change to riders.
  • The actual setting of the destination sign (even the digital ones) is not included in the CIS data stream, and so there is no way to tell where a vehicle claims to be going when it arrives at a stop, let alone to determine when the advertised destination changed.

Although I have published some analyses of the actual behaviour of services as compared with the TTC’s goals, these are tricky to build because of all of the caveats listed here.  This is a project not just for one rainy day.

Congestion and Average Speed Charts

It is impractical to chart every small section of a route as a separate link and produce individual local analyses.  General changes in link times over one or two kilometres will show areas of congestion, but for a fine-grained view, a different charts are needed.

Remember that the internal representation of vehicle positions was translated from GPS to a linear set of values.  Every 20 seconds this gives a snapshot of the location of each vehicle on a route.  The King route, for example, is nearly 13km one way, and on the internal scale of 1 unit to 10 metres, this means there are about 1300 possible locations for a car at each observation.  In practice, there are at most about 40 cars in service, and so for one set of observations, most locations are empty.  However, if the locations are counted over an hour, then the locations with the most observed cars are also going to be the locations where cars are travelling the slowest or are stopped.

Thinking back to the mapped data table, each row in that table is a snapshot for one moment in time giving the location of each vehicle (if it is active and on the route).  Collectively, 180 rows give one hour’s worth data.  It is quite straightforward to spin through the entire table counting vehicles by location and hour, but care is needed to only count vehicles moving in the desired direction.

504_201403W1_WB_VehCounts

These charts, each showing one hour’s operation, show the onset and disappearance of congestion at various points along the route, but with the entire route visible on one chart.  Link time charts cover only a segment and do not break down locations within the segment.

Each stop on a route shows up as a clear spike because vehicles will tend to stop there for multiple 20-second intervals.  Where traffic is seriously congested ahead of a stop or a traffic signal, vehicles will spend more time and there will be higher counts within each unit.

A variation on this type of analysis looks at the change in position between current observations and the one 20 seconds in the future.  This gives both the location and speed of a vehicle at a specific time.  When these speeds are averaged over an hour, the result is a chart that complements the congestion plots with low points at or near stops and congested areas, and high points in between.

The two plots differ in that for the vehicle counts, the height of the spikes varies depending on service levels, while the speed charts are independent of the number of vehicles and make direct comparison between different times of the day quite easy. Stepping through the tabs of the spreadsheet, one can see areas of congestion form and recede by the hour.

504_201403W1_WB_SpeedStats

504_201403W1_EB_SpeedStats

Although the data can be consolidated (say for one set of weekdays, or for all Saturdays), this can mask effects that only occur a few of the selected days.  For example, routes affected by night club activity tend to run comparatively freely early in the week, or on days with bad weather, but encounter traffic congestion and higher passenger demand on fair weather evenings late in the week.  For example, some routes are affected by traffic around shopping plazas or entertainment locations that does not show up every day of the week, or even on every weekend.

Selection of the specific days or periods to chart can be an iterative process using the monthly plots to identify days with generally similar behaviour, and then drilling down to the detail.

The charts above plot the speed profiles as a solid area, but for comparison purposes, lines work better. Here is a more recent example from an article about bus vs streetcar speeds on 501 Queen in November 2011. These charts compare average vehicle speeds eastbound on the western half of the route (Yonge to Roncesvalles).

This spreadsheet has many tabs:

  • 0600 to 2500 contain hourly data plots for each mode
  • 0600D to 2500D contain the “delta” values between the values
  • TPData is used to create the vertical lines marking major intersections
  • Headings generates the chart titles
  • Data1 and Data2 contain the two sets of speed data for plotting
  • Delta calculates the difference between the two sets of data for each cell and populates the “D” suffix charts

The intent of this structure is to minimize changes needed to create a new set of comparison charts.

501SpeedStats_201911_EB_West_Comparison

Destination Charts

In some earlier articles, I included plots showing vehicle spacing and destinations.  The intent was to give a sense of the scatter of headways leaving a point and how this was magnified by the short turns (scheduled or otherwise) in service.

The technique used was to look at the “as operated” schedule for the location and direction to be used as a starting point, and then scan across to see the last timepoint where the vehicle was observed.  The combination of the departure time and the actual length of the trip defined a vertical bar, one for each vehicle as shown on the following charts:

200911DestinationsUpGlad

Data are shown for the 501 Queen route westbound from Gladstone for the November 1-30, 2009.  The spaces between the bars show the regularity (or not) of the headways.  The level and reliability of service actually reaching each point outbound on the route are revealed by the spacing between bars at each location.

(Apologies for the format which uses numeric location values rather than location specific grids lines.  When this chart was developed, I had not yet switched to the latter format, and as I have not been generating any in the past few years, I have not updated the presentation.)

Historical View of Headways and Link Times

A fairly comment statement about traffic in Toronto is that it is worse than it used to be, and transit vehicles cannot get across their routes as quickly now as in days of yore. This is true in some cases, but an equally common situation is the disruption of a route by a temporary obstruction or diversion. With an accumulation of data from several months, it is possible to build histories of average travel times and headways, as well as SD values for these, to see how things have really evolved and where emerging problems might lie.

The process and chart format is the same for both types of analysis.

The monthly summaries (see above) contain all of the data for a given point (headway charts) or route segment (link time charts) organized by day, type of day and time.

To perform a review of a route over time, all of the monthly data for one point or link is read and consolidated by week and time (on a half hourly or hourly basis), with average and SD values calculated for each time interval for each week. Here are a few examples of the result:

504_105_Yonge_HeadwayHistory_SDC

504_214_BloorWest_HeadwayHistory_SDC

These charts show the evolution of headway values at Yonge westbound and at Bloor/Dundas southbound over the periods for which I have tracking data. Note how often the SD and average values overlap indicating the degree to which vehicles tend to operate in pairs.

504_11213_Jameson_EastofQueensway_LinkHistory_SDC

The section westbound from Jameson to The Queensway shows very large changes in travel time values, especially in the afternoon and evening when outbound motor traffic is attempting to reach (or to bypass) the Gardiner Expressway. The times vary a great deal depending on current conditions including factors such as close ramps and bridges, traffic signal behaviour and special events that produce heavy traffic flows at times other than the “traditional” peaks.

504_20605_University_Yonge_LinkHistory_SDC

Some areas have quite static travel times seen over a long period with interruptions caused by construction projects. King eastbound from University to Yonge lost the curb lane to construction in late 2013 and this caused substantial increases in travel time.

504_20504_Yonge_Jarvis_LinkHistory_SDC

Travel time eastbound from Yonge to Jarvis is another example of a value that has changed little over time. One visible exception is at the start of September 2014 when the TIFF diversion sent all King service via Queen Street.

The purpose of these charts is both to flag specific weeks for more detailed study, and to verify whether conditions are changing on a long term trend as opposed to short-lived upheavals.

One challenge with data subdivided at this level (especially if done to a half-hour resolution) is that “n” can be fairly small for a specific week and time slot, and a single delay of a few cars on one day can produce a spike in average and SD values. This is a catch-22 between consolidation and reporting at a more detailed level .

Conversion of Pre-GPS Data

When I started work on this back in 2007, the TTC was not yet using GPS data to record vehicle locations. Instead the tracking system estimated a vehicle’s location relative to specific intersections (mainly those with bus/car stops) through a combination of electronic signposts and hub odometer readings. This arrangement produced many strange behaviours in the data which were only corrected once the GPS info was added to the system.

For purposes of historical comparison, the old format data are converted to GPS by the simple process of matching intersection locations in that data with their GPS equivalents. From that point onward, the data are in a format that the current suite of programs understands.

Programming Languages and Data Formats

There is nothing very complex “under the covers” here.  No esoteric programming languages or data structures were used, and the skill, such as it is, lies in understanding the peculiarities of the data and how the information can be consolidated into a standardized format.

All of the raw data are handled in CSV files so that they can exist outside any kind of formal database structure.  The charts are all produced in Excel using templates that have been created specifically to take the CSV files as input.

The programming language is Rexx, something that will be familiar to old hand IBM mainframe folks like me, especially those from a Tech Support background.  This was originally developed as a scripting language for IBM’s VM/CMS environment, but spread from there to many other platforms and is available on an open source basis.  The birthday of Rexx is March 20, 1979.  According to the official history, the name just “sounded nice”, but lore has it that there was a “Rex” pub not far from the lab where its creator worked.  (This one was in Hursley, UK, not on Queen West.)

For more info, visit the Rexx Language Association.  The specific implementation I am using is Open Object Rexx running on Windows.

The Excel charts are generated using Visual Basic macros within Excel to drive the process of populating multiple charts for various days of a month, locations on a route, etc.

No, don’t ask.  I am not prepared to make the code open source.

Change Log

May 24, 2014:

  • Description of calculation of stats (averages, standard deviations, distributions) added.
  • Conversion of pre-GPS data

August 25, 2013:

  • Flemingdon Park 100 added as a sample route to illustrate a very twisting route layout with special mapping challenges.
  • Cutoff time for position interpolation reduced to two minutes to avoid errors at short off-route diversions and short turns.

October 6, 2015:

  • Updated for clarification.
  • Speed and Vehicle Count charts added/updated for new format.
  • Destination chart replaced with .xlsx version.

October 8, 2015:

  • Monthly headway and link time charts.
  • Headway and link time histories.

July 31, 2020

  • All hotlinks have been changed to open in new tabs.
  • Some of the files linked from this article have been refreshed as they did not make the transition from my old website to the new one cleanly.

August 2, 2020

  • Samples of the monthly headway and link time charts have been added to show underlying structural and functionality updates that occurred between fall 2017 and early 2020.
  • A sample of comparative travel speeds has been added to show the format for two separate sets of data.

3 thoughts on “Methodology For Analysis of TTC’s Vehicle Tracking Data

  1. Thank you this, Steve. I’ve been wondering recently if you would publish something like this.

    I have some questions if you have a minute:

    – Where do you get the CIS data from?

    Steve: From the TTC. I have an arrangement to request data for specific routes and periods. A few years ago, there was an idea to put this stuff online for general access, but it died with the Giambrone era.

    – I wonder if you can elaborate on this “downtown canyon”. Is it because of the tall buildings downtown interfering with certain vehicle’s GPS readings? Also, you say it only affects certain vehicles – so do you maintain a list of, or at least generally know about, which vehicles these are? (I’m not asking for a list of vehicle IDs – only interested in your process.)

    Steve: Yes. The problem is caused by reflections of GPS signals off of the buildings that confuse some readers. Most cars are not affected, but there are a few that show up regularly with wild readings starting roughly at University and ending a bit east of Yonge on King Street.

    Thanks again. It’s great – important, even – to see how you do what you do.

    Steve: Thanks. I know that some of these articles are tedious reading for many, but the documentation of the degree to which fact and fantasy about surface operations diverge is important. The single greatest impediment to attracting riders is service quality.

    Like

  2. The CIS data is actually obtainable by anyone. NextBus/TTC allows anyone to access the XML with the vehicle location at any given time. This is how all the apps get there data.

    In order to have a complete set of data, one will need to write a program that fetches the XML data every 20 or 30 seconds for a period of time. By parsing the XMLs, you’ll have all the GPS coordinates of each vehicle on the route. You can have night bus routes and all those lightly used routes too.

    Most app creators would use Java programming language because it has libraries for file fetching from the internet and XML parsing making it ideal for collecting data from NextBus.

    Steve, I applaud you for the box method to map the data. It really takes precise measuring to determine if the vehicle location is on the route or not.

    Steve: Yes, the real time data are available via Nextbus, but when, retrospectively, I want a month’s worth for ten routes, that’s a whole different matter. If the data arrived in XML format, they would be much bigger simply for all the tags. One month’s data for a busy route like King is 170mb after it has been unzipped, and that’s in csv format with no field identifiers.

    Like

  3. If you don’t mind, how did you label the stop names on the vertical axis of the service chart? I’ve been trying to google it, but I don’t even know what the function is called. Did you put the labels on using Excel?

    Steve: Ah, yes, that was tricky. The chart page has a “height” in units used to map the service, and each of the stops (and its corresponding horizontal line) has a value in that co-ordinate system.

    In the case of the chart you linked, for King, the height of the chart is 1300 units and Spadina is at 635. Therefore, this line (and its text) go at 635/1300 relative to zero as the horizontal access. This has to be scaled against the chart’s physical location which comes from the values in the top and height values for ActiveChart.Axes(xlValue) (which just for fun counts down from the top of the chart, not up). The horizontal positioning comes from the Left and Width values for ActiveChart.Axes(xlPrimary).

    For each route, I wrote an Excel macro that spins through the list and converts the offset between the base line and the position of the stop into the page’s co-ordinate system. The text and the line are then drawn in at the proper vertical and horizontal positions on the page relative to the underlying chart. It took a while to get this to work properly, but once it did, the effect was almost magical.

    This is actually done on a blank chart to produce a template for each route with the vertical scale preset to the appropriate height and with the reference lines drawn in. A separate macro populates the daily charts by importing .csv files containing the chart data into the template, and saving each day’s under its appropriate name.

    Visual Basic is not my “first language”, but some of the code was developed by using the “Record Macro” function, performing what I wanted to do manually, and looking at the resulting code to see what Excel was expecting.

    Like

Comments are closed.