Measuring Service Quality

The quality of service provided on Toronto’s streets and in the subway has been a major, long-running topic on this site.  As reported last week, the TTC has just issued its third quarterly report on surface route reliability relative to a target of scheduled headway ±3 minutes.  They acknowledge that the methodology behind these numbers is flawed, and seek a better way to track reliability from the riders’ point of view.

To that end, the TTC is looking at the “journey time metric” used in London, UK, which tracks an entire trip’s experience including access, waiting and transfer times.  Leaving aside the need to define multiple trips both in location (downtown, suburban, in between) and in time (peak commutes, midday, evening, weekends), I believe that multiple metrics are required to flag problems at a level that is both meaningful and revealing of problem specifics.

What follows is a slightly reworked version of a proposal I made to the TTC recently on this subject.

1. Granularity

Whatever metrics are used, they should be calculated on a disaggregated basis such as time of day, day of week (weekends), portion of route.  The outliers should be reported.

In other words, if there are have 1,000 combinations of route, time period, location, and there are 500 poor performers, then we have a large and pervasive problem.  If there are only 100 really poor performers, then things are may not too bad.  However if these are concentrated by period or location, this shows an area needing attention.

If we find that only 20% of “Saturday” meets the standard, this is a major issue given the level of weekend demand, but the problem could be submerged in the much more plentiful weekday statistics by quarterly averaging.  Route analyses published on this site show that evening and weekend services in many cases appear to operate with little or no supervision and completely unpredictable service quality.

Major routes should be subdivided for analysis so that good service on the central part of 504 King (say) does not mask problems on outer sections (e.g. west of Dufferin, east of Parliament).

Granular reporting allows the selection of an appropriate metric for the route, time and location.

2. Headway Adherence and On Time Performance

These stats remain valuable, but only with granular analysis and reporting described above.

Where headways are greater than some policy threshold (say 15 minutes), on time performance (OTP) is more meaningful that headway adherence.  Buses may be “properly” spaced, but they may not arrive when riders expect to see them.  This can foul up both wait times and transfer connections.

This is particularly important for night services, but it can also affect major routes with scheduled short-turns or branches that leave wider headways on the outer sections.  The metric appropriate for a line at Yonge Street in the peak period may not be appropriate for a branch well away from the core especially at evenings and weekends.

Very short headways (bunching) do not necessarily represent acceptable service even if they fall within the 3 minute rule.  Riders will tend to cram on the first bus or streetcar that appears.  If that vehicle is short turned, at least a through vehicle may be close behind.  A related problem is that all vehicles in a bunch may not be equally attractive to riders depending on their destination, even assuming that they all actually stop (buses in packs love to leap frog).

3. Proportion of Service Operated

How many trips actually operate at a specific location and time period versus what is on the schedule?  If all trips are present, but either the headway or OTP metric is low, then we know that it’s line conditions (traffic, operation, management) that are the likely culprits.  If, however, trips are missing, this will cascade into wider headways and trips that are not just off schedule, but completely absent.

On frequent services, a missing vehicle (including short turns) may not trigger an exception to the headway metric.  Half of the peak scheduled subway service could be missing and the route would still score 100% on the 3 minute rule.  (The scheduled headway is 2’20”, and half the service would be 4’40”.  The service provided would fall within the span of 0’00” to 5’50” allowed by the metric.)

4. Consolidated Metrics

By now, readers familiar with such management tools as “scorecards” will know that the amount of detail this process describes will be substantial, and the typical manager’s eyes will glaze over even at the thought of a chart or spreadsheet with myriad details.  Managers love to reduce their worlds to simple, one-dimensional numbers, or even something as simple as a traffic signal.  This sort of attitude drives those of us who know that the system fails or succeeds at the detail level absolutely bonkers.

Having calculated all of the details, the goal should be to meet a compound standard but still at a granular level by location.  For example, did service operate to reasonable headways and at the scheduled trip count?

Exception reporting is essential here.  Assuming that service is as good as TTC often claims it to be, then the number of instances of routes failing to meet criteria should be small.  These are the ones to flag for attention, especially if they show up again and again, or if they score poorly on a consistent basis over a long period.  Conversely, if only a handful of routes manages to stay off of the “bad behaviour list” and every edition of this report lands with a thud (real or virtual), then there is something very wrong.

If the manager wants a simple metric, it should be to get that report down to one page, and not just by printing in teensy-weensy type on the largest available sheet of paper.

5. Ridership and Crowding

If a service is overcrowded, this is extremely unpleasant for riders and deters transit use.  Riders may face full vehicles they cannot board, and even when they do squeeze on, their trip is in less than ideal circumstances.  TTC Service Standards were relaxed as a budgetary measure, albeit a one-time fix that cannot be repeated, and what small amount of surplus capacity that had been designed into the bus network was removed.  This leaves no room for growth, but keeps the budget hawks at Council happy with the supposed “efficiency” of the transit system.  How this is supposed to attract motorists to transit is a mystery.

The TTC takes riding counts regularly, and is supposed to be equipping vehicles with automatic passenger counters to aid in the frequency and granularity of counts.  This information should be reported, and it should not be consolidated into average loads per peak hour.  If vehicle loads are uneven (as they often are with poorly spaced service), then an average load will mask what riders actually experience.  Most riders are not on the half-empty vehicles.  Would-be riders who give up and walk (or take a taxi) are not counted at all.

The fix may be better line management, traffic priority, more service or a combination of all three.  With the political focus on big-ticket expansion programs, attention must be drawn to the shortcomings of the service provided today.

6. Journey Time

It will be impossible to construct every possible variation for trips in the network, but these will certainly give a high level view.  These journeys must reflect not just peak period core oriented trips for which the system is optimized.  Suburb-to-suburb trips, midday trips, evening and weekend trips all need to be considered.

There is a “Catch 22” here.  Suppose that a journey involves a walk to a transit route, a wait for a bus, a ride to a subway terminal, a walk through that terminal to the train platform, a wait for a train, a ride when the train arrives, and finally a walk from the final station to one’s destination.  Transit planners know that these components are perceived very differently by riders.

For example, wait time, especially for an unpredictable service, is poisonous to the view of a convenient service.  There is anxiety associated with uncertainty about a vehicle’s arrival and the rider’s ability to board.  The time is not spent doing anything productive (e.g. travelling), and the environment may be less than ideal even in a subway station.  Wait times will be compounded, of course, if the service is overcrowded because a rider may not be able to board the first train.  This sort of problem needs to be included in the construction of the metric.

Conversely, some parts of the trip (walking, riding) may have fairly consistent values, and these could swamp large swings in the more annoying components of waits and transfers.

While journey times might provide another way of looking at service quality, they are not a substitute for detailed tracking of service behaviour at the detailed level.


Some of the service quality metrics will interact and fixing one problem may bring improvement in multiple values.  However, if we don’t know the details in time and location, and only vaguely sense that “the Finch West bus is a mess”, this feeds the TTC culture where “traffic congestion” and a “not our fault” attitude prevail.

Some issues — budget allocations for service and fleet levels, transit signal priority, parking bylaws and towing policies to keep streets clear — do require external assistance and difficult policy decisions by City Council.  Average reports over three months’ operation do not, however, pinpoint the problems or show the degree to which they are shared across the city.

Any metrics describing TTC service must be at a level riders can understand.  To much consolidation, whether it is in time, location, or trip type, will hide valuable information that riders know from first-hand experience.

A goal that service will meet a standard 70% of the time is laughable.  Such a goal will guarantee than, on average, a typical 10 trip a week commuter will have 3 trips that don’t meet standard.  This could be a small variation, or a major disruption.  Averaged over three months, that 3-in-10 will be much worse on bad weather days and on days when unusual events disrupt the system.  An entire week without a problem will be rare.

That 3-in-10 goal has a long history at the TTC where the earlier metric — schedule adherence within three minutes — was the target.  The service could only manage a 70% rating a good deal of the time, and the “target” was set to match actual behaviour.  That’s no way to improve, only a way to say “we didn’t get any worse this week”.

Management has hidden behind the averages for too long.  Granular reporting and multiple metrics are needed to ferret out the problems.  Don’t report every item, but without the details (visible in public reports), the value of any measurement exercise is dubious.

Much of this information can be calculated retroactively from vehicle tracking data and this would allow transition to new metrics by seeing the effect of a new scheme on periods for which old-style reports are already available.  A trial on a few major routes could be used to evaluate various options and to understand how the new metrics behave.

The TTC keeps talking about its customers and how much more responsive they want to be to riders’ concerns.  This means more than cleaning up subway stations.  The organization needs to focus on the product they are actually selling — service.

9 thoughts on “Measuring Service Quality

  1. Can you tell us whether your proposal to the TTC was well received and if they are going to use at least some of your excellent ideas? Knowing how their service REALLY works may be a bit of an embarrassment to them but knowing where the problems are is the only way to solve them.

    Steve: I have yet to receive a reaction. It was only sent to them on the weekend, and I decided that it deserved a wider audience.


  2. Transit signal priority eh … oh that’s a problem on the YUS Number 1 line! Signals don’t always function. So that why there are delays every single day!

    There were a few times on the 42 Cummer bus where drivers were nice to wait for me as I ran to the bus stop. So nice of them.


  3. Hi Steve, it’s only a matter of time before on-time performance is measured as the variance between predicted arrival time and actual arrival time. Scheduled times are becoming irrelevant (from a traveller’s perspective) in a world of real time information.

    Steve: I do not agree. Just because you hit the “predicted” time does not mean that the service is reliable. Moreover, if the predicted time is 20 minutes from now on a route that should run every 10 minutes, there is a big problem. For example, today I made a connection between the Queen car and the Coxwell bus (filling in for the Kingston Road streetcar service). For the second time in a week, when I looked at Nextbus, there were TWO buses at Coxwell Station. All of us waited for a double-headway gap. Yes, I knew what was going on, but if it had been a stormy day, I would not have been amused at an unnecessary wait.


  4. I did not write that hitting the predicted time means a service is reliable. What I meant was that there is an alternate way to consider on-time performance from a user’s perspective. I don’t think regular TTC customers experience their service in terms of scheduled times or headways anymore. The expectation these days is that if I always catch a bus at so-and-so time it should always arrive at that time, doesn’t matter if that’s the scheduled time or not as no one pays attention to scheduled times anymore. Or, if real time info tells me that a bus will arrive in 5 minutes then it should arrive in 5 minutes. In the end, it’s up to the customers to judge whether a service is on time or not, and they can only do that based on what they know.

    Steve: I agree with your position for comparatively frequent services, but for infrequent routes, especially where transfer connections are involved, predictability to a schedule is very important. It is extremely frustrating to watch in real time as the bus you thought you would catch is running early, or not paying attention to the schedule at all.


  5. Steve wrote:

    “Yes, I knew what was going on, but if it had been a stormy day, I would not have been amused at an unnecessary wait.”

    What was going is unacceptable in any weather condition. If a bus is scheduled to arrive in 5 minutes, then it should arrive in around 5 minutes, not more than 10 minutes. I still think a drop back policy can work, especially for the 501 Queen. But for other routes it can work too.


  6. The most annoying part of this is that if they just put all the data on an amazon ec2 node and made it public all these numbers would be generated by the public in short order … there is a lot of data, and storing it costs money … but analyzing it is actually fairly quick and painless, and can be done by the public in a distributed way … Steve is interested in headways stats, I’m interested in proportion of service operated, Bob is interested in subways, Joe is interested in Buses … so we all would write whatever code is necessary to figure out what we were looking for … and in the end, there would be a couple dozen people working on it.

    I still haven’t figured out why the TTC thinks that they can do this better than the crowd.

    Steve: While Adam Giambrone was TTC Chair, there was a motion directing staff to make the data public as part of the City’s Open Data Initiative. However, Adam was distracted, and there was an election, and so the idea just vanished.


  7. Steve said:

    While Adam Giambrone was TTC Chair, there was a motion directing staff to make the data public as part of the City’s Open Data Initiative. However, Adam was distracted, and there was an election, and so the idea just vanished.

    Should this not appear on the occasional lists of ‘outstanding items’ that Staff provide? If it (and similar instructions to staff) were listed there it would ensure that issues like this (which staff probably hope to bury!) will be kept in mind.

    Steve: A lot fell off of the table at the end of the previous administration.


  8. It’s kind of sad that this post only has 7 comments while discussion about the nuances of the intitial “D” can generate at least 50-60 comments.

    Considering that the actual quality of service is a fundamental part of making the TTC (and transit) experience reliable and convenient+comfortable (fundamentally important components of a successful transit system), we should make a greater effort to learn more about these technical areas.

    Cheers, Moaz


  9. Any response from TTC yet? I have this image of new CGMs, I mean CEOs, trying to affect a near-century of old habits. A question that lingers for me is whether the top suggestion from operators to improve reliability is to set run times ‘realistically’. Does Service Planning even have a response to drivers on why this is not done?

    Steve: I have not heard much back from the TTC on this, but do know that in the development of the new “Journey Time Metric” which is supposed to debut next year, the problem with inadequate running time has shown up. The next problem is the “why”, and that has more than one answer. Crowding does cause delay because it slows down stop service times, but there are also locations of genuine “congestion” which itself can have multiple causes. These effects can compound on each other so that what appears to be a small increase in one factor actually causes a much larger effect overall. Finally, of course, there is the basic question of whether scheduled times are adequate for typical service conditions, and whether operating practices actually include management of headways.


Comments are closed.