Determine Sydney Train Delay Information

I am trying to determine if a currently running Sydney Train is running late / delayed from the trip update real-time feed. How would I do this? I was hoping that the stop sequence (i.e. the “stop_time_update” field in the GTFS feed) could give me the delay information I need, but most of the time the arrival and departure information for all stops have 0 for the timestamp and delay information (see below); but sometimes the “delay” and “time” fields are not 0.

“stop_sequence”: 0,
“stop_id”: “2000342”,
“arrival”: {
“delay”: 0,
“time”: 0,
“uncertainty”: 0
},
“departure”: {
“delay”: 0,
“time”: 0,
“uncertainty”: 0
},

I’m in the process of writing some detailed docs so I don’t forget all the little quirks in the system. Here’s an edited extract, with some examples.


For scheduled services

Services running as scheduled (that is, there are no transpositions causing changes to the stopping pattern) have a trip schedule_relationship of SCHEDULED:

entity {
  id: "92AE.1487.104.124.K.4.44060407"
  trip_update {
    trip {
      trip_id: "92AE.1487.104.124.K.4.44060407"
      schedule_relationship: SCHEDULED
      route_id: "CUL_1b"
    }[...]

For these trips, only a delay field is provided. You should obtain the scheduled arrival time and departure time from stop_times.txt (in the GTFS-static for realtime dataset) and add the delay seconds. Negative value denotes the vehicle is expected to arrive or depart ahead of schedule:

entity {
  id: "92AE.1487.104.124.K.4.44060407"
  trip_update {
    trip {
      trip_id: "92AE.1487.104.124.K.4.44060407"
      schedule_relationship: SCHEDULED
      route_id: "CUL_1b"
    }
    stop_time_update {
      arrival {
        delay: 45
      }
      departure {
        delay: 45
      }
      stop_id: "2560702"
      schedule_relationship: SCHEDULED
    }[...]
    timestamp: 1484886561
  }
}

Match each stop_time_update by the stop_id provided in the realtime and static feeds.


For replacement services

A replacement service is where the trip is an additional service or has had its scheduled stopping pattern altered. These trips have a trip schedule_relationship of REPLACEMENT.

REPLACEMENT for scheduled services are intended to slot-in and replace all or a section of an existing service. They include a timestamp for arrival and departure. delay is also provided for stops which were already part of the schedule (in stop_times.txt) and have not been skipped, added or replaced.

entity {
  id: "W557.1487.104.64.V.4.44060591"
  trip_update {
    trip {
      trip_id: "W557.1487.104.64.V.4.44060591"
      schedule_relationship: REPLACEMENT
      route_id: "BML_1"
    }
    stop_time_update {
      arrival {
        time: 1484886212 // <-- no delay provided, this stop is either added, or replaces another stop
      }
      departure {
        time: 1484886305
      }
      stop_id: "2148536"
    }
    stop_time_update {
      arrival {
        delay: 163 // <-- delay provided, this stop is listed in the existing schedule
        time: 1484887076
      }
      departure {
        delay: 133
        time: 1484887106
      }
      stop_id: "2750513"
    }

Replacement and additional stops are not denoted in any way in the stop_time_update entity. You must back check with the static schedule to determine which stops have been added or replaced.

(Replaced stops are platform changes. They can be linked using the stop’s parent_station field, except in some City Circle services which stop at Central twice.)

Stops which are skipped without replacement (e.g. a Bondi Junction to Waterfall service commencing at Central) will have its first stops (Bondi Junction to Town Hall) appear as SKIPPED (under stop_time_update > schedule_relationship).

Upon completion of the section of stops with added stops, replacement stops or skipped stops, the trip_update may resume reporting a SCHEDULED relationship.

4 Likes

Thanks very much! I will go through this a little bit later. I do find it hard to get this kind of specific information. This is fantastic information.

I’d imagine the same logic applies to light rail and NSW trains as well?

LR and NSW Trains are far less fiddly. They don’t have the old REPLACEMENT schedule_relationship, which makes them easier to process. Both also have a stop_sequence as well as stop_id for matching with static data.

For some services, NSW Trains only provides estimates for a small number of stops ahead of its current position (because regional services tend to make up time).

That would be ARTC information out of 4-TRAK actually.

I have noticed this on Tripview. Sometimes the actual departure time of a train running late shows up as an earlier time than the previously-estimated arrival time. I always assume that the Departure is correct and the true arrival must be earlier than or equal to the Departure.

It is unclear to me how much 4-TRAK information ARTC gives to TfNSW - does it include ALL freight and Passenger information Brisbane to Kalgorlie or just Trainlink services?

Geoff Lambert

There seems to have been a change in this behaviour for Trainlink Regional Services realtime GTFS? The forward projection of delay has more or less vanished or only applies to one or two stops. The delta now seems to be the delta for projected arrival times rather than projected departure times. This change seemed to happen on Friday. I am inferring this from the information on TripView.

It looks like the recent feed change is to do with that the final reported delay is. In most cases, it now seems to report 0 or close to 0 delay as the last reported stop.

It’s always been the case that delay projections apply 1-2 stops ahead… but TripView and NextThere had propagated the last provided delay value to all following stops (this is in-line with the GTFS realtime specification).

I can see both arrival and departure times reported in the current and pre-Friday feeds.

NSW Trains does not use ARTC data for these predictions they have their own internal 4Trak for this.

3 Likes

@jxeeno about the above point:

Delay data takes into account operational dwell time (customer timetables do not have a dwell) for future stops and remove this dwell time from the estimate which will have the effect of reducing the delay at the subsequent stations. This was done at the request of NSW Trains operations to be more representative to how the trains actually run.

Hope that makes sense.

Thanks,
Alex

Certainly makes sense. The new predictions are more reflective of what actually occurs :slight_smile:

1 Like

based on the above information.

How to calculate the actual delay using the time field.
We already have a scheduled arrival time in open data. Do we need just replace the scheduled arrival time with the ‘time’ field. Could you provide more details on this time field.