Timetables Complete GTFS


#1

Static timetables, stop locations, and route shape information in General Transit Feed Specification (GTFS) format for all operators, including regional, trackwork and transport routes not available in realtime feeds.


This is a companion discussion topic for the original entry at https://opendata.transport.nsw.gov.au/dataset/timetables-complete-gtfs

#2

Hi,

I’ve downloaded this data and started working with the train data specifically. However, I’m having trouble with the trip data. Two issues that I’d appreciate some advice on are:

  1. What does trip_id mean? This is a composite field which seems to have route_id embedded within it. Also, it does not conform with the documentation that is referenced on the download page.

Some sample trip id’s are 1273.TA.2-SCO-sj2-14.91.R, 1.TA.2-SCO-sj2-14.1.H. In each case, 2-SCO-sj2-14 is the associated route_id.

One document defined trip_id as …

The trip_id is the unique identifier for a particular trip. It is
composed of two fields. The first is the run number of the
trip. The second value is a unix timestamp which indicates
the planed start of the trip.

Another defines it as

The trip_id used to uniquely identify trips has a semantic content that could be used to
provide additional information about the timetabled train. The format is as follows:
<trip_name>.<timetable_id>.<timetable_version_id>.<dop_ref>.<set_type>.<number_of_car
s>.<trip_instance>

Neither of these seems to align with the trip_id values present in the data.

  1. Duplicated trips. This may be a function of of my issues with (1) above?

By way of example (there are others), the trips 1005.TA.2-SCO-sj2-14.67.R, 1006.TA.2-SCO-sj2-14.67.R, 1008.TA.2-SCO-sj2-14.67.R all run from Port Kembla to Thirroul on Wednesday departing at 04:49:01 and arriving at 05:22:00. These trips each have a different service id linking through to the calendar data. That said, the stops and timings are the same. This implies that there are multiple trains running at the same time between the same stations.


#3

Hi @k4werri, please have a read of our trains technical document - https://opendata.transport.nsw.gov.au/sites/default/files/Real-Time_Train_Technical_Document_v2.5.pdf

The trip_id is defined as:
<trip_name>.<timetable_id>.<timetable_version_id>.<dop_ref>.<set_type>.<number_of_car
s>.<trip_instance>

I believe the example you have given is a bit more complex because those are two vehicles that join together to provide the service. You need to use block_id to combine them into the one trip. Hopefully one of our developers can explain that a bit better than I can.

Thanks,
Alex


#4

Hi Alex,

Thanks for your note. I remain uncertain. I had read the document you refer to before my initial post.

Regarding my 2nd question above, the sample three trips I quoted in the initial post all have block id equal to blank so I cannot see how using block id will resolve my observation of apparent duplicates.

I have found other cases where block id does sensibly connect different shorter trips into a combined longer trip. That is not the particular problem that motivated my initial post.

As to my first question above, again it was in the document you referenced that I found one of the two different formats for trip id. I cannot see how the sample trip id’s I quoted in my initial post (and are extracted directly from the data) align with the format for trip id you posted (and is presented in the document you reference). The format you quote has 7 components separated by periods. The trip id’s I provided have only 5 components separated by periods.

Regards.

Kevin.


#5

Hey Kevin,

The trip_id format varies depending on which GTFS bundle you’re using. In the Timetable Complete GTFS bundle, the trip_id is a concatenation of a route identifier and a bunch of internal identifiers that don’t mean much to consumers like us. You can check out the TransXChange release notes and get hints as to what the identifiers refer to, but in brief:

1273.TA.2-SCO-sj2-14.91.R

  • Trip/Vehicle Journey Code: 1273
  • Internal operator identifier: 2
  • Route Identifier: SCO
  • Internal identifier: sj2
  • Direction: R = inbound (or H = outbound)

If you want to decode data such as set type or number of cars that you’ve referred to, you’ll have to use the Sydney Trains GTFS bundle which can be found here: https://opendata.transport.nsw.gov.au/dataset/public-transport-timetables-realtime

Re duplicated trips, you might want to check calendar_dates.txt to see what days each trip is excluded from running.


#6

@alejandro.felman,
It appears that the Private Coach Services is neither comprehensive nor completed.
Many coach services are not included such as Firefly, Greyhouse, Australia Wide, etc
For agents that are included in the GTFS complete file, some operating routes are missing. An obvious example is the Sydney and Canberra route, which Murrays Coaches (agency_id = “B079”) is currently operating, as is Greyhouse (not included).
Any reason for these routes/agencies being excluded?
Chinh


#7

@chinhho related discussion here: Coach data


#8

Hi,

Thanks for your note. I’ve also had a look at the releases notes you mention. I think I’ll just treat the trip id as a composite unique id for the moment.

As for duplicated trips, I’d already looked at the calendar and calendar_dates files before posting. No joy. Even with the extra information provided in those files, there are distinct trip_id’s which result in trains running on the same day, at the same time stopping at the same stations. Does not make sense.

Regards.

Kevin.


#9

Hi @chinhho, I’m finding out exactly why some are included and some aren’t, will answer this in the thread linked by @jxeeno.

In saying that though, Murrays Coaches is in the data feeds, you can filter it by agency id to find the routes. See below.



#10

Thanks @alejandro.felman,
I can see that Murrays has Canberra - Wollongong and Canberra - Narooma.
But the route Canberra - Sydney is missing. This is just one example of missing routes for an agency that is included in the complete GTFS.
Chinh


Coach data
#11

Hi @alejandro.felman and @david.phillips,
This is somewhat related to the Feedback requested for enhancements to Complete GTFS](Feedback requested for enhancements to Complete GTFS) but I am posting here since I have mainly played around with the complete GTFS posted in this page. I have tested with the sample of enhanced GTFS too and the problem described below still stands in that dataset.
I believe there is a problem with Sydney Trains Network dataset, which raises concerns around data quality. I try to describe as much as possible so that you can replicate the problem I found.

  1. Merging the calendar file with the trips file and filter the shape_id (or route_id) to T4 line, I can see that the T4 line is served by 59 distinct service_id in this GTFS bundle.

  2. If I filter these 59 distinct service_id to the regular Tuesday service (merge these service_id with the calendar file and keep only rows where tuesday = 1), I get 23 service_id as per below

    .

  3. I merge results of point 2. above with the calendar_dates file and filter to date = “2019-04-23” (tuesday) to see which service_id is actually scheduled to run on the Tuesday of 23 April 2019. Results below:


    If we look at the last column (exception_type), they all 2, meaning that all of these 23 service_id that serve T4 will be removed, according to the GTFS bundle.

  4. I then look at the canlendar_date on the same date to see if other service_id will be added to serve the T4. Result indicates that some service_id will be added on the 23 Apr 2019 but NONE of these added service_id actually serve the T4.

The GTFS data therefore tell me that no train will be scheduled to run on the T4 on the Tuesday of 23 Apr 2019. I therefore conclude that we have a serious issue with data quality.

I share the R code just in case you want to replicate this problem in R

NOTE: If I look at a different date, such as the 23 March 2019, at least I see some trips are scheduled to serve the T4 line. Also, if I look at the same date 23 Apr 2019 but a different network, such as Sydney Buses Network or TrainkLink, I still see trips running on these network. The problem described above happens to other dates (eg 25 Apr) and other train lines operated by Sydney Trains as well.

Could you please investigate and confirm?
Chinh


#12

Hi @chinhho, if you have a look at just the Sydney Trains data you will see that it only has data up until the 18th of April. I know our documentation says that the data export is based on a 90 day period but that is not always the case:

5.2 Start & End Dates
As per Data Scope validity period, the export is based on a 90 day period. Many Start & End dates will reflect this period by being valid for the entire period. However there will be calendars with shorter validity periods that start in the future or end earlier. In general these will relate to change in schedules (e.g. a timetable amendment).

I believe best practice is to download the bundle daily and then work with the data one or two weeks in advance, another developer might be able to confirm or let you know how they work with the data.

Thanks,
Alex