Timetables Complete GTFS

@alejandro.felman,
It appears that the Private Coach Services is neither comprehensive nor completed.
Many coach services are not included such as Firefly, Greyhouse, Australia Wide, etc
For agents that are included in the GTFS complete file, some operating routes are missing. An obvious example is the Sydney and Canberra route, which Murrays Coaches (agency_id = “B079”) is currently operating, as is Greyhouse (not included).
Any reason for these routes/agencies being excluded?
Chinh

@chinhho related discussion here: Coach data

Hi,

Thanks for your note. I’ve also had a look at the releases notes you mention. I think I’ll just treat the trip id as a composite unique id for the moment.

As for duplicated trips, I’d already looked at the calendar and calendar_dates files before posting. No joy. Even with the extra information provided in those files, there are distinct trip_id’s which result in trains running on the same day, at the same time stopping at the same stations. Does not make sense.

Regards.

Kevin.

Hi @chinhho, I’m finding out exactly why some are included and some aren’t, will answer this in the thread linked by @jxeeno.

In saying that though, Murrays Coaches is in the data feeds, you can filter it by agency id to find the routes. See below.


Thanks @alejandro.felman,
I can see that Murrays has Canberra - Wollongong and Canberra - Narooma.
But the route Canberra - Sydney is missing. This is just one example of missing routes for an agency that is included in the complete GTFS.
Chinh

Hi @alejandro.felman and @david.phillips,
This is somewhat related to the Feedback requested for enhancements to Complete GTFS](Feedback requested for enhancements to Complete GTFS) but I am posting here since I have mainly played around with the complete GTFS posted in this page. I have tested with the sample of enhanced GTFS too and the problem described below still stands in that dataset.
I believe there is a problem with Sydney Trains Network dataset, which raises concerns around data quality. I try to describe as much as possible so that you can replicate the problem I found.

  1. Merging the calendar file with the trips file and filter the shape_id (or route_id) to T4 line, I can see that the T4 line is served by 59 distinct service_id in this GTFS bundle.

  2. If I filter these 59 distinct service_id to the regular Tuesday service (merge these service_id with the calendar file and keep only rows where tuesday = 1), I get 23 service_id as per below

    .

  3. I merge results of point 2. above with the calendar_dates file and filter to date = “2019-04-23” (tuesday) to see which service_id is actually scheduled to run on the Tuesday of 23 April 2019. Results below:


    If we look at the last column (exception_type), they all 2, meaning that all of these 23 service_id that serve T4 will be removed, according to the GTFS bundle.

  4. I then look at the canlendar_date on the same date to see if other service_id will be added to serve the T4. Result indicates that some service_id will be added on the 23 Apr 2019 but NONE of these added service_id actually serve the T4.

The GTFS data therefore tell me that no train will be scheduled to run on the T4 on the Tuesday of 23 Apr 2019. I therefore conclude that we have a serious issue with data quality.

I share the R code just in case you want to replicate this problem in R

NOTE: If I look at a different date, such as the 23 March 2019, at least I see some trips are scheduled to serve the T4 line. Also, if I look at the same date 23 Apr 2019 but a different network, such as Sydney Buses Network or TrainkLink, I still see trips running on these network. The problem described above happens to other dates (eg 25 Apr) and other train lines operated by Sydney Trains as well.

Could you please investigate and confirm?
Chinh

Hi @chinhho, if you have a look at just the Sydney Trains data you will see that it only has data up until the 18th of April. I know our documentation says that the data export is based on a 90 day period but that is not always the case:

5.2 Start & End Dates
As per Data Scope validity period, the export is based on a 90 day period. Many Start & End dates will reflect this period by being valid for the entire period. However there will be calendars with shorter validity periods that start in the future or end earlier. In general these will relate to change in schedules (e.g. a timetable amendment).

I believe best practice is to download the bundle daily and then work with the data one or two weeks in advance, another developer might be able to confirm or let you know how they work with the data.

Thanks,
Alex

The given URI “https://api.transport.nsw.gov.au/v1/publictransport/timetables/complete” returns a status code of 500. Is this expected?

Hi Roger, that endpoint looks OK at the moment. When were you getting the 500?

Hi, I’m noticing the same issue in the GTFS complete bundle as @k4werri in that I can see multiple trips that appear to be duplicates (i.e. trains running on the same day at the same time stopping at the same stations). Appears to only occur on Sydney Train data and Block ID does not seem to explain the issue.
Below screenshot of a recently downloaded bundle. Yellow is Saturday and orange are Sunday services. Second image shows that there are 6 services operating on two days at exactly the same time.

If I had to take a guess at the issue I’d say that the bundle is including trips that include a range of timetable verions as opposed to the latest timetable. Do you think that this would be the issue?

Is there any way to know what is the latest timetable id / timetable version id for a gtfs bundle?

Hey @austenp,

We’re using the GTFS complete bundle and don’t have issues with duplicate trips you’re describing.

Having a quick look at the screenshot you’ve provided, it looks like you have a mix of old and new trips in your database. Are you clearing out old data before importing new data? For example, I wouldn’t expect to see both 125C.1317.129.128.A.8.58697160 and 125C.1197.103.128.A.8.57758629 in the same GTFS bundle because they represent the same trip but in different versions of the bundle published over time.

For the GTFS complete bundle, you also cannot assume IDs are consistent over time. For example, service ID TA+r1341+2 from a bundle published today could mean something completely different to TA+r1341+2 from a bundle published tomorrow. For example, the current bundle has that service ID as a Thursday service rather than a Saturday service as per above.

"TA+r1341+2","0","0","0","1","0","0","0","20191105","20200203"
2 Likes

Hi @jxeeno thanks for jumping in. I agree it does look like a mix of old and new trips in the database. The screenshot of data is from a bundle downloaded a few days ago and imported into a blank sqlite database so if there were old trips they were in the bundle already. I’ll try again with a fresh bundle today but I suspect the same issue will crop up.
From your experience with the data, is there any way to identify old/new trips if they were in the same bundle?

Good to know that service IDs are recycled too; I’ll keep that one in mind.

Hmm, it shouldn’t be possible for the bundle to have both old and new trips. If you’re certain this is a fresh import into a blank sqlite database, maybe those are for separate services after all.

The next thing to look at would be how are you handling calendar_dates exceptions. Are you correctly determining which service IDs operate on which days based on entries in both calendar.txt and calendar_dates.txt?

3 Likes

Hi @jxeeno, I think its user error after all. :blush: Tried a new bundle and thought the issue disappeared. Then dug deeper into the calendar exclusions of the fresh bundle and old and I’ve realised that while there might be multiple seemingly overlapping services the way the calendar exclusion dates are recorded knock out the overlap (e.g. service A and B might be M-F with the same date range but the calendar dates might exclude service A half that time and service B the other half - rendering them effectively unique).
Thanks for your help!

1 Like

Ah, great to hear! :slight_smile: Yes, the use of calendar date in the bundles can be a bit funky but it’s compliant with the GTFS spec.

Hi there,

I see that the feed is updated on a periodic basis. Is there a way I can access historical GTFS data for the last 3-4 years (trains static timetable data)? I can only see the latest dataset being available.

have you seen Historical GTFS and GTFS Realtime | TfNSW Open Data Hub and Developer Portal ?

1 Like

Thanks for that. I was looking for static timetable data for trains though, I should’ve specified it in my comment earlier.

sydney trains or nsw trains? there’s some on Historical GTFS Bundles and Timetables | TfNSW Open Data Hub and Developer Portal and User account | TfNSW Open Data Hub and Developer Portal

In routes.txt - route_type the documentation lists valid responses of 0-12, however, in the download link above I have values of 2 , 4, 106, 204, 401, 700, 712, 714 & 900.

2 & 4 match with the documentation, filtering gives Rail & Ferries.

From a little joining of files, there seem to be:
106 NSW TrainLink Train
204 NSW TrainLink Coach
401 Sydney Metro
700 Busses
712 More Busses (possible school busses by the looks of things?)
714 Busses again (potentially rail replacement services?)
900 Light Rail (Sydney & Newcastle)

Is there anywhere I can confirm my assumptions and get confirmation on what these codes mean?