Timetables Complete GTFS

Hi @alejandro.felman and @david.phillips,
This is somewhat related to the Feedback requested for enhancements to Complete GTFS](Feedback requested for enhancements to Complete GTFS) but I am posting here since I have mainly played around with the complete GTFS posted in this page. I have tested with the sample of enhanced GTFS too and the problem described below still stands in that dataset.
I believe there is a problem with Sydney Trains Network dataset, which raises concerns around data quality. I try to describe as much as possible so that you can replicate the problem I found.

  1. Merging the calendar file with the trips file and filter the shape_id (or route_id) to T4 line, I can see that the T4 line is served by 59 distinct service_id in this GTFS bundle.

  2. If I filter these 59 distinct service_id to the regular Tuesday service (merge these service_id with the calendar file and keep only rows where tuesday = 1), I get 23 service_id as per below

    .

  3. I merge results of point 2. above with the calendar_dates file and filter to date = “2019-04-23” (tuesday) to see which service_id is actually scheduled to run on the Tuesday of 23 April 2019. Results below:


    If we look at the last column (exception_type), they all 2, meaning that all of these 23 service_id that serve T4 will be removed, according to the GTFS bundle.

  4. I then look at the canlendar_date on the same date to see if other service_id will be added to serve the T4. Result indicates that some service_id will be added on the 23 Apr 2019 but NONE of these added service_id actually serve the T4.

The GTFS data therefore tell me that no train will be scheduled to run on the T4 on the Tuesday of 23 Apr 2019. I therefore conclude that we have a serious issue with data quality.

I share the R code just in case you want to replicate this problem in R

NOTE: If I look at a different date, such as the 23 March 2019, at least I see some trips are scheduled to serve the T4 line. Also, if I look at the same date 23 Apr 2019 but a different network, such as Sydney Buses Network or TrainkLink, I still see trips running on these network. The problem described above happens to other dates (eg 25 Apr) and other train lines operated by Sydney Trains as well.

Could you please investigate and confirm?
Chinh