Conflicting information in two TfNSW GTFS feeds


#1

Hello,

Thank you for providing transport data and hosting the forum!

We are attempting to use two TfNSW GTFS datasets at the same time, as part of one logical merged dataset. These are Sydney Trains and NSW Trains.

The two datasets have data that seems inconsistent in at least one case.

As of 2017-02-22, data for Sydney Trains (https://api.transport.nsw.gov.au/v1/gtfs/schedule/sydneytrains) contains the following two stops:
stop_id: 2577146, stop_ll: -34.589305,150.597767, stop_name: “Robertson Bus”, location_type: 1, parent_station: null, wheelchair_boarding: 0
stop_id: 2577143, stop_ll: -34.589443,150.597539, stop_name: “Robertson Bus 1”, location_type: 0, parent_station: 2577146, wheelchair_boarding: 0

As of 2017-02-22, data for NSW Trains (https://api.transport.nsw.gov.au/v1/gtfs/schedule/nswtrains) contains the following stop:
stop_id: 2577146, stop_ll: -34.58930722,150.5978025, stop_name: “Illawarra Hwy Before Main St”, location_type: 0, parent_station: null, wheelchair_boarding: 1
and does not contain stop_id 2577143

Stop 2577143 is only served by trips for Sydney Trains routes. Stop 2577146 is only served by trips for NSW Trains routes.

The information for stop 2577146 is different. The different location_type is a particular problem in our use, and different wheelchair_boarding could be an issue for accessibility-oriented apps.

As noted in GTFS spec and in https://opendata.transport.nsw.gov.au/sites/default/files/TfNSW_GTFS_release_notes.pdf section 9.3, location_type should only be 1 for stops that are Parent Stations. Per the GTFS spec, Parent Stations cannot have service stopping at them (it must stop at a child stop). Taken separately, both datasets fulfill these requirements, but attempting to use them both at the same time results in ambiguity.

As far as I can tell, the document https://opendata.transport.nsw.gov.au/sites/default/files/TfNSW_Realtime_Bus_Technical_Doc.pdf in section 4.8, field stop_id, says that stop_ids should be unique across TfNSW datasets. I would then expect a given stop_id to correspond to, at least, the same location_type (stop or station) in all TfNSW GTFS feeds.

Questions:

  1. Are stop_ids indeed intended to be unique across all GTFS and GTFSRT datasets provided by TfNSW at api.transport.nsw.gov.au?
  2. If the stop_ids are to be unique, are stops with the same stop_id are intended to have logically consistent properties in different GTFS datasets?
  3. If the answer to 1 and 2 is true, can the datasets for Sydney Trains and/or NSW Trains be corrected? (Changing the ID of the parent_station in Sydney Trains dataset would seem to be the easiest way, but I don’t know what the impact might be internally.)
  4. If datasets will be corrected, can you provide a rough estimate when this might happen?
  5. If the stop_ids are NOT intended to be unique, do you have a recommended way to handle different information for stops in different datasets, other than dealing with different datasets completely separately?

Thanks,
–Jarek


#2

Hi Jarek, some good points here and it gets a little tricky…

In answer to your questions;

  1. Yes, but there may be some conflicting attributes as you have noted
  2. Yes, as above
  3. Yes, but as you have hinted, there is an internal impact on Sydney Trains for this change and isn’t as easy as it may seem on the surface
  4. I wouldn’t expect any changes to happen soon, I will see what can be done to address inconsistencies
  5. Whilst they are intended to be unique, there are a few options. You could handle the datasets differently as you mention, or merge and give preference to the information in the NSW Trains bundle. If merging make sure to reference the overlapping trips in both bundles identified here: https://opendata.transport.nsw.gov.au/reference-tables

Some general info:
Sydney Trains has a nodal geography set up to handle train services specifically, and there are a few makeshift workarounds to handle bus services. A parent station needs to be applied to each “platform”. In the example for Robertson provided, 2577146 is being applied as the parent station, even though it is also an actual bus stop.

Also, by default all coach stops in the NSW Trains GTFS bundle are flagged as wheelchair accessible. This is due to downstream data dependencies, and in general WCA should be taken from the trip level.

Hope that was of some use!


#3

Hello,

Thank you for the great response, and apologies for my very slow follow-up.

Your post answers my questions comprehensively. Just one follow-up, if possible: Would you recommend always giving preference to data coming from NSW Trains where it conflicts with Sydney Trains data, or are there some cases/places where Sydney Trains data is better, e.g. within “city Sydney” or “metropolitan Sydney”?

Thanks again,
–Jarek