Sydney Trains GTFS "for realtime" has invalid data


#1

“HM46.1433.101.16-20160514.I.1.40875826” stop 50 appears to depart Circular Quay 9 minutes before it arrives:

"HM46.1433.101.16-20160514.I.1.40875826","13:06:00","12:57:00","2000351","50","","1","1",""

This is with the Sydney Trains GTFS “for realtime” feed, dated 2016-04-25.

There’s probably some other similar errors, but OneBusAway goes no further than this error. :frowning:

I threw the files at FeedValidator which seems to pick up a lot of other errors – it’s reporting 6,596 errors and 36,225 warnings. My favourite is the Perth to Broken Hill service which takes 2 minutes (58 times the speed of sound)… :slight_smile:

Given some of the errors I think that the file has been hand fed through a desktop spreadsheet program at one point, destroying some of the information there.

The Ferry and Light Rail feeds don’t have major problems like the train one does.


#2

Thanks. We’ll have someone take a look.

Regards
Yvonne


#3

Yes, you’ll see a lot of errors in the Sydney Trains bundle. In general these should only be in trips that are actually not in service. For example “HM46.1433.101.16-20160514.I.1.40875826” is a non revenue service and shouldn’t be displayed to customers. Trips that are actually in service should not be affected.

As for the Perth service, Sydney Trains can’t properly schedule services outside of their nodal geography, so these times are essentially just dummy values. Something to do with trips that run over 24 hours I believe…

The Sydney Trains bundle is published via a slightly different process to the other modes. Errors in the the validation reports for other modes will actually block publication so you’ll never see them. Not so for Sydney Trains.


#4

Thanks for looking into this.

As I understand, non-revenue trips (such as moving a train back to the yard outside of peak or at the end of day) do need to be included in the feed. It’s useful for accessibility, because then you can tell a user “the next train does not stop at this platform”, so that they don’t stand up when they hear the train coming.

However, it is a nonsensical entry. I suspect that the “arrival_time” and “departure_time” columns are probably reversed. I could run this through a script to clean it up, but it gets to the point where the underlying feed bug should be fixed.

Regarding the Perth-Broken Hill service, it is possible to represent times on trips that cross midnight. A service departing at Monday 13:00 would be listed as “13:00:00”, and then if it had a stop on Tuesday at 14:00, it would be listed as “38:00:00”.


#5

The trip that terminates at circular quay is blocked to a trip that actually overlaps from Central (shares the same stopping pattern and times). From memory there is some slightly obscure logic that dictates how these are timetabled, will try to get some more info for you. Agree that have a departure time before arrival time is generally not great.

And yes the GTF spec deals with 24 hr plus trips just fine, unfortunately the issue is back in Sydney Trains scheduling system. As a result I would recommend not displaying the trips to Perth at all.


#6

For the Indian Pacific, though, there is a publicly listed timetable, is there a way to give us the data of that? That would be better than a 2min trip - even if you can’t give us real-time. :stuck_out_tongue: