I’d like to find data that tells me, for a given service/train, the ‘capacity’ of said train. By ‘capacity’, I mean the number of seats.

I am using the Sydney GTFS static download which provides details of trips, services etc.

I have looked at the Train Occupancy - Nov 2016 to Feb 2017 datasets which looks like it goes some of the way there though I’m not sure if I’m interpreting the ‘occupancy status’ and ‘occupancy range’ data items. I have read the explanatory notes and am still uncertain.

Are they both indicators of actual occupancy? Or, is one actual occupancy and the other potential capacity?

I would like to get an estimate of the potential ‘capacity’ of a given train which is, essentially, the number of carriages and carriage type. I’m confident of inferring carriage type from the abovementioned data but not the number.

Any thoughts appreciated.



Hi K

This may be useful to you Sydney and Intercity train fleet |

We usually indicate occupancy as occupancy based on a capacity (as in potential capacity) value.

Hi Yvonne,

Thanks for your note and the link.

Apologies but I’m still uncertain as to how I should interpret the fields I mentioned from the Train Occupancy Data Set. The two fields are Occupancy Status and Occupancy Range. Are they an indicator of the same thing or different things?

Occupancy Status takes on values such as, among others, MANY_SEATS_AVAILABLE, FEW_SEATS_AVAILABLE and STANDING_ROOM_ONLY.

Occupancy Range takes on values such as, among others, Low: 0-399, Medium: 400-799 and High: 800+.

Entries in the Train Occupancy Data Set have many different combinations of the above such as, for example

  • MANY_SEATS_AVAILABLE, Medium: 400-799
  • FEW_SEATS_AVAILABLE, Medium: 400-799

How should these pairings be interpreted?

Does the combination MANY_SEATS_AVAILABLE, Low: 0-399 mean that there are expected to be many seats available on a train that could seat 0-399 passengers?

Any guidance would be appreciated.



Both indicate the same thing (i.e. the number of passengers on board), but the bucket sizes are different. The Occupancy Range field is there to cater for scenarios where the train set used to operate the service isn’t known.

Consider an A set (Waratah) train with 894 seats available (see Dataset - TfNSW Open Data Hub and Developer Portal):

“FEW_SEATS_AVAILABLE, Medium: 400-799” would indicate that:

  • Occupancy Status: between 581 to 939 passengers on board (between 65% and 105% of 894 seats)
  • Occupancy Range: 400 - 799 passengers on board

You can find the intersection of the two to derive a better estimate range - i.e. between 581 to 799 passengers on board.

Thanks … that helps.