Invalid GTFS Data

Hate to be a pain, but your GTFS files still publish logically inconsistent data.

You are using different agency_ids for the same agency_name.

It’s not possible to make software work when identifiers don’t correctly describe what they represent in the real world.

How can the same agency have more than one unique agency_id?

Well, very simply, you just change the agency_name a bit so it represents just a part of the complete agency.

For example,
“2439”,“State Transit Sydney”,“”,“Australia/Sydney”,“EN”,“131500”
“2440”,“State Transit Sydney”,“”,“Australia/Sydney”,“EN”,“131500”
“2441”,“State Transit Sydney”,“”,“Australia/Sydney”,“EN”,“131500”

Just adding a parenthetical would fix the problem. The data would become logically consistent and software designed for GTFS could be guaranteed to make sense.

Please come to your senses and start making sense.


Hi @Webmaster, the different agency IDs correspond to different bus service regions. You need to look at the Reference Tables dataset to see how each agency is defined in the GTFS feeds. Eg. 2439 = region 7, 2440 = region 8, etc.

This is outlined in the notes on the API explorer and also in our documentation page.



In other words, coerce.


Then if I happen to notice reversion to good behavior I can seek out and undo coercive behavior and keep things working. Maybe.

Perhaps reorienting has merit. Lets not loose sight of the forest through the trees.


Just some food for thought for the TODO list.

Thank you for the reply.

Note: “2439 = region 7, 2440 = region 8, etc.” or similar mappings does not appear in the Reference Tables dataset.

Is the suffix of SMBSC and OSMBSC the region so that the following is correct?

‘2439’ => ‘State Transit Sydney (Region 7)’,
‘2440’ => ‘State Transit Sydney (Region 8)’,
‘2441’ => ‘State Transit Sydney (Region 9)’,

Does this hold for the other duplicate agency_names?

Hi @Webmaster, the ‘Overlapping agencies in GTFS feeds’ dataset shows this.

SMBSC = “Sydney Metropolitan Bus Service Contracts”
OSMBSC = “Outer Sydney Metropolitan Bus Service Contracts”

Like you said, each number after those corresponds to a region.

Same thing for all the duplicate agency names, there might be more than one region per agency.


Are the region numbers familiar to the public or are they internal?

For example, would the public know what “Region 7” means in “State Transit Sydney (Region 7)”?

Hi @Webmaster, yes this is all publicly available information.

You can even find this in Wikipedia:


Hello again,

Can you say what the differences among the duplicates in the non-realtime GTFS bundle represent in the real world?

The region pattern doesn’t seem to be holding.

For example:

Premier Motor Service, agency_id B025
Premier Motor Service, agency_id B083

Forest Coach Lines (No Real-Time), agency_id B052
Forest Coach Lines (No Real-Time), agency_id B057
Forest Coach Lines (No Real-Time) , agency_id B005

train replacement bus operators, agency_id 700
train replacement bus operators, agency_id 701

Sapphire Coast Buslines, agency_id B034
Sapphire Coast Buslines, agency_id B033

Are there distinctions familiar to the public present in the agency_ids?


I would request that the agency_id and agency_name fields stay as is.

The agency name is displayed to end users, and they don’t care what region number a route is in. If you want to merge multiple regions using the same operator into one agency, that’s ok with me as a developer, but that sounds like a waste of my tax dollars…

From the spec:

Field Name Required Details
agency_id Optional The agency_id field is an ID that uniquely identifies a transit agency. A transit feed may represent data from more than one agency. The agency_id is dataset unique. This field is optional for transit feeds that only contain data for a single agency.
agency_name Required The agency_name field contains the full name of the transit agency. Google Maps will display this name.