Hey Agarg,
The method to download the entire dataset seems to be invalid. I have written a small script in python to download all the txt files starting from 2020. Sharing it in case anyone finds it useful.
import requests
import tqdm
import os
for year in tqdm.tqdm((2020,2025), desc="Year"):
for month in tqdm.tqdm(range(1,13), desc="Month"):
for day in tqdm.tqdm(range(1,32), desc="Day"):
year_identifier = f"{year}{month}" if month > 9 else f"{year}0{month}"
month_identifier = f"{month}" if month > 9 else f"0{month}"
day_identifier = f"{day}" if day > 9 else f"0{day}"
file_name = f"https://tfnsw-prod-opendata-tpa.s3-ap-southeast-2.amazonaws.com/Opal_Patronage/{year}-{month_identifier}/Opal_Patronage_{year_identifier}{day_identifier}.txt"
r = requests.get(file_name, allow_redirects=True, headers={
'Referer': 'https://opendata.transport.nsw.gov.au/'
})
# print(r.status_code)
if r.status_code != 200:
break
if not os.path.exists(f"data/{year}-{month_identifier}"):
os.makedirs(f"data/{year}-{month_identifier}")
open(f"data/{year}-{month_identifier}/Opal_Patronage_{year_identifier}{day_identifier}.txt", 'wb').write(r.content)
Extraction process for Opal Patronage data is changing
What is happening?
The ETL (Extract Transfer Load) Tool which delivers the daily Opal Patronage Data feeds to the Open Data Hub is changing. A new Python based ETL tool has been implemented and will commence extracting the feeds to the Open Data Hub from Tuesday 23 April 2024.
What does this mean for me?
There is no impact expected for Opal Patronage Data users from this change as the new ETL process produces the same data files as before. You should not expect to notice anything different with accessing or using Opal Patronage Data following the change.
How can I get further support?
If you have any questions or encounter any issues with the Opal Patronage Data, please contact OpenDataProgram@transport.nsw.gov.au