Are the privacy restrictions necessary?


#1

In the Ferry Opal data at least, no counts are captured if there are fewer than 18 taps on or 18 taps off in a 15 minute period. This makes the data almost unusable except for Circular Quay or Manly. I can’t see a justification for excluding counts where the number is less than 18 as no personal information is captured (income, religious affiliation etc). Privacy restrictions are relevant to ABS data, for example, because personal information may be collected, but this is not the case with Opal taps.


#2

Thanks @RobinSandell for raising this. It’s an area we tried to balance privacy vs utility.

We had the data treated for privacy reasons. It’s probably better in the words of the data scientists.

Datasets released via a differentiallyprivate algorithm ensure that all known classes of privacy attacks (such as, re-identication) have little chance of success. In particular and very importantly the mathematical properties of the algorithm and resulting dataset provide the same protection for new privacy attacks that may be developed in the future, ensuring they too have little chance of success.

As you can appreciate we treat our customer’s privacy very seriously and have ensured we have done due diligence to assure privacy whilst providing some utility from the data.


#3

Thanks for the quick response Yvonne. I’m not convinced the balance has been struck appropriately between privacy and utility. Some low demand wharves record no counts in one week as there is not a 15 minute period when more than 17 passengers tap (Birchgrove, Darling Point and Garden Island). Others only have a very small number of 15 minute periods where taps are counted (eg Greenwich (2), Kirribilli (8), Kissing Point (11). I really can’t envisage how there could be a privacy breach when this particular CSV file has no sensitive, personal information - not even the Opal card type.


#4

Hey @RobinSandell – I will need to leave it to those who know about data science and privacy to explain :slight_smile:
Have a good weekend.


#5

Hi Robin, while the “time with location” csv doesn’t show taps for certain stations, location without time does show taps, albeit aggregated for 24 hours, not 15 minutes. does this help you?