A quantitative approach to Data Protection Impact Assessment
Gergely Biczók (Associate Professor, CrySys Lab)
Many are familiar with the expression “data is the new oil.” Ever-increasing amounts of information are produced, stored, processed, and transferred, enabling products and services across all industries. A substantial amount of this information can be used to identify an individual, and is known is Personally Identifiable Information, or PII. Processing PII comes with great risks to individuals’ rights, and can bring about real harm if not handled carefully. As such, data protection laws have been introduced around the globe to safeguard people from these kinds of dangers. The cornerstone of the EU’s data protection framework is the General Data Protection Regulation, or GDPR.
Active since May 25, 2018, the GDPR introduced many new concepts compared its predecessor, for instance, the principle of accountability, the concepts of data protection by design and by default, as well as Data Protection Impact Assessments, or DPIAs.
Data Protection Impact Assessment
As mandated by Art. 35 GDPR(2018): “Where a type of processing, in particular, using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal data.”
The list of processing activities that require a DPIA includes, but isn’t limited to, systematic and extensive profiling with significant effect, large-scale processing of sensitive data, and large-scale systematic monitoring of a publicly accessible data area. However, the DPIA template is far from set in stone and has only a minimum set of contents: a systematic description of the processing operation, an assessment of the necessity and proportionality of the processing operations in relation to its purpose, an assessment of the risks to the rights and freedoms of data subjects, and the measures envisaged to address the risks. Even if the risk assessment requirements entailed some sort of scoring connected to the level of a given risk, the current DPIA is still a largely qualitative assessment and can lead to potentially arbitrary outcomes.
Let’s make it quantitative
DPIAs hold the potential for being the most useful new idea to come from the GDPR. A properly executed DPIA can be a major asset for every stakeholder: data subjects have enhanced protection, data controllers can develop and demonstrate compliance, and data protection authorities can more easily get the information they need. We argue that in certain cases, adding a quantitative element to a DPIA can mitigate its subjectivity and encourage informed decision-making. If we could be equipped with numbers regarding the estimated impact of risks associated with a particular type of data processing, the following benefits could results:
- Data controllers could evaluate risk profiles of alternative solutions during system design/implementation. Furthermore, since an updated review of DPIA is required whenever the risk profile of a given processing type changes, controllers could more objectively assess the change in risk.
- When comparing DPIAs performed by different controllers, on similar processing instances, competent authorities could contrast their levels of accuracy against one another. Likewise, a controller could contest a negative decision taken by authorities, by referring to a similar DPIA from a different controller.
- Hard acceptance criteria could be set with regard to IT/process development and privacy by design and by default.
- Product owners and system designers could have a tool to help them convince upper management to invest in the development of privacy-enhancing technologies (Return on Privacy Investment, RoPI).
It is also important to note the limitations of using a quantitative method:
- We explicitly advocate for an integrated qualitative-plus-quantitative DPIA
- In practice, only processing activities that yield particular risk profiles owing to their scale or scope should be characterized via a quantitative assessment. This helps SMEs keep the cost of compliance more affordable.
While the approach itself is technology-neutral, its implementation must inherently and necessarily be completely sector and technology-specific, given the vast array of different potential processing instances.
Based on these benefits and limitations, one could argue that software application development is the low-hanging fruit for a quantitative Data Protection Impact Assessment. Apps are everywhere, app families use the same platforms and access similar personal data categories through the same permissions, and only a few platforms exist (think mobile, social, cloud, and fintech categories).
Thought experiment: Cambridge Analytica
As several major news outlets reported in 2015, and again in 2018, approximately 50 million Facebook profiles were harvested by Aleksandr Kogan’s “thisisyourdigitallife” app through his company, Global Science Research, in collaboration with Cambridge Analytica. This data was used to create a detailed psychological profile of every affected person, which in turn enabled CA to target them with personalized political ads, potentially impacting the outcome of the 2016 US presidential elections.
From a technical standpoint, Facebook provides an API and a set of permissions that allows third-party apps to gain direct access to user information and transfer it to app providers (one “user” permission per attribute in the deprecated API v1.0). Up until 2014, a user’s profile information could also be acquired if one of their friends installed an app embodying interdependent privacy on Facebook. In short, when a friend installed an app, the app could request and be granted access to the profile information of a user such as the birthday, current location, and history (1 “friend” permission per attribute in the deprecated API v1.0). The user herself would not be aware that a friend had installed an app collecting her information, so this was considered collateral damage. By default, unless a user manually unchecked the appropriate boxes in a hard-to-find menu, almost all of their profile information could be accessed by their friends’ applications. Many apps made use of this mechanism; let us now consider the imaginary average App X.
Suppose App X v1.0 directly asks a user installing it for five personal attributes (corresponding to five distinct “user” permissions). Now suppose that the company developing App X hired a new programmer, who was a real wiz with the Facebook API. He went back to the drawing board and added the requests for 3 “friend” permissions. This way App X v2.0 collected 5 personal data items directly and 3 personal attributes indirectly from the friends of the user installing (3 distinct “friend” permissions). Note, that an average user had around 200 friends. A quick calculation yields the number of personal data items gained per app install: 15 + 2003 = 605. Taking this further, the developer company made an adoption forecast for their App X that showed a promising 100,000 active users in a couple of months. So, in a short amount of time App X v2.0 was estimated to collect 60,500,000 personal data items as opposed to v1.0’s 500,000 (the sheer amount of personal data items collected grew 121-fold).
Consider that risk equals likelihood times impact. The likelihood that an average Facebook user’s data is taken by App X grew 201-fold. Moreover, the impact per user slightly dipped from 5 collected data items to 3.01. In total, the risk grew 201/3.01 = 66.8-fold, a severe increase which would, if the GDPR was applicable in 2014, warrant further inspection from the company. (Note, that the calculation is simplified omitting overlapping friend circles and personal data items of clearly different sensitivity, among others).
The example above shows a very specific implementation of a generic quantitative Data Protection Impact Assessment approach. This case study also hints that data protection professionals should work shoulder-to-shoulder with IT platform experts when carrying out a DPIA!
We would like to thank Iraklis Symeonidis and Lorenzo Dalla Corte for their contributions to this article.
Best practices to prevent IDOR vulnerabilities
Learn about Insecure Direct Object Reference and the steps you can take as a developer to make sure your applications are safeguarded against cyberattacks.
Interview with Davide Balzarotti
Having the right security mindset is important, so we asked an expert about learning security and building security awareness!
Vulnerabilities in authentication and authorization
Authentication and authorization both can be associated with common security vulnerabilities. Here are some ways to prevent them!