Many "privacy" questionaires focus on confidentiality, integrity, and availability. This document aims to provide a series of questions fellow W3C members can use to do an initial privacy review for their standard.
- Does this specification have a "Privacy Considerations" section?
- Interesting features added to the web platform generally have privacy impacts. Documenting the various concerns and potential abuses in a separate and distinct "Privacy Considerations" sections of a document is a good way to help implementer and web developers understand the risks that a feature presents, and to ensure that adequate mitigations are in place. If it seems like a feature does not have privacy impacts, then say so inline in the spec section for that feature: “There are no known privacy impacts of this feature.”
- Saying so explicitly in the specification serves several purposes:
Does this specification collect personally derived data?
- Shows that a spec author/editor has explicitly considered privacy when designing a feature.
- Provides some sense of confidence that there are no such impacts.
- Challenges security and minded individuals to think of and find such instances (as well as the mere potential for such impacts.)
- Demonstrates the spec author/editor’s receptivity to feedback about such impacts.
Does this specification generate personally derived data, and if so how will that data be handled?
- Explanation: If the person involved in the transaction was not previously identifiable, Personally Derived Data includes a large swath of data which could be used on its own, or in combination with other information, to identify them. The exact definition of what’s considered “personal information” varies, but could certainly include things like a home address, an email address, birthdates, usernames, fingerprints, video recordings, audio recordings, geographic location or any other information derived from a person. If the person is already identifiable, personally derived data might become more significant when combined with other data (including general data such as the time of day where the transaction is happening, or personally derived data), enabling inferences to be drawn about the person involved.
- Example: If the specification under consideration exposes private information to the web, it is important to consider ways to mitigate the obvious impacts. For instance, a feature which uses biometric data (fingerprints or retina scans) could refuse to expose the raw data to the web, instead using the raw data only to unlock some origin-specific and ephemeral secret and transmitting that secret instead. Also, user mediation could be required, in order to ensure that no data is exposed without a user’s explicit choice (and hopefully understanding).
Does this specification allow an origin direct access to a user’s location, and if so is that information minimized?
- Explanation: If a standard generates personally derived information, care must be taken to preserve the privacy of the data what has been generated, Methods which can be adopted to improve personally derived data include but are not limited to: de-identification, reporting data in aggregate, and/or encrypting the data.
- Example: WebRTC generates audio and/or video data. Depending on where the camera or microphone is recording, the information could be intensely personal. When generating personally derived data, developers should stop to consider:
- Why the data is collected,
- What is the primary purpose for the processing
- Where is the data being transferred to
- If/how will the data retained
- If/how long it is being retained
How should this specification work in the context of a user agent’s "incognito" mode?
- Explanation: A user’s location is highly-desirable information for a variety of use cases. It is also, understandably, information which many users are reluctant to share, as it can be both highly identifying, and potentially unsafe. New features which make use of geolocation information, or which expose it to the web in new ways should carefully consider the ways in which the risks of unfettered access to a user’s location could be mitigated.
- Example: Geolocation information can serve many use cases at a much less granular precision than the user agent can offer. For instance, a restaurant recommendation can be generated by asking for a user’s city-level location rather than a position accurate to the centimeter.
Is it possible to spoof/fake the data being generated for privacy purposes?
- Explanation: Explanation: At the moment, sites are not told when the user is in "incognito" mode. Ideally, the feature would work in such a way that the website would not be able to determine that the user was in "incognito", as this reveals that the user might consider their interaction sensitive. Less ideally, the feature wouldn’t work, but the website still wouldn’t be able to distinguish "incognito" from simply being denied permission to use the feature (for instance). Unideally, the feature wouldn’t exist at all in "incognito", which means that the user wouldn’t be exposing data, but the website can probably tell that the user is in that state. (The question of whether websites could be aware of, and hence offer to respect, "incognito" mode is a matter of current discussion.)
- Example: Disabling a feature which could reveal that the user is in "incognito" mode
Does the standard utilize data that is personally-derived, i.e. derived from the interaction of a single person, or their device or address?
- Explanation: Users may have a legitimate need to falsify the data eminating from their machine. While standards makers are not obligated to make spoofing intuitive, they should not actively try to thwart it.
- Example: A user who is located in an oppressive regime may not wish to provide their exact geographic location, instead choosing to appear to be posting from a nearby country with less draconian laws.
Does the data record contain elements that would enable re-correlation when combined with other datasets through the property of intersection (commonly known as "fingerprinting")?
- Explanation: If so, even if anonymized, could it be re-correlated? If the data could be re-correlated, does the data record contain elements that would explicitly enable such re-correlation such as unique identifiers?
- Example: Social Security numbers, employee ID numbers, etc
Is the user likely to know if information is being collected?
- Explanation: Sometimes seemingly innocuous pieces of information, in combination, can identify a user.
- Example: One record contains name, DOB, and city of birth, a second contains DOB, city of birth, and medical illness treated.
Can the user easily, preferably through an element of the GUI, revoke consent granted to a particular feature?
- Explanation: Do I get feedback on the patterns that the information could reveal (at any instant, over time) so I can adjust behaviors? Information flows should not be invisible - users should be able to see what information is being collected and adjust behaviors accordingly.
- Example: Does a camera icon appear on a site while the webcam is being utilized? Does a noise occur when a picture is taken? Does an LED light up when the camera is on?
Once consent has been given, is there a mechanism whereby it can be automatically revoked after a reasonable, or user configurable, period?
- Explanation: Consent should not be a one time affair, but an ongoing process.
- Example: If a user must clear all cookies and cache to turn off consent granted to their webcam, this is a poor consent model.
Does this standard utilize strong end to end encrption?
- Explanation: Consent should not be granted in perpetuity because a user might forget after a while that they have given it, or it might have been given inadvertently by them or someone else.
- Example: An authentication cookie for a social network log-in can have an expiry of few hours or days so that a forgetful user will be automatically logged-out. Similarly a user that gives there consent for tracking using the DNT API can automatically stop being tracked after a reasonable period.
Does this standard use the Respec Linter to check for common privacy issues?
- Explanation: As per RFC 7258, pervasive monitoring is an attack. The TAG has supported the use of end to end encryption to protect against pervasive monitoring.
- Example: A new standard that wishes to livestream video should do so utilizing end to end encryption so that eavesdroppers may not intercept the stream.
- linter.js will check that the URLs in your `respecConfig` are using TLS, and throw a warning if they are not.
- linter.js will check that your specification has a privacy and considerations section, and link back to this document if it does not.
Many thanks to Nick Doty for Github advice; to the Privacy Interest Group and the Technical Architecture Group for their continued feedback.