A consistent routine of privacy detection, measurement and practice allows websites and services to improve their privacy practices. Privacy measurement is specifically used to find out what user data companies collect, how they collect it, and what they do with it.
OpenWPM is an effective privacy measurement tool developed by Steven Englehard and Arvind Narayanan at Princeton University, who recently published a study, “Online tracking: A 1-million-site measurement and analysis.” Their tool provides an opportunity for web privacy measurement to be a more common place practice. OpenWPM collects large-scale data for simulating users, detects and classifies trackers. This automated process is designed to improve the effectiveness of browser privacy tools by efficiently updating the tracking-protection lists.
What exactly is web tracking and why is it a problem? As consumers browse different websites, they are often monitored by “first parties,” which are the sites themselves. However, they are also observed by “third parties” which are very often ad networks embedded on the web page. Third parties capture consumer browsing history through cookies to uniquely identify the user. Often third parties obtain data such as email address or user characteristics without the consumers’ awareness or permission. Interestingly, cookie syncing is also a tracking practice that allows different trackers to share user identifiers with one another.
While hard to detect, cookie syncing is transferred server-to-server by sending the ID in the request URL or in the referer URL. DoubleClick net is one of the most active in cookie syncing and shares 108 different cookies with 118 other third parties (this includes both events where it is a referer and where it is a receiver).
As Englehard and Narayanan wrote in their introduction:
Web privacy measurement — observing websites and services to detect, characterize and quantify privacy-impacting behaviors — has repeatedly forced companies to improve their privacy practices due to public pressure, press coverage, and regulatory action. On the other hand, web privacy measurement presents formidable engineering and methodological challenges. In the absence of a generic tool, it has been largely confined to a niche community of researchers.
In the course of their online tracking study, OpenWPM determined that of the top one million sites there are 81,000 third parties. Interestingly of 81,000 present, only 123 are on more than 1% of sites. In fact, the top five third parties are Google-owned domains. News sites had the most third party tracking in terms of content categories across websites.
Overall, web privacy measurement has three parts: simulating users, recording observations and analysis. The difference in OpenWPM is that it automates the first two parts of this process. It also uses 15 measurements on each site it assesses, such as stateful (cookie-based) and stateless (fingerprinting-based) tracking and multiple fingerprinting techniques.
The automated process of creating tracking-protection lists is a key benefit of OpenWPM. Its ability to scale data, detect and catalog trackers is the holy grail. As well offering a web-based analysis platform provides ease and availability to those with limited technological ability.