AsymCAR River

Walter Frick:

The online economy — from search to email to social media — is built in large part on the fact that consumers are willing to give away their data in exchange for products that are free and easy to use. The assumption behind this trade-off is that without giving up all that data, those products either couldn’t be so good or would have to come at a cost.

But a new working paper, released this week by Lesley Chiou of Occidental College and Catherine Tucker of MIT, suggests that the trade-off may not always be necessary. By studying the effects of privacy regulations in the EU, they attempted to measure whether the anonymization and de-identification of search data hurts the quality of search results.

Most search engines capture user data, including IP addresses and other data that can identify a user across multiple visits. This data then allows search companies to improve their algorithms and to personalize results for the user. At least, that’s the idea. To determine whether storage of users’ personal data improves search results, Chiou and Tucker looked at how search results from Bing and Yahoo differed before and after changes in the European Commission’s rules on data retention. In 2008 the Commission recommended that search engines reduce the period over which search engines kept user records. In response, Yahoo decided to strengthen its privacy policy by anonymizing user data after 90 days. In 2010 Microsoft changed its policy, and began deleting IP addresses associated with searches on Bing after six months and all data points intended to identify a user across visits after 18 months. In 2011 Yahoo changed its policy again, this time deciding to store personal data longer — for 18 months rather than 90 days — allowing the researchers yet another chance to measure how changes in data storage affected search results. (Google did not change its policies during this period, and so is not included in the study. Some of Tucker’s past research has been funded by Google.)

AsymCAR River

Do Tech Companies Really Need All That User Data?