Advertisement

Big Data, User Privacy, and OkCupid

By on

heartby Angela Guess

Michael Zimmer reports in Wired, “On May 8, a group of Danish researchers publicly released a dataset of nearly 70,000 users of the online dating site OkCupid, including usernames, age, gender, location, what kind of relationship… they’re interested in, personality traits, and answers to thousands of profiling questions used by the site. When asked whether the researchers attempted to anonymize the dataset, Aarhus University graduate student Emil O. W. Kirkegaard, who was lead on the work, replied bluntly: ‘No. Data is already public.’ This sentiment is repeated in the accompanying draft paper, “The OKCupid dataset: A very large public dataset of dating site users,” posted to the online peer-review forums of Open Differential Psychology, an open-access online journal also run by Kirkegaard: ‘Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form’.”

Zimmer goes on, “For those concerned about privacy, research ethics, and the growing practice of publicly releasing large data sets, this logic of ‘but the data is already public’ is an all-too-familiar refrain used to gloss over thorny ethical concerns. The most important, and often least understood, concern is that even if someone knowingly shares a single piece of information, big data analysis can publicize and amplify it in a way the person never intended or agreed. The ‘already public’ excuse was used in 2008, when Harvard researchers released the first wave of their ‘Tastes, Ties and Time’ dataset comprising four years’ worth of complete Facebook profile data harvested from the accounts of cohort of 1,700 college students. And it appeared again in 2010, when Pete Warden, a former Apple engineer, exploited a flaw in Facebook’s architecture to amass a database of names, fan pages, and lists of friends for 215 million public Facebook accounts, and announced plans to make his database of over 100 GB of user data publicly available for further academic research.”

Read more here.

Photo credit: Flickr/ ms. Tea

Leave a Reply