The recent meltdown of 23andme and what might become of their DNA database got me thinking about this question:
What happens to your data when a company goes bankrupt?
To say the past year has been a tough one for 23andme is an understatement. This latest turn of events, which involves infighting between management and the board, a wholesale resignation of the board, and the delisting of their stock is in addition to the massive data breach the company suffered in the fall of 2023 that led to investigations by privacy regulators and a $30 million lawsuit settlement.
Your Data Is Their Asset
The story of 23andme is the classic tale of a Silicon Valley startup gone wrong. An audacious, big-picture vision was pitched by a well-connected CEO. It raised a bunch of money based on a speculative business model that turned out to be ill-conceived. You could insert “WeWork” and tell the same story … except the stakes for 23andme’s demise involve the DNA data of millions of people. Even if we have never personally used this service, our relatives may have used it. This was how the Golden State killer was caught, using DNA matching from a similar type of genetic ancestry site. One study about genetic databases in general published in Science noted:
“We project that about 60% of the searches for individuals of European descent will result in a third-cousin or closer match, which theoretically allows their identification using demographic identifiers” (Erlich et al, 2018)
Despite a privacy policy that says it will protect personal information, when it comes to changes in ownership (i.e., through a bankruptcy or sale), it seems like all bets are off. New ownership could choose to use this data in a myriad of ways that would seek to derive value from it, possibly including sharing it with insurance companies, employers and law enforcement – three uses that current management has explicitly opposed doing per their privacy policy.
Beyond the personal data, there is also the question of aggregate data. As the company’s current privacy policy states:
Aggregate Information is different from Personal Information
Aggregate Information is not Personal Information because Aggregate Information does not contain information about, nor can it reasonably be linked to, a specific individual. Aggregate Information is information about a group of people, such as an analysis or evaluation of a group. Aggregate Information describes the group as a whole in such a way that no specific individual may be reasonably identified. For example, the number of 23andMe customers with a specific variant or health condition is Aggregate Information. (23andme)
My Data vs. Data About Me vs. Data Like Mine
Aggregate data is not afforded the same types of legal protections as personal data. Yet, when it comes to machine learning, there are still concerns about aggregate data. Here’s one way to think about it:
My Data = data that can be used to identify me directly. This is what we traditionally think of as personally identifiable data (PII). Typically this data has legal protections.
Data About Me = data that is collected adjacent to PII. It can be anonymized and thus not subject to legal protections.
Data Like Mine = the patterns found in data that correlate to my data or data about me in some way. These patterns can be used to make decisions about me. They are inferences – statistical best guesses – based on the data at hand.
For example, when you use an app, your email address and name that you use for the account would be considered “your data” and likely not used beyond the administration of your account. However, the other information you share with the app – let’s say it’s a mental health app – could be constituted as “data about you.” This might include data about your mood or metadata like how often you use the app or what time of day you access it. The app might have learned through its training data that people who have “data like yours” are at high risk of a particular health condition and thus less likely to be insurable. The patterns it identifies can be used to make decisions such as denying access to health insurance or offering you a higher premium.
The basis of machine learning and predictive analytics is not necessarily concerned with you as an individual, but rather, the probability of how you as an individual will behave given how similar you are to others you correlate to based on patterns in the data.
In the western world, we tend to think about privacy – both legally and culturally – from an individual standpoint. Once data is anonymized and aggregated, that is to say, disconnected from an individual (reidentification issues notwithstanding) most data privacy laws no longer apply. Yet, even if your individual identity is not at play, there are still harms that can happen to those who are deemed to be part of a group, particularly if that group is one with shared genetic traits. Genetic data opens questions surrounding a host of traditionally protected categories of data. These questions move into “privacy adjacent” territory. It’s an area that seems to be out of scope given our current privacy laws which focus solely on personal information, even as we look to those laws to help address the adverse impacts of data-driven AI systems using aggregated data.
“An algorithmic condensation of social conditions results in the acceleration of differences such as racialised deprivation or disability. AI can decide which humans are disposable.” Dan McQuillan
The 23andme case highlights the risks around the issue of treating data – especially sensitive data – as just another corporate asset that can be bought and sold. If you do have a 23andme account, now is the time to consider exercising the option to delete your personal data while that choice is still available.
Send Me Your Questions!
I would love to hear about your data dilemmas or AI ethics questions and quandaries. You can send me a note at hello@ethicallyalignedai.com or connect with me on LinkedIn. I will keep all inquiries confidential and remove any potentially sensitive information – so please feel free to keep things high level and anonymous as well.
This column is not legal advice. The information provided is strictly for educational purposes. AI and data regulation is an evolving area and anyone with specific questions should seek advice from a legal professional.