Responsible Collection of Public Data

4 min readMay 3, 2017

This is Part II of a series on ethics and public data collection. Part I was published in the Responsible Data Forum and can be found here.

For Donna, everything missing from the harsh realities of life could be found in children. They were sweet in a way a world apart from the gossiping and severe nunnery she left four years back. Caring in a way utterly foreign to her middle school students. She didn’t know how much longer she could put up with them. She wanted to make the world a better place, but her students were cruel and conniving. That’s when she met Terry. He was passionate, charismatic, and devoted fully to protecting the most innocent among us: the unborn. Donna left her job and joined him. They toured the eastern half of the country, fighting for their beliefs and for the world the way it should be. When Terry began advocating violence on behalf of their cause, Donna couldn’t resist.

I had known about Donna’s strict Catholic upbringing. I knew the crowd she hung out with. I knew what drove her. But I never really knew her. She was just a subject of an investigation into her and Terry’s (neither’s real names) anti-abortion group. Everything I had learned about her came from Googling her name and from the public parts of her social media presence. Anyone with internet access could have known her just as well.

A YouTube clip here, a missive on Facebook there, a few words with a local reporter, algorithms to make the information accessible — Donna’s small one-off shares quickly turned into a full character sketch. At what point does responsibility for privacy go from the one sharing to the one reporting?

At Omelas, we’re using the most advanced predictive analytics in the world to understand what draws people towards radicalization and what turns them away. We’re changing countering violent extremism from an intuition-based art into a data-driven science. Omelas’s potential to steer people away from violence could be life-saving but to do so requires creating sketches like the one of Donna by the tens of thousands. Before arriving at that stage, Omelas’s top priority is to develop an ethical framework on how to proceed.

Earlier this year, Pew released a survey showing the gap between the type of privacy people want and the type they receive. More than nine out of ten respondents said control over who sees their information was important, but only half were confident they understood what data of theirs is shared. The lack of confidence is justified: two out of three say their photos are available online and nearly half say the same of their birthdate and of their email address. That’s more information than I had to start with Donna. Spend $5 a month on a service like Spokeo, and an email address will give you social media profiles, forum registrations, usernames, phone numbers and more.

Information shared passively — metadata about how a user behaves, who they interact with, times of day they’re active — dwarfs the actively shared kind. Few take steps to keep that information hidden: Only one in nine Firefox users had opted into the browser’s Do Not Track feature as of 2013, while only three in ten internet users regularly cleaned their cookies in 2007. Passively shared content can paint even more vivid a picture than actively shared. Tracking services need only two weeks of metadata to learn your distinct browsing habits and tie new devices and new accounts back to you. Advertising platforms can use metadata to tell the type of car you want to buy, your role in your office, even whether you’re a dog or a cat person.

The portrait of an individual painted by publicly available information is deep and detailed. That makes choosing what details to include and omit perilous. Established fields provide some guidance. Statistical rigor demands clarity about uncertainty, clarity that conclusions reached represent a distribution of probabilities and not a perfect predictor of the future, and clarity that correlation is not causation. Journalistic ethics demand context and transparency. But applying predictive analytics and machine learning to countering violent extremism raises questions without ready parallels. The whole of a portrait can be far more sensitive than the sum of its parts. A person’s online profile may differ drastically from who they are in person or even in private online. New fields inevitably create new and unforeseen issues.

What ethical guidelines exist today for the field are sparse. What kind of questions should we ask ourselves when writing ethical guidelines? Are we forgetting to consider certain factors? What are the types of players we need to factor into our decision making? How do we stay updated with the latest data privacy laws and regulations?

We’re partnering with leaders in the intelligence and responsible data communities to develop a coherent framework to answer these questions. As we do so, the voice of the public will be invaluable in ensuring we account for all possibilities and concerns. Leave your thoughts in the comment section below or contact us at ethics@omelas.co.

Written by Omelas

No responses yet