In discussing issues related to privacy, people often devolve to trying to identify and define immediate harm and/or a worst-case scenario. Both of these lenses are reductive and incomplete. Because data analysis often occurs invisibly to us, via proprietary algorithms that we don't even know are in play, assigning harm can be a matter of informed guesswork and inference. As one example, try explaining how and why your credit score is determined. This algorithmically defined number determines many opportunities we receive or don't receive, yet few of us can say with any certainty how this number is derived. Algorithms aren't neutral -- they're a series of human judgments automated in a formula. There isn't any single worst-case scenario, and discussions of worst-case scenarios risk creating a false vision that there is a single spectrum with "privacy" at one end and some vague "worst-case scenario" at the other -- and this is not how it works.
The reason privacy matters -- and the reason that profiling matters -- is that we are seeing increasingly experimental and untested uses of data, especially in the realm of predictive analytics. Products using new statistical methods are used in hiring decisions, lending, mortgage decisions, finance, search, and personalization. The hype is that these new -- or "innovative" or "disruptive" -- uses of data will help us get more efficient and push pass the biases of the past. However, this fails in at least two ways: First, algorithms contain the biases of their creators; second, the performance of these products fails to live up to the hype, which in turn doesn't justify the risk.
Data collected in an educational setting -- by definition -- is data collected on people in the midst of enormous development, questioning, and growth. If people are doing adolescence right, they are making mistakes, asking questions, and breaking things, all in the name of growth and learning. In the context of, for example, an eighth-grade classroom, it all makes sense. But outside that context, it's very different. One of the promises of Big Data and learning analytics is that the data sets will be large enough to allow researchers to distill signal from noise, but, as noted earlier, the reality fails to live up to the hype.
How many of us have memories of our behavior from high school, middle school, or elementary school that make us wince? Those wince-able moments are our data trail. I mentioned earlier that talking about worst-case scenarios is an inaccurate frame, and this is why: There is no single data point that, if undone, can "fix" our pasts. However, data collected from our adolescence is bound to contain things that are inaccurate, temporary, flawed, or confusing -- for us and for people attempting to find patterns.
If people are aware of surveillance, it shifts the way we act. When students are habituated to surveillance from an early age, it has the potential to shift the way they develop. If this data is shared outside of an educational context, it creates the potential that every person attending a public school is fully profiled before she graduates. A commonly overlooked element of this conversation is that profiles never come from a single source; they are assembled and combined from multiple sources. When data collected within an educational context gets combined with data sets collected from social media or our personal browsing histories, different stories emerge.
For most people over 30 reading this post, our detailed records begin early in the 21st century, when we were 15 or older. For some kids in school now, their data profiles begin when their parents posted their ultrasounds on Facebook. While targeted advertising to kids is an immediate concern, at least targeted advertising is visible. Profiling by algorithm is invisible and forever. Requiring students to pay for their public educations with the data that will then be used to judge them sells out our kids. We can use data intelligently, but we need to have a candid conversation about what that means.