Browse all articles

My Summer 2019 Internship Experience with the Common Sense Privacy Program

Topics:   Privacy Program

I was lucky enough to be hired by Common Sense this summer for an eight week long Summer internship with the privacy program. Going into the internship, I didn’t know much about what Common Sense does beyond their media reviews.

Jeff Graham | September 10, 2019

The following is posted on behalf of Ben Fleischmann our 2019 summer intern who was able to jumpstart our foray into machine learning.

Hi, I’m Ben Fleischmann and this Fall I’ll be a Junior at Amherst College in Massachusetts. Going into college, I was about as directionless as a student can be. I had no idea what I wanted to do professionally, and I chose a small, liberal arts school partially because I figured I would get a little taste of everything. This proved accurate, and with the eight classes available to me my first year, I dipped my toes into seven different departments. Among those, I took classes in Chemistry, Art, Economics, and most notably, Computer Science. I took an immediate liking to Computer Science, a subject that-- despite growing up in San Francisco-- I had never really been exposed to. I continued taking classes in the department into my Sophomore year, and declared a CS major toward the end of it.

I was lucky enough to be hired by Common Sense this summer for an eight week long Summer internship with the privacy program. Going into the internship, I didn’t know much about what Common Sense does beyond their media reviews. The organization’s extensive advocacy work was completely off of my radar. I learned about my team’s work around privacy practices, and specifically about the policy annotator tool they built to help elucidate and evaluate the oftentimes obscure and dense language of company privacy policies.

 When I learned that I would be working on expanding the annotator tool, I was very excited. I had many personal goals for my summer experience, one of which was to figure out what working as a software engineer was actually like. I enjoyed the tasks and puzzles that were set to me as a CS student, and I knew that I wanted to bring that sort of problem solving into my future career. But at the same time I didn’t want to, as one of my professors puts it, “spend my time maximizing sales of Barbie(™) dolls.” I was extremely lucky to work with the privacy team, honing my CS skills within a professional environment while working on a cause that I felt was morally upright, and had positive benefits to society.

While at Common Sense, I mainly focused on experimenting with machine learning to move towards automating the privacy policy annotator that my manager, Jeff Graham, had built. The tool retrieves and aids in the annotation of company policies under a rubric of 150 questions. The annotation process gives consumers helpful information about the privacy practices of digital services they use, but the process is still a human task, and each evaluation is time consuming. At the onset of my internship, enough evaluations had been completed to constitute a data set potentially large enough to train a machine learning model on. 

The early weeks of my time at Common Sense were spent researching related work and brainstorming possible approaches to solve the problem. This was my first foray into machine learning, and I learned a ton in this initial period. Then came the task of preprocessing the data. Each evaluation file holds on to a large quantity of data including all of the policy text, and text relating to the annotations specific to each question on the rubric. I began by building a segmenter that would break the text of a policy down into pieces that approximately matched the length of the flagged text in the evaluations. I then needed to clean up the text by removing non-english words, urls, and other miscellaneous noise that might affect the accuracy of a text classifier down the road. Next, I experimented with ways to vectorize the cleaned text; a process by which each segment is converted mathematically into a matrix of features that the computer can process. Once, vectorized, the text can be used to train a machine learning model and various classifiers can be built. I used annotated text relating to specific questions in the rubric to build classifiers that flagged unknown text as potentially relevant or irrelevant to the question at hand. I described the process linearly, but in reality every piece of this project was fluid and subject to change, and most of my time toward the end was spent tweaking parameters and methods to try to increase the performance of the classifiers. I was extremely daunted at first by my project but I am very happy with the progress I made, and I learned more than I ever could have hoped.

My experience with Common Sense this Summer was overwhelmingly positive. This was, in many ways, a perfect internship. I was lucky enough to work with the wonderful people on the privacy team on an issue that I believe is morally pertinent. I collaborated closely with Jeff who guided me through my project and taught me a lot. He provided me with help and support and encouraged my explorations, and I’m very happy to have worked with such a great person. I was always greeted with smiles and hellos at the office, and I am grateful for the opportunity I had to learn so many technical and logistical skills within a professional environment. I will take what I learned this Summer with me, and I am grateful to everybody that made my experience so memorable!