news.gnom.es: Quantifying Privacy: A Week of Location Data May Be an “Unreasonable Search” & New Law Review Article: Mosaic Theory and Machine Learning

News Gnomes: Quantifying Privacy: A Week of Location Data May Be an “Unreasonable Search”

When does the simple digital tracking of your location and movements — the GPS bleeps from most of our smartphones — start to be truly revealing? When do the data points and inferences that can be drawn from it strongly suggest, say, trips to a psychiatrist, a mosque, an abortion clinic, a strip club or an AIDS treatment center?

The answer, according to a new research paper, is about a week, when the data portrait of a person becomes sufficiently detailed to qualify as an “unreasonable search” and a potential violation of an individual’s Fourth Amendment rights.

. . .

Justice Samuel Alito stated that four weeks of location data collection, without a warrant, was “surely too long,” in that it would be enough for a detailed portrait of a person’s behavior. But Justice Antonin Scalia noted in his majority opinion that “it remains unexplained why a four-week investigation is ‘surely too long.’ ”

“That’s a fair critique,” observed Ms. Hutchins, the University of Maryland law professor. “We wanted to see what the science showed.”

The “research paper” is Steven M. Bellovin, Renée M. Hutchins, Tony Jebara, and Sebastian Zimmeck, When Enough is Enough: Location Tracking, Mosaic Theory, and Machine Learning, 8 NYU J. L. & Liberty 555 (2014). Abstract:

Since 1967, when it decided Katz v. United States, the Supreme Court has tied the right to be free of unwanted government scrutiny to the concept of reasonable expectations of privacy. An evaluation of reasonable expectations depends, among other factors, upon an assessment of the intrusiveness of government action. When making such assessment historically the Court considered police conduct with clear temporal, geographic, or substantive limits. However, in an era where new technologies permit the storage and compilation of vast amounts of personal data, things are becoming more complicated. A school of thought known as “mosaic theory” has stepped into the void, ringing the alarm that our old tools for assessing the intrusiveness of government conduct potentially undervalue privacy rights.

Mosaic theorists advocate a cumulative approach to the evaluation of data collection. Under the theory, searches are “analyzed as a collective sequence of steps rather than as individual steps.” The approach is based on the observation that comprehensive aggregation of even seemingly innocuous data reveals greater insight than consideration of each piece of information in isolation. Over time, discrete units of surveillance data can be processed to create a mosaic of habits, relationships, and much more. Consequently, a Fourth Amendment analysis that focuses only on the government’s collection of discrete units of data fails to appreciate the true harm of long-term surveillance—the composite.
In the context of location tracking, the Court has previously suggested that the Fourth Amendment may (at some theoretical threshold) be concerned with the accumulated information revealed by surveillance. Similarly, in the Court’s recent decision in United States v. Jones, a majority of concurring justices indicated willingness to explore such an approach. However, in general, the Court has rejected any notion that technological enhancement matters to the constitutional treatment of location tracking. Rather, it has decided that such surveillance in public spaces, which does not require physical trespass, is equivalent to a human tail and thus not regulated by the Fourth Amendment. In this way, the Court has avoided a quantitative analysis of the amendment’s protections.

The Court’s reticence is built on the enticingly direct assertion that objectivity under the mosaic theory is impossible. This is true in large part because there has been no rationale yet offered to objectively distinguish relatively short-term monitoring from its counterpart of greater duration. This article suggests that by combining the lessons of machine learning with the mosaic theory and applying the pairing to the Fourth Amendment we can see the contours of a response. Machine learning makes clear that mosaics can be created. Moreover, there are important lessons to be learned on when this is the case.

Machine learning is the branch of computer science that studies systems that can draw inferences from collections of data, generally by means of mathematical algorithms. In a recent competition, “The Nokia Mobile Data Challenge,” researchers evaluated machine learning’s applicability to GPS and cell phone tower data. From a user’s location history alone, the researchers were able to estimate the user’s gender, marital status, occupation and age.8 Algorithms developed for the competition were also able to predict a user’s likely future location by observing past location history. The prediction of a user’s future location could be even further improved by using the location data of friends and social contacts.

Machine learning of the sort on display during the Nokia competition seeks to harness the data deluge of today’s information society by efficiently organizing data, finding statistical regularities and other patterns in it, and making predictions therefrom. Machine learning algorithms are able to deduce information—including information that has no obvious linkage to the input data—that may otherwise have remained private due to the natural limitations of manual and human-driven investigation. Analysts can train machine learning programs using one dataset to find similar characteristics in new datasets. When applied to the digital “bread crumbs” of data generated by people, machine learning algorithms can make targeted personal predictions. The greater the number of data points evaluated, the greater the accuracy of the algorithm’s results.

In five parts, this article advances the conclusion that the duration of investigations is relevant to their substantive Fourth Amendment treatment because duration affects the accuracy of the predictions. Though it was previously difficult to explain, for example, why an investigation of four weeks was substantively different from an investigation of four hours, we now have a better understanding of the value of aggregated data when viewed through a machine learning lens. In some situations, predictions of startling accuracy can be generated with remarkably few data points. Furthermore, in other situations accuracy can increase dramatically above certain thresholds. For example, a 2012 study found the ability to deduce ethnicity moved sideways through five weeks of phone data monitoring, jumped sharply to a new plateau at that point, and then increased sharply again after twenty-eight weeks. Similarly, the accuracy of identification of a target’s significant other improved dramatically after five days’ worth of data inputs. Experiments like these support the notion of a threshold, a point at which it makes sense to draw a Fourth Amendment line.

In order to provide an objective basis for distinguishing between law enforcement activities of differing duration, the results of machine learning algorithms can be combined with notions of privacy metrics, such as k-anonymity or l-diversity. While reasonable minds may dispute the most suitable minimum accuracy threshold, this article makes the case that the collection of data points allowing predictions that exceed selected thresholds should be generally deemed unreasonable searches in the absence of a warrant. Moreover, any new rules should take into account not only the data being collected but also the foreseeable improvements in the machine learning technology that will ultimately be brought to bear on it; this includes using future algorithms on older data.

In 2001, the Supreme Court asked “what limits there are upon the power of technology to shrink the realm of guaranteed privacy.” In this study, we explore an answer and investigate what lessons there are in the power of technology to protect the realm of guaranteed privacy. After all, as technology takes away, it also gives. The objective understanding of data compilation and analysis that is revealed by machine learning provides important Fourth Amendment insights. We should begin to consider these insights more closely.

This entry was posted in Cell phones, GPS / Tracking Data, Informational privacy. Bookmark the permalink.

Comments are closed.