Data Path by r2hox via flickr

5 Reasons Big Data Needs a History

Rebecca Lemov—

Big Data is a topic that is big news, yet it is often raised in academic circles with trepidation. Here are some reasons why our understanding of big data, even as a fashion, can benefit from historical thinking.

  1. Big Data is a new(er) concept. The phrase “big data” is a neophyte as far as phrases go, only finding its way into the Oxford English Dictionary in 2013 along with such words as “fracking,” “live blogging,” and “kombucha.” Of course, this does not mean that no one employed the modifier “big” alongside the word “data” in previous decades—and its use as a compound term dates at least to 1997 when several NASA engineers wrote a paper discussing the problems inherent in visualizing large data sets. Still, many today dismiss talk of “big data” as pure hype. Of which, admittedly, there is some. However, when leading voices in fields from political science to history say that data–in ever greater amounts–is nothing less than the future of social science itself, and when other promoters describe the revolutionary effects of data on everything from police-work to piece-work (that is, from data-driven law enforcement to data-driven fashion design), then one can responsibly take notice. What is new here?
  2. But big data does have a history. One good way of “taking notice,” that is, of understanding this new concept, is to study the prehistory of big data. This is in part because big data is one of those technological phenomena often assumed not really to have a history at all. Like other life-changing technologies, it is often described as springing from the head of a genius, or perhaps a duo of geniuses in a Palo Alto garage not long ago. Thus it comes as a surprise, from the point of view of the popular press at least, to discover precursors who worked in related if tangential fields to computer science (such as library scientists calling themselves “documentalists”), and who in some cases worked entirely elsewhere (such as experimental fieldworkers of the 1930s using exotic psychological tests, or the seventeenth-century publishers of miniature bibles).
  3. The originators of big data didn’t account for a digital world. The prehistory of big data may look haphazard, but this tells us something important about the dynamics of data. In the middle of the twentieth century, a group of “pioneers of data” collaborated to cobble together a device that could hold dreams and other dream-like materials in high-tech formats. Their creation was like a Rube-Goldberg Machine for turning dreams into data. Proto-big data did not depend on the existence of digital storage devices or database management systems. Working at the cusp of the information age, these 1940s–1950s pioneers built a complex device out of techniques, relationships, human and non-human actors. It disseminated neglected and oddly compiled data sets, creating an immense untapped resource. Its creators harvested nothing less than the dream life of countless people around the globe.
  4. There is some very personal data out there. It is often assumed that the shift to ever-more-personal data gathering has something to do with, on the one hand, the imperatives of surveillance and, on the other, the driving force of the capitalist market. At the same time, “big data” is almost always defined as large sets of information—seen as having nothing inherently to do with the personal turn in data collecting. My research highlights this contradiction by putting the “personal” part of big data at its core over half a century ago. It describes a much longer enchantment of the social sciences with intimate collections of data on a global scale. For example, “Get the data!” was the mantra of two anthropologists, George and Louise Spindler, who conducted fieldwork between 1948 and 1951 in the Wisconsin woods with Menominee American Indian people, those living both on and off the reservation. Carrying a heavy wire-recorder in the back seat of their Chevy, living in tents for months, they carried out interviews with young and old, using a method Louise had devised, the “Expressive Autobiographical Interview,” which was meant to delve into intimate yet unacknowledged parts of a person’s life and pull them out to be captured in data sets. Using the life history method, anthropologist John Adair encouraged a young Navajo recently returned from World War II to recall drinking tea with his Scottish girlfriend’s family, listening to Bob Hope (“not much of a show”), and flying over the recently liberated concentration camps. All this, Proustian in its detail, entered a vast data bank, meant to be accessible from any hallway or laboratory in the world.
  5. The seemingly relentless boosterism of pronouncements about big data is all around us. What is often neglected is its “shadow side.” The fantasy of total information has a long history and it breeds monsters, too. What a prehistory of big data reveals in stark relief is the “pathos of the perishable format.” Thinking they were creating a vast clearinghouse of all existing sociological data—for future purposes as yet unknown—the creators of this project unintentionally marooned their data on an analog format: the once-cutting-edge Microcard combined with READEX machines. The database of dreams slid into a sort of latency as the result of triple obsolescence: their theory went out of date, their storage platform lapsed into uselessness, and their project lost its funding imprimatur. This did not depend, in any direct way, on the arrival of digital formats. The database of dreams, a slumbering giant when I first came across it in the Library of Congress in 2008, held whole worlds of neglected data waiting to be resurrected.

Rebecca Lemov is associate professor of the history of science at Harvard University and past visiting scholar at the Max Planck Institute for the History of Science. She is the author of World as Laboratory, named a 2006 New York Times Editor’s Choice, and Database of Dreams.


Further Reading:

Database of Dreams cover

Recent Posts

All Blogs

Categories