New research publication: Content analysis in an era of Big Data

hbem20.v057.i01.coverMy latest article—along with Rodrigo Zamith, my Ph.D. advisee, and wonderful colleague Alfred Hermida—has been published in the most recent issue of the Journal of Broadcasting and Electronic MediaWe discuss the challenges of doing content analysis in an era of Big Data, and suggest a hybrid approach that blends computational and manual methods of data collection, filtering, coding, and analysis.

Here’s the full citation, including a link to a preprint version:

Lewis, S. C., Zamith, R., & Hermida, A. (2013). Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual MethodsJournal of Broadcasting & Electronic Media57(1), 34–52. doi:10.1080/08838151.2012.76170 (preprint version)

What’s especially exciting is that the paper is part of a special edition of the journal that examines emerging methods in digital media research. A great team of guest editors, led by Jean Burgess, lead off the issue with this introduction.

Our paper shows how we used a combination of algorithmic and human-driven kinds of techniques to analyze Andy Carvin’s Twitter coverage of the Arab Spring—first by computationally parsing and cleaning the data, second by manually identifying source types, and also by developing a Web-based interface to improve the accuracy of coding. As Alf mentioned on his site, a separate paper on our findings, “Sourcing the Arab Spring: A Case Study of Andy Carvin’s Sources on Twitter During the Tunisian and Egyptian Revolutions,” is forthcoming in the Journal of Computer-Mediated Communication, sometime in 2013.

Here’s the abstract from our JOBEM piece:

Massive datasets of communication are challenging traditional, human-driven approaches to content analysis. Computational methods present enticing solutions to these problems but in many cases are insufficient on their own. We argue that an approach blending computational and manual methods throughout the content analysis process may yield more fruitful results, and draw on a case study of news sourcing on Twitter to illustrate this hybrid approach in action. Careful combinations of computational and manual techniques can preserve the strengths of traditional content analysis, with its systematic rigor and contextual sensitivity, while also maximizing the large-scale capacity of Big Data and the algorithmic accuracy of computational methods.