TY - JOUR
T1 - Big data experiments with the archived Web
T2 - Methodological reflections on studying the development of a nation’s Web
AU - Brügger, Niels
AU - Nielsen, Janne
AU - Laursen, Ditte
PY - 2020
Y1 - 2020
N2 - This article explores how archived Web sources can be used for historical studies of an entire national Web domain and its development over time. It presents the methodological challenges of large-scale studies using Web archive content and discusses the limitations and potential of this new type of study of Web history. It uses the entire Danish Web domain .dk from 2006 to 2015, as it has been preserved in the Danish national Web archive, as a case to exemplify how ‘a nation’ can be delimited on the Web and how an analytical design for this type of big data analysis using archived Web can be developed. This includes considering the characteristics of the archived Web as a historical source for academic studies as well as the specific characteristics of the data sources used. Our findings reveal some of the ways in which a nation’s digital landscape can be mapped by examining Web site sizes and hyperlinks, and we focus on discussing how these results shed light on the methodological challenges, reflections and choices that are an integral part of large-scale Web archive studies. The study demonstrates that hardware and software as well as human competences from various disciplines make it possible to perform large-scale historical studies of one of the biggest media sources of today, the World Wide Web.
AB - This article explores how archived Web sources can be used for historical studies of an entire national Web domain and its development over time. It presents the methodological challenges of large-scale studies using Web archive content and discusses the limitations and potential of this new type of study of Web history. It uses the entire Danish Web domain .dk from 2006 to 2015, as it has been preserved in the Danish national Web archive, as a case to exemplify how ‘a nation’ can be delimited on the Web and how an analytical design for this type of big data analysis using archived Web can be developed. This includes considering the characteristics of the archived Web as a historical source for academic studies as well as the specific characteristics of the data sources used. Our findings reveal some of the ways in which a nation’s digital landscape can be mapped by examining Web site sizes and hyperlinks, and we focus on discussing how these results shed light on the methodological challenges, reflections and choices that are an integral part of large-scale Web archive studies. The study demonstrates that hardware and software as well as human competences from various disciplines make it possible to perform large-scale historical studies of one of the biggest media sources of today, the World Wide Web.
U2 - 10.5210/fm.v25i3.10384
DO - 10.5210/fm.v25i3.10384
M3 - Journal article
SN - 1396-0466
VL - 25
JO - First Monday (Chicago)
JF - First Monday (Chicago)
IS - 3
ER -