Abstract
This paper presents a test of browser-based Web crawling on a sample of streaming services’ web sites and web players. We are especially interested in their graphical user interfaces since the Royal Danish Library collects most of the content by other means. In a legal deposit setting and for the purposes of this test we argue that streaming services consist of three main parts: their catalogue, metadata, and the graphical user interfaces. We find that the collection of all three parts are essential in order to preserve and playback what we could call 'the streaming experience'. The goal of the test is to see if we can capture a representative sample of the contemporary streaming experience from the initial login to (momentary) playback of the contents.
Currently, the Danish Web archive (Netarkivet) implements browser-based crawl systems to optimize its collection of the Danish Web sphere (Myrvoll et al., n.d.). The test will run on a local instance of Browsertrix (Webrecorder, n.d.). This will let us login to services that require a local IP-address. Our sample includes streaming services for books, music, TV-series, and gaming.
In the streaming era, the very thing that defines it is what threatens to impede access to important media history and cultural heritage. Streaming services are transnational and they have paywalls while content catalogues and interfaces change constantly (Colbjørnsen et al., 2021). They challenge the collection and preservation of how they present and playback the available content. On a daily basis, Danes stream more TV (47 pct.) than they watch flow-TV (37 pct.) and six out of 10 Danes subscribe to Netflix (Kantar-Gallup, 2022). Streaming is a standard for many and no longer a first-mover activity, at least in the Nordic region of Europe (Lüders et al., 2021).
The Danish Web archive collects websites of streaming services as part of its quarterly cross-sectional crawls of the Danish Web sphere (The Royal Danish Library, n.d.). A recent analysis of its collection of web sites and interfaces concluded that the automated collection process provides insufficient documentation of the Danish streaming services (Aegidius and Andersen, in review).
This paper presents findings from a test of browser-based crawls of streaming services’ interfaces. We will discuss the most prominent sources of errors and how we may optimize the collection of national and international streaming services.
References
Aegidius, A. L. & Andersen M. M. T. (in review) Collecting streaming services, Convergence: The International Journal of Research into New Media Technologies
Colbjørnsen, T., Tallerås K., & Øfsti, M. (2021) Contingent availability: a case-based approach to understanding availability in streaming services and cultural policy implications, International Journal of Cultural Policy, 27:7, 936-951, DOI: 10.1080/10286632.2020.1860030
Lüders, M., Sundet, V. S., & Colbjørnsen, T. (2021) Towards streaming as a dominant mode of media use? A user typology approach to music and television streaming. Nordicom Review, 42(1), 35–57. https://doi.org/10.2478/nor-2021-0011
Kantar-Gallup (2022) Digital Life 2022. Available at: https://www.kantargallup.dk/digital-life-2022 (accessed 20 September 2023).
Myrvoll A.K., Jackson A., O'Brien, B., et al. (n.d.) Browser-based crawling system for all. Available at: https://netpreserve.org/projects/browser-based-crawling/ (accessed 26 May 2023).
The Royal Danish Library (n.d.) Netarkivet. Available at: https://www.kb.dk/en/find-materials/collections/netarkivet (accessed 20 September 2023).
Webrecorder (n.d.) Browsertrix-crawler. Available at: https://webrecorder.net/tools#browsertrix-crawler (accessed 20 September 2023).
Currently, the Danish Web archive (Netarkivet) implements browser-based crawl systems to optimize its collection of the Danish Web sphere (Myrvoll et al., n.d.). The test will run on a local instance of Browsertrix (Webrecorder, n.d.). This will let us login to services that require a local IP-address. Our sample includes streaming services for books, music, TV-series, and gaming.
In the streaming era, the very thing that defines it is what threatens to impede access to important media history and cultural heritage. Streaming services are transnational and they have paywalls while content catalogues and interfaces change constantly (Colbjørnsen et al., 2021). They challenge the collection and preservation of how they present and playback the available content. On a daily basis, Danes stream more TV (47 pct.) than they watch flow-TV (37 pct.) and six out of 10 Danes subscribe to Netflix (Kantar-Gallup, 2022). Streaming is a standard for many and no longer a first-mover activity, at least in the Nordic region of Europe (Lüders et al., 2021).
The Danish Web archive collects websites of streaming services as part of its quarterly cross-sectional crawls of the Danish Web sphere (The Royal Danish Library, n.d.). A recent analysis of its collection of web sites and interfaces concluded that the automated collection process provides insufficient documentation of the Danish streaming services (Aegidius and Andersen, in review).
This paper presents findings from a test of browser-based crawls of streaming services’ interfaces. We will discuss the most prominent sources of errors and how we may optimize the collection of national and international streaming services.
References
Aegidius, A. L. & Andersen M. M. T. (in review) Collecting streaming services, Convergence: The International Journal of Research into New Media Technologies
Colbjørnsen, T., Tallerås K., & Øfsti, M. (2021) Contingent availability: a case-based approach to understanding availability in streaming services and cultural policy implications, International Journal of Cultural Policy, 27:7, 936-951, DOI: 10.1080/10286632.2020.1860030
Lüders, M., Sundet, V. S., & Colbjørnsen, T. (2021) Towards streaming as a dominant mode of media use? A user typology approach to music and television streaming. Nordicom Review, 42(1), 35–57. https://doi.org/10.2478/nor-2021-0011
Kantar-Gallup (2022) Digital Life 2022. Available at: https://www.kantargallup.dk/digital-life-2022 (accessed 20 September 2023).
Myrvoll A.K., Jackson A., O'Brien, B., et al. (n.d.) Browser-based crawling system for all. Available at: https://netpreserve.org/projects/browser-based-crawling/ (accessed 26 May 2023).
The Royal Danish Library (n.d.) Netarkivet. Available at: https://www.kb.dk/en/find-materials/collections/netarkivet (accessed 20 September 2023).
Webrecorder (n.d.) Browsertrix-crawler. Available at: https://webrecorder.net/tools#browsertrix-crawler (accessed 20 September 2023).
| Bidragets oversatte titel | Test af browserbaseret høstning af streaming services' brugerfalder |
|---|---|
| Originalsprog | Engelsk |
| Publikationsdato | 26 apr. 2024 |
| Antal sider | 10 |
| Status | Udgivet - 26 apr. 2024 |
| Begivenhed | IIPC General Assembly and Web Archiving Conference 2024 - Bibliothèque nationale de France, Paris, Frankrig Varighed: 24 apr. 2024 → 26 apr. 2024 https://netpreserve.org/ga2024/ |
Konference
| Konference | IIPC General Assembly and Web Archiving Conference 2024 |
|---|---|
| Lokation | Bibliothèque nationale de France |
| Land/Område | Frankrig |
| By | Paris |
| Periode | 24/04/2024 → 26/04/2024 |
| Internetadresse |
Emneord
- Browsertrix
- streaming
- svod
- Netflix
- DR
- TV2
- kulturarv
- digitale medier
Citationsformater
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
- MLA