Beskrivelse
Web as data – challenges and triumphs of creating and working with a derived web corpusThe most widespread access form to archived web is through the Wayback Machine where the archived web is browsed much like as when it was online. However, traditional ways of providing access through a library interface such as the Wayback Machine are not always ideal for researchers who are looking to build corpora around specific research questions. Indeed, a wide range of other access forms are possible, from hyperlink analyses to studies of images, language, etc., including big data studies of large portions of the web. Such analyses may require certain software and computational tools, ie. the need to work with the material as derived data using tools not available at the library. Moreover, the complexity of data and collections means that many researchers will require individual assistance to understand the collection and it’s potential. Since 2018, The Royal Danish Library has made on demand copies of its archived web to be delivered as a corpus at the researcher’s own institution. In this panel, we will explore the challenges and triumphs of a number of cases, where the invited speakers have used the library’s archived web as a base collection from which a number of different corpora have been created, serving different research interests. Topics include but are not limited to: corpus creation, collaboration library/research institution, project support, data management, research infrastructure etc. There are five individual contributions.
Periode | 17 okt. 2022 → 18 okt. 2022 |
---|---|
Begivenhedstype | Konference |
Placering | aarhus, DanmarkVis på kort |
Grad af anerkendelse | International |