The Potentials and Challenges for Researchers and Web Archives Using the Persistent Web IDentifier (PWID)

Publikation: KonferencebidragPaperForskningpeer review

Abstract

In order for researchers to live up to good research practice, we need to be able to make persistent references to contents in web archives. In some cases, different types of Persistent Identifiers can be used. However, for web archive pages or element references, which needs to be resolvable for more than 50 years, the Persistent Web IDentifier (PWID) is often the best choice. Many referencing guidelines or standards recommend that references to web archives should be made via an archived URL. This is a challenge not only for closed web archives, but also for web archives that change addresses for their web archive data. For instance, this happened when the Irish web archive migrated their holdings from an Internet Memory Foundation (IMF) platform (http://collection.europarchive.org/nli/) to an Archive-IT web archive service (https://archive-it.org/home/nli). It will also be the case for web archives changing archive URLs due to changes related to the Wayback machine.

The PWID resolves many of the known issues with common identifiers as it is based on basic web archive metadata; web archive, archival time of web element, archived URL of web element and precision or inherited interpretation of the PWID, like page or part/file. Thus, once the web archive is identified, the archival time and archived URL can be used to find the resource since these metadata are present in WARC. Finally, the information about interpretation/precision of the resource can be used as a means to choose manifestation of the page and access to the resource. This means that resolving of a PWID does not rely on a separate registry of the contents of a web archive (which can be huge), since the WARC metadata can be indexed (e.g. in CDX or SOLR) and this index will be able to support the resolving. Furthermore, the design of the PWID has been based on bridge building between digital humanity researchers, web archivist, persistent identifier experts, Internet experts etc. in order to meet requirements of being human readable, persistent, technology agnostic, global, algorithmically resolvable and accepted as an URN.

Using the PWID, researchers will gain a way to persistently address web elements in a sustainable way. The web archives can benefit from the PWID, too, both in regards to the implementation of support for researchers, and in creation of the web archive when there are several manifestations of a web page. For example, the British Library web archive uses the PWID when archiving snapshots of web pages. Furthermore, since a PWID URN is a URI, it can be used as URI identifier as is e.g. required for WARC identifiers. The PWID can become even more useful for researchers when is incorporated in reference tools like Zotero etc.

The presenters will discuss their different perspectives as researcher within the humanities and as computer scientist and web archivist. The presentation will cover challenges and experiences from each perspective as well as future potentials in support and through expansion of the PWID URN definition.
OriginalsprogEngelsk
Publikationsdato26 apr. 2024
StatusUdgivet - 26 apr. 2024
BegivenhedIIPC General Assembly and Web Archiving Conference 2024 - Bibliothèque nationale de France, Paris, Frankrig
Varighed: 24 apr. 202426 apr. 2024
https://netpreserve.org/ga2024/

Konference

KonferenceIIPC General Assembly and Web Archiving Conference 2024
LokationBibliothèque nationale de France
Land/OmrådeFrankrig
ByParis
Periode24/04/202426/04/2024
Internetadresse

Citationsformater