Skip to Main Content

Library Guides

Finding and Using Digital Archives: Using digitised archives critically

This guide covers how to find digitised and digital archives material and how to critically examine what you find.

Understanding digitised archival resources

As we saw on the previous page, there are many different reasons why archivists digitise collections, as well as many reasons why a collection might not be digitised. Once the material is digitised, it can then be presented in a number of different ways, all of which will impact how you search within that collection, and how you interpret it.

This page covers some of the questions you should ask of any digitised archive resource, before you start looking at the material within it, in order to assess how valuable it will be for your research and whether it is a trustworthy resource.

Who has created the site?

This information will usually be on the front page, sometimes in the header or the footer or on an 'About' page.

 Knowing who produced the resource will help you to understand what decisions may have been taken around the selection of material. 

For example the Brooke Archives website brings together digitised documents from archives around the world relating to the Brooke family, who were known as the White Rajahs of Sarawak (in Malaysia). The material has been selected by the Brooke Heritage Trust and so is likely to reflect their perspective as the descendants of colonial rulers.

Does the material come from just one collection?

Digitisation can be a great way of bringing together collections that have become dispersed, and to allow access to all the material about an individual, or on a topic together in one place.

In 'What Are Archives' we looked at the importance of thinking about archives as an aggregate of material. However, digitising material from multiple collections takes them out of their original context and puts them into a new one. It may mean that some aspects of the archive are more understandable - for example, if the two halves of a correspondence are brought together - but may also mean that the items lose some of the meaning that they had in their original context. For example, if there are 40 letters on a topic digitised from an archive collection that contains 60 letters in total, we can see that this is an important topic in that collection. However, if it is 40 letters out of 600, then the topic was less important to the writer than the digitised resource may lead us to assume.

How is the digital material arranged?

Traditionally archives are arranged in a hierarchical structure. This helps you to understand how and why the records were created. 

Some archives, including the University of Westminster Archive, make digitised material available online through their online catalogue so that you can see it in this context. Other archives present their digital material in a thematic way, more like being at an exhibition.  This can be useful if your research aligns with their expected audience, but can be frustrating if it doesn't.

The archive of artist John Latham was digitised in 2009 and made available online via https://www.ligatus.org.uk/aae/. The website allows you to browse the archive under 3 different personalities, related to the central characters in Dostoevsky's The Brothers Karamazov (a major reference point of Latham). Using this website will help you to see how the mode of presenting the archive makes a difference to your experience as a researcher.

Are images of the records available or only indexes/transcriptions?

Although digitisation now tends to mean making digital images of documents available online, some older projects concentrated on providing transcriptions or databases of records. These projects tend to focus on records with large amounts of names relevant to geneaologists.

The Historic Hospital Admissions Records Project http://www.hharp.org/ is one example of this type of resource. Search results are given as database entries, not as photographs of the original pages and so the information cannot be confirmed by the researcher. While this resource makes clear in the project description has been double-checked for typographical errors, other projects may not have been so rigorous. It should also be noted that some of the information has been standardised to enable easier searching, which may be a problem to some researchers.

Does the website have citation guidelines?

If a website includes citation guidelines, this will give you an indication that it is intended as an academic resource. You can see an example of this on the Darwin Correspondence Project website.

Ideally the website should also include the unique archival reference number/code for the item, especially where it is not presented within the archive's catalogue. The reference number/code will enable you to look the item up on the archive's catalogue and find out more contextual information about it.

Digitised archives should always be cited as webpages, to make it clear that you used a digitised version and not the physical item itself. However it would be useful to also include the reference number/code for the item where this is available. Webpages have a limited lifespan, whereas the reference number for the archive item should be persistent.

If you are planning on re-using any of the images find from these digital resources, you should check the University's Copyright guides first.

Why was the site created?

This information should either be on the home page or an 'About' page. It should give you a good idea of how complete the resource is likely to be and who the intended audience are. A digitisation resource produced as part of an organisation's anniversary is likely to have more of a celebratory feel and be less complete than a resource produced for academic researchers.

A good example of an informative 'About' page is the Norfolk Record Office's Second Air Division Digital Archive. It not only explains the purpose of the website but why some things haven't been digitised.

Is the collection only available digitally?

When researchers use digital-only resources, they have to trust the host organisation because there is no way of checking any of the information. There can be perfectly legitimate reasons for making a resource digital-only. For example, many of the records digitised through the British Library's Endangered Archives Programme are in remote locations that would be difficult for researchers to travel to. However, if the host organisation is a commercial company providing only limited access to some of the archive through a digitised portal, then we may be more suspicious of their motives.

Is the text of the digitised documents searchable?

The search function on a digitised archive website will work in one of three ways:

  • Optical Character Recognition (OCR) has been applied to the text of the documents to make it machine readable
  • the text of the documents have been transcribed by people
  • the website is searching across descriptions or summaries of the archive documents, rather than the text of the documents itself.

It is important that you understand which of these is the case before you start searching as it will impact how you search.

Both transcription and OCR can have mistakes in them and are not infallible. Where you are searching on an OCR or transcribed text, the text is also more likely to have abbreviations and historical spellings that may impact on your search results.

Does the website allow user submissions?

Many online archives have sprung up that enable communities to submit digital copies of their own material. Often these relate to areas that traditional archives haven't collected, such as popular culture.

The Manchester Digital Music Archive is a very successful example of this type of project . Running since 2003, the website enables individuals to upload photographs or scans of set-lists, gig photos, press cuttings and other ephemera relating to Manchester's music scene, in order to share them with the wider community. Because this collection is entirely online, you are not able to visit and see any of these objects in person - they remain in the houses of the people who have uploaded them. 

In a traditional archive setting, the archivists take responsibility for guaranteeing the authenticity of documents in the archive. This is done through establishing 'provenance' - knowing who has owned an object from the time it was created until it arrives at the archive. Researchers know that they can trust the digital version because they could visit it in person at the repository to check the two versions against each other. In an online repository with user submissions the provenance of documents is unclear. Trust relies on the community checking and policing the uploads of other users.

How complete is the digitisation?

There has been increasing discussion in the last few years about the 'gaps' and 'silences' in archives. Archives are always incomplete but the way that they are arranged, catalogued and described can disguise this. Archivists cannot necessarily know what has been destroyed or lost before a collection came to them and describe what they have. Part of your role as a researcher is to think about what might have not survived, and why.

As we saw on the previous page, digitisation is an expensive process and access to digitised materials is bound by legislation. For this reason, the majority of projects will only aim to digitise a proportion of a collection. If the digitised material is presented within the catalogue structure, it will be clear what material was not, or could not, be digitised. However if the digitised material is presented on a separate website then it may not be immediately apparent. When using a digitised collection, it is good practice to look at the catalogue alongside it, in order to have a better idea of what is missing.