Skip to Main Content

Library Guides

Psychology: Secondary datasets

Subject guide for psychology students

Finding and using secondary datasets

There are two ways of gathering the data you need in order to conduct your research: collecting new data or obtaining copies of existing data.

It is always a good idea when starting out on a project to check whether any existing data are relevant to your research questions. These data might include datasets you can download from online repositories or databases, or research data shared by your supervisor.

If you find third party data you would like to use, you will need to understand and abide by the licence terms under which the data are made available, and you must acknowledge the authors of the data. You should provide details of where the data were obtained so that other researchers can obtain or request access to the same data. The ideal way to do this is to cite the data directly, just as you would an academic paper.

Finding secondary datasets

Datasets are generally made available or published via data repositories or published as supplementary material to accompany journal articles. It is increasingly standard practice for journal articles to be published with a Data Access Statement, which tells you where and how to access the data that underpins the research presented in the article. Publishing data with an article facilitates academic integrity and enables other researchers to reproduce and validate research findings.

Datacite.org is a searchable registry of millions of research datasets that have been published and assigned a DOI. Using your search terms, you can access detailed descriptions of the datasets and the research studies to which they relate, as well as the dataset’s location on the internet.

The UK Data Archive Data Catalogue is the UK’s largest searchable digital collection of social sciences and population research data, including medical data.

To understand how Creative Commons (CC) licences work, visit our guide here https://libguides.westminster.ac.uk/copyrightresearchers/creativecommons

Formatting a reference to a dataset

If you are publishing your research in a journal, different style manuals have specific rules for referencing databases and datasets:

APA 7th edition

Smith, M., & Jones, G. R. (2015). Title of dataset (Version) [Dataset]. doi:10.15125/12345

Harvard

Smith, M. and Jones, G. R. (2015) 'Title of data'. Available at: http://doi.org/10.15125/12345 (Accessed: 1 March 2022).

Further information about citing research data

You may find the following external resources useful:

Citing a subset of data

Just as you may want to cite a particular passage from a textual work, you may need to cite a subset of data. For example, you may have queried a database and worked with the result set, or filtered out parts of a large dataset that were less relevant to your research. There are two ways of approaching this.

The first is similar to the approach you would take with a textual work. Cite the whole database or dataset, then provide the information the reader would need to extract the same subset. This could be the query you submitted to the database, and the date and time you submitted it. If there were multiple or complex steps involved, you may need to include this information in the supplementary data section instead.

The other approach is to archive a snapshot of the subset you actually used. Some archives give you this option when you query a dynamic database. Otherwise, you must make sure that the terms and licence conditions of the data you used allow you to archive your copy.