A free resource for all? Legal and ethical considerations when using CGDC for research

In this blog, Dr Stefan Ramsden from the OHOS History Lab at the University of Manchester examines copyright, GDPR and ethical issues around the use of CDGC for research.

Steve Strange with drinkers in The Packet pub, Tiger Bay, Cardiff, after a Spandau Ballet concert at Casablancas, 1981. Image taken by Graham Smith, and sourced from People’s Collection Wales (https://www.peoplescollection.wales/discover) where it was uploaded by S.A. Brain and Co Ltd. The image is made available on the People’s Collection website for non-commercial use via a Creative Archives Licence (see https://www.peoplescollection.wales/creative-archive-licence).

Making community-generated digital content (CGDC) more easy to find and to link with other sources will help to realise the huge research potential of this material. Yet, whilst CGDC represents a vast treasure trove of text, sound and imagery concerning past lives, there are significant legal and ethical issues associated with accessing of personal material online, especially but not only when the individuals concerned are living. This blog outlines some of the legal and ethical frameworks and considerations that researchers need to consider when using CGDC, particularly when this material references the recent past. The types of CGDC discussed here include, but are not limited to: oral histories and textual autobiographical material; oral and textual accounts describing events, people and episodes; historical images and video material.

Many of the issues raised below concerning the use of CGDC overlap significantly with those faced by historians using similar sources in institutional, curated archival collections. Researchers using institutional archives have to abide by rules governing the ethical and legal use of these materials, often signing a declaration to this effect.1 Similarly, when using online collections hosted by institutions such as the British Library, researchers can read about the ethical and legal guidelines and procedures that the institution has followed (which will often include gaining informed consent from content donors) and are provided with clear guidelines about how researchers might use the online material.2 However, collections of CGDC hosted online by community groups and individuals often come with little or no guidance about how the material was collected, and what sorts of informed consent (if any) were obtained from donors for their material to appear online and be used by others for research. It’s also unclear what guidance on making this sort of material available was available to organisations creating and hosting CGDC.

Whether an online historical source is made available by an institution or community group, researchers are not simply free to use it as they wish: both because of legal rules and ethical frameworks. This blog briefly outlines the copyright and data protection ramifications of using CGDC before dealing with some of the broader ethical issues. This blog is an introduction to the issues – researchers should avail themselves of the wide range of literature available on copyright, data projection and research ethics, and seek legal advice where appropriate.


Material found online is subject to copyright law. Under UK law, every piece of expression stored in a relatively fixed form (eg. recordings, texts, images) is automatically the copyright of its creator at the point of creation.3 In the case of images, copyright is generally assumed to rest with the creator and/or owner of the object represented in the image, though in the case of photographs, it is sometimes asserted that the subject also has rights. Copyright can also be sold or assigned elsewhere. The copyright status of material found online should not be assumed to be held by the group or institution that has uploaded it. The British Library, for example, makes no promise that the use of material from its online collections will not violate a third party’s copyright.4 Most material placed online by community groups does not include copyright information concerning its use or reuse.

Whether or not there is information about who owns the copyright on a particular online site, the researcher still needs to consider copyright if they are going to use the material for research and publication. There are two ways an individual can reproduce material that is not their own copyright. The first is through permission of the copyright holder, the second (discussed in more detail below) is via the exemptions made for non-commercial research. Sometimes CGDC websites might stipulate conditions about how their material can be used – these do not necessarily carry the force of law but should be considered for ethical reasons (see below). Sometimes explicit permission for copying and reuse of material is granted on the website hosting the CGDC, for example through the provision of a Creative Commons license. Different types of Creative Commons licence exist, each with their own conditions, and the researcher would need to satisfy the conditions of this licence if their use of copyrighted material rested solely on this type of permission. For example, the ‘ShareAlike’ Creative Commons license stipulates that any outputs using the material must be also be made freely available to all using the same license.5 This could potentially prevent publication in a paywalled journal or book for sale. Where a website hosting CGDC includes no information about copyright, it may be possible to obtain permission from  the institution, group or individual who have uploaded the material concerned. If this institution or group is not the copyright holder, it would be necessary to locate the copyright holder(s) and seek their permission in order to resort to this justification for copying and publishing the material. This is not always easy, though many creators of CGDC are keen for their work to be cited and used.

However, copyright law is not designed solely for the protection of individual rights but also to allow for the use of intellectual property for the enrichment and benefit of the wider community.6 The second justification for copying and publishing copyrighted material is based on exceptions from the requirement to seek permission from the copyright holder. In the UK, these exceptions are: for non-commercial research and private study; fair dealing for criticism, review or quotation.7 Under these rules, a researcher could reasonably claim non-commercial research as a justification for utilising copyrighted material, particularly if reproducing the work in a journal article rather than in a book for which they are to receive payment. However, it would still be necessary to ensure that use falls within the ‘fair dealing’ stipulation as there is no blanket exemption from copyright law for research. Much depends on the interpretation of ‘fair dealing’.8 This term is not precisely defined in law, but is understood as use of a copyrighted work that will not constitute a substitute for the original work and will not result in commercial loss for the copyright holder. It also means that only a ‘limited extract’ of material will be used (again, what constitutes a ‘limited extract’ is not precisely defined).9

Data Protection Law

The second form of legal restriction of which researchers need to be aware when using CGDC for research is the General Data Protection Regulation, or GDPR (implemented in UK law in the Data Protection Act 2018). This governs the way in which personal data relating to living people can be collected, stored and processed in the UK and European Union. Many types of CGDC are rich in personal data (of ‘data subjects’ as described by the regulation). There may be numerous data subjects in any particular piece of CGDC. For example, an oral history interview may include not only the personal data of the narrator, but also that of other people referred to in the interview.

It is not only the person or institution who collects personal data and/or makes it available in a physical archive or website who needs to be concerned with GDPR. A researcher copying the data and using it for their research and writing becomes a ‘data controller’, and must establish a legal basis for storing and processing this data.10 There are two legal bases available. The first is ‘consent’. For researchers accessing CGDC online it may be difficult or impossible to obtain consent from all data subjects. Even if consent were to be obtained from each data subject for the purposes of the research, consent for processing data can be withdrawn, leading to potential problems for a researcher. Therefore the Oral History Society recommends that researchers, where possible, resort to the second legal basis for processing personal data, that of an exemption based on public interest. Whereas data subjects can demand that their personal data be erased, they cannot do this where the data is being used for ‘archiving purposes in the public interest’ or for ‘scientific, historical or statistical purposes’.11 This is relatively straightforward for researchers in institutions with a mandated public task connected to research, including universities, libraries and museums.12 It is possible to process individual data on the basis of ‘legitimate interests’ for researchers not working in these institutions.  However, the processing of ‘special category’ data – data that reveals political opinions, religion, ethnicity, sexuality and other personal characteristics  –  requires a further justification.13 Much oral history data, for example, is likely to fall into this category. A separate justification needs to be made for processing special category data, but the researcher can resort to the public interest inherent in their research. In all cases, the use of personal data must not cause ‘substantial damage or distress’ to the data subject(s), and researchers need to document the decisions they have taken in processing personal data and be able to explain how their use of this data is compliant with data protection law.14 When using an exemption, researcher is still required to process personal data responsibly and to put in place measures to protect the privacy of data subjects (see the Information Commissioner’s Office guidelines and the Royal Historical Society’s guidelines for more detailed discussion).15

Finally, though Creative Commons licenses are increasingly common for online material, including CGDC, it is important to note that material that is made available online via a Creative Commons license is not exempt from the requirements of GDPR. The National Lottery Heritage Fund – probably the biggest funder of CGDC and a promoter of the use of CC licenses to increase accessibility – recognises that these licenses can lead researchers to erroneously believe they are free to use material with no restrictions. As a result the NLHF has introduced  exemptions from its usual requirement that grant holders must make material freely available via Creative Commons licenses. These exemptions apply to special category data such as that often contained in oral history interviews and similar material.16


To be ethical in their use of CGDC relating to individuals, particularly those who are still alive, the researcher should do more than simply obey copyright and data protection law. Many uses of personal data might be legal yet strike us (and more importantly, the individual to whom they relate) as ethically problematic.  There is a growing literature discussing researchers’ reuse of material, including social survey data and oral history collected originally for other purposes, and this literature includes a variety of ethical reflections that are relevant for researchers who wish to use CGDC in their work.

A central ethical issue is the extent to which a researcher is free to use CGDC materials for purposes which the original producers, collectors and curators of this material would not have envisaged and to which they have not consented. Most writers on internet ethics argue against assuming that online materials can be used for research like any other published material due to the possible absence of informed consent on the part of individuals.17 When a researcher discovers CGDC content which includes individual data and is considering using it for research, two immediate ethical questions arise connected to issues of consent. The first is whether adequate informed consent was gathered for this personal data to be placed online. Did individuals fully understand how accessible and reusable their material would be and consent to it on this basis? If there is suspicion that the original producer or subject of the data did not know that their material would appear online, or could not have fully understood the implications of this, the reseacher needs to think very carefully about whether and how they might use the material. Secondly, if informed consent was obtained to collect and place material online, are the purposes for which the researcher intends to use the material compatable with the spirit as well as the letter of the original consent? Joanna Bornat outlines this ethical problem in relation to her own reuse of oral history material collected by Margot Jefferys in the 1990s for a project studying the development of geriatric medicine in the UK. Interviewees had given permission for the use and archiving of their interview recordings and transcripts on the understanding that the research was about the history of geriatric medicine. However, when reviewing this material after Jeffrey’s death, Bornat found that it told a strong story about the marginalisation of Indian doctors in the NHS. She asked herself if it would be ethically correct to use these interviews to tell a story which might not sit well with the interviewees and which they had not thought was the subject of the orginal research.18 In the end, Bornat decided to use the material because she decided that the importance of telling a truthful, if uncomfortable, story about discriminatory practices in the development of geriatric medicine outweighed any displeasure that interviewees might feel. She argued that so long as the original context of the interviews and differences in language across time were unpacked in her the article, there was both ethical and methodological justifications for using the material.19

Although little has been written about the specific ethical problems of using CGDC for research, there is a literature on historians’ use of internet archives. Various internet archives have scraped and preserved websites from the mid 1990s onwards. Like CGDC sources, these archives represent a significant resource for historians whilst the ethical issues raised by their usage are similar. In his discussion of internet archives, Ian Milligan uses the example of GeoCities, one of the principle media through which private individuals could set up their own websites and share content in the early days of the World Wide Web in the mid 1990s.20  When GeoCities collapsed in 2009, the Internet Archive, aware of the historical importance of this material, scraped content from the millions of websites hosted there. This material can today be accessed through the Wayback Machine (https://archive.org/web/). It was impossible to obtain consent from the millions of individuals whose material was saved, but Milligan is clear that the need to act quickly to save the material meant that the public interest case for undertaking the scrape outweighed the ethical imperative to seek consent. However, he also argues that researchers have to consider very carefully the ethical dilemmas involved in accessing and using this material. While some owners of GeoCities sites meant their material to be seen by as large an audience as possible, others shared deeply personal narratives and reflections that seem to have been intended for only a small audience of online friends. Many must have felt that their content had anonymity akin to that of a ‘needle in a haystack’, and would not have imagined the ways in which future researchers might search and access content. The creators of these sites could also not have imagined that their material might exist in the long term as part of the Internet Archive. Many of these individuals will still be alive, and use of their material could be potentially harmful. Milligan advises researchers to proceed carefully and empathetically in accessing and using this data. He notes that ‘working with web archives requires constantly navigating a grey zone’, and advises that researchers using the Internet Archive need to pay close attention to context, looking for clues to understand how an individual may have viewed their website as private or public. He also suggests that we need to use empathy, imagining how we ourselves might feel if this were our material that a researcher was using.21 These are also good tips for reseachers using CGDC.

There are strong reasons why, out of a duty of care for the privacy and feelings of research subjects, we need to be careful with using material we find online that relates to living individuals. Most ethical frameworks seek to balance the potential for harm to individuals against the researcher’s duty to produce research that is truthful and that benefits wider society.  While the informed consent of research participants is seen as a gold standard in research with people, it is not always possible for practical reasons to obtain consent from those individuals whose personal information we find in archived collections, whether these are physical archives or collections existing online. This does not simply mean that we should not use such material. Informed consent can rarely cover the range of possible uses that a researcher might make of a particular data set,22 whilst the obligation to produce socially useful historical research may outweigh qualms about a lack of specific informed consent.23 Moreover, different ethical frameworks can result in varying positions in terms of the balances to be struck between individual and collective good. For example, researchers in Scandanavia and Europe tend to adopt a ‘deontological’ approach, in which the central principle is to protect the human rights of autonomous citizens. Researchers in the US and the UK tend towardsmore ‘utilitarian’ principles, with greater willingness to risk individuals’ rights for the public good that their research can bring.24

While different solutions to ethical problems are possible and defendable, according to a researcher’s cultural, institutional and personal ethical context, it is always advisable that ethical problems are anticipated and considered from the start and across the life of a particular research project. Mckee and Porter contend that researchers need to begin with an ethical question: why they are undertaking the research in the first place – will any societal benefit accrue from the research? All subsequent ethical decisions should flow from this. There are distinct ethical dimensions to the planning, conducting, archiving and publication stages of research, and ethical review should be an ongoing and reflexive process throughout these stages.25  This is particularly true of research with CGDC which involves material which is highly diverse and exists as an often complex blend of private and public information about people in the past and present.

Finally, researchers using CGDC must be aware of the power dynamics inherent in the research relationship. Among the many reasons why CGDC is often not equivalent to the source material we access in institutional archives or in published works is that it is generally produced, collected and curated by groups and individuals who are not professional historians. Their project might be ongoing, and the CGDC they produce will often be seen as a representation of themselves and their community. Groups and individuals producing CGDC can therefore feel a strong sense of ownership and investment in their material. Academic historians, with the resources and authority bestowed by their institutional position, have a platform which enables them to project their own interpretation of the CGDC that they find online. This means that there is an unequal power relationship between academic researchers and many producers of CGDC. Sensitivities in relation to this power dynamic may be particularly acute when CGDC has been produced by marginalised groups but any group which has collected and made available online CGDC representing aspects of their identity may feel understandably aggrieved if researchers use their material and submit it to their own interpretive ends without consultation. It is therefore incumbent on researchers who use CGDC to attempt to engage with its producers. Researchers need to explain how they propose to use the material, allow producers to express opinions, and should seek informed consent where appropriate. If there is disagreement between producer and researcher, negotiation may also be possible (perhaps involving offers to anonymise material or share authorship).

Summary of Guidance

  • CGDC represents a rich resource for historical researchers; and in particular, CGDC can help to restore marginalised voices to the historical record. However, because this material has often been collected and placed online by groups and individuals who may work differently to academic and archival professionals, we have to take particular care in how we use this material, making sure that we understand and work within legal, ethical and institutional frameworks.
  • All material is in the copyright possession of someone. While it is best, if in doubt, and if the copyright holder is known, to seek permission, there are exceptions that allow the use of copyrighted material without permission for research and educational purposes, providing certain criteria are met.
  • Creative Commons licenses do not give the researcher free rein. The ‘share alike’ license, for example, demands that any output that reuses material is also freely available– and a researcher cannot always be certain that websites have the right to apply a Creative Commons license to particular material (which might infact be the copyright of another person).
  • Data protection law, including the Data Protection Act 2018 (which implemented the General Data Protection Regulation in the UK) stipulates that researchers handling personal data, regardless of the source of data, are ‘data handlers’ and must establish a legal reason for processing it. This legal reason must be grounded either in consent obtained from the data subject, or in an exemption based on the purposes for which the data is being processed (‘archiving purposes in the public interest’ or ‘scientific, historical or statistical purposes’). Researchers must document the decisions they have taken in processing personal data and be able to explain how these comply with data protection law.
  • It is not enough for the researcher to satisfy themselves that they have met their legal obligations in the use of CGDC, researchers must consider the ethics of using this material.
  • The literature on research ethics advises that researchers build ethical reflection into every stage of the research process, asking themselves questions about the kinds of harm that their research could cause and the kinds of benefits it could bring, whilst also involving the widest possible range of stakeholders in ethical discussions and informing themselves about how ethical considerations have been handled in similar research and with what results.
  • Because CGDC often represents the work and identities of community groups and individuals, researchers using CGDC should, in most cases, attempt to consult with these producers. Such consultation enables the inclusion of the wishes of producers in the ethical calculus. This is not to say that in every instance the researcher must seek informed consent for use of CGDC, but that the wishes of those individuals to whom the material refers, and those individuals who have collected the material, should be considered carefully.
  • University researchers should ensure that they understand and engage with institutional frameworks, considering whether their use of CGDC will require formal peer ethical review.

Further Resources

More information about UK copyright law can be found here:

More information about UK data protection law can be found here:

A good discussion of ethical issues around internet research can be found here:

References and endnotes

  1. Mary Larson ‘Steering Clear of the Rocks: A Look at the Current State of Oral History Ethics in the Digital Age’ Oral History Review 40, no.1, 2013. For an institutional example see, for eg., The National Archives, ‘Using materials from The National Archives’, 2022, available online from: https://cdn.nationalarchives.gov.uk/documents/information-management/use-of-tna-materials.pdf    [Accessed 5 September 2023]. ↩︎
  2. See, for example: The British Library ‘Websites and Online Services’, available online from:  https://www.bl.uk/about-us/terms-and-conditions/websites-and-online-services [accessed 14 September 2023]; British Library ‘Collection Guides. Oral History’, available online from: https://www.bl.uk/collection-guides/oral-history#:~:text=Many%20sound%20recordings%20have%20been,in%20accredited%20higher%20education%20establishments. [Accessed 14 September 2023] ↩︎
  3. British Copyright Council ‘Information: How original literary works are protected by copyright in the UK’, webpage, available from: https://www.britishcopyright.org/information/how-original-literary-works-are-protected-by-copyright-in-the-uk. [Accessed 4 September 2023] ↩︎
  4. The British Library ‘Websites and Online Services’. ↩︎
  5. Creative Commons  ‘Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)’. available from: https://creativecommons.org/licenses/by-sa/4.0/. [Accessed 24 August 2023] ↩︎
  6. Heidi A. Mckee and James E. Porter, The Ethics of Internet Research. A Rhetorical, Case-Based Process, New York: Peter Lang, 2009, pp.53-4. ↩︎
  7. Gov.Uk, ‘Copyright Guidance. Exceptions to copyright’, 2021, webpage, available from:
    https://www.gov.uk/guidance/exceptions-to-copyright.  [Accessed 24 August 2023] ↩︎
  8. McKee and Porter, The Ethics of Internet Research, p.55. ↩︎
  9. Gov.Uk. ‘Copyright Guidance. Exceptions to copyright’; British Copyright Council, ‘Using material from a published edition’, website, available from: https://www.britishcopyright.org/information/using-material-from-a-published-edition.  [Accessed 23 August 2023] ↩︎
  10. Oral History Society, ‘Dealing with GDPR’, website, available from https://www.ohs.org.uk/gdpr-2/#:~:text=GDPR%20applies%20to%20any%20organisation,%2C%20self%2Demployed%20or%20voluntary. [Accessed 24 August 2023] ↩︎
  11. Oral History Society, ‘Dealing with GDPR’. ↩︎
  12. Oral History Society, ‘Dealing with GDPR’. ↩︎
  13. Katherine Foxhall, Data Protection and Historians in The UK,  Royal Historical Society, 2020, available from: https://files.royalhistsoc.org/wp-content/uploads/2020/07/19092331/20200707_RHS_Data_Protection_Historians_WEB2.pdf  [Accessed 27 September 2023], p.15; Information Commissioner’s Office ‘UK GDPR Guidance and Resources’, available from: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/. [Accessed 27 September 2023] ↩︎
  14. Oral History Society, ‘Dealing with GDPR’, p.6; Foxhall, Data Protection and Historians in the UK. ↩︎
  15. Foxhall, Data Protection and Historians in the UK. ↩︎
  16. Oral History Society, ‘Dealing with GDPR’. ↩︎
  17. Eg.. Mckee and Porter, The Ethics of Internet Research; Bernard Enjolras. et al. Internet Research Ethics, Cappelen Damm Akademisk NOASP, 2015; Association of Internet Researchers, ‘Internet Research: Ethical Guidelines 3.0’, 2019, available from: https://aoir.org/reports/ethics3.pdf. [Accessed 5 September 2023] ↩︎
  18. Joanna Bornat, ‘A Second Take: Revisiting Interviews with a Different Purpose’, Oral History (2003) 31(1), pp. 47-53. ↩︎
  19. Bornat, ‘A Second Take…’; Joanna Bornat, Parvati Raghuram, and Leroi Henry, ‘Revisiting the archives: a case study from the history of geriatric medicine’, Sociological Research Online, vol 17, no 2, 2012. ↩︎
  20. Ian Milligan, History in the Age of Abundance. How the web is transforming historical research, London: Mcgill-Queens University Press, 2019,  pp.171-212. ↩︎
  21. Milligan, History in the Age of Abundance, p195. ↩︎
  22. Bornat, ‘A Second Take…’ p.52. ↩︎
  23. Milligan, History in the Age of Abundance, p.206. ↩︎
  24. Association of Internet Researchers, ‘Internet Research: Ethical Guidelines’, p.5. ↩︎
  25. Mckee and Porter, The Ethics of Internet Research, pp.141-166. ↩︎