In the past two decades, communities have adopted digital technologies to gather and record their collections in a form of ‘citizen history’ that has created a truly democratic and vast reservoir of new knowledge about the past. The intellectual and economic investment in such community-generated digital content (CGDC) is immense, and its rich and diverse content is one of the UK’s prime cultural assets. However, this content is ‘critically endangered’ due to technological and organisational barriers: CGDC has proved extraordinarily resistant to traditional methods of linking and integration, meaning resources commonly funded and produced by the public are often inaccessible or isolated, under-utilised, and at risk of disappearing altogether. Existing solutions to this problem, involving bespoke interventionist activities, are expensive, time-consuming, and unsustainable at scale, whilst any unsophisticated computational integration of this data would result in a lowest-common-denominator solution which would negate the meaning and purpose of both CGDC and its creators.
The Our Heritage, Our Stories (OHOS) project responds to this urgent challenge by bringing together researchers in digital humanities, archives, history, linguistics, and computer science, from the Universities of Glasgow and Manchester, with world-leading archive and digital infrastructure development at The National Archives (TNA), the project’s lead IRO. This team are harnessing cutting-edge approaches from cultural heritage, humanities, and computer science to dissolve existing barriers and develop scalable linking and discoverability across CGDC and the collections of TNA. This work is also being conducted in collaboration with leading UK heritage organisations, including Tate, the British Museum, the National Libraries of Scotland and Wales, the National Lottery Heritage Fund, the Public Record Office of Northern Ireland, and a network of smaller regional and local heritage organisations holding digital content created by and relating to communities.
Using multidisciplinary methods, OHOS is making previously unfindable and unlinkable CGDC discoverable within the national collection, while respecting and embracing its complexity and diversity by designing sophisticated automated tools to make it searchable and connected. Revealing and commemorating the previously hidden voices and stories of diverse communities across the UK, this new accessibility will be showcased to the world through a major new public-facing Observatory at TNA where people can access, reuse, and remix this newly integrated content. As we dissolve barriers and add meaningful links across these collections, we will make them accessible to new and varied audiences and open them up for research – demonstrated via multidisciplinary case studies – and embed new strategies for future management of CGDC into heritage practice and training.
The lasting legacies of this project will be the wealth of previously siloed, hidden, and fragmented CGDC it will situate and render discoverable. By so doing, we will revolutionise our understanding of the past, and the methods and means to achieve this, by developing cutting-edge tools, AI methods, historical and linguistic research, and new frameworks for sustainable archival practice. By enabling CGDC to be re-used and reimagined, we will help it survive and be nourished, for the future and for our shared national collection.
The archives work on Our Heritage, Our Stories is being led by the University of Glasgow. This work is informing a deep understanding of the ecosystem of community content on which the other areas of the project will build, creating an institutional/organisational taxonomy for community archives that addresses their information ecosystems. By drawing on a full and nuanced understanding of CGDC’s lifecycle, context, preservation, and fragilities, this project workstream will produce a semantics-aware, FAIR-compliant and sustainable post-custodial model for use across the sector. Through ethical and meaningful co-curation with community archives it will also generate best practice frameworks and effective training resources which can be embedded into archival education and professional development, while respecting the complexity, hybridity and nuance encoded within their data and practice. This will be informed through iterative testing with our IRO partners, community archives, ALT, DPC, Wikidata, and our wider network.
The University of Glasgow is also in charge of overall project management and administration, including managing the Community Fund and other payments, advisory board and stakeholder panel liaison, organising workshops and project travel, and undertaking other centralised project functions. This work also ensures project coordination with partners and with other Towards a National Collection projects, and coordinates the publication on our project website of our White Papers.
The University of Manchester is leading the machine learning and AI research on the Our Heritage, Our Stories project. This work will focus on the AI-enabled interpretation and text-mining of complex, imperfect, and orphaned collection description data. It will develop and validate scalable and explainable machine learning and AI techniques to create a pipeline for automated extraction and semantic enrichment of CGDC metadata from disparate collections (via the use of structured metadata, such as core language elements, existing authority data and terms from novel ontologies). It will develop this into knowledge graphs of CGDC metadata, to enable complex representations and search, working in collaboration with the linguistics experts at the University of Glasgow, who will investigate the barriers created by a lack of ability to describe collections in a community’s own language, jargon, and dialect as it changes across time. This approach to language as a heritage object in its own right will ensure those archives not congruent with standard English descriptors are made inclusive to a truly national collection.
The University of Manchester is also working on producing key historical research demonstrator case studies to demonstrate the value of CGDC for research, which engage academic, community, and family historians on the use and potential of discoverable and linked CGDC, including generating crucial links between the creators and potential end-users of newly opened up CGDC. These will make the compelling case that CGDC content is core to a national collection for research; central to this will be the establishment of a community of practice, where we will identify and collaborate with a number of independent community-produced digital archives that record, safeguard, and make known records, artefacts and oral histories of migration, religious and ethnic identity, and the social history of post-war UK at a local level. The historical research lab will also produce studies of focused periods and times – indicative examples of these include research on areas such as the records, artefacts and oral histories of post 1950 migration into the UK, to reveal compelling new narratives about the histories and development of contemporary society, or telling new and richly detailed stories about community responses to war and its aftermath and impact.
The National Archives (TNA) is leading the pivotal building of knowledge, capacity, infrastructure, interfaces, and pathways that will facilitate the linking and integration of CGDC within our national collection. Central to this is the hosting of the project’s enriched data and the development of innovative tools and interfaces to allow scalable mixing, visualising, and sharing of CGDC for research and discovery. This will support manual and automated metadata production and harvesting, knowledge graph infrastructures, semantic linking and interoperability, multilingual/multidialectal crosswalks, and ontologies – all of which allow discoverability and integration between TNA’s existing world-class digital infrastructure, the nation’s vast CGDC trove of content, and the wider work of the Towards a National Collection programme.