The EUscreenXL Aggregation Process

Author: Andrew Ormsby


EUscreenXL Aggregation has reached – and surpassed – its halfway mark. Anyone with an internet connection and a computer, or mobile device, can now search Europeana.eu to find more than 540.000 television and radio programmes, newsreels, home movies, amateur films and other contextual material, related to the audovisual domain. This content is freely available to view – via links on the Europeana portal – on the websites of the European broadcasters and audiovisual archives who together with television historians, educators and technical experts, form the EUscreenXL consortium of partners.

Europeana has been described as Europe’s digital library: an online repository of millions of digitised cultural heritage items from archives, museums and libraries across Europe. These items are available to everybody, free of charge, and include books and manuscripts, photos and paintings, maps, sheet music and recordings.

An area in which Europeana has traditionally been lacking is the audiovisual realm. This is where EUscreenXL comes in. Our task is to become the pan-European aggregator in the audiovisual domain by making available on Europeana a total of one million items.

The aggregation process begins when partners upload large datasets (often consisting of tens of thousands of items) in csv or xml format into the MINT mapping tool. This tool, which was developed by the National Technical University of Athens, enables partners to map their own metadata schema to the Europeana Data Model (or EDM). The EDM includes eight mandatory fields, which partners must provide in order for their collections to be accepted by Europeana. The most important field (along with Title, Description, Rights Statement and Subject) is a link to stable, persistent URL where the digital object can be seen in its full information context on the content partner’s website. When the partner has mapped their schema to the EDM they publish their dataset, which is harvested from MINT every month by Europeana, to go live on their portal.

The fact that content partners need only provide a few mandatory metadata fields means that Europeana is able to accept huge quantities of data. The low entry threshold also means that smaller providers, with fewer resources, are able to provide access to their collections: an aim in keeping with the democratic spirit in which Europeana was conceived. In theory, a collection of, say, 100.000 items, can be made available on Europeana in three simple steps: upload, map, publish. This is the current aggregation model in a nutshell.

Inevitably, what is simple in theory can present difficulties in practice. There has been criticism of certain aspects of the aggregation process. Some partners have described the workflow as slow and in-transparent. Europeana’s ‘top-down’ model can give the sense that the workflow is based on Europeana’s needs rather than those of the providers. Content can ‘disappear’ in the vastness of Europeana’s digital realm, and similar collections can be split up when projects come to an end, or are reborn under a different name. The portal interface itself is beginning to look dated and is in need of redesign.

It is to Europeana’s credit that they are taking note of these shortcomings and listening to the aggregators. Work is being done on developing a new aggregation model; the Europeana portal itself is being redesigned and rethought, both in terms of its look and purpose, with the aim of evolving it from a portal into a multi-sided platform, with more focus on highlighting smaller collections, developing online exhibitions, and a new emphasis on the creative reuse of content.

Furthermore, the fact that the current aggregation model works on a grand scale has many benefits. The scalability of the enterprise, due to shared workloads, metadata standards and domain expertise, means that we are able to aggregate very large datasets with fewer problems than one might expect from the sheer amount of information being handled. And given that the process does run relatively smoothly, sometimes it is easy to forget just how much data we are dealing with and take the process for granted.

The industrial nature of the task – the sense that aggregation is about quantity rather than quality – is also, in some ways, a misleading one. There is quality in abundance: it is simply a case that the threshold for entry, in terms of metadata, is fairly low. This does not mean that all EUscreenXL content features the bare minimum of mandatory metadata fields, or that the quality of the digital object itself is compromised. The aim to be inclusive, rather than exclusive, is one of Europeana’s fundamental principles.

Although there is still much work to be done before we reach our final target of one million items, it is worth pausing to consider what we have been aggregating, rather than how much. The focus on numbers can seem relentless and sometimes it is easy to forget the wonderful content. My aggregation pick for this blog is LCVA’s Tarybų Lietuva (Soviet Lithuania) which can be seen here. This collection comprises 650 newsreel stories from the Communist era, documenting Lithuanian life in the 1950s from a Soviet perspective. The metadata includes English summaries and the content is all Public Domain marked. Here, in miniature, we can see a measure of the success of EUscreenXL, in the way that content, previously unavailable to all but a handful of dedicated researchers, is now freely accessible to anyone with an internet connection and the desire to explore a shared European history.

 

Funded by: Connected to: