When digital content needs to be stored in long-term (cloud) storage, many thorny problems arise. One of the biggest is the handling of metadata from multiple sources. So far, it is often complex to merge different schemas of metadata. Often only customized solutions open the way to archive such collections.
CONTENTUS, a German joint research project, demonstrates an interesting approach to overcome these obstacles: Their system offers a module called "metadata translator", which eases the process to have it all in one place.
We wanted to learn more about this particular project and asked the creators to describe backgrounds and goals as well as technology approaches.
Here is a brief interview, providing insights from the creators of the solution:
Can you briefly describe the scope of CONTENTUS as a whole?
IRT: "The main objective of the project is to provide a rich toolbox of solutions for cultural institutions and other owners of audiovisual content that encompasses all processing steps from raw digital data to a semantic multimedia search environment (even a module for easy restoration of digitized film and video material).
The CONTENTUS framework and the methodologies and concepts that have been developed in the project yield a system that supports cultural institutions in providing end users with access to multimedia collections at a large scale. End-users benefit from innovative and semantic search options that are fuelled by the abundance of multimedia assets and metadata from various sources, including “traditional”, intellectually compiled data, automatically generated information, and internet-based resources."
What standards are in use to reach the goal of just one metadata repository?
IRT: "Within CONTENTUS MPEG-7 is in use for audio and audiovisual material. MPEG-7 has advantages regarding its flexibility because it is based on general and widely applicable concepts, but the complexity causes many problems given the fact that MPEG-7 was not specially designed to describe broadcast content.
To specify the metadata objects and their structure, the MPEG-7 schema had to be enhanced by a new profile. CONTENTUS partners IRT and Fraunhofer HHI pushed the standardization of this MPEG-7 profile, named “Audio Visual Detailed Profile” (AVDP). The profile supports the description of audio, video or audiovisual content, ensuring a compatible document structure for all types of content.
A key feature of AVDP is the modularity in the descriptions (e.g. separating metadata originating from different modalities or produced by different tools). It is the first profile based on version 2 of MPEG-7 and includes low-level audio and video descriptors and was published as
ISO/IEC 15938-9/Amd1. The associated schema (ISO/IEC 15938 Part 11) is in its last stage and will be released in summer 2012."
Your metadata translator is just one, but interesting module: Why was this created and how does it work?
IRT: "Broadcasters and media houses make use of various models for their metadata. These metadata are usually created manually and thus of high reliability. However, in order to use these legacy systems, a solution was required for the integration of different metadata schemes.
Therefore, IRT has developed a metadata translator that converts metadata instances of German Broadcasters into the CONTENTUS specific MPEG-7 format.
Metadata translator (CONTENTUS)
This solution is based on a central data model called
BMF (Broadcast Metadata Exchange Format). It provides the prerequisites for uniformly labeling and describing information.
The class model of BMF describes the meaning and relationships between information which is relevant and exchanged in the frame of television production processes. BMF is used as a central exchange format, i.e. the source metadata instance is first converted to BMF.
In a second step, the generated BMF instance is translated to the desired target format. This allows the exchange of metadata between different systems in a more cost-efficient way."
How flexible is the metadata connector if someone else would want to apply it to other metadata systems?
IRT: "Using BMF as a central data model, the effort required to connect additional systems to already integrated systems is limited. For example: if a new source format A should be connected to the target formats X, Y and Z, the only thing to do is the translation between format A and BMF because the connections between BMF and the target formats already exist."
"The big challenge is to achieve the complete transition from analog content to an all-digital processing and archiving”
Looking into the future: What are the challenges for digital media workflows in your opinion?
IRT: "One big challenge is to achieve the complete transition from analog to digital content handling and processing. Unstructured metadata has to be converted into digital data sets and integrated into existing workflows. There are lots of different file and metadata formats in one workflow.
To handle these formats within one production process, the best solution would be the usage of one unique file format. But this solution is not realistic, because the market has established a lot of different formats. Therefore an exchange format could help doing the translation from one format into another in a cost efficient way and to have the option to add further formats easily.
To help balancing the ever increasing workload for documentalists, CONTENTUS developed several technical approaches e.g. in the area of speech-to-text, face detection or video OCR. These technologies need to be further developed and adapted to the specific requirements of the archives or the broadcast houses, but a big step forward has been made.
"Material that is not described properly will be lost."
Last but not least, the people/staff must be trained/sensitized for the correct description of material in the metadata. This is very important for retrieval. Material that is not described properly will be lost. Adding new functionality (e.g. semantic linking, automatic feature extraction) into search and retrieval could be a good approach to enhance search functionalities in broadcasting/media houses.
However, enhancing existing individual workflows (i.e. integrating new features) is obviously not straightforward, and for many media houses it will be a big challenge to adapt their workflows to their future needs."
Some background about the project and researchers involved:
CONTENTUS is a joint research project in the German Theseus program. Partners include:
•
Deutsche Nationalbibliothek (DNB)
•
Deutsche Thomson OHG (DTO)
•
Institut für Rundfunktechnik GmbH (IRT)
•
Mufin GmbH
•
Fraunhofer HHI
•
Hasso Plattner Institut
Background
The "Institut für Rundfunktechnik" (IRT, engl. Broadcast Technology Institute) is located in Munich, Germany, and serves as the primary research center for public-broadcasting organizations in Germany, Austria and Switzerland.
Team working on CONTENTUS at IRT:
Christian Fey, Birgit Schmidt and Ronald Mies.