The Metadata 2020 Project Thread

The Metadata 2020 Project Thread

Metadata 2020 has recently initiated six projects that form a unified framework supporting the metadata improvement goals and aspirations of the project:

  1. Researcher Communication - Exploring ways to align efforts between communities that aim to increase the impact and consistency of communication with researchers about metadata.

  2. Metadata Recommendation and Evaluation Mappings - To converge communities and publishers towards a shared set of recommended metadata concepts with related mappings between those recommended concepts and elements in important dialects.

  3. Defining the Terms We Use About Metadata - In order to communicate effectively about anything, a common language must be acknowledged, tacitly or purposefully. In the metadata space, there is not agreement on what words like ‘property’, ’term’, ‘concept’, ‘schema’, or ’title’ refer to. This project will develop a glossary of words associated with metadata, both for core concepts and disciplinary areas.

  4. Incentives for Improving Metadata Quality - to highlight downstream applications and value of metadata for all parts of the community, telling real stories as evidence of how better metadata will meet their goals.

  5. Shared Best Practices and Principles - To build a set of high level best practices for using metadata across the scholarly communication cycle, in order to facilitate interoperability and easier exchange of information and data across the stakeholders in the process.

  6. Metadata Evaluation and Guidance - To identify and compare existing metadata evaluation tools and mechanisms for connecting the results of those evaluations to clear, cross-community guidance.

A schematic diagram of the framework that connects these projects is shown in Figure 1.

Figure 1 - Threads

Real metadata improvements depend critically on identifying meaningful motivations for communities that contribute to and benefit from those improvements. Identifying and elucidating these motivators is the goal of Project 4 which underlies the entire Metadata 2020 effort.

Initial assessments of these motivating factors across Metadata 2020 emerged from a series of community meetings held during the last year. They included:

  • Increased/improved/optimised discoverability and utility of research products (papers, datasets, software, etc.)

  • Efficient scholarly communications for all stakeholder communities

  • Interoperability across multiple disciplines, standards and systems

  • Innovation and development across sectors

  • Improved research integrity through the use of persistent identifiers to disambiguate authors/institutions/funders

  • Define and understand the value propositions of quality metadata

  • Enhance opportunities for collaboration with other stakeholders

The communities convened during these initial meetings (researchers, publisher, librarians, data publishers and repositories, services platforms and tools, and funders) included disciplines and sectors spanning the academic publishing universe. These communities are shown in the upper left corner of Figure 1 and they create metadata recommendations that reflect the use cases and motivators that are important to them. Those recommendations can be conceptual, or they can be associated with a particular metadata representation or dialect.

The metadata recommendations reflect community values and needs and, therefore, can be important indicators of the motivators being identified in Project 4. They also play a critical role in identifying commonalities between those motivators. Typically, the recommendations reflect terminology used in the community that creates the recommendation and these vocabularies vary. The first goal of Project 2 is to identify concepts that are shared across recommendations and create mappings that show those connections. The fidelity of these connections can vary, so a language like SKOS can be helpful in describing them. In some cases, like the one described below, the connections are more straightforward.

Many community recommendations are closely associated with metadata dialects or specific representations like XML, RDF, or JSON and they include a mapping from the recommended concepts to specific elements in these representations. When identical (or similar) concepts are connected across recommendations, the related elements can be connected in order to support translations from one dialect to another, typically done programmatically, e.g. using XSLT. These element mappings are the second part of Project 2.

Once the set of elements associated with a particular recommendation and dialect are known, collections of metadata records written in that dialect can be evaluated in terms of that recommendation. Tools and techniques for implementing these evaluations, typically focusing on different aspects of the metadata (completeness, consistency, or quality), are the purview of Project 6.

Evaluation results are only helpful to metadata providers if they are connected to clear guidance around shared best practices and principles. This is the connection between Projects 5 and 6. Metadata collections are evaluated in terms of community recommendations and the results are connected to guidance that describes best practices for addressing the recommendations. Those best practices are then communicated back to researchers and metadata providers (Project 5) in the context of the incentives (Project 4) using a consistent vocabulary identified and described in Project 3 (the second framework spanning project).

A Real-World Example

A real-world example can help clarify these projects and the connections between them. Start with an incentive (P4): there is a growing need to ensure that researchers get credit for developing software that is used in their own research or, even more important, shared with others. Metadata about software can facilitate discovery and citation of this software and is, therefore, an important part of the solution to this problem. A working group representing a number of communities made a recommendation for content that should be included in metadata about software (FORCE11). This recommendation was completely conceptual – the authors did not connect it to any specific implementation.

The concepts included in the FORCE11 recommendation are listed in column 1 of Table 1. A Metadata Improvement and Guidance Project funded by NSF (MetaDIG) created a set of concepts from multiple recommendations and the FORCE11 concepts were mapped to those names (column 2 in Table 1). Concept descriptions from the MetaDIG project are given in column 3 of Table 1. Note that these names are intentionally general, i.e. they refer to concepts used to document many kinds of resources (complete list of MetaDIG concepts). In this case, the concepts are simple, and the mappings are straightforward. Some important supplementary concepts were added to the FORCE11 list from MetaDIG.

FORCE11 Concept MetaDig Concept Description
Unique Identifier Resource Identifier Identifier for the resource described by the metadata
Resource Identifier Type The type of identifier used to uniquely identify the resource.
Software Name Resource Title A short description of the resource. The title should be descriptive enough so that when a user is presented with a list of titles the general content of the data set can be determined.
Author(s) Author / Originator The principal author of the resource
Contributor Name Contributor to the resource
Contributor Role Contributor Role The role of any individuals or institutions that contributed to the creation of the data.
Version Number Resource Version Version of the cited resource
Release Date Publication Date Date of publication of the cited resource
Location/Repository Publisher Publisher of the cited resource
Software License Rights Information about rights held in and over the resource
Description Abstract A paragraph describing the resource.
Keywords Theme Keyword A word or phrase that describes some (typically high-level) aspect of a resource.
Keywords Keyword A word or phrase that describes some aspect of a resource. Can be one of several types.
Keyword Vocabulary If you are following a guideline or using a shared vocabulary for the words/phrases in your “keywords” attribute, put the name of that guideline here.


Table 1 illustrates the first part of P2 - metadata recommendation mapping. The second part of P2 requires a mapping of the FORCE11 concepts to an implementation(s). The concepts were mapped to [Version 4.1 of the DataCite metadata schema](https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdf) by DataCite, creating an implementation of the FORCE11 recommendation in the DataCite dialect.

The second part of P2 (element mappings) can now be accomplished by identifying or defining 1) mappings of the same concepts to other dialects or 2) mappings or crosswalks of the DataCite metadata elements to other dialects. The first approach was taken in the MetaDIG project and the current mappings to several other dialects are shown in Table 2, which lists the concepts and descriptions along with xPaths to elements that represent the concepts in six dialects from several disciplines. These mappings could be used to specify transforms for these concepts between any of these six dialects.

It is interesting to note that these mappings allow us to infer interest in the FORCE11 Software Guidelines across the diverse communities and use cases that motivated the creation of these dialects. Six of the fourteen concepts (43%) are included in all six dialects while five concepts (36%) are included in five dialects, three concepts (21%) in four dialects, and one concept (7%) in only one dialect. These differences may reflect real differences between communities or they may indicate that a more careful look at the mappings is required (certainly a possibility). In any case, these differences also indicate potential topics for the inter-community comparisons that the Metadata 2020 Project is taking on.

Conclusions

We have presented a unifying framework for six metadata improvement projects recently initiated by the Metadata 2020 Project. The framework casts mappings between recommended metadata concepts and elements in terms of incentives for metadata improvements expressed in a common vocabulary and shared with metadata creators and users. The project already includes amazing expertise from the entire metadata universe. Please join us if you can contribute experience and ideas for moving forward.

FORCE11 Software Citation Guidelines

FORCE11 created software citation principles during 2016. These were mapped to DataCite in Appendix 5 of the DataCite 4.1 Schema description.

Concept Description Dialect Paths
Resource Identifier Identifier for the resource described by the metadata DCAT /dct:identifier
DCITE /dcite:resource/dcite:identifier[identifierType=“DOI”] |
/dcite:resource/dcite:alternateIdentifiers/dcite:alternateIdentifier
EML /eml:eml/@packageId
HCLS idot:preferredPrefix
HCLS idot:alternatePrefix
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:citation/
cit:CI_Citation/cit:identifier/mcc:MD_Identifier/
mcc:code//

JATS /article/front/article-meta/article-id
JATS /article/back/ref-list/ref/element-citation/pub-id
Resource Identifier Type The type of identifier used to uniquely identify the resource. DCITE /dcite:resource/dcite:identifier/@identifierType
EML /eml:eml/@system
JATS /article/front/article-meta/article-id/@pub-id-type
JATS /article/back/ref-list/ref/element-citation/pub-id/@pub-id-type
Resource Title A short description of the resource. The title should be descriptive enough so that when a user is presented with a list of titles the general content of the data set can be determined. DCAT /dct:title
DCITE /dcite:resource/dcite:titles/dcite:title
EML /eml:eml//title
HCLS dct:title
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/
/mri:citation/cit:CI_Citation/cit:title//*
JATS /article/front/article-meta/title-group/article-title
JATS /article/back/ref-list/ref/element-citation|mixed-citation/data-title
Author / Originator The principal author of the resource DCITE /dcite:resource/dcite:creators/dcite:creator/*
EML /eml:eml//creator
HCLS dct:creator
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/
/mri:citation/cit:CI_Citation/
cit:citedResponsibleParty/cit:CI_Responsibility
[normalize-space(cit:role/cit:CI_RoleCode)=‘author’]
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:citation/cit:CI_Citation/
cit:citedResponsibleParty/cit:CI_Responsibility
[normalize-space(cit:role/cit:CI_RoleCode)=‘originator’]
JATS /article/front/article-meta/contrib-group/contrib[@contrib-type=“author”]
JATS /article/back/ref-list/ref/element-citation/person-group
[@person-group-type=‘author’]/
Contributor Name Contributor to the resource DCITE /dcite:resource/dcite:contributors/dcite:contributor/dcite:contributorName
EML /eml:eml//associatedParty HCLS dct:contributor
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/
/mri:citation/
cit:CI_Citation/cit:citedResponsibleParty/cit:CI_Responsibility[not
(normalize-space(cit:role/cit:CI_RoleCode)[.=‘author’ or .=‘principalInvestigator’
or .=‘originator’])]/[contains(name(),‘Name’)]
JATS /article/front/article-meta/contrib-group/contrib[@contrib-type!=
“author” and @contrib-type!=“editor”]
JATS /article/back/ref-list/ref/element-citation|mixed-citation/>
person-group[@person-group-type!=“author”]/name/
Contributor Role The role of any individuals or institutions that contributed to the creation of the data. DCITE /dcite:resource/dcite:contributors/dcite:contributor/@contributorType
EML /eml:eml///role
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/*/mri:citation/cit:CI_Citation/
cit:citedResponsibleParty
/cit:CI_Responsibility[not(normalize-space(cit:role/cit:CI_RoleCode)[.=‘author’ or .=
‘principalInvestigator’ or .=‘originator’])]/cit:role/cit:CI_RoleCode
JATS //contrib/@contrib-type
JATS /article/back/ref-list/ref/element-citation|mixed-citation/person-group/
@person-group-type
Resource Version Version of the cited resource DCITE /dcite:resource/dcite:version
HCLS pav:version
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/mri:MD_DataIdentification/
mri:citation/cit:CI_Citation/cit:edition//*
JATS /article/back/ref-list/ref/element-citation|mixed-citation/version
Publication Date Date of publication of the cited resource **DCAT **/dct:issued
DCITE /dcite:resource/dcite:publicationYear
EML /eml:eml//pubDateHCLS /dct:issued
ISO-1 //cit:CI_Citation/cit:date/
cit:CI_Date[cit:dateType/cit:CI_DateTypeCode=‘publication’]/cit:date/gco:DateTime
JATS /article/front/article-meta/pub-date[@date-type=‘original-publication’ or @date-type=‘update’]/

JATS /article/back/ref-list/ref/element-citation|mixed-citation/year
Publisher Publisher of the cited resource DCAT /dct:publisher
DCITE /dcite:resource/dcite:publisher
EML /eml:eml//publisherHCLS dct:publisher
ISO-1 //cit:CI_Responsibility[normalize-space(cit:role/cit:CI_RoleCode)=‘publisher’]/
cit:party/cit:CI_Organisation/cit:name//

JATS /article/front/journal-meta/publisher/publisher-name
JATS /article/back/ref-list/ref/element-citation/publisher-name
Rights Information about rights held in and over the resource DCAT dct:licenseDCAT dct:rights
DCITE /dcite:resource/dcite:rightsList/dcite:rights
EML /eml:eml//intellectualRights
HCLS dct:license
HCLS dct:rights
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/
/
mri:resourceConstraints/mco:MD_LegalConstraints
JATS /article/front/article-meta/permissions/*
Abstract A paragraph describing the resource. DCAT /dct:description
DCITE /dcite:resource/dcite:descriptions/dcite:description[@descriptionType=‘Abstract’]/*
EML /eml:eml//abstractHCLS dct:description
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/
/mri:abstract//*
JATS /article/front/article-meta/abstract
Theme Keyword A word or phrase that describes some (typically high-level) aspect of a resource.

Note: The general identification keywords usually have a type of “theme” and are referred to as “theme keywords”. Other types and vocabularies are used for other information.
DCAT /dct:keyword DCITE /dcite:resource/dcite:subjects/dcite:subject
EML /eml:eml//keywordSet/keyword[@keywordType=‘thematic’]
HCLS dcat:keyword
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/
/mri:descriptiveKeywords/mri:MD_Keywords
[normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘theme’]/mri:keyword//*
JATS /article/front/article-meta/kwd-group/kwd
JATS /article/front/article-meta/article-categories/subj-group/subject
Keyword A word or phrase that describes some aspect of a resource. Can be one of several types. DCAT /dct:keyword
DCITE /dcite:resource/dcite:subjects/dcite:subject
EML /eml:eml//keywordSet/keyword[not(contains(@keywordType,‘place’)) and not(contains(@keywordType,‘place’))
and not(contains(@keywordType,’thematic’)) and not(contains(@keywordType,’temporal’))
and not(contains(@keywordType,‘discipline’)) and not(contains(@keywordType,‘stratum’))
and not(contains(@keywordType,’taxonomic’))]
EML /eml:eml/
/keywordSet/keyword[@keywordType=‘place’]
EML /eml:eml//keywordSet/keyword[@keywordType=‘taxonomic’]
EML /eml:eml/
/keywordSet/keyword[@keywordType=‘thematic’]
EML /eml:eml//keywordSet/keyword[@keywordType=‘temporal’]
EML /eml:eml/
/keywordSet/keyword[@keywordType=‘discipline’]
EML /eml:eml//keywordSet/keyword[@keywordType=‘stratu’]
HCLS dcat:keyword
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/
/mri:descriptiveKeywords/mri:MD_Keywords
[normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘theme’]/mri:keyword//*
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/mri:MD_Keywords
[normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘place’]/mri:keyword//

ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:extent/gex:EX_Extent/gex:geographicElement/
gex:EX_GeographicDescription/gex:geographicIdentifier/mcc:MD_Identifier/mcc:code//

ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/mri:MD_Keywords
[normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘instrument’]/mri:keyword//

ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/
mri:MD_Keywords[normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘platform’]/mri:keyword//

ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/mri:MD_Keywords
[normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘project’]/mri:keyword//

JATS /article/front/article-meta/kwd-group/kwd
**JATS **/article/front/article-meta/article-categories/subj-group/subject
Keyword Vocabulary If you are following a guideline or using a shared vocabulary for the words/phrases in your “keywords” attribute, put the name of that guideline here. DCITE /dcite:resource/dcite:subjects/dcite:subject/@subjectScheme
EML /eml:eml//keywordSet/keywordThesaurus
HCLS void:vocabulary
ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/
/mri:descriptiveKeywords/mri:MD_Keywords
[normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘theme’]/mri:thesaurusName/cit:CI_Citation


DCAT: Data Catalog Vocabulary, DCITE: DataCite Metadata Schema V4.1, EML: Ecology Metadata Language, HCLS: Dataset Descriptions: HCLS Community Profile, ISO-1: ISO 19115-1, JATS: Journal Article Tag Suite