The Metadata 2020 Project Thread
Metadata 2020 has recently initiated six projects that form a unified framework supporting the metadata improvement goals and aspirations of the project:
-
Researcher Communication - Exploring ways to align efforts between communities that aim to increase the impact and consistency of communication with researchers about metadata.
-
Metadata Recommendation and Evaluation Mappings - To converge communities and publishers towards a shared set of recommended metadata concepts with related mappings between those recommended concepts and elements in important dialects.
-
Defining the Terms We Use About Metadata - In order to communicate effectively about anything, a common language must be acknowledged, tacitly or purposefully. In the metadata space, there is not agreement on what words like ‘property’, ’term’, ‘concept’, ‘schema’, or ’title’ refer to. This project will develop a glossary of words associated with metadata, both for core concepts and disciplinary areas.
-
Incentives for Improving Metadata Quality - to highlight downstream applications and value of metadata for all parts of the community, telling real stories as evidence of how better metadata will meet their goals.
-
Shared Best Practices and Principles - To build a set of high level best practices for using metadata across the scholarly communication cycle, in order to facilitate interoperability and easier exchange of information and data across the stakeholders in the process.
-
Metadata Evaluation and Guidance - To identify and compare existing metadata evaluation tools and mechanisms for connecting the results of those evaluations to clear, cross-community guidance.
A schematic diagram of the framework that connects these projects is shown in Figure 1.
Real metadata improvements depend critically on identifying meaningful motivations for communities that contribute to and benefit from those improvements. Identifying and elucidating these motivators is the goal of Project 4 which underlies the entire Metadata 2020 effort.
Initial assessments of these motivating factors across Metadata 2020 emerged from a series of community meetings held during the last year. They included:
-
Increased/improved/optimised discoverability and utility of research products (papers, datasets, software, etc.)
-
Efficient scholarly communications for all stakeholder communities
-
Interoperability across multiple disciplines, standards and systems
-
Innovation and development across sectors
-
Improved research integrity through the use of persistent identifiers to disambiguate authors/institutions/funders
-
Define and understand the value propositions of quality metadata
-
Enhance opportunities for collaboration with other stakeholders
The communities convened during these initial meetings (researchers, publisher, librarians, data publishers and repositories, services platforms and tools, and funders) included disciplines and sectors spanning the academic publishing universe. These communities are shown in the upper left corner of Figure 1 and they create metadata recommendations that reflect the use cases and motivators that are important to them. Those recommendations can be conceptual, or they can be associated with a particular metadata representation or dialect.
The metadata recommendations reflect community values and needs and, therefore, can be important indicators of the motivators being identified in Project 4. They also play a critical role in identifying commonalities between those motivators. Typically, the recommendations reflect terminology used in the community that creates the recommendation and these vocabularies vary. The first goal of Project 2 is to identify concepts that are shared across recommendations and create mappings that show those connections. The fidelity of these connections can vary, so a language like SKOS can be helpful in describing them. In some cases, like the one described below, the connections are more straightforward.
Many community recommendations are closely associated with metadata dialects or specific representations like XML, RDF, or JSON and they include a mapping from the recommended concepts to specific elements in these representations. When identical (or similar) concepts are connected across recommendations, the related elements can be connected in order to support translations from one dialect to another, typically done programmatically, e.g. using XSLT. These element mappings are the second part of Project 2.
Once the set of elements associated with a particular recommendation and dialect are known, collections of metadata records written in that dialect can be evaluated in terms of that recommendation. Tools and techniques for implementing these evaluations, typically focusing on different aspects of the metadata (completeness, consistency, or quality), are the purview of Project 6.
Evaluation results are only helpful to metadata providers if they are connected to clear guidance around shared best practices and principles. This is the connection between Projects 5 and 6. Metadata collections are evaluated in terms of community recommendations and the results are connected to guidance that describes best practices for addressing the recommendations. Those best practices are then communicated back to researchers and metadata providers (Project 5) in the context of the incentives (Project 4) using a consistent vocabulary identified and described in Project 3 (the second framework spanning project).
A Real-World Example
A real-world example can help clarify these projects and the connections between them. Start with an incentive (P4): there is a growing need to ensure that researchers get credit for developing software that is used in their own research or, even more important, shared with others. Metadata about software can facilitate discovery and citation of this software and is, therefore, an important part of the solution to this problem. A working group representing a number of communities made a recommendation for content that should be included in metadata about software (FORCE11). This recommendation was completely conceptual – the authors did not connect it to any specific implementation.
The concepts included in the FORCE11 recommendation are listed in column 1 of Table 1. A Metadata Improvement and Guidance Project funded by NSF (MetaDIG) created a set of concepts from multiple recommendations and the FORCE11 concepts were mapped to those names (column 2 in Table 1). Concept descriptions from the MetaDIG project are given in column 3 of Table 1. Note that these names are intentionally general, i.e. they refer to concepts used to document many kinds of resources (complete list of MetaDIG concepts). In this case, the concepts are simple, and the mappings are straightforward. Some important supplementary concepts were added to the FORCE11 list from MetaDIG.
FORCE11 Concept | MetaDig Concept | Description |
---|---|---|
Unique Identifier | Resource Identifier | Identifier for the resource described by the metadata |
Resource Identifier Type | The type of identifier used to uniquely identify the resource. | |
Software Name | Resource Title | A short description of the resource. The title should be descriptive enough so that when a user is presented with a list of titles the general content of the data set can be determined. |
Author(s) | Author / Originator | The principal author of the resource |
Contributor Name | Contributor to the resource | |
Contributor Role | Contributor Role | The role of any individuals or institutions that contributed to the creation of the data. |
Version Number | Resource Version | Version of the cited resource |
Release Date | Publication Date | Date of publication of the cited resource |
Location/Repository | Publisher | Publisher of the cited resource |
Software License | Rights | Information about rights held in and over the resource |
Description | Abstract | A paragraph describing the resource. |
Keywords | Theme Keyword | A word or phrase that describes some (typically high-level) aspect of a resource. |
Keywords | Keyword | A word or phrase that describes some aspect of a resource. Can be one of several types. |
Keyword Vocabulary | If you are following a guideline or using a shared vocabulary for the words/phrases in your “keywords” attribute, put the name of that guideline here. |
Table 1 illustrates the first part of P2 - metadata recommendation mapping. The second part of P2 requires a mapping of the FORCE11 concepts to an implementation(s). The concepts were mapped to [Version 4.1 of the DataCite metadata schema](https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdf) by DataCite, creating an implementation of the FORCE11 recommendation in the DataCite dialect.
The second part of P2 (element mappings) can now be accomplished by identifying or defining 1) mappings of the same concepts to other dialects or 2) mappings or crosswalks of the DataCite metadata elements to other dialects. The first approach was taken in the MetaDIG project and the current mappings to several other dialects are shown in Table 2, which lists the concepts and descriptions along with xPaths to elements that represent the concepts in six dialects from several disciplines. These mappings could be used to specify transforms for these concepts between any of these six dialects.
It is interesting to note that these mappings allow us to infer interest in the FORCE11 Software Guidelines across the diverse communities and use cases that motivated the creation of these dialects. Six of the fourteen concepts (43%) are included in all six dialects while five concepts (36%) are included in five dialects, three concepts (21%) in four dialects, and one concept (7%) in only one dialect. These differences may reflect real differences between communities or they may indicate that a more careful look at the mappings is required (certainly a possibility). In any case, these differences also indicate potential topics for the inter-community comparisons that the Metadata 2020 Project is taking on.
Conclusions
We have presented a unifying framework for six metadata improvement projects recently initiated by the Metadata 2020 Project. The framework casts mappings between recommended metadata concepts and elements in terms of incentives for metadata improvements expressed in a common vocabulary and shared with metadata creators and users. The project already includes amazing expertise from the entire metadata universe. Please join us if you can contribute experience and ideas for moving forward.
FORCE11 Software Citation Guidelines
FORCE11 created software citation principles during 2016. These were mapped to DataCite in Appendix 5 of the DataCite 4.1 Schema description.
Concept | Description | Dialect Paths |
---|---|---|
Resource Identifier | Identifier for the resource described by the metadata | DCAT /dct:identifier DCITE /dcite:resource/dcite:identifier[identifierType=“DOI”] | /dcite:resource/dcite:alternateIdentifiers/dcite:alternateIdentifier EML /eml:eml/@packageId HCLS idot:preferredPrefix HCLS idot:alternatePrefix ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:citation/ cit:CI_Citation/cit:identifier/mcc:MD_Identifier/ mcc:code// JATS /article/front/article-meta/article-id JATS /article/back/ref-list/ref/element-citation/pub-id |
Resource Identifier Type | The type of identifier used to uniquely identify the resource. | DCITE /dcite:resource/dcite:identifier/@identifierType EML /eml:eml/@system JATS /article/front/article-meta/article-id/@pub-id-type JATS /article/back/ref-list/ref/element-citation/pub-id/@pub-id-type |
Resource Title | A short description of the resource. The title should be descriptive enough so that when a user is presented with a list of titles the general content of the data set can be determined. | DCAT /dct:title DCITE /dcite:resource/dcite:titles/dcite:title EML /eml:eml//title HCLS dct:title ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:citation/cit:CI_Citation/cit:title//* JATS /article/front/article-meta/title-group/article-title JATS /article/back/ref-list/ref/element-citation|mixed-citation/data-title |
Author / Originator | The principal author of the resource | DCITE /dcite:resource/dcite:creators/dcite:creator/* EML /eml:eml//creator HCLS dct:creator ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:citation/cit:CI_Citation/ cit:citedResponsibleParty/cit:CI_Responsibility [normalize-space(cit:role/cit:CI_RoleCode)=‘author’] ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:citation/cit:CI_Citation/ cit:citedResponsibleParty/cit:CI_Responsibility [normalize-space(cit:role/cit:CI_RoleCode)=‘originator’] JATS /article/front/article-meta/contrib-group/contrib[@contrib-type=“author”] JATS /article/back/ref-list/ref/element-citation/person-group [@person-group-type=‘author’]/ |
Contributor Name | Contributor to the resource | DCITE /dcite:resource/dcite:contributors/dcite:contributor/dcite:contributorName EML /eml:eml//associatedParty HCLS dct:contributor ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:citation/ cit:CI_Citation/cit:citedResponsibleParty/cit:CI_Responsibility[not (normalize-space(cit:role/cit:CI_RoleCode)[.=‘author’ or .=‘principalInvestigator’ or .=‘originator’])]/[contains(name(),‘Name’)] JATS /article/front/article-meta/contrib-group/contrib[@contrib-type!= “author” and @contrib-type!=“editor”] JATS /article/back/ref-list/ref/element-citation|mixed-citation/> person-group[@person-group-type!=“author”]/name/ |
Contributor Role | The role of any individuals or institutions that contributed to the creation of the data. | DCITE /dcite:resource/dcite:contributors/dcite:contributor/@contributorType EML /eml:eml///role ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/*/mri:citation/cit:CI_Citation/ cit:citedResponsibleParty /cit:CI_Responsibility[not(normalize-space(cit:role/cit:CI_RoleCode)[.=‘author’ or .= ‘principalInvestigator’ or .=‘originator’])]/cit:role/cit:CI_RoleCode JATS //contrib/@contrib-type JATS /article/back/ref-list/ref/element-citation|mixed-citation/person-group/ @person-group-type |
Resource Version | Version of the cited resource | DCITE /dcite:resource/dcite:version HCLS pav:version ISO-1 /mdb:MD_Metadata/mdb:identificationInfo/mri:MD_DataIdentification/ mri:citation/cit:CI_Citation/cit:edition//* JATS /article/back/ref-list/ref/element-citation|mixed-citation/version |
Publication Date | Date of publication of the cited resource | **DCAT **/dct:issued DCITE /dcite:resource/dcite:publicationYear EML /eml:eml//pubDateHCLS /dct:issued ISO-1 //cit:CI_Citation/cit:date/ cit:CI_Date[cit:dateType/cit:CI_DateTypeCode=‘publication’]/cit:date/gco:DateTime JATS /article/front/article-meta/pub-date[@date-type=‘original-publication’ or @date-type=‘update’]/ JATS /article/back/ref-list/ref/element-citation|mixed-citation/year |
Publisher | Publisher of the cited resource | DCAT /dct:publisher DCITE /dcite:resource/dcite:publisher EML /eml:eml//publisherHCLS dct:publisher ISO-1 //cit:CI_Responsibility[normalize-space(cit:role/cit:CI_RoleCode)=‘publisher’]/ cit:party/cit:CI_Organisation/cit:name// JATS /article/front/journal-meta/publisher/publisher-name JATS /article/back/ref-list/ref/element-citation/publisher-name |
Rights | Information about rights held in and over the resource | DCAT dct:licenseDCAT dct:rights DCITE /dcite:resource/dcite:rightsList/dcite:rights EML /eml:eml//intellectualRights HCLS dct:license HCLS dct:rights ISO-1 /mdb:MD_Metadata/mdb:identificationInfo// mri:resourceConstraints/mco:MD_LegalConstraints JATS /article/front/article-meta/permissions/* |
Abstract | A paragraph describing the resource. | DCAT /dct:description DCITE /dcite:resource/dcite:descriptions/dcite:description[@descriptionType=‘Abstract’]/* EML /eml:eml//abstractHCLS dct:description ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:abstract//* JATS /article/front/article-meta/abstract |
Theme Keyword | A word or phrase that describes some (typically high-level) aspect of a resource. Note: The general identification keywords usually have a type of “theme” and are referred to as “theme keywords”. Other types and vocabularies are used for other information. |
DCAT /dct:keyword DCITE /dcite:resource/dcite:subjects/dcite:subject EML /eml:eml//keywordSet/keyword[@keywordType=‘thematic’] HCLS dcat:keyword ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/mri:MD_Keywords [normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘theme’]/mri:keyword//* JATS /article/front/article-meta/kwd-group/kwd JATS /article/front/article-meta/article-categories/subj-group/subject |
Keyword | A word or phrase that describes some aspect of a resource. Can be one of several types. | DCAT /dct:keyword DCITE /dcite:resource/dcite:subjects/dcite:subject EML /eml:eml//keywordSet/keyword[not(contains(@keywordType,‘place’)) and not(contains(@keywordType,‘place’)) and not(contains(@keywordType,’thematic’)) and not(contains(@keywordType,’temporal’)) and not(contains(@keywordType,‘discipline’)) and not(contains(@keywordType,‘stratum’)) and not(contains(@keywordType,’taxonomic’))] EML /eml:eml//keywordSet/keyword[@keywordType=‘place’] EML /eml:eml//keywordSet/keyword[@keywordType=‘taxonomic’] EML /eml:eml//keywordSet/keyword[@keywordType=‘thematic’] EML /eml:eml//keywordSet/keyword[@keywordType=‘temporal’] EML /eml:eml//keywordSet/keyword[@keywordType=‘discipline’] EML /eml:eml//keywordSet/keyword[@keywordType=‘stratu’] HCLS dcat:keyword ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/mri:MD_Keywords [normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘theme’]/mri:keyword//* ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/mri:MD_Keywords [normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘place’]/mri:keyword// ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:extent/gex:EX_Extent/gex:geographicElement/ gex:EX_GeographicDescription/gex:geographicIdentifier/mcc:MD_Identifier/mcc:code// ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/mri:MD_Keywords [normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘instrument’]/mri:keyword// ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/ mri:MD_Keywords[normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘platform’]/mri:keyword// ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/mri:MD_Keywords [normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘project’]/mri:keyword// JATS /article/front/article-meta/kwd-group/kwd **JATS **/article/front/article-meta/article-categories/subj-group/subject |
Keyword Vocabulary | If you are following a guideline or using a shared vocabulary for the words/phrases in your “keywords” attribute, put the name of that guideline here. | DCITE /dcite:resource/dcite:subjects/dcite:subject/@subjectScheme EML /eml:eml//keywordSet/keywordThesaurus HCLS void:vocabulary ISO-1 /mdb:MD_Metadata/mdb:identificationInfo//mri:descriptiveKeywords/mri:MD_Keywords [normalize-space(mri:type/mri:MD_KeywordTypeCode)=‘theme’]/mri:thesaurusName/cit:CI_Citation |
DCAT: Data Catalog Vocabulary, DCITE: DataCite Metadata Schema V4.1, EML: Ecology Metadata Language, HCLS: Dataset Descriptions: HCLS Community Profile, ISO-1: ISO 19115-1, JATS: Journal Article Tag Suite