Survey about metadata

An international, multi-stakeholder survey about metadata awareness, knowledge, and use in scholarly communications

One of the goals of Metadata 20/20’s Research Communications Committee was to figure out how to engage and motivate researchers to invest in the metadata supporting scholarly outputs. Early on, the committee discovered that researchers are difficult to engage because they are a diverse and busy audience, and because metadata requirements are as varied as the panoply of research fields in the current scholarly communications landscape.

With this challenge facing the committee, the group decided to take a different approach to understanding researchers. Publishers, librarians, and data repository managers each interact with researchers and work with metadata. If the researchers are hard to pin down, why not approach the infrastructure around the researchers to study perceptions of metadata encountered by researchers?

This approach proved fruitful enough to produce a viable dataset about the perceptions and use of metadata in scholarly communications. Researchers were represented in the study, as were publishers, librarians, and repository managers. A survey to each of these groups produced results that confirmed known information: librarians, publishers, and repository managers on the whole know more about metadata than researchers. It also produced surprises: researchers and repository managers are more closely aligned in metadata needs than librarians and publishers. The gap between researchers and either librarians or publishers is worth studying further. Maybe lamentations about a lack of researcher investment by librarians and publishers is due in part to librarians and publishers??

The full analysis and data sets of this work are now published and Metadata 20/20 hopes that they will provide a starting point to improve the metadata landscape in scholarly communications.

Study Background

Background: The Metadata 20/20 initiative is an ongoing effort to bring various scholarly communications stakeholder groups together to promote principles and standards of practice to improve the quality of metadata. To understand the perspectives and practices regarding metadata of the main stakeholder groups (librarians, publishers, researchers and repository managers), we conducted a survey during summer 2019. The survey content was generated by representatives from the stakeholder groups.

Methods: A link to an online survey (17 or 18 questions depending on the group) was distributed through multiple social media, listserv, and blog outlets. Responses were anonymous, with an optional entry for names and email addresses for those who were willing to be contacted later.

Results: Complete responses (N=211; 87 librarians, 27 publishers, 48 repository managers, and 49 researchers) representing 23 countries on four continents were analyzed and summarized for thematic content and ranking of awareness and practices.

Conclusions: Across the stakeholder groups, the level of awareness and usage of metadata methods and practices was highly variable. Clear gaps across the groups point to the need for consolidation of schema and practices, as well as broad educational efforts in order to increase knowledge and implementation of metadata in scholarly communications.

Study Outputs

The following outputs resulted from this study:

Survey Analysis Paper

  Analysis of the data collected during the 2019 survey. Each stakeholder group is analyzed separately and cross-stakeholder themes and insights are shared. The paper concludes with recommended areas for future study.  
  Kathryn A. Kaiser, Michelle Urberg, Maria Johnsson, Jennifer Kemp, Alice Meadows, Laura Paglione (2021). An international, multi-stakeholder survey about metadata awareness, knowledge, and use in scholarly communications. Quantitative Science Studies.  
Key Librarian Themes
  • Many librarians are educating and instructing users about metadata through classes and workshops, by individual consulting, and by web-based user guides.
  • Quality control of metadata in various systems is a major and time-consuming task for many librarians.
  • Librarian metadata work is spread across multiple systems including, library catalogs and discovery layers for item-level cataloguing or in Current Research Information Systems (CRIS) systems or other repositories where they check researchers’ publications.
  • Metadata is important: “… we aim to ensure that metadata is fit for purpose, well maintained and widely disseminated”—one of many statements indicating the importance of metadata according to librarians.
Key Publisher Themes
  • Publisher respondents say a lack of understanding of the benefits of metadata for end users and for discovery as the most critical issues for authors, although they de-emphasized promotion and guidance as an area for education support.
  • The publisher respondents reported limited or no opportunities for authors to update their own metadata after submitting it.
  • Because of wide variability in quality methods, metadata quality and completeness are likely to vary widely
Key Repository Manager Themes
  • Repositories are found in a lot of different corners of the scholarly communications life cycle and no two repositories are alike.
  • They have different metadata needs with respect to what pieces of content are necessary and important for a repository to record and make accessible deposited content.
  • Content intervention by humans is still required in repositories to address gaps in controlled vocabularies, but opportunities exist for automation to assist content and metadata ingestion.
Key Researcher Themes
  • The response patterns reflect a limited perspective of researchers as consumers of metadata, or specific awareness or expressed needs about metadata for other researcher purposes.
  • The reported common uses of metadata reflect the position of researcher as a creator of content, but mostly for human consumption in small volumes rather than for machine readability or large volume work.
  • Full access rights and information about usage restrictions is top of mind for researchers. Although few questions asked about this directly, free-text and provided comments frequently mentioned these topics.

Survey Methods and Summary Results

  This appendix represents an analyzed data set of the raw data collected for the Metadata 20/20 survey. The purpose of the data summary is to draw attention to the unique responses of each stakeholder group. The authors of this article jointly analyzed and developed this analysis.  
  Kaiser, K., Urberg, M., Johnsson, M., Kemp, J., Meadows, A., & Paglione, L. (2021). Metadata 2020 metadata usage survey methods and results summary (Version 0.1.0). Zenodo.  

Survey Questions

  This document records the finalized version of the survey that was distributed, following IRB approval, to a broad number of communities. The questions were answered by participants self-identifying as researchers, publishers, librarians, and repository managers. The purpose of these questions was to assess how metadata is understood by stakeholders in the scholarly communications life cycle. To the designers’ best knowledge, no other survey had previously attempted to analyze knowledge about, and perceptions of metadata associated with publication. The survey was conceived as a primary output by the Metadata 20/20 Researcher Communications Project and the survey instrument was developed with the assistance of stakeholder groups active in Metadata 20/20.  
  Metadata 2020, Kaiser, K., Urberg, M., Johnsson, M., Kemp, J., Meadows, A., & Paglione, L. (2021). Metadata 2020 metadata usage survey questions (Version 0.1.0). Zenodo.  

Raw Survey Data

  This document contains the Excel spreadsheet with all of the raw, unanalyzed responses to the Metadata 20/20 Survey. Each stakeholder group is represented with its own tab. In addition, one tab contains all of the responses, another the demographics for the survey. The responses in these tabs may include incomplete responses. Two other tabs contain comparisons of how each stakeholder group has self-prioritized metadata fields and metadata schema, which are discussed in this paper further. The responses from each of the stakeholder groups consists of 87 librarians, 27 publishers, 48 repository managers, and 49 researchers from 23 countries.  
  Paglione, L., Kaiser, K., Urberg, M., Johnsson, M., Kemp, J., & Meadows, A. (2021). Data from: An international, multi-stakeholder survey about metadata awareness, knowledge, and use in scholarly communications [Excel spreadsheet]. Dryad, Dataset.