About the project


The YAMZ Metadictionary (metadata dictionary) prototype is guided by the NSF-funded DataONE Metadata Working Group. The duration of the effort is from June 2013 to July 2014. The prototype is a proof-of-concept web-based software service acting as an open registry of metadata terms from all domains and from all parts of "metadata speech". With no login required, anyone can search for and link to registry term definitions. Anyone can register to be able to login and create terms.

We aim for the metadictionary to become a high-quality cross-domain metadata vocabulary that is directly connected to evolving user needs. Change will be rapid and affordable, with no need for panels of experts to convene and arbitrate to improve it. We expect dramatic simplification compared to the situation today, in which there is an overwhelming number of vocabularies (ontologies) to choose from.

Our hope is that users will be able to find most of the terms they need in one place (one vocabulary namespace), namely, the Metadictionary. This should minimize the need for maintaining expensive crosswalks with other vocabularies and cluttering up expressed metadata with lots of namespace qualifiers. Although it is not our central goal, the vocabulary is shovel-ready for those wishing to create linked data applications.

The source code for the YAMZ metadictionary is distributed as a Python package under the terms of a BSD license and is published on Github. Full documentation of the API can be found here.

Basic metadictionary structure

Across the dictionary there are three disjoint classes. Classification of terms into these categories is a fully-automated process based on voting and user reputation.

Vernacular


  • unstable terms; link to a vernacular term only as "work-in-progress"
  • anyone can propose new vernacular terms (with user registration)
  • proposer(s) of a term "own" it initially; only they can make changes
  • ownership can be transferred to other community members at any time
  • communities of interest spring up around related term clusters
  • vernacular terms expire in 6 months unless renewed (editing auto-renews)
  • all similar to the Internet-Draft process for Internet standards
  • Canonical


  • stable, unchanging terms, safe for long-term reference
  • terms move from vernacular to canonical by periodic approval
  • approval is a lightweight process supervised by community elders
  • terms still subject to voting or similar mechanisms
  • Deprecated


  • stable terms, but deprecated for long-term reference
  • terms move from canonical to archival by periodic approval
  • archival terms have links such as "ObsoletedBy: term X"
  • All parts of every term, including definitions, examples, and illustrations are dedicated to the public domain under the terms of the CC0 license.