We’ve been investigating options for storage and distribution of citation data in the Biodiversity Heritage Library. In particular, we are searching for an appropriate "core" format. The thought is that with an appropriately verbose, open, standard core format for our citations, we can transform that format into whatever other format we might want to support. By “verbose”, we mean a format that can support all of the information that we need to preserve. By “open”, we’re looking for a format that’s not tied exclusively to one system or vendor. And by “standard”, we’re hoping to identify a format that is widely recognized by the library community.
Some of the information found in this Wikipedia article has guided the research: http://en.wikipedia.org/wiki/Comparison_of_reference_management_software. Specifically, the information found there about which formats are supported in each of the various applications is useful.
Following is a brief description of the format candidates we’ve investigated, as well as our preliminary conclusions.
- The following formats appear (at the first look) to be the most open, verbose, and recognized formats.
METS/MODS - Library of Congress standards
http://www.loc.gov/standards/mets/
http://www.loc.gov/standards/mods/ - examples can be found under the "Guidance" section
NLM – National Library of Medicine format
http://dtd.nlm.nih.gov/ - DTDs
http://www.ncbi.nlm.nih.gov/staff/beck/citations/citationtags.html - examples
EndNote (RIS/XML) – this seems to be the most widely adopted format
http://www.endnote.com/support/ensupport.asp - XML DTD is here
http://refdb.sourceforge.net/manual-0.9.4/c2166.html - RIS format description - The following format is also a possibility, but it may be overly complex for our needs.
RDF
http://en.wikipedia.org/wiki/Resource_Description_Framework
http://www.w3.org/RDF/ - Here are other formats that have been looked at, but appear to be deficient in one way or another.
UniXRef – this is the XML format CrossRef returns from their OpenURL resolver.
The verbosity of this format is good; it appears that a document using this format it could contain all of the information that we require. However, it is unclear how much this format has been adopted outside of specialized custom applications.
http://www.crossref.org/help/Content/04_Queries_and_retrieving/Bulk%20metadata%20distribution.htm – schema found by clicking on “Unified XML Schema – Overview”
CoiNS - not widely adopted
MARC - doesn't support article-level metadata (pages, etc)
Dublin Core - Not detailed enough
BibTeX - Not detailed enough
OAI outputs – only a few defined outputs, which happen to be formats that are defined elsewhere (Dublin Core, RFC1807, MARC)
RefWorks - too proprietary
If you have experience with one or more of these formats and would like to help us make our decision, please post your comments below.
Mike Lichtenberg
Missouri Botanical Garden
