bibTeX Definition in Web Ontology Language (OWL) Version 0.1

Working Draft

This version:
http://zeitkunst.org/bibtex/0.1/
Latest version:
http://zeitkunst.org/bibtex/0.1/
Editor:
Nick Knouf, MIT <nknouf@mit.edu>

NOTICE:

This is an early draft of the bibTeX definition in OWL. It is subject to change.

Abstract

As the semantic web grows, there is the need for more and more formal ontology definitions in standard languages such as the Web Ontology Language (OWL) of the World Wide Web Consortium. With that said, there are numerous projects that predate OWL that can serve as useful foundations. One such project is bibTeX, a method of marking up bibliographic data, primarily for use in LaTeX documents, but also useful for generic bibliographic storage. This document describes bibTeX recast in OWL for use in RDF applications.

Status of this Document

Note:

This document is subject to change at any time. As an early draft, the namespace may change, classes and properties added or dropped, and semantics modified before the final version. It is not recommended for production work or for work that you cannot change in the future.

Given the disclaimer in the note, welcome to the specification for bibTeX in OWL. This document arose out of a need to markup bibliographic data while creating a new website based on RDF, OWL, and other semantic web technologies. As the document matures, the usefulness beyond bibTeX will become apparent and other useful classes and properties can (and probably will) be added.

There are a number of things to consider:

Contact me if you would like to help out, or if you have any comments.

Table of Contents

1 Introduction and (Short) History
2 Namespace and Prefix
3 Classes and Properties
    3.1 Classes
    3.2 Datatype Properties
4 Examples
5 Acknowledgements
6 References


1 Introduction and (Short) History

Markup and organization of bibliographic entries is vital for smooth functioning in any work that requires careful documentation of sources. While there exist many commercial and non-commercial solutions, one of the simplest and most widespread is bibTeX[bibtex]. This format is simple but complete, easy to use but powerful. The following is an example:

@article{Gettys90,
   author = {Jim Gettys and Phil Karlton and Scott McGregor},
   title = {The {X} Window System, Version 11},
   journal = {Software Practice and Experience},
   volume = {20},
   number = {S2},
   year = {1990},
   abstract = {A technical overview of the X11 functionality.  This is an update
of the X10 TOG paper by Scheifler \& Gettys.}
}
This describes an Article with an key of Gettys90 along with the usual metadata of author, title, journal, and so on. The ID allows one to reference the bibTeX entry in a LaTeX document by simply using the instruction \cite{Gettys90}. All formatting and production of the bibliography is taken care of for the user.

bibTeX easily fits into the OWL way of describing the world with classes and properties. In the past, others had marked up bibTeX in the predecessor of OWL, DAML+OIL[damloil], one of which was created by the USC Information Sciences Institute[uscbibtex]. However, there has not been, as of yet, a markup of bibTeX in OWL.

This document serves as the definition of bibTeX in owl and outlines some common examples. In all likelihood, there will be extensions and modifications to the OWL definition in the future to accommodate desired changes. The goal is for this (and the subsequent modifications) to become the standard way of marking up bibliographic data in the semantic web.

2 Namespace and Prefix

The namespace for this ontology is http://purl.oclc.org/NET/nknouf/ns/bibtex#. This namespace will always point to the most up-to-date version of the ontology. RDF documents that use the bibTeX ontology should use this rather than the actual URL to the file, which currently is http://zeitkunst.org/bibtex/0.1/bibtex.owl. The namespace prefix is bibtex. The full XML Namespace definition, therefore, is xmlns:bibtex "http://purl.oclc.org/NET/nknouf/ns/bibtex#".

3 Classes and Properties

Nearly all of the classes and properties come directly from the bibTeX format document[bibtex]. There are a few additional properties and classes, some added to help organize bibTeX cardinality restrictions, others for ease-of-use. To reiterate, the classes and properties are subject to change!

3.1 Classes

Article
rdfs:subClassOf Entry
rdfs:label Article
rdfs:comment An article from a journal or magazine.
Book
rdfs:subClassOf Entry
rdfs:label Book
rdfs:comment A book with an explicit publisher.
Booklet
rdfs:subClassOf Entry
rdfs:label Booklet
rdfs:comment A work that is printed and bound, but without a named publisher or sponsoring institution.
Conference
rdfs:subClassOf Entry
rdfs:label Conference
rdfs:comment The same as INPROCEEDINGS, included for Scribe compatibility.
Entry
rdfs:label Entry
rdfs:comment Base class for all entries
Inbook
rdfs:subClassOf Entry
rdfs:label Inbook
rdfs:comment A part of a book, which may be a chapter (or section or whatever) and/or a range of pages.
Incollection
rdfs:subClassOf Entry
rdfs:label Incollection
rdfs:comment A part of a book having its own title.
Inproceedings
rdfs:subClassOf Entry
rdfs:label Inproceedings
rdfs:comment An article in a conference proceedings.
Manual
rdfs:subClassOf Entry
rdfs:label Manual
rdfs:comment Technical documentation.
Mastersthesis
rdfs:subClassOf Entry
rdfs:label Mastersthesis
rdfs:comment A Master's thesis.
Misc
rdfs:subClassOf Entry
rdfs:label Misc
rdfs:comment Use this type when nothing else fits.
Phdthesis
rdfs:subClassOf Entry
rdfs:label Phdthesis
rdfs:comment A PhD thesis.
Proceedings
rdfs:subClassOf Entry
rdfs:label Proceedings
rdfs:comment The proceedings of a conference.
Techreport
rdfs:subClassOf Entry
rdfs:label Techreport
rdfs:comment A report published by a school or other institution, usually numbered within a series.
Unpublished
rdfs:subClassOf Entry
rdfs:label Unpublished
rdfs:comment A document having an author and title, but not formally published.

3.2 Datatype Properties

has abstract
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has abstract
rdfs:comment An abstract of the work.
has address
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has address
rdfs:comment Usually the address of the publisher or other type of institution. For major publishing houses, van Leunen recommends omitting the information entirely. For small publishers, on the other hand, you can help the reader by giving the complete address.
has affiliation
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has affiliation
rdfs:comment The authors affiliation.
has annotation
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has annotation
rdfs:comment An annotation. It is not used by the standard bibliography styles, but may be used by others that produce an annotated bibliography.
has author
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has author
rdfs:comment The name(s) of the author(s), in the format described in the LaTeX book.
Editorial note 
This is tricky due to the fact that order is not (generally) preserved in RDF documents. The problem arises when you want to have an author list where the order is _extremely_ important. How shall we do that? Perhaps we want to define "hasPrimaryAuthor", "hasSecondaryAuthor", "hasTertiaryAuthor", and "hasRemainingAuthors", or something of that sort. This will be have to given more thought.
has booktitle
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has booktitle
rdfs:comment Title of a book, part of which is being cited. See the LaTeX book for how to type titles. For book entries, use the title field instead.
has chapter
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#nonNegativeInteger
rdfs:label has chapter
rdfs:comment A chapter (or section or whatever) number.
has contents
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has contents
rdfs:comment A Table of Contents.
has copyright
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has copyright
rdfs:comment Copyright information.
has crossref
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has crossref
rdfs:comment The database key of the entry being cross referenced.
has edition
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has edition
rdfs:comment The edition of a book--for example, "Second". This should be an ordinal, and should have the first letter capitalized, as shown here; the standard styles convert to lower case when necessary.
has editor
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has editor
rdfs:comment Name(s) of editor(s), typed as indicated in the LaTeX book. If there is also an author field, then the editor field gives the editor of the book or collection in which the reference appears.
Editorial note 
Again, the same issues that arose with the "hasAuthor" property apply here.
has institution
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has institution
rdfs:comment The sponsoring institution of a technical report.
Editorial note 
This could be an object property that refers to an external set of institution instances.
has ISBN
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has ISBN
rdfs:comment The International Standard Book Number.
has ISSN
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has ISSN
rdfs:comment The International Standard Serial Number. Used to identify a journal.
has journal
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has journal
rdfs:comment A journal name. Abbreviations are provided for many journals; see the Local Guide
Editorial note 
This could optionally be an object property, whereby the range would refer to an external set of journal instances, thus providing standardized abbreviations for different bibliographic styles.
has key
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has key
rdfs:comment The key for a particular bibTeX entry. Note that the rdf:ID for each Entry instance could be the bibTeX key as well, possibly making this property redundant.
has keywords
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has keywords
rdfs:comment Key words used for searching or possibly for annotation.
has language
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has language
rdfs:comment The language the document is in.
has LCCN
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has LCCN
rdfs:comment The Library of Congress Call Number.
has location
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has location
rdfs:comment A location associated with the entry, such as the city in which a conference took place.
has month
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has month
rdfs:comment The month in which the work was published or, for an unpublished work, in which it was written. You should use the standard three-letter abbreviation, as described in Appendix B.1.3 of the LaTeX book.
has mrnumber
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has mrnumber
rdfs:comment The Mathematical Reviews number.
has note
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has note
rdfs:comment Any additional information that can help the reader. The first word should be capitalized.
has number
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has number
rdfs:comment The number of a journal, magazine, technical report, or of a work in a series. An issue of a journal or magazine is usually identified by its volume and number; the organization that issues a technical report usually gives it a number; and sometimes books are given numbers in a named series.
has organization
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has organization
rdfs:comment The organization that sponsors a conference or that publishes a manual.
has pages
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has pages
rdfs:comment One or more page numbers or range of numbers, such as 42-111 or 7,41,73-97 or 43+ (the `+' in this last example indicates pages following that don't form a simple range). To make it easier to maintain Scribe-compatible databases, the standard styles convert a single dash (as in 7-33) to the double dash used in TeX to denote number ranges (as in 7-33).
has price
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has price
rdfs:comment The price of the document.
has publisher
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has publisher
rdfs:comment The publisher's name.
Editorial note 
This is a case where an ObjectProperty might be a better choice, where the range is some set of publisher names defined in another ontology. That would allow all of the metadata for the publisher to be incorporated as needed.
has school
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has school
rdfs:comment The name of the school where a thesis was written.
Editorial note 
As with "hasPublisher", this could be an ObjectProperty that refers to an external set of school instances.
has series
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has series
rdfs:comment The name of a series or set of books. When citing an entire book, the title field gives its title and an optional series field gives the name of a series or multi-volume set in which the book is published.
has size
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has size
rdfs:comment The physical dimensions of a work.
has title
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has title
rdfs:comment The work's title, typed as explained in the LaTeX book.
has type
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has type
rdfs:comment The type of a technical report--for example, "Research Note".
has URL
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label has URL
rdfs:comment The WWW Universal Resource Locator that points to the item being referenced. This often is used for technical reports to point to the ftp or web site where the postscript source of the report is located.
has volume
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#nonNegativeInteger
rdfs:label has volume
rdfs:comment The volume of a journal or multivolume book.
has year
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#nonNegativeInteger
rdfs:label has year
rdfs:comment The year of publication or, for an unpublished work, the year it was written. Generally it should consist of four numerals, such as 1984, although the standard styles can handle any year whose last four nonpunctuation characters are numerals, such as '(about 1984)'.
how published
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label how published
rdfs:comment How something strange has been published. The first word should be capitalized.
human creator
rdfs:domain Entry
rdfs:range http://www.w3.org/2001/XMLSchema#string
rdfs:label human creator
rdfs:comment A generic human creator category, necessary in order to contain both author and editor.
page and/or chapter data
rdfs:domain Entry
rdfs:label page and/or chapter data
rdfs:comment A generic property to hold page and/or chapter data.

4 Examples

The original bibTeX example, shown above, cast as RDF using this ontology:


<bibtex:Article rdf:ID="Gettys90">
	<bibtex:hasKey rdf:datatype="&xsd;string">Gettys90</bibtex:hasKey>
	<bibtex:hasTitle rdf:datatype="&xsd;string">The X Window System, Version 11</bibtex:hasTitle>
	<bibtex:hasJournal rdf:datatype="&xsd;string">Software Practice and Experience</bibtex:hasJournal>
	<bibtex:hasVolume rdf:datatype="&xsd;nonNegativeInteger">20</bibtex:hasVolume>
	<bibtex:hasNumber rdf:datatype="&xsd;string">S2</bibtex:hasNumber>
	<bibtex:hasYear rdf:datatype="&string">1990</bibtex:hasYear>
	<bibtex:hasAbstract rdf:datatype="&xsd;string">A technical overview of the X11 functionality.  This is an update
of the X10 TOG paper by Scheifler & Gettys.</bibtex:hasAbstract>
</bibtex:Article>

			
As you can see, this is rather verbose; however, the verbosity allows for complete definition of the classes and properties in a machine readable format.

5 Acknowledgements

Thanks go to the people on the #rdfig IRC channel and members of the rdfweb-dev mailing list for help and comments. Thanks also go to Oren Patashnik and Leslie Lamport for creating bibTeX and Dana Jacobsen for the excellent bibTeX webpage summary.

6 References

bibtex
bibTeX Format Description
damloil
DAML+OIL Specification
uscbibtex
USC Information Sciences Institute