bibTeX Definition in Web Ontology Language (OWL) Version 0.1
Working Draft
- This version:
-
http://zeitkunst.org/bibtex/0.1/
- Latest version:
-
http://zeitkunst.org/bibtex/0.1/
- Editor:
- Nick Knouf, MIT <nknouf@mit.edu>
Copyright © 2004, MIT. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/1.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
The copyright covers only this document, not the technologies described therein, including, but not limited to OWL, bibTeX, RDF, and RDF Schema. As well, while this document may bear a resemblence to W3C specifications, it is not a W3C document in any way. The document markup and style is modified from the W3C XML and XSL stylesheet, as they provide a convenient way to markup documents of this type.
Abstract
As the semantic web grows, there is the need for more and more formal ontology definitions in standard languages such as the Web Ontology Language (OWL) of the World Wide Web Consortium. With that said, there are numerous projects that predate OWL that can serve as useful foundations. One such project is bibTeX, a method of marking up bibliographic data, primarily for use in LaTeX documents, but also useful for generic bibliographic storage. This document describes bibTeX recast in OWL for use in RDF applications.
Status of this Document
Note:
This document is subject to change at any time. As an early draft, the namespace may change, classes and properties added or dropped, and semantics modified before the final version. It is not recommended for production work or for work that you cannot change in the future.
Given the disclaimer in the note, welcome to the specification for bibTeX in OWL. This document arose out of a need to markup bibliographic data while creating a new website based on RDF, OWL, and other semantic web technologies. As the document matures, the usefulness beyond bibTeX will become apparent and other useful classes and properties can (and probably will) be added.
There are a number of things to consider:
- Figure out a good way of describing an ordered set of authors. The best way that I can think of to do this is to use rdf:seq, but I'm stuck as to the way to integrate this with the OWL ontology. Thoughts?
- As described below, it would be useful to refer to journals and the such by a resource rather than a literal. Now, compiling such a list is a monumental task; is there a central location where we can get this information and (hopefully) convert it into RDF?
- Looks like I didn't search deep enough; there is already a tool that exists to transform bibTeX into RDF. As well, there is some extensive work done by SWAD-Europe: Semantic blogging for bibliographies - Lessons Learnt and SWAD-Europe: Semantic Blogging and Bibliographies - Requirements Specification.
- Complete the XSL stylesheet that transforms the bibTeX OWL file into the XML source that is the basis for the classes and properties section.
- Define proper ranges for some of the properties. Many are listed as "&xsd:string", but are better described as more complicated values.
- Perhaps things that are marked up as &xsd;string should actually be &xsd;token? (thanks to evlist in #rdfig for the suggestion)
- Bruce D'Arcus has suggested a different schema that doesn't try to define such concrete terms; that might be better in the long run.
- There might be better ways to describe bibliographic data: LibDB
- Modify the ontology based on user input.
Contact me if you would like to help out, or if you have any comments.
1 Introduction and (Short) History
Markup and organization of bibliographic entries is vital for smooth functioning in any work that requires careful documentation of sources. While there exist many commercial and non-commercial solutions, one of the simplest and most widespread is bibTeX[bibtex]. This format is simple but complete, easy to use but powerful. The following is an example:
@article{Gettys90,
author = {Jim Gettys and Phil Karlton and Scott McGregor},
title = {The {X} Window System, Version 11},
journal = {Software Practice and Experience},
volume = {20},
number = {S2},
year = {1990},
abstract = {A technical overview of the X11 functionality. This is an update
of the X10 TOG paper by Scheifler \& Gettys.}
}
This describes an
Article
with an key of
Gettys90
along with the usual metadata of author, title, journal, and so on. The ID allows one to reference the bibTeX entry in a LaTeX document by simply using the instruction
\cite{Gettys90}
. All formatting and production of the bibliography is taken care of for the user.
bibTeX easily fits into the OWL way of describing the world with classes and properties. In the past, others had marked up bibTeX in the predecessor of OWL, DAML+OIL[damloil], one of which was created by the USC Information Sciences Institute[uscbibtex]. However, there has not been, as of yet, a markup of bibTeX in OWL.
This document serves as the definition of bibTeX in owl and outlines some common examples. In all likelihood, there will be extensions and modifications to the OWL definition in the future to accommodate desired changes. The goal is for this (and the subsequent modifications) to become the standard way of marking up bibliographic data in the semantic web.
2 Namespace and Prefix
The namespace for this ontology is http://purl.oclc.org/NET/nknouf/ns/bibtex#. This namespace will always point to the most up-to-date version of the ontology. RDF documents that use the bibTeX ontology should use this rather than the actual URL to the file, which currently is http://zeitkunst.org/bibtex/0.1/bibtex.owl. The namespace prefix is bibtex. The full XML Namespace definition, therefore, is xmlns:bibtex "http://purl.oclc.org/NET/nknouf/ns/bibtex#".
3 Classes and Properties
Nearly all of the classes and properties come directly from the bibTeX format document[bibtex]. There are a few additional properties and classes, some added to help organize bibTeX cardinality restrictions, others for ease-of-use. To reiterate, the classes and properties are subject to change!
3.1 Classes
rdfs:subClassOf
|
Entry
|
rdfs:label
| Article |
rdfs:comment
| An article from a journal or magazine. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Book |
rdfs:comment
| A book with an explicit publisher. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Booklet |
rdfs:comment
| A work that is printed and bound, but without a named publisher or sponsoring institution. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Conference |
rdfs:comment
| The same as INPROCEEDINGS, included for Scribe compatibility. |
rdfs:label
| Entry |
rdfs:comment
| Base class for all entries |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Inbook |
rdfs:comment
| A part of a book, which may be a chapter (or section or whatever) and/or a range of pages. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Incollection |
rdfs:comment
| A part of a book having its own title. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Inproceedings |
rdfs:comment
| An article in a conference proceedings. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Manual |
rdfs:comment
| Technical documentation. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Mastersthesis |
rdfs:comment
| A Master's thesis. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Misc |
rdfs:comment
| Use this type when nothing else fits. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Phdthesis |
rdfs:comment
| A PhD thesis. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Proceedings |
rdfs:comment
| The proceedings of a conference. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Techreport |
rdfs:comment
| A report published by a school or other institution, usually numbered within a series. |
rdfs:subClassOf
|
Entry
|
rdfs:label
| Unpublished |
rdfs:comment
| A document having an author and title, but not formally published. |
3.2 Datatype Properties
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has address |
rdfs:comment
| Usually the address of the publisher or other type of institution. For major publishing houses, van Leunen recommends omitting the information entirely. For small publishers, on the other hand, you can help the reader by giving the complete address. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has annotation |
rdfs:comment
| An annotation. It is not used by the standard bibliography styles, but may be used by others that produce an annotated bibliography. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has author |
rdfs:comment
| The name(s) of the author(s), in the format described in the LaTeX book. |
Editorial note | |
This is tricky due to the fact that order is not (generally) preserved in RDF documents. The problem arises when you want to have an author list where the order is _extremely_ important. How shall we do that? Perhaps we want to define "hasPrimaryAuthor", "hasSecondaryAuthor", "hasTertiaryAuthor", and "hasRemainingAuthors", or something of that sort. This will be have to given more thought. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has booktitle |
rdfs:comment
| Title of a book, part of which is being cited. See the LaTeX book for how to type titles. For book entries, use the title field instead. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has edition |
rdfs:comment
| The edition of a book--for example, "Second". This should be an ordinal, and should have the first letter capitalized, as shown here; the standard styles convert to lower case when necessary. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has editor |
rdfs:comment
| Name(s) of editor(s), typed as indicated in the LaTeX book. If there is also an author field, then the editor field gives the editor of the book or collection in which the reference appears. |
Editorial note | |
Again, the same issues that arose with the "hasAuthor" property apply here. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has institution |
rdfs:comment
| The sponsoring institution of a technical report. |
Editorial note | |
This could be an object property that refers to an external set of institution instances. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has journal |
rdfs:comment
| A journal name. Abbreviations are provided for many journals; see the Local Guide |
Editorial note | |
This could optionally be an object property, whereby the range would refer to an external set of journal instances, thus providing standardized abbreviations for different bibliographic styles. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has key |
rdfs:comment
| The key for a particular bibTeX entry. Note that the rdf:ID for each Entry instance could be the bibTeX key as well, possibly making this property redundant. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has month |
rdfs:comment
| The month in which the work was published or, for an unpublished work, in which it was written. You should use the standard three-letter abbreviation, as described in Appendix B.1.3 of the LaTeX book. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has number |
rdfs:comment
| The number of a journal, magazine, technical report, or of a work in a series. An issue of a journal or magazine is usually identified by its volume and number; the organization that issues a technical report usually gives it a number; and sometimes books are given numbers in a named series. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has pages |
rdfs:comment
| One or more page numbers or range of numbers, such as 42-111 or 7,41,73-97 or 43+ (the `+' in this last example indicates pages following that don't form a simple range). To make it easier to maintain Scribe-compatible databases, the standard styles convert a single dash (as in 7-33) to the double dash used in TeX to denote number ranges (as in 7-33). |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has publisher |
rdfs:comment
| The publisher's name. |
Editorial note | |
This is a case where an ObjectProperty might be a better choice, where the range is some set of publisher names defined in another ontology. That would allow all of the metadata for the publisher to be incorporated as needed. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has school |
rdfs:comment
| The name of the school where a thesis was written. |
Editorial note | |
As with "hasPublisher", this could be an ObjectProperty that refers to an external set of school instances. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has series |
rdfs:comment
| The name of a series or set of books. When citing an entire book, the title field gives its title and an optional series field gives the name of a series or multi-volume set in which the book is published. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#string
|
rdfs:label
| has URL |
rdfs:comment
| The WWW Universal Resource Locator that points to the item being referenced. This often is used for technical reports to point to the ftp or web site where the postscript source of the report is located. |
rdfs:domain
|
Entry
|
rdfs:range
|
http://www.w3.org/2001/XMLSchema#nonNegativeInteger
|
rdfs:label
| has year |
rdfs:comment
| The year of publication or, for an unpublished work, the year it was written. Generally it should consist of four numerals, such as 1984, although the standard styles can handle any year whose last four nonpunctuation characters are numerals, such as '(about 1984)'. |
rdfs:domain
|
Entry
|
rdfs:label
| page and/or chapter data |
rdfs:comment
| A generic property to hold page and/or chapter data. |
4 Examples
The original bibTeX example, shown above, cast as RDF using this ontology:
<bibtex:Article rdf:ID="Gettys90">
<bibtex:hasKey rdf:datatype="&xsd;string">Gettys90</bibtex:hasKey>
<bibtex:hasTitle rdf:datatype="&xsd;string">The X Window System, Version 11</bibtex:hasTitle>
<bibtex:hasJournal rdf:datatype="&xsd;string">Software Practice and Experience</bibtex:hasJournal>
<bibtex:hasVolume rdf:datatype="&xsd;nonNegativeInteger">20</bibtex:hasVolume>
<bibtex:hasNumber rdf:datatype="&xsd;string">S2</bibtex:hasNumber>
<bibtex:hasYear rdf:datatype="&string">1990</bibtex:hasYear>
<bibtex:hasAbstract rdf:datatype="&xsd;string">A technical overview of the X11 functionality. This is an update
of the X10 TOG paper by Scheifler & Gettys.</bibtex:hasAbstract>
</bibtex:Article>
As you can see, this is rather verbose; however, the verbosity allows for complete definition of the classes and properties in a machine readable format.
5 Acknowledgements
Thanks go to the people on the #rdfig IRC channel and members of the rdfweb-dev mailing list for help and comments. Thanks also go to Oren Patashnik and Leslie Lamport for creating bibTeX and Dana Jacobsen for the excellent bibTeX webpage summary.