fontomri: Ontology for fMRI Subject Databases

TOC

fontomri	N. Knouf
	Brain and Cognitive Sciences, MIT
	November 21, 2003

Ontology for fMRI Subject Databases

Abstract

This is a request for comments on a shared fMRI subject database ontology defined using the Ontology Web Language (OWL) and implimented using the Resource Description Framework (RDF).

1. Non-Technical Introduction
2. Technologies to Use
3. Previous Work
4. Possible Ontology Elements
4.1 More about Person
4.1.1 Person Instance
4.1.2 More about Experimenter
4.1.3 More about Subject
5. Ontology Definition
§ References
§ Author's Address
A. Copyright

TOC

1. Non-Technical Introduction

Functional neuroimaging has exploded over the last decade into a common tool used to investigate numerous questions about the brain. With this explosion comes the need to easily, completely, and properly keep track of subject information. As well, recent federal regulations such as HIPPA necessitate strict control of identifiable information for years into the future. As well, there is a desire by some in the community to facillitate the sharing of fMRI data, something that is only possible when we agree to a common method of cataloging subject information.

The goal of fontomri is to provide an easy-to-use framework for both day-to-day subject management as well as long-term historical record-keeping. In addition, future work aims to combine data from analyses with subject information to provide a complete record of everything done with a subject's data.

Given that many details about subjects resides in the form of post-it notes, a centralized database would be useful not only to the researcher, but to the lab as a whole. With any discussion of centralized subject data, however, comes confidentiality concerns, especially with the new HIPPA regulations. This document gives guidelines about how to mitigate these concerns, both in the design of the database itself, as well as the design of the hardware that runs the database.

TOC

2. Technologies to Use

This document proposes a standard fMRI ontology defined by the Ontology Web Language (OWL)[1] with instance data defined using the Resource Description Framework (RDF)[2]. Both OWL and RDF are recommendations of the World Wide Web Consortium, the organization behind the standards for the web, among many other things.

By using these open technologies we ensure that the ontology will be free and open for now and into the future. As well, both OWL and RDF are text-based using XML as their serialization format (in the case of RDF, XML is the preferred format, but not the only one), and are thus easily readible by both humans and machines. Finally, OWL and RDF 'future-proof' our work so that coming tools that allow for reasoning and beyond-keyword searching will work with this ontology (see more information at the World Wide Web Consortium's website for the Semantic Web).

Security is extremely important to the success of fontomri. We must ensure that subject data cannot be tied to identifiable information, as required by HIPPA regulations. As well, we want to ensure that all transactions between the user (researcher) and the database are secure.

In brief, since there needs to be some connection between a subject's name and a subject ID, fontomri proposes an MD5 hash of the subject's name xor'ed with a site-specific password. This method, much like that used on UNIX systems for the protection of user passwords, makes connection of a subject's name with their data prohibitive.

On the hardware front, network access to the database will be limited to localhost connections. Data will be sent from a user's computer to the host over SSL. Scripts on the host will then connect to the database to input the data. The goal is to use as much encryption as possible without causing an undue burden on the end-user.

TOC

3. Previous Work

Given the importance of the problem, it is surprising that there is so little in the way of publicaly available software for subject management. Some of this may be due to widely-available knowledge management tools not being available until recently. However, one group at Dartmouth has been working on a tool called the fMRI Data Management Tool. This tool uses the Protégé knowledge management system as the main interface to the ontology.@@TODO add information about fmri-dmt work and why it's not the best@@

TOC

4. Possible Ontology Elements

To begin, it would be good to enumerate the main elements that we'd like to include in our ontology. More information on each element will be given in the subsections.

Person

Person refers to any human being that is involved in an experiment, including both the subject and the experimenter. Conceivibly this ontology could be extended to deal with non-human primates, thus the need to be pedantic from the start.

Experiment

Experiment refers to all of the information related to a study, including but not limited to experimental parameters, scanner information, notes, and so on. For most studies this will be the most involved element.

Study

Study refers to the combination of one or more Experiments, one or more Persons (most likely consisting of at least one experimenter and subject). Other data related to a Study include publication information.

As is mostly likely obvious to the reader, the elements listed above are not enough to describe a study, but it is convenient to describe the basic-level containers that include all of our subsequent information. Now it would be good to go into more detail about the types of information that are contained within the elements.

4.1 More about Person

The two options for a Person are Subject and Experimenter. It is important to note that in many cases an Experimenter may also be a Subject so there is no a priori reason to prevent the statement "Experimenter isA Subject". As well, it may be convenient to define a "generic" Person with basic information, allowing the statement "Experimenter is JohnDoe," where "John Doe" is an instance of Person.

4.1.1 Person Instance

For most cases it would be useful to create an instance of Person that MUST include the following:

Name

This way, you can create the syllogism:

JohnDoe IsA Subject
Subject IsA Person
Person hasProperty Name

Such an inference, while a simple example, is at the heart of this methodology and eliminates redundancy (if done correctly).

4.1.2 More about Experimenter

The following items MUST be specified for any instance of Experimenter:

Affiliation
Title
Contact information

4.1.3 More about Subject

The definition of a Subject is the most involved of all Person definitions. It is also one of the most important as it defines some of the major paramters of your Study.

A Subject MUST have the following:

subject-id

A unique subject ID not tied to any identifiable information about the subject, including, but not limited to, date of birth, date of scan, subject initials, weight, social security number, address, phone number, etc. This is mandated by HIPPA. But more importantly, the unique subject-id is what will tag the information of this subject to a particular experiment, it will be what is in any published manuscript, and it will be the only link to a subject's identifiable information.

subject-hash

The name of the subject, in MD5 form. It is necessary to have some sort of link between a particular subject in an experiment and the identifiable information for the subject. But HIPPA seems to prevent this! What is the solution? The solution is to take the cue from the management of passwords on a UNIX system. Passwords are stored as a clear-text MD5 hash which is one-way, meaning that even if you have the hash, you can't (without brute-force) get the password. Thus, this hash will be the name of the subject (as defined in the instance of a Person). This does have disadvantages, those being that if one were to get this hash it would be possible to do a dictionary search to try and crack the name of the subject. Ways to mitigate this include always sending any information to the subject data base in encrypted format (a good rule to follow anyway) and ensuring physical and network security to the machine storing the database. However, with that said, this element is subject to change because of concerns described.

@@TODO make it easy to add comments on subjects and get subject information on a day-to-day basis @@

@@TODO make it possible in the study section to make the statement "experimenter isAn author" @@

TOC

5. Ontology Definition

@@TODO finalize definition @@

TOC

References

[1]	"OWL Guide".
[2]	"RDF Primer".

TOC

Author's Address

	Nick Knouf
	Brain and Cognitive Sciences, MIT
	NE20-443d
	77 Mass Ave
	Cambridge, MA 02139
	US
EMail:	nknouf@mit.edu
URI:	http://web.mit.edu/bcs/nklab/

TOC