View Source

What is an ontology?

"Ontology" — as an abstract noun — represents branches of Philosophy that discuss "existence." But in Information Science, an "ontology" — as a concrete noun — is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions that represent the entities in that subject area.

In i2b2's domain of Clinical Research Informatics, an ontology is a collection of healthcare-related "concepts," that define a set of terms representing the various categories of clinical and translational science. Categories such as demographics, diagnoses, procedures, laboratory tests, and medications are described as "concepts" in an i2b2 ontology.

Unlike some other ontologies in Information Science, an i2b2 ontology does not define relational expressions among the concepts or terms. Therefore, strictly speaking, an i2b2 ontology is actually a "taxonomy," which is a hierarchical group of terms or concepts, without any relational expressions among them.

For instance, if you recall (from your high-school biology class) Linnaeus's "binomial nomenclature" model of how to name living organisms, then you already know what a taxonomy is. For instance, the common house fly Musca domestica and other flies in the family Muscidae are members of the order Diptera, which are organisms in the class Insecta, phylum Arthropoda, kingdom Animalia. A taxonomy defines the hierarchical "is-a" relationship between two concepts. Similarly, a Camry is a Toyota, which is an automobile, which is a type of motorized, wheeled conveyance.

Why does i2b2 have an ontology?

An i2b2 instance requires an ontology so that researchers can locate "concepts" of interest and locate patients whose medical records contain these "concepts."

The healthcare concepts contained in the ontology are the building blocks of an i2b2 query. The i2b2 ontology is represented in the upper-left corner of the i2b2 webclient (user interface). See the screenshot below.

These concepts are rendered as a collection of nested folders and generally represent child-parent (or is-a) relationships. Researchers locate these concepts by using a hierarchical tree, or taxonomy. When you log in to the demo i2b2 site (username "demo", password "demouser" are already filled in), you will see the i2b2 ontology tree on the upper-left side of the user interface. We also refer to that listing as a group of ontologies, where each category or top-level taxonomy is called an ontology in its own right.

In this screenshot, we would say that i2b2 is displaying an ontology with 13 top-level ontology trees or categories, namely: Clinical Trials, Custom Metadata, Demographics, Diagnoses, etc.

Ontology Working Group > Ontologies 101 – Introduction – Your First i2b2 Ontology > i2b2Webclient-Demo-20250225.png

For instance:

diabetes mellitus is an endocrine disorder, which is a type of diagnosis; it will be found in both the "Diagnoses" tree and in the "Diagnoses (ICD10)" tree;
aspirin is a non-steroidal anti-inflammatory drug, which is a type of medication; it will be found in the "Medications" tree;
tonsillectomy is a type of surgery, which is a procedure; it will be found in the "Procedures" tree;
hemoglobin A1c is a blood test, which is a type of laboratory test; it will be found in the "Laboratory Tests" tree; and
race is a patient-demographic descriptor; it will be found in the "Demographics" tree.

Here is another screenshot, where you can see the hierarchy for "Race" in the "Demographics" tree:

Ontology Working Group > Ontologies 101 – Introduction – Your First i2b2 Ontology > i2b2Webclient-Demo-Race.png

How does it work? How does the i2b2 ontology make the patient data queryable?

First, a little more terminology. The i2b2 "patient data" are stored in the Clinical Research Chart database, or "CRC database." The ontology of concepts that describe those patient data is known as the "metadata." The metadata or ontology is stored in the Metadata database, or "ONT database."

The i2b2 metadata in the ONT database work together with the patient data in the CRC database. Each patient "observation" in the CRC database must have a code associated with it, and that code must match a healthcare concept — diagnosis, procedure, medication, lab test, demographic descriptor, etc. — that exists in the metadata database. So the CRC patient data will be queryable in i2b2 only if the patient facts in the CRC database are recorded utilizing standard codes that are referenced in the i2b2 metadata (ontology trees).

For instance, if the patient's electronic health record indicates that the patient had the procedure "tonsillectomy with adenoidectomy," then that fact needs to be recorded in the CRC database using the standard code for that particular procedure. In the case of using the default i2b2 Procedures metadata tree, that code would be "ICD9:28.3". A query in the user interface for "tonsillectomy with adenoidectomy" will be translated by i2b2 into a query in the database for patients who have the code ICD9:28.3 in their CRC data records.

A) The ontology is made up of medical concepts. The ontology is visible to researchers through the webclient, and each medical concept in the ontology includes specific code(s) that match medical observation facts (such as diagnoses, medications, procedures, etc). A researcher can select concepts they are interested in to build a query, and medical data that matches those concepts' codes will be counted.

Where did the i2b2 ontology "concepts" come from?

Typically, the concepts that make up the various categories or "domains" are based on medical terminology standards published by various institutions. Concepts can also be created ad hoc for a specific i2b2 instance by the i2b2 administrator at that site. (For more information on custom metadata, please see Ontologies 201.)

The commonly used standards representing basic structured clinical patient data include:

Source	Standard	Relevant i2b2 Domains or Categories
Health Level 7 International	HL7 Administrative	Demographics
American Medical Association (AMA)	Current Procedural Terminology (CPT; Level 1 of HCPCS)	Procedures
American Medical Association (AMA)	Healthcare Common Procedure Coding System (HCPCS)	Procedures
US National Center for Health Statistics (NCHS)	International Classification of Diseases (ICD)	Diagnoses (ICD9-CM, ICD10-CM), Procedures (ICD9-PROC, ICD10-PCS)
Regenstrief Institute	Logical Observation Identifiers Names and Codes (LOINC)	Measurements (Laboratory Tests, Vital Signs)
US Food and Drug Administration (FDA)	National Drug Code (NDC)	Medications
US National Library of Medicine	RxNorm (part of UMLS)	Medications
International Health Terminology Standards Development Organisation (IHTSDO), aka SNOMED International	Systematized Nomenclature of Medicine, Clinical Terms (SNOMED CT)	Demographics, Diagnoses, Procedures, Measurements, Medications
US National Library of Medicine	Unified Medical Language System (UMLS)	Demographics, Diagnoses, Procedures, Measurements, Medications
Veterans Administration Medications (VA Classes)		Medications

Do all i2b2 instances always have the same ontology?

No. Several i2b2 ontologies have been developed and are openly available for use. Any organization may also modify an existing ontology for its own use, or develop a new ontology. (See Ontologies 201.)

What ontologies can I use right away ('out of the box')?

The i2b2 database loading modules come with at least 3 sets of "metadata" or ontology trees. These are the demo ontology, the ACT ontology, and the ACT-on-OMOP ontology.

Name

Description

Included Domains

(I don't think all domains can be listed, or it will be hard to understand. I think this merges with description and tries to convey the level of detail and comprehensiveness).

Target Data Model

i2b2 demo ontology

default metadata from i2b2 authors

fixme

i2b2 Common Data Model (star-schema); default CRC database has matching concepts

ACT ontology

ENACT project

fixme

i2b2 Common Data Model (star-schema); ACT CRC demo database has matching concepts

ACT-on-OMOP ontology

ENACT project

fixme

i2b2 Common Data Model (star-schema), but modified with views into the OMOP Common Data Model; CRC database loaded with SYNPUF demo data has matching concepts

That may be the "demo" ontology, but it doesn't resemble the "ACT" ontology that our PI was showing us. How many different ontology trees does i2b2 have?

~~The i2b2 database loading modules come with at least 3 sets of "metadata" or ontology trees. These are the demo ontology, the ACT ontology, and the ACT-on-OMOP ontology.~~

How about a thumbnail description of these ontologies? How do they differ from each other?

~~Sure. Here is a table describing the major differences among them.~~

~~Name~~	~~Description~~	~~Included Domains~~	~~Target Data Model~~
~~i2b2 demo ontology~~	~~default metadata from i2b2 authors~~	~~fixme~~	~~i2b2 Common Data Model (star-schema); default CRC database has matching concepts~~
~~ACT ontology~~	~~ENACT project~~	~~fixme~~	~~i2b2 Common Data Model (star-schema); ACT CRC demo database has matching concepts~~
~~ACT-on-OMOP ontology~~	~~ENACT project~~	~~fixme~~	~~i2b2 Common Data Model (star-schema), but modified with views into the OMOP Common Data Model; CRC database loaded with SYNPUF demo data has matching concepts~~

How should I choose which ontology to use?

The ontology you choose to use depends on your local goals and local data. As long as your primary purpose is to understand and show how the i2b2 webclient works, the i2b2 demo ontology is sufficient. If your goal is to set up i2b2 for research use, one of the ACT ontologies will be more useful. In general, to set up an i2b2 instance to support research, start with the ACT Ontology. This Ontology includes a wide range of terminologies that should cover most code-sets you'll find in your EMR data. OMOP refers to a data architecture, so if you are using this format locally, then ACT on OMOP will be the most relevant.

Are the ontology and data stored separately? How to they connect and where are the data?

You just mentioned Domains. What do you mean by that?

sfsf saf

There are many terms related to i2b2 ontologies that are new to me. Is there a Glossary?

Yes! Here is a Glossary that provides explainers for many of the key terms related to i2b2 ontologies and the i2b2 architecture.

What is an ontology?

Why does i2b2 have an ontology?

How does it work? How does the i2b2 ontology make the patient data queryable?

Where did the i2b2 ontology "concepts" come from?

Do all i2b2 instances always have the same ontology?

What ontologies can I use right away ('out of the box')?

That may be the "demo" ontology, but it doesn't resemble the "ACT" ontology that our PI was showing us. How many different ontology trees does i2b2 have?

How about a thumbnail description of these ontologies? How do they differ from each other?

How should I choose which ontology to use?

Are the ontology and data stored separately? How to they connect and where are the data?

You just mentioned Domains. What do you mean by that?

There are many terms related to i2b2 ontologies that are new to me. Is there a Glossary?

What's the relationship between the metadata and the data?

How do I deploy my chosen ontology tree?

What should my metadata database look like when I am done?