Ontology Working Group
Space shortcuts
Space Tools
Ontology Working Group OWL

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: rearranged questions, grouped, minor changes to wording

Table of Contents
maxLevel4
minLevel3
indent5px

What is an ontology?

"Ontology" represents the branch of Philosophy that discusses "existence." But in Information Science, an "ontology" is a way of listing the attributes of a subject area and showing how they are related. An ontology for a subject area or "domain" consists of a set of terms that represent the entities in that domain, along with a set of expressions that define how the entities are related to each other.

In i2b2's domain of Clinical Research Informatics, an ontology is a collection of healthcare-related "concepts" or terms. These terms represent the various categories of information in clinical and translational science. Categories such as demographics, diagnoses, procedures, laboratory tests, and medications are described using hierarchical lists of "concepts" in an i2b2 ontology.

Unlike some other ontologies in Information Science, an i2b2 ontology does not define relational expressions among the concepts or terms. Therefore, strictly speaking, an i2b2 ontology is actually a "taxonomy," which is a hierarchical group of terms or concepts, without any relational expressions among them.

For instance, if you recall (from your high-school biology class) Linnaeus's "binomial nomenclature" model of how to name living organisms, then you already know what a taxonomy is. As another example, a Camry is a Toyota, which is an automobile, which is a type of motorized, wheeled conveyance.

Why does i2b2 have an ontology?

An i2b2 instance requires an ontology so that researchers can locate concepts of interest and locate patients whose medical records contain these concepts.

The healthcare concepts contained in the ontology are the building blocks of an i2b2 query. The i2b2 ontology is represented in the upper-left corner of the i2b2 webclient (user interface). See the screenshot below.

These concepts are rendered as a collection of nested folders and generally represent child-parent (or is-a) relationships. Researchers locate these concepts by using a hierarchical tree, or taxonomy. When you use the demo i2b2 site (username "demo", password "demouser" are already filled in), you will see the i2b2 ontology tree on the upper-left side of the user interface. We also refer to that listing as a group of ontologies, where each category or top-level taxonomy is called an ontology in its own right

In this screenshot, we would say that i2b2 is displaying an ontology with 13 top-level ontology trees or categories, namely: Clinical Trials, Custom Metadata, Demographics, Diagnoses, etc.

Image Removed

Here are some examples of child-parent (or is-a) relationships found in i2b2's ontologies of healthcare concepts:

Introduction

This tutorial page is intended for i2b2 administrators who are new to i2b2 and clinical informatics. Maybe you've been asked to deploy i2b2 for your institution. Maybe your institution already has an i2b2 instance, and you've just inherited it. You are curious about how things work and how to set up or modify the ontology database. This page will bring you some insight.

Part 1: Ontology Basics

What is an ontology?

"Ontology" represents the branch of Philosophy that discusses "existence." But in Information Science, an "ontology" is a way of listing the attributes of a subject area and showing how these attributes are interrelated. An ontology for a subject area or "domain" consists of a set of terms that represent the entities in that domain, along with a set of expressions that define how the entities in that domain are related to each other.

In i2b2's domain of Clinical Research Informatics, an ontology is a collection of healthcare-related "concepts" or terms. These terms represent the various categories of information in clinical and translational science. Categories such as Patient Demographics, Diagnoses, Procedures, Laboratory Tests, and Medications are described using hierarchical lists of "concepts" in an i2b2 ontology.

Unlike some other ontologies in Information Science, an i2b2 ontology does not typically define relationship expressions among the concepts or entities. Therefore, strictly speaking, an i2b2 ontology is generally more akin to a "taxonomy," which is a hierarchical group of terms or concepts, without any expressions defining the relationships among them.

For instance, if you recall (from your high-school biology class) Linnaeus's "binomial nomenclature" model of how to name living organisms, then you already know what a taxonomy is. As another example, a Camry is a Toyota, which is an automobile, which is a type of motorized, wheeled conveyance.

Why does i2b2 have an ontology?

An i2b2 instance requires an ontology so that researchers can locate concepts of interest and locate patients whose medical records contain these concepts.

The healthcare concepts contained in the ontology are the building blocks of an i2b2 query. The i2b2 ontology is represented in the upper-left corner of the i2b2 webclient (user interface). See the screenshot below.

These concepts are rendered as a collection of nested folders and generally represent child-parent relationships. Researchers locate these concepts by using a hierarchical tree. When you use the demo i2b2 site, you will see the i2b2 ontology tree on the upper-left side of the user interface. 

In this screenshot, we would say that i2b2 is displaying an ontology tree with 13 top-level ontology trees or categories, namely: Clinical Trials, Custom Metadata, Demographics, Diagnoses, etc.

Image Added

Here are some examples of child-parent (or is-a) relationships found in i2b2's ontologies of healthcare concepts:

  • diabetes mellitus is an endocrine disorder, which is a type of diagnosis; it will be found in both the "Diagnoses" tree and in the "Diagnoses (ICD10)" tree;
  • aspirin is a non-steroidal anti-inflammatory drug, which is an analgesic, which is a type of medication
  • diabetes mellitus is an endocrine disorder, which is a type of diagnosis; it will be found in both the "Diagnoses" tree and in the "Diagnoses (ICD10)" tree;
  • aspirin is a non-steroidal anti-inflammatory drug, which is an analgesic, which is a type of medication; it will be found in the "Medications" tree;
  • tonsillectomy is a type of surgery, which is a procedure; it will be found in the "ProceduresMedications" tree;
  • tonsillectomy is a type of surgery, which is a procedure; it will be found in the "Procedures" tree;
  • hemoglobin A1c is a hemoglobin A1c is a blood test, which is a type of laboratory test; it will be found in the "Laboratory Tests" tree; and
  • race is a patient-demographic descriptor; it will be found in the "Demographics" tree.

...

The commonly used standards representing basic structured clinical patient data are represented by a number of different "coding schemes" — ICD-9, ICD-10, CPT-4, HCPCS, SNOMED CT, LOINC, RxNorm, UMLS, and VA Classes. See Appendix B for further information.

How does it work? How does the i2b2 ontology make the patient data queryable?

First, a little more terminology. The i2b2 "patient data" are stored in the i2b2 Clinical Research Chart database, or "CRC database." The concepts that describe those patient data are known as the "metadata." The metadata are stored in the i2b2 Metadata database, or "ONT database."

The i2b2 metadata in the ONT database work together with the patient data in the CRC database. Each patient "observation" in the CRC database must have a code associated with it, and that code must match a healthcare concept — diagnosis, procedure, medication, lab test, demographic descriptor, etc. — that exists in the ONT database. So the CRC patient data will be queryable in i2b2 only if the patient facts in the CRC database are recorded utilizing standard codes that are referenced in the i2b2 metadata (ONT database).

For instance, if the patient's electronic health record indicates that the patient had the procedure "tonsillectomy with adenoidectomy," then that fact needs to be recorded in the CRC database using the standard code for that particular procedure. In the case of using the default i2b2 Procedures metadata tree, that code would be "ICD9:28.3". When a researcher makes a query in the user interface for "tonsillectomy with adenoidectomy," that query will be translated by i2b2 into a query for patients in the CRC database who have the code ICD9:28.3 in their data records.

Info
It's important to understand that each institution will have its own protocols for coding diagnoses, procedures, medications, etc., in the patient electronic health records (aka EHR, as in Cerner or Epic databases), and that these protocols may use standard or non-standard codes. So, when preparing the CRC database for i2b2, it's necessary for the ETL process (extract-transform-load) to map the codes from the patient EHR into the codes that are present in the i2b2 metadata. For instance, let's say a patient's EHR record includes an NDC code for a medication. And let's say that your institution's i2b2 ontology tree only has an RxNorm code for that type of medication. Then the medication record from the EHR should be mapped into a patient record in the i2b2 CRC database that uses the appropriate RxNorm code. If the patient record in the CRC database has the NDC code from the EHR, then it would not be matched in a query when the query is using the RxNorm code.

There are many terms related to i2b2 ontologies that are new to me. Is there a Glossary?

Yes! Here is a Glossary that provides explainers for many of the key terms related to i2b2 ontologies and the i2b2 architecture.

Do all i2b2 instances share the same i2b2 ontology?

No. Several i2b2 ontologies have been developed and are openly available for use. Any organization may also modify an existing ontology for its own use, or develop a new ontology. (See Ontologies 201.)

The i2b2 software includes i2b2 ontologies that you can use right away. There are additional ontology trees that can substitute for, or can be added to, the standard i2b2 ontology trees. (See Appendix E – Advanced Ontologies.)

Which ontologies can I use right away ('out of the box')?

The i2b2 database-loading modules come with at least 3 sets of "metadata" or ontology trees. These are the demo ontology, the ACT ontology, and the ACT-on-OMOP ontology. 

...

There are many terms related to i2b2 ontologies that are new to me. Is there a Glossary?

Yes! Here is a Glossary that provides explainers for many of the key terms related to i2b2 ontologies and the i2b2 architecture.


...

Part 2: Choosing Your Ontology

Do all i2b2 instances share the same i2b2 ontology?

No. Several i2b2 ontologies have been developed and are openly available for use. Any organization may also modify an existing ontology for its own use, or develop a new ontology. (See Ontologies 201.)

The i2b2 software includes i2b2 ontologies that you can use right away. There are additional ontology trees that can substitute for, or can be added to, the standard i2b2 ontology trees. (See Appendix E – Advanced Ontologies.)

Which ontologies can I use right away ('out of the box')?

The i2b2 database-loading modules come with at least 3 sets of "metadata" or ontology trees. These are the demo ontology, the ACT ontology, and the ACT-on-OMOP ontology. 

NameDescriptionTarget Data Model
i2b2 Demo Ontologydefault metadata from i2b2 authorsi2b2 Common Data Model (star-schema); default CRC demo database has matching concepts
ACT OntologyENACT projecti2b2 Common Data Model (star-schema); ACT CRC demo database has matching concepts
ACT-on-OMOP OntologyENACT projecti2b2 Common Data Model (star-schema), but modified with views into the OMOP Common Data Model; the CRC database loaded with SYNPUF demo data has matching concepts

To view and compare details of the domains and coding schemes included in each ontology, see Appendix B.

Info
titlei2b2 Common Data Model

A Common Data Model (CDM) is a way of organizing data into standardized structures and observational content to enable alignment of patient data across multiple organizations. Each i2b2 ontology leverages the i2b2 Common Data Model (CDM), which is based on a "star schema": instead of separate tables for diagnoses, medications, and other data types, all patient observations are stored in a single "fact" table, and the ontology describes the different codes that are placed in this fact table.

How should I choose which ontology to use?

This is a key question/concept.

The ontology you choose depends on your institution's goals and patient data. If your goal is simply to understand and show how i2b2 works, then the i2b2 demo ontology is sufficient. If your goal is to set up i2b2 for research use, one of the ACT ontologies will be far more useful. The ACT ontologies are more modern, more robust, and will satisfy the needs of more researchers.

Furthermore, any institution that is planning to join the ENACT Network will need to use an ACT ontology to ensure compatibility with the other institutions in the network.

When setting up the patient data for i2b2, your institution needs to decide how it will conduct the ETL mapping from the EHR to the i2b2 CRC database. If your patient data are going to be set up only for i2b2 and SHRINE use, then many institutions set up the patient data in the default i2b2/tranSMART Common Data Model (CDM). In this case, they typically use the ACT Ontology with all its various coding standards; the patient data from the EHR system need to be mapped to the multiple coding schemes in the ACT Ontology as they are loaded into the CRC database.

If your patient data are going to be set up in a database for queries by other systems besides i2b2, then many institutions use the OMOP Common Data Model (CDM) for their patient data in the CRC database. In this case, they typically use the ACT-on-OMOP Ontology, which relies chiefly on the SNOMED CT coding standard; the patient data from the EHR system need to be mapped to the SNOMED CT coding scheme as they are loaded into the CRC database.

Info
For some institutions, the ETL mapping from the EHR to the i2b2 CRC databases is the most problematic process in the setup. Those institutions may decide to minimize the complexity of their ETL, and simply copy the coding scheme from their EHR into the patient records in the CRC database. If the coding schemes in their EHR are not standard coding schemes, then they may have to customize their i2b2 ontology to reflect the coding schemes present in their CRC database.


Tip Box

Setting up i2b2 can be a complex undertaking. You can learn a lot about how it works, and get the system up and running most quickly, by first setting up your i2b2 instance with the i2b2 Demo Ontology and demo patient data. Those databases will comprise a "Demo" project in your i2b2 instance.

When you have proven the deployment with the demo project, you can add a separate, new project for research. In this case you could use actual patient data in a second CRC database and an ACT Ontology in a second ONT database. Those new databases will comprise a "research" project in your i2b2 instance.


...

Part 3: How Ontologies and Patient Data Are Tied Together

How does it work? How does the i2b2 ontology make the patient data queryable?

First, a little more terminology. The i2b2 "patient data" are stored in the i2b2 Clinical Research Chart database, or "CRC database." The concepts that describe those patient data are known as the "metadata." The metadata are stored in the i2b2 Metadata database, or "ONT database."

The i2b2 metadata in the ONT database work together with the patient data in the CRC database. Each patient "observation" in the CRC database must have a code associated with it, and that code must match a healthcare concept — diagnosis, procedure, medication, lab test, demographic descriptor, etc. — that exists in the ONT database. So the CRC patient data will be queryable in i2b2 only if the patient facts in the CRC database are recorded utilizing standard codes that are referenced in the i2b2 metadata (ONT database).

For instance, if the patient's electronic health record indicates that the patient had the procedure "tonsillectomy with adenoidectomy," then that fact needs to be recorded in the CRC database using the standard code for that particular procedure. In the case of using the default i2b2 Procedures metadata tree, that code would be "ICD9:28.3". When a researcher makes a query in the user interface for "tonsillectomy with adenoidectomy," that query will be translated by i2b2 into a query for patients in the CRC database who have the code ICD9:28.3 in their data records.

Info
It's important to understand that each institution will have its own protocols for coding diagnoses, procedures, medications, etc., in the patient electronic health records (aka EHR, as in Cerner or Epic databases), and that these protocols may use standard or non-standard codes. So, when preparing the CRC database for i2b2, it's necessary for the ETL process (extract-transform-load) to map the codes from the patient EHR into the codes that are present in the i2b2 metadata. For instance, let's say a patient's EHR record includes an NDC code for a medication. And let's say that your institution's i2b2 ontology tree only has an RxNorm code for that type of medication. Then the medication record from the EHR should be mapped into a patient record in the i2b2 CRC database that uses the appropriate RxNorm code. If the patient record in the CRC database has the NDC code from the EHR, then it would not be matched in a query when the query is using the RxNorm code.


Info
If your local institution does not have data in the CRC database for a certain domain in your chosen i2b2 ontology, then user queries referencing that domain may come back empty. To avoid that, you can exclude that domain from the i2b2 user interface, so that the domain without matching data in the CRC database is never used in a query

To view and compare details of the domains and coding schemes included in each ontology, see Appendix B.

Info
titlei2b2 Common Data Model

A Common Data Model (CDM) is a way of organizing data into standardized structures and observational content to enable alignment of patient data across multiple organizations. Each i2b2 ontology leverages the i2b2 Common Data Model (CDM), which is based on a "star schema": instead of separate tables for diagnoses, medications, and other data types, all patient observations are stored in a single "fact" table, and the ontology describes the different codes that are placed in this fact table.

Info
If your local institution does not have data in the CRC database for a certain domain in your chosen i2b2 ontology, then user queries referencing that domain may come back empty. To avoid that, you can exclude that domain from the i2b2 user interface, so that the domain without matching data in the CRC database is never used in a query.

How should I choose which ontology to use?

This is a key question/concept.

The ontology you choose depends on your institution's goals and patient data. If your goal is simply to understand and show how i2b2 works, then the i2b2 demo ontology is sufficient. If your goal is to set up i2b2 for research use, one of the ACT ontologies will be far more useful. The ACT ontologies are more modern, more robust, and will satisfy the needs of more researchers.

Furthermore, any institution that is planning to join the ENACT Network will need to use an ACT ontology to ensure compatibility with the other institutions in the network.

When setting up the patient data for i2b2, your institution needs to decide how it will conduct the ETL mapping from the EHR to the i2b2 CRC database. If your patient data are going to be set up only for i2b2 and SHRINE use, then many institutions set up the patient data in the default i2b2/tranSMART Common Data Model (CDM). In this case, they typically use the ACT Ontology with all its various coding standards; the patient data from the EHR system need to be mapped to the multiple coding schemes in the ACT Ontology as they are loaded into the CRC database.

If your patient data are going to be set up in a database for queries by other systems besides i2b2, then many institutions use the OMOP Common Data Model (CDM) for their patient data in the CRC database. In this case, they typically use the ACT-on-OMOP Ontology, which relies chiefly on the SNOMED CT coding standard; the patient data from the EHR system need to be mapped to the SNOMED CT coding scheme as they are loaded into the CRC database.

Info
For some institutions, the ETL mapping from the EHR to the i2b2 CRC databases is the most problematic process in the setup. Those institutions may decide to minimize the complexity of their ETL, and simply copy the coding scheme from their EHR into the patient records in the CRC database. If the coding schemes in their EHR are not standard coding schemes, then they may have to customize their i2b2 ontology to reflect the coding schemes present in their CRC database.
Tip Box

Setting up i2b2 can be a complex undertaking. You can learn a lot about how it works, and get the system up and running most quickly, by first setting up your i2b2 instance with the i2b2 Demo Ontology and demo patient data. Those databases will comprise a "demo" project in your i2b2 instance.

When you have proven the deployment with the demo project, you can add a separate, new project for research. In this case you could use actual patient data in a second CRC database and an ACT Ontology in a second ONT database. Those new databases will comprise a "research" project in your i2b2 instance.

What's the relationship between the metadata and the patient data?

...

  • TABLE_ACCESS: a configuration table that tells i2b2 which ontology tables to load into the user interface (required)
  • SCHEMES: a configuration table that tells i2b2 which coding schemes are being used in the ontology tables (required, but may be empty)
  • one or more "ontology" tables (required):
    • each ontology table will typically consist of a hierarchy of concepts for a particular category or domain of healthcare data; for instance:
      • in the i2b2 demo ontology, the I2B2 table contains concepts for patient demographics
      • in the ACT ontology, the ACT_DEM_V41 table contains concepts for patient demographics
  • CUSTOM_META: an empty ontology table; may include custom concepts added by you
  • totalnum and totalnum_report: tables dedicated to recording the number of patients who have the concepts present in the ontology trees; these tables are NOT displayed to the user as part of the ontology tree in the user interface; these table are empty until you run a "totalnum" report (see Ontologies 102 – Patient Counts ("totalnum"))

...

Ontology Working Group OWL