Ontology Working Group
Space shortcuts
Space Tools
Ontology Working Group OWL

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: minor formatting

Table of Contents
maxLevel4
indent5px

...

Learning Objectives

This tutorial page is intended for i2b2 administrators Administrators who are new to i2b2 and clinical informatics. Maybe you've been asked to deploy i2b2 for your institution. Maybe your institution already has an i2b2 instance, and you've just inherited it. You are curious about how things work and how to set up or modify the ontology database.

This page will bring you some insight.help you achieve these goals:

  • Understanding what an ontology is for i2b2
  • Choosing an appropriate ontology tree for i2b2

  • Deploying an ontology tree for i2b2

  • Understanding how the ontology enables queries in i2b2

Part 1: Ontology Basics

What is an ontology?

"Ontology" represents the branch of Philosophy that discusses "existence." But in Information Science, an "ontology" is a way of listing the attributes of a subject area and showing how these attributes are interrelated. An ontology for a subject area or "domain" consists of a set of terms that represent the entities in (or attributes of) that domain, along with a set of expressions that define how the entities in that domain are related to each other.

In i2b2's domain of "Clinical Research Informatics," an ontology is a collection of healthcare-related "concepts" or terms. These terms represent the various categories of information in clinical and translational science. Categories In an i2b2 ontology, categories such as Patient Demographics, Diagnoses, Procedures, Laboratory Tests, and Medications are described using hierarchical lists of "concepts" in an i2b2 ontology.."

Unlike some other ontologies in Information Science, an i2b2 ontology does not typically define relationship expressions among the concepts or entities. Therefore, strictly speaking, an i2b2 ontology is generally more akin to a "taxonomy," which is a hierarchical group of terms or concepts, without any expressions defining the relationships among them.

...

These concepts are rendered as a collection of nested folders and generally represent child-parent relationships. Researchers locate these concepts by using "drilling down" through a hierarchical tree, or by using the "search" tool. When you use the demo i2b2 site, you will see the i2b2 ontology tree on the upper-left side of the user interface. 

...

Here is another screenshot, where you can see the expansion for "Race" in the "Demographics" tree:


This diagram shows the high level workflow:

Image Added

Where do i2b2 ontology "concepts" come from?

Typically, the concepts that make up the various categories or "domains" are based on medical terminology standards published by various institutions. Concepts can also be created ad hoc for a specific i2b2 instance by the i2b2 administrator at for that site. (For more information on custom metadata, please see Ontologies 201.)

The commonly used standards representing basic structured clinical patient data are represented by a number of different "coding schemes" — ICD-9, ICD-10, CPT-4, HCPCS, SNOMED CT, LOINC, RxNorm, UMLS, and VA Classes. We mention this here because these codes appear inside the ontology tables, and they are also referenced in the user interface. See Appendix B for further information.

There are many terms related to i2b2 ontologies that are new to me. Is there a Glossary?

Yes! Here is a Glossary that provides explainers for many of the key terms related to i2b2 ontologies and the i2b2 architecture.

...

No. Several i2b2 ontologies have been developed and are openly available for use. Any organization may also modify an existing ontology for its own use, or develop a new ontology. (See Ontologies 201 .)– Custom Metadata – Additions and Modifications.)

The i2b2 software includes i2b2 ontologies that you can use right away. There are additional ontology trees that can substitute for, or can be added to, the standard i2b2 ontology trees. (See Appendix E F – Advanced Ontologies.)

Which ontologies can I use right away ('out of the box')?

The i2b2 database-loading modules come with at least 3 sets of "metadata" or ontology trees. These are the demo ontologyDemo Ontology, the ACT ontologyOntology, and the ACT-on-OMOP ontology. 

...

To view and compare details of the domains and coding schemes included in each ontology, see Appendix B.

...

titlei2b2 Common Data Model

...

Ontology. Inside the i2b2 software release folder you will find the three ontology tree folders (demo, act, and act-omop) at this location (linux syntax): i2b2/edu.harvard.i2b2.data/Release_m-n/NewInstall/Metadata, where m is the version number and n is the release identifier.

Info

You may review/download the database-loading modules on Github: https://github.com/i2b2/i2b2-data. You may also download from i2b2's Software Download page: https://www.i2b2.org/software/.

Each ontology is designed to be used with some version of the i2b2 Common Data Model.

Name

FolderDescriptionWhen To UseTarget Data Model
i2b2 Demo Ontologydemodefault metadata from i2b2 authors; legacy categoriesdemonstrationi2b2 Common Data Model (star-schema)
ACT Ontologyactmodern categories and concepts, supplied by the ENACT projectproductioni2b2 Common Data Model (star-schema)
ACT-on-OMOP Ontologyact-omopmodern categories and concepts, supplied by the ENACT projectproduction, at sites using OMOP CDMmulti-fact-table-enhanced i2b2 Common Data Model (star-schema)

To view and compare details of the domains and coding schemes included in each ontology, see Appendix B.

How should I choose which ontology to use?

...

The ontology you choose depends on your institution's research goals and patient data.

Research Goals

If your goal is simply to understand and show how i2b2 works, then the i2b2 demo ontology is sufficient. If your goal is to set up i2b2 for research use, one of the ACT ontologies will be far more useful. The ACT ontologies are more modern, more robust, and will satisfy the needs of more researchers.

Furthermore, any institution that is planning to join the ENACT Network will need to use an ACT ontology to ensure compatibility with the other institutions in the network.

Patient Data

First, a little more terminology. The i2b2 "patient data" are stored in the i2b2 Clinical Research Chart database, or "CRC database." The ontology categories and concepts that describe those patient data are known as the "metadata." The metadata are stored in the i2b2 Metadata database, or "ONT database."

When setting up the patient data for i2b2, your institution needs to decide how it will conduct the ETL mapping from the EHR to the i2b2 CRC database. If your patient data are going When setting up the patient data for i2b2, your institution needs to decide how it will conduct the ETL mapping from the EHR to the i2b2 CRC database. If your patient data are going to be set up only for i2b2 and SHRINE use, then many institutions set up the patient data in the default i2b2/tranSMART Common Data Model (CDM). In this case, they typically use the ACT Ontology with all its various robust collection of coding standards; the patient data from the EHR system need to be mapped to the multiple coding schemes in the ACT Ontology as they are loaded into the CRC database.

If your patient data are going to be set up in a database for queries by other systems besides i2b2, then many institutions use the OMOP Common Data Model (CDM) for their patient data in the CRC database. In this case, they typically use the ACT-on-OMOP Ontology, which relies chiefly on the SNOMED CT coding OMOP Vocabulary standard; the patient data from the EHR system need to be mapped to the SNOMED CT appropriate OMOP Vocabulary coding scheme as they are loaded into the OMOP CDM in the CRC database.

Info
For some institutions, the ETL mapping from the EHR to the i2b2 CRC databases is the most problematic process in the setup. Those institutions may decide to minimize the complexity of their ETL, and simply copy the coding scheme from their EHR into the patient records in the CRC database. If the coding schemes in their EHR are not standard coding schemes, then they may have to customize their i2b2 ontology to reflect the coding schemes present in their CRC database.
Tip Box

It is highly recommended for newcomers to begin their i2b2 journey by first setting up the Demo Ontology with the Demo CRC patient data.

Setting up i2b2 can be a complex undertaking. You can learn a lot about how it works, and get the system up and running most quickly, by first setting up your i2b2 instance with the i2b2 Demo Ontology and demo patient data. Those databases will comprise a "demo" project in your i2b2 instance.

When you have proven i2b2's deployment with the demo project, you can add a separate, new project for research. In this case you could use actual patient data in a second CRC database and an ACT Ontology in a second ONT database. Those new databases will comprise a "research

Tip Box

Setting up i2b2 can be a complex undertaking. You can learn a lot about how it works, and get the system up and running most quickly, by first setting up your i2b2 instance with the i2b2 Demo Ontology and demo patient data. Those databases will comprise a "Demo" project in your i2b2 instance.

When you have proven the deployment with the demo project, you can add a separate, new project for research. In this case you could use actual patient data in a second CRC database and an ACT Ontology in a second ONT database. Those new databases will comprise a "research" project in your i2b2 instance.

Part 3: How Ontologies and Patient Data Are Tied Together

How does it work? How does the i2b2 ontology make the patient data queryable?

First, a little more terminology. The i2b2 "patient data" are stored in the i2b2 Clinical Research Chart database, or "CRC database." The concepts that describe those patient data are known as the "metadata." The metadata are stored in the i2b2 Metadata database, or "ONT database."

The i2b2 metadata in the ONT database work together with the patient data in the CRC database. Each patient "observation" in the CRC database must have a code associated with it, and that code must match a healthcare concept — diagnosis, procedure, medication, lab test, demographic descriptor, etc. — that exists in the ONT database. So the CRC patient data will be queryable in i2b2 only if the patient facts in the CRC database are recorded utilizing standard codes that are referenced in the i2b2 metadata (ONT database).

For instance, if the patient's electronic health record indicates that the patient had the procedure "tonsillectomy with adenoidectomy," then that fact needs to be recorded in the CRC database using the standard code for that particular procedure. In the case of using the default i2b2 Procedures metadata tree, that code would be "ICD9:28.3". When a researcher makes a query in the user interface for "tonsillectomy with adenoidectomy," that query will be translated by i2b2 into a query for patients in the CRC database who have the code ICD9:28.3 in their data records.

Info
It's important to understand that each institution will have its own protocols for coding diagnoses, procedures, medications, etc., in the patient electronic health records (aka EHR, as in Cerner or Epic databases), and that these protocols may use standard or non-standard codes. So, when preparing the CRC database for i2b2, it's necessary for the ETL process (extract-transform-load) to map the codes from the patient EHR into the codes that are present in the i2b2 metadata. For instance, let's say a patient's EHR record includes an NDC code for a medication. And let's say that your institution's i2b2 ontology tree only has an RxNorm code for that type of medication. Then the medication record from the EHR should be mapped into a patient record in the i2b2 CRC database that uses the appropriate RxNorm code. If the patient record in the CRC database has the NDC code from the EHR, then it would not be matched in a query when the query is using the RxNorm code.
Info
If your local institution does not have data in the CRC database for a certain domain in your chosen i2b2 ontology, then user queries referencing that domain may come back empty. To avoid that, you can exclude that domain from the i2b2 user interface, so that the domain without matching data in the CRC database is never used in a query.

What's the relationship between the metadata and the patient data?

This is another key question/concept.

Actually, I inherited an i2b2 instance. How can I tell which ontology it's using?

You may compare the tables in your ONT database with the variations shown in Appendix D – i2b2 Ontology Tables.


...

Part 3: Deploying Your Ontology

How do I deploy my chosen ontology tree?

See the installation instructions here: 3.7 Metadata Tables

What should my metadata database look like when I am done?

See the list of metadata tables here: Appendix D – i2b2 Ontology Tables

What else is pertinent to setting up my first ontology?

Two very important topics for understanding the relationship between patient data (CRC) and metadata (ONT) are i2b2 Projects and CRC-Based Metadata. And it is critical to understand that because these two databases work in tandem, the full deployment of the ontology is not complete until the CRC database is deployed as well.

i2b2 Projects

The pairing of a metadata database with a patient database (ONT plus CRC) defines a "project" in i2b2. For instance, the pairing of the i2b2 demo ontology with the i2b2 demo patient data would define a "demo" project in i2b2. In that same i2b2 instance, an institution could create a second ONT database for the ACT ontology, and a second CRC database for the institution's actual patient data; the pairing of those two new databases would define a genuine research project in i2b2.

It's important to note that each ONT database is designed to be associated with only one CRC database; that is, each ONT database is designed to belong to only one i2b2 project. This is because the ONT database actually includes patient counts from a CRC database; and for that reason can be tied to only a single CRC database.

The converse is also true. Each CRC database is designed to be paired with only one ONT database. This is because the CRC database actually includes some very specific metadata that must match the metadata in the ONT database.

The instructions for creating multiple projects are outside the scope of this Ontologies tutorial page.

CRC-Based Metadata

There is a special metadata table that resides in the CRC database. This metadata table is called the CONCEPT_DIMENSION table. It resides in the CRC patient database, and it is part of the i2b2/tranSMART patient CDM, but it is still metadata. This table is crucial for the success of the i2b2 query mechanism. You may consider this the "secret sauce" that joins the metadata with the patient data. There are settings in this table that must match certain values in the ONT database. For a complete explanation of how the CRC database's metadata table is used, please see Ontologies 103 – Ontology Table Structure and Query Mechanism.

A second metadata table also resides in the CRC database. This is called the QT_BREAKDOWN_PATH table. This defines how certain queries are performed. Again, there are settings in this table that must match certain values in the ONT database.


...

Part 4: How Ontologies and Patient Data Are Tied Together

How does it work? How does the i2b2 ontology make the patient data queryable?

This is another key question/concept.

The The earlier question about "How does it work? How does the i2b2 ontology make the patient data queryable?" introduced the concept of how the patient data in the CRC database work in tandem with the metadata in the ONT database to allow i2b2 to conduct queries for the researcher. As a reminder from the earlier answer: the metadata spell out The metadata spell out all the various healthcare concepts or terms that a researcher may wish to query for in the user interface, and the patient data must be coded in such a way that they reflect the codes found defined in the metadata.

Only when a patient's codes match the codes in the query terms from the ontology a query's ontology codes will the patient be counted as part of a query result.

But there is more to the relationship between the ONT and CRC databases. One additional topic is the i2b2 "project." The other additional topic is the "secret sauce" of the relationship.

i2b2 Projects

The pairing of a metadata database with a patient database (ONT plus CRC) defines a "project" in i2b2. For instance, the pairing of the i2b2 demo ontology with the i2b2 demo patient data would define a "demo" project in i2b2. In that same i2b2 instance, an institution could create a second ONT database for the ACT ontology, and a second CRC database for the institution's actual patient data; the pairing of those two new databases would define a genuine research project in i2b2.

It's important to note that each ONT database is designed to be associated with only one CRC database; that is, each ONT database is designed to belong to only one i2b2 project.

The converse is not true. Each CRC database is designed to be paired with any number of ONT databases; that is, each CRC database may belong to multiple i2b2 projects.

The instructions for creating multiple projects are outside the scope of this Ontologies tutorial page.

Metadata "Secret Sauce"

There is a special metadata table that resides in the CRC database. This metadata table is called the CONCEPT_DIMENSION table. It resides in the CRC patient database, and it is part of the i2b2/tranSMART patient CDM, but it is still definitely metadata. You may consider this the "secret sauce" that joins the metadata with the patient data.

So what is the role of this metadata in the CRC database?

The ONT database provides a hierarchy of healthcare concepts, and each concept is identified in the ONT database by a unique "path." The concept's path resembles the path of a file in a computer filesystem. Just as directories in a computer filesystem are hierarchical, so the paths of the concepts reflect the hierarchy of healthcare concepts. In the CRC database, all the healthcare concepts (diagnoses, medications, procedures, and measurements) are coded in the patient records using coding schemes like ICD, RxNorm, CPT, and LOINC; the patient data do not include the hierarchical paths from the ONT database. The role of the CONCEPT_DIMENSION table is to map each hierarchical path from the ONT database to its matching code in the CRC patient data. It's this mapping that allows a path from the ONT database to be used in a query on the CRC database; without this mapping, i2b2 would not be able to match ONT concepts with CRC patients.

The CONCEPT_DIMENSION table provides the mapping between all the healthcare concepts (paths) present in an ONT database and the diagnosis, procedure, medication, and measurement codes present in that ONT database's associated CRC patient data. (If a CRC database is associated with more than one ONT database — more than one project — then the CONCEPT_DIMENSION table would need to include the mappings for all the relevant hierarchical paths from all the associated ONT databases for that CRC database.)

Image Removed

How do I deploy my chosen ontology tree?

<Add links here to link to the appropriate sections of the installation instructions>

What should my metadata database look like when I am done?

When your metadata database is installed, it will have the following tables (per i2b2 v1.8.1):

  • TABLE_ACCESS: a configuration table that tells i2b2 which ontology tables to load into the user interface (required)
  • SCHEMES: a configuration table that tells i2b2 which coding schemes are being used in the ontology tables (required, but may be empty)
  • one or more "ontology" tables (required):
    • each ontology table will typically consist of a hierarchy of concepts for a particular category or domain of healthcare data; for instance:
      • in the i2b2 demo ontology, the I2B2 table contains concepts for patient demographics
      • in the ACT ontology, the ACT_DEM_V41 table contains concepts for patient demographics
  • CUSTOM_META: an empty ontology table; may include custom concepts added by you
  • totalnum and totalnum_report: tables dedicated to recording the number of patients who have the concepts present in the ontology trees; these tables are NOT displayed to the user as part of the ontology tree in the user interface; these table are empty until you run a "totalnum" report (see Ontologies 102 – Patient Counts ("totalnum"))

After installation, all required tables will be populated, indexed, and ready to use.

After installation, there will also be a number of SQL-based procedures added to the ONT database, to prepare for running the "totalnum" report

This table shows all the tables you should expect to find in your ONT database when you have loaded your i2b2 ontology:

...

BIRN

CUSTOM_META

I2B2

ICD10_ICD9

ONT_PROCESS_STATUS

PHI

SCHEMES

TABLE_ACCESS

totalnum

totalnum_report

...

ACT_COVID_V41

ACT_CPT4_PX_V41

ACT_DEM_V41

ACT_HCPCS_PX_V41

ACT_ICD10_ICD9_DX_V4

ACT_ICD10CM_DX_V41

ACT_ICD10PCS_PX_V41

ACT_ICD9CM_DX_V4

ACT_ICD9CM_PX_V4

ACT_LOINC_LAB_V4

ACT_LOINC_LAB_PROV_V41

ACT_MED_ALPHA_V41

ACT_MED_VA_V41

ACT_RESEARCH_V41

ACT_SDOH_V41

ACT_VAX_V41

ACT_VISIT_DETAILS_V41

ACT_VITAL_SIGNS_V4

ACT_ZIPCODE_V41

BIRN

CUSTOM_META

I2B2

ICD10_ICD9

ONT_PROCESS_STATUS

SCHEMES

TABLE_ACCESS

totalnum

totalnum_report

...

ACT_COVID_V41_OMOP

ACT_CPT4_PX_V41_OMOP

ACT_DEM_V41_OMOP

ACT_HCPCS_PX_V41_OMOP

ACT_ICD10_ICD9_DX_V4_OMOP

ACT_ICD10CM_DX_V41_OMOP

ACT_ICD10PCS_PX_V41_OMOP

ACT_ICD9CM_DX_V4_OMOP

ACT_ICD9CM_PX_V4_OMOP

ACT_LOINC_LAB_V4_OMOP

ACT_LOINC_LAB_PROV_V41_OMOP

ACT_MED_ALPHA_V41_OMOP

ACT_MED_VA_V41_OMOP

ACT_RESEARCH_V41_OMOP

ACT_SDOH_V41_OMOP

ACT_VAX_V41_OMOP

ACT_VISIT_DETAILS_V41_OMOP

ACT_VITAL_SIGNS_V4_OMOP

ACT_ZIPCODE_V41_OMOP

BIRN

CUSTOM_META

I2B2

ICD10_ICD9

ONT_PROCESS_STATUS

SCHEMES

TABLE_ACCESS

totalnum

totalnum_report

Each patient "observation" in the CRC database must have a code associated with it, and that code must match a healthcare concept — diagnosis, procedure, medication, lab test, demographic descriptor, etc. — that is defined in the ONT database. So the CRC patient data will be queryable in i2b2 only if the patient facts in the CRC database are recorded utilizing standard codes that are defined in the i2b2 metadata (ONT database).

For instance, if the patient's electronic health record indicates that the patient had the diagnosis "Acute tonsillitis," then that fact needs to be recorded in the CRC database using the standard code for that particular diagnosis as defined in the ONT database. In the case of using the default i2b2 Diagnoses metadata tree, that code would be "ICD9:463". When a researcher makes a query in the user interface for "Acute tonsillitis," then i2b2 will query the CRC database for all patients who have the code ICD9:463 in their data records.


Let's view a high-level illustration of the mechanism behind a typical query. Here's that concept's definition from the diagnoses table in the ONT (metadata) database:

Excerpt from Diagnoses Table in Demo Ontology (I2B2 table in ONT metadata database)
Concept Name, displayed to userConcept Path, a unique identifierother fields
Acute tonsillitis

\i2b2\Diagnoses\Respiratory system (460-519)\Acute respiratory infections (460-466)\(463) Acute tonsillitis\

...

When the user selects that Concept Name in the user interface, the i2b2 Query Tool associates that with the Concept Path. When running the query, i2b2 looks for that path in the concepts table in the CRC database, which looks like this...

Excerpt from Concepts Table in Demo Ontology (CONCEPT_DIMENSION table in CRC database)
Concept Path, a unique identifierConcept Code, used in facts tableother fields

\i2b2\Diagnoses\Respiratory system (460-519)\Acute respiratory infections (460-466)\(463) Acute tonsillitis\

ICD9:463

...

...and i2b2 associates that Concept Path with the Concept Code for that concept. i2b2 then locates those patients in the facts table in the CRC database that share that Concept Code...

Excerpt from Facts Table in Demo Patient Data (OBSERVATION_FACTS table in CRC database)
Patient IDVisit IDConcept Codeother fields

1000000003

474819

ICD9:463

...

1000000003

475819

ICD9:463

...

1000000003

483428

ICD9:463

...

1000000003

478327

ICD9:463

...

1000000003

478484

ICD9:463

...

1000000055

473104

ICD9:463

...

...and reports a count of those patients back to the user at the conclusion of the query. In this illustration, there were 2 unique patients (over 6 encounters) that had this Concept Code in the facts table, so the user's query for patients with a diagnosis of "Acute tonsillitis" will return a count of 2 patients.

Info

The query mechanism illustrated above applies to most queries, but not all. For more details, see Ontologies 103 – Ontology Table Structure and Query Mechanism.


Info

It's important to understand that each institution will have its own protocols for coding diagnoses, procedures, medications, etc., in the patient electronic health records (aka EHR, as in Cerner or Epic databases), and that these institutional protocols may use standard, non-standard, or proprietary codes. So, when loading patient data into the CRC database for i2b2, it's necessary for the data curators who perform the ETL process (extract-transform-load) to map the codes existing in the patient EHR into the codes that are defined in the i2b2 metadata (ONT database).

For instance, let's say a patient's EHR record includes an NDC code for a medication. And let's say that your institution's i2b2 ontology tree only has an RxNorm code for that particular medication. Then the medication record from the EHR would need to be mapped into a patient record in the i2b2 CRC database in such a way that the ontology's RxNorm code (not the EHR's NDC code) appears in the patient record. If the patient record in the CRC database has the NDC code from the EHR, then it would not be matched in a query, since the i2b2 query would be using the RxNorm code.


Info
For some institutions, the ETL mapping from the EHR to the i2b2 CRC database is the most problematic process in the setup. Those institutions may decide to minimize the complexity of their ETL, and simply copy the coding scheme from their EHR into the patient records in the CRC database. If the coding schemes in their EHR are not standard coding schemes, then they will likely have to customize their i2b2 ontology to reflect the coding schemes present in their CRC database. In this case, they may reduce the complexity of the ETL mapping, but this will likely increase the complexity of setting up the ontology tree. For details on how to customize the ontology tree, see Ontologies 201 – Custom Metadata – Additions and Modifications.

What if my institution records new concepts into the EHR that are not already included in the ontology?

It's not unusual for institutions to include coding in their EHR to reflect novel diagnoses, new medications, etc. And some institutions prefer to use proprietary codes for some clinical information. If your institution is using concepts or codes that are not reflected in your i2b2 ontology tree, then your researchers will not be able to query for those concepts in i2b2.

The remedy for this is to customize your ontology trees to include the new concepts that are missing from the i2b2 ontologies. See Ontologies 201 – Custom Metadata – Additions and Modifications.

Tip Box

If your new concepts are not strictly proprietary, then please consider contacting the appropriate coding authority, asking them to include your new concepts into the next release of their terminology.

What if my institution's EHR is missing concepts that are found in one of the ontology trees?

Never fear! No institution will be using all of the concepts in every ontology tree. It is normal to have concepts in the ontology tree that match none of your patient records.

To prevent your institution's researchers from inadvertently choosing an "unused" concept in their queries, we recommend adding patient counts to each concept in the ontology tree. You can learn about how to do this in Ontologies 102 – Patient Counts ("totalnum").

Suggested Next Steps

In addition, in your CRC database, there should be a CONCEPT_DIMENSION table.

Suggested Next Steps?

...

Ontology Working Group OWL