Page History
Table of Contents | ||||
---|---|---|---|---|
|
...
Learning Objectives
This tutorial page is intended for i2b2 administrators Administrators who are new to i2b2 and clinical informatics. Maybe you've been asked to deploy i2b2 for your institution. Maybe your institution already has an i2b2 instance, and you've just inherited it. You are curious about how things work and how to set up or modify the ontology database.
...
The i2b2 database-loading modules come with at least 3 sets of "metadata" or ontology trees. These are the Demo Ontology, the ACT Ontology, and the ACT-on-OMOP Ontology. Inside the i2b2 software release folder you will find the three ontology tree folders (demo
, act
, and act-omop
) at this location (linux syntax): i2b2/edu.harvard.i2b2.data/Release_m-n/NewInstall/Metadata
, where m
is the version number and n
is the release identifier.
Info |
---|
You may review/download the database-loading modules on Github: https://github.com/i2b2/i2b2-data. You may also download from i2b2's Software Download page: https://www.i2b2.org/software/. |
Each ontology is designed to be used with some version of the i2b2 Common Data Model.
...
Info | ||
---|---|---|
| ||
|
How should I choose which ontology to use?
This is a key question/concept.
The ontology you choose depends on your institution's research goals and patient data.
Research Goals
If your goal is simply to understand and show how i2b2 works, then the i2b2 demo ontology is sufficient. If your goal is to set up i2b2 for research use, one of the ACT ontologies will be far more useful. The ACT ontologies are more modern, more robust, and will satisfy the needs of more researchers.
Furthermore, any institution that is planning to join the ENACT Network will need to use an ACT ontology to ensure compatibility with the other institutions in the network.
Patient Data
First, a little more terminology. The i2b2 "patient data" are stored in the i2b2 Clinical Research Chart database, or "CRC database." The ontology categories and concepts that describe those patient data are known as the "metadata." The metadata are stored in the i2b2 Metadata database, or "ONT database."
When setting up the patient data for i2b2, your institution needs to decide how it will conduct the ETL mapping from the EHR to the i2b2 CRC database. If your patient data are going to be set up only for i2b2 and SHRINE use, then many institutions set up the patient data in the default i2b2/tranSMART Common Data Model (CDM). In this case, they typically use the ACT Ontology with its robust collection of coding standards; the patient data from the EHR system need to be mapped to the multiple coding schemes in the ACT Ontology as they are loaded into the CRC database.
If your patient data are going to be set up in a database for queries by other systems besides i2b2, then many institutions use the OMOP Common Data Model (CDM) for their patient data in the CRC database. In this case, they typically use the ACT-on-OMOP Ontology, which relies chiefly on the OMOP Vocabulary standard; the patient data from the EHR system need to be mapped to the appropriate OMOP Vocabulary coding scheme as they are loaded into the OMOP CDM in the CRC database.
Tip Box |
---|
It is highly recommended for newcomers to begin their i2b2 journey by first setting up the Demo Ontology with the Demo CRC patient data. Setting up i2b2 can be a complex undertaking. You can learn a lot about how it works, and get the system up and running most quickly, by first setting up your i2b2 instance with the i2b2 Demo Ontology and demo patient data. Those databases will comprise a "demo" project in your i2b2 instance. When you have proven i2b2's deployment with the demo project, you can add a separate, new project for research. In this case you could use actual patient data in a second CRC database and an ACT Ontology in a second ONT database. Those new databases will comprise a "research" project in your i2b2 instance. |
Actually, I inherited an i2b2 instance. How can I tell which ontology it's using?
You may compare the tables in your ONT database with the variations shown in Appendix D – i2b2 Ontology Tables.
Part 3: Deploying Your Ontology
How do I deploy my chosen ontology tree?
See the installation instructions here: 3.7 Metadata Tables
What should my metadata database look like when I am done?
See the list of metadata tables here: Appendix D – i2b2 Ontology Tables
What else is pertinent to setting up my first ontology?
How should I choose which ontology to use?
This is a key question/concept.
The ontology you choose depends on your institution's research goals and patient data.
Research Goals
If your goal is simply to understand and show how i2b2 works, then the i2b2 demo ontology is sufficient. If your goal is to set up i2b2 for research use, one of the ACT ontologies will be far more useful. The ACT ontologies are more modern, more robust, and will satisfy the needs of more researchers.
Furthermore, any institution that is planning to join the ENACT Network will need to use an ACT ontology to ensure compatibility with the other institutions in the network.
Patient Data
First, a little more terminology. The i2b2 "patient data" are stored in the i2b2 Clinical Research Chart database, or "CRC database." The ontology categories and concepts that describe those patient data are known as the "metadata." The metadata are stored in the i2b2 Metadata database, or "ONT database."
When setting up the patient data for i2b2, your institution needs to decide how it will conduct the ETL mapping from the EHR to the i2b2 CRC database. If your patient data are going to be set up only for i2b2 and SHRINE use, then many institutions set up the patient data in the default i2b2/tranSMART Common Data Model (CDM). In this case, they typically use the ACT Ontology with its robust collection of coding standards; the patient data from the EHR system need to be mapped to the multiple coding schemes in the ACT Ontology as they are loaded into the CRC database.
If your patient data are going to be set up in a database for queries by other systems besides i2b2, then many institutions use the OMOP Common Data Model (CDM) for their patient data in the CRC database. In this case, they typically use the ACT-on-OMOP Ontology, which relies chiefly on the OMOP Vocabulary standard; the patient data from the EHR system need to be mapped to the appropriate OMOP Vocabulary coding scheme as they are loaded into the OMOP CDM in the CRC database.
Tip Box |
---|
It is highly recommended for newcomers to begin their i2b2 journey by first setting up the Demo Ontology with the Demo CRC patient data. Setting up i2b2 can be a complex undertaking. You can learn a lot about how it works, and get the system up and running most quickly, by first setting up your i2b2 instance with the i2b2 Demo Ontology and demo patient data. Those databases will comprise a "demo" project in your i2b2 instance. When you have proven i2b2's deployment with the demo project, you can add a separate, new project for research. In this case you could use actual patient data in a second CRC database and an ACT Ontology in a second ONT database. Those new databases will comprise a "research" project in your i2b2 instance. |
Actually, I inherited an i2b2 instance. How can I tell which ontology it's using?
You may compare the tables in your ONT database with the variations shown in Appendix D – i2b2 Ontology Tables.
...
Part 3: Deploying Your Ontology
How do I deploy my chosen ontology tree?
See the installation instructions here: 3.7 Metadata Tables
What should my metadata database look like when I am done?
See the list of metadata tables here: Appendix D – i2b2 Ontology Tables
What else is pertinent to setting up my first ontology?
Two very important topics for understanding the relationship between patient data (CRC) and metadata (ONT) Two very important topics for understanding the relationship between patient data (CRC) and metadata (ONT) are i2b2 Projects and CRC-Based Metadata. And it is critical to understand that because these two databases work in tandem, the full deployment of the ontology is not complete until the CRC database is deployed as well.
...
The converse is also true. Each CRC database is designed to be paired with only one ONT database. This is because the CRC database actually includes some very specific metadata that must match the metadata in the ONT database. There is an exception to this: if more than one ONT database share the same information that appears in the CRC database, then multiple projects can be supported by that single CRC database.
The instructions for The instructions for creating multiple projects are outside the scope of this Ontologies tutorial page.
...
How does it work? How does the i2b2 ontology make the patient data queryable?
This is another key topicquestion/concept.
The patient data in the CRC database work in tandem with the metadata in the ONT database to allow i2b2 to conduct queries for the researcher. The metadata spell out all the various healthcare concepts or terms that a researcher may wish to query for in the user interface, and the patient data must be coded in such a way that they reflect the codes defined in the metadata.
...
For instance, if the patient's electronic health record indicates that the patient had the procedure "tonsillectomy with adenoidectomydiagnosis "Acute tonsillitis
," then that fact needs to be recorded in the CRC database using the standard code for that particular procedure diagnosis as defined in the ONT database. In the case of using the default i2b2 Procedures Diagnoses metadata tree, that code would be "ICD9:
28.3". When a researcher makes a query in the user interface for "tonsillectomy with adenoidectomy," then i2b2 will query the CRC database for all patients who have the code ICD9:28.3
in their data records463
". When a researcher makes a query in the user interface for "Acute tonsillitis
," then i2b2 will query the CRC database for all patients who have the code ICD9:463
in their data records.
Let's view a high-level illustration of the mechanism behind a typical query. Here's that concept's definition from the diagnoses table in the ONT (metadata) database:
Excerpt from Diagnoses Table in Demo Ontology (I2B2 table in ONT metadata database) | ||
---|---|---|
Concept Name, displayed to user | Concept Path, a unique identifier | other fields |
Acute tonsillitis |
| ... |
When the user selects that Concept Name in the user interface, the i2b2 Query Tool associates that with the Concept Path. When running the query, i2b2 looks for that path in the concepts table in the CRC database, which looks like this...
Excerpt from Concepts Table in Demo Ontology (CONCEPT_DIMENSION table in CRC database) | ||
---|---|---|
Concept Path, a unique identifier | Concept Code, used in facts table | other fields |
|
| ... |
...and i2b2 associates that Concept Path with the Concept Code for that concept. i2b2 then locates those patients in the facts table in the CRC database that share that Concept Code...
Excerpt from Facts Table in Demo Patient Data (OBSERVATION_FACTS table in CRC database) | |||
---|---|---|---|
Patient ID | Visit ID | Concept Code | other fields |
|
|
| ... |
|
|
| ... |
|
|
| ... |
|
|
| ... |
|
|
| ... |
|
|
| ... |
...and reports a count of those patients back to the user at the conclusion of the query. In this illustration, there were 2 unique patients (over 6 encounters) that had this Concept Code in the facts table, so the user's query for patients with a diagnosis of "Acute tonsillitis
" will return a count of 2 patients.
Info |
---|
The query mechanism illustrated above applies to most queries, but not all. For more details, see Ontologies 103 – Ontology Table Structure and Query Mechanism. |
Info |
---|
It's important to understand that each institution will have its own protocols for coding diagnoses, procedures, medications, etc., in the patient electronic health records (aka EHR, as in Cerner or Epic databases), and that these institutional protocols may use standard, non-standard, or proprietary codes. So, when loading patient data into the CRC database for i2b2, it's necessary for the data curators who perform the ETL process (extract-transform-load) to map the codes existing in the patient EHR into the codes that are defined in the i2b2 metadata (ONT database). For instance, let's say a patient's EHR record includes an NDC code for a medication. And let's say that your institution's i2b2 ontology tree only has an RxNorm code for that particular medication. Then the medication record from the EHR would need to be mapped into a patient record in the i2b2 CRC database in such a way that the ontology's RxNorm code (not the EHR's NDC code) appears in the patient record. If the patient record in the CRC database has the NDC code from the EHR, then it would not be matched in a query, since the i2b2 query would be using the RxNorm code. |
...
Info |
---|
For some institutions, the ETL mapping from the EHR to the i2b2 CRC databases database is the most problematic process in the setup. Those institutions may decide to minimize the complexity of their ETL, and simply copy the coding scheme from their EHR into the patient records in the CRC database. If the coding schemes in their EHR are not standard coding schemes, then they may will likely have to customize their i2b2 ontology to reflect the coding schemes present in their CRC database. In this case, they can may reduce the complexity of the ETL mapping, but this will surely likely increase the complexity of setting up the ontology tree. |
Info |
If your local institution does not have data in the CRC database for a certain domain in your chosen i2b2 ontology, then user queries referencing that domain may come back empty. To avoid that, you can exclude that domain from the i2b2 user interface, so that the domain without matching data in the CRC database is never used in a queryFor details on how to customize the ontology tree, see Ontologies 201 – Custom Metadata – Additions and Modifications. |
What if my institution records new concepts into the EHR that are not already included in the ontology?not already included in the ontology?
It's not unusual for institutions to include coding in their EHR to reflect novel diagnoses, new medications, etc. And some institutions prefer to use proprietary codes for some clinical information. If your institution is using concepts or codes that are not reflected in your i2b2 ontology tree, the then your researchers will not be able to query for those concepts in i2b2.
One The remedy for this is to customize your ontology trees to include the new concepts that are missing from the i2b2 ontologies. See Ontologies 201 – Custom Metadata – Additions and Modificationsand Modifications.
Tip Box |
---|
If your new concepts are not strictly proprietary, then please consider contacting the appropriate coding authority, asking them to include your new concepts into the next release of their terminology. |
What if my institution's EHR is missing concepts that are found in one of the ontology trees?
If your institution's patient data do not include concepts from one of the ontology trees, then queries made from that tree's concepts may come up with zero results.
Trees with zero matching patient records can be handled in these ways:
...
Never fear! No institution will be using all of the concepts in every ontology tree. It is normal to have concepts in the ontology tree that match none of your patient records.
To prevent your institution's researchers from inadvertently choosing an "unused" concept in their queries, we recommend adding patient counts to each concept in the ontology tree. You can learn about how to do this in Ontologies 102 – Patient Counts ("totalnum
")
...
.
Suggested Next Steps
- Visit Appendix D – i2b2 Ontology Tables to learn more about the structure of the metadata.
- Visit Appendix E – Test Queries to learn how to run some "sanity check" queries on your new ONT and CRC databases.
- Visit Ontologies 102 – Patient Counts ("
totalnum
") to learn how to add patient counts to your ontology concepts in the user interface.
...