Test page
How to quickly add a few concepts to i2b2 ontology
- This content was presented at 2020 Harvard Symposium and is available in this video recording: https://www.youtube.com/watch?v=A5bIvNDIb3c (starting at about 5 minute mark and lasting approximately 20 minutes)
- Not everyone is adventurous to look “under the hood” so to say, but it is rather straight-forward and not scary. The beauty of i2b2 is its flexibility - you can fairly easily add new concepts to the ontology and you are immediately ready to load corresponding data and work with it in your webclient (or other compatible interfaces).
- There are many ways of adding rows to database tables, but I will discuss using a tool with a graphical user interface, such as Oracle SQL Developer for Oracle database (there are similar tools for every RDBMS, there’s a plethora of choices). I will assume that you know how to connect to your i2b2 database using such a tool.
- Export the content of your ontology table to .csv file and work with it in Excel. Subsequently, you will need to drop the table and replace it with the content of the .csv file you modified by importing it into your database, or you can do this work directly in your database table.
- By default, the ontology lives in a table called “i2b2” although you can change that name and have it live in multiple tables.
- The basic structure of the ontology table(s) in i2b2
- The table might look a little intimidating at first glance, but once you understand the idea behind it, it is straightforward to work with. Here are the main columns in the ontology (also frequently referred to as the metadata) table:
- C_HLEVEL
- This column carries a whole number which indicates the depth of the hierarchical level for a given ontology concept. Typically 0 means the very top - root level - of the hierarchy, 1 is the first “child,” 2 is the child the first, and so on.
- C_FULLNAME
- This is the most important column. It is my practice whenever I work with i2b2 ontologies in a spreadsheet or a database to sort the table by c_fullname - you will understand why in a moment
- Please note that I am simplifying here and in order for this simplification to work and not cause problems, you must ensure that your c_fullname is kept in sync (values be identical) with c_dimcode
- C_fullname is a path. Backslash acts as a separator, and individual elements of the path indicate the relationship of a given concept to the top of the hierarchy, referred to as the root. In other words, it’s the way you can get from the root to the current concept - a path!
- The element in the path that has no “children” is a terminal node or a “leaf.” Its ancestors are “folders.” In i2b2 webclient, folders get rendered with a folder icon and leaves - with a page icon. See c_visualattributes below for the continuation of the leaf and folder discussion.
- By convention, the root is “\i2b2\” - doesn’t have to be, but frequently is.
- C_NAME
- Concept description or a name
- This is shown in the webclient, so think about readability
- Webclient will alphabetize “sibling” elements in your ontology, so you might need to get clever about C_NAME if you want to enforce a certain order of concepts
- Tip: try “zz “ (that’s z-z-space) in front of your C_NAME and see what happens in webclient
- C_SYNONYM_CD
- Default is “N” and we’ll concentrate on those. If you see “Y” it means that the row you are looking at contains a synonym - out of scope for this tutorial.
- C_VISUALATTRIBUTES
- Three-letter values populate this column, but we will focus on the first 2 only:
- 1st letter determines if it’s a folder (F) or a leaf - a terminal node (L). There are other possible values, but it’s out of scope. Just remember to ignore rows where the 1st letter is R - those are modifiers, and that C behaves almost but not quite like F.
- 2nd letter - we’ll always keep it “A” (stands for “active”) for the purposes of this tutorial
- Three-letter values populate this column, but we will focus on the first 2 only:
- C_TOTALNUM
- Really important column but not for this discussion
- C_BASECODE
- This one is optional but convenient - fill in the code for your concepts here and it’ll be easier for you to visually understand and manage your ontology
- C_METADATAXML
- This controls popups in i2b2 webclient for concepts like lab results. This will be explained elsewhere.
- C_FACTTABLECOLUMN
- Default is “concept_cd”
- C_TABLENAME
- Default is “concept_dimension”
- C_COLUMNNAME
- Default is “concept_path”
- C_COLUMNDATATYPE
- Default is “T”
- C_OPERATOR
- Default is “LIKE”
- C_DIMCODE
- “Dim” code stands for dimension code
- Technically concept_path in concept_dimension table equates to c_dimcode.
- It is important to remember to keep c_dimcode the same as c_fullname, but beware that this is a simplification
- For more advanced behavior, you may need to allow your c_fullname and c_dimcode to diverge, but we will not cover it in this presentation
- Might want to ask Lori Phillips <LCPHILLIPS@PARTNERS.ORG> to chime in with more details here
- C_COMMENT
- I do not remember whether and how it gets rendered in the webclient
- Ok to leave blank
- C_TOOLTIP
- Self-explanatory - appears on hover-over in the webclient
- Ok to leave blank
- M_APPLIED_PATH
- Default is “@”
- UPDATE_DATE
- Date in the format of “dd-mmm-yy”
- DOWNLOAD_DATE
- Date in the format of “dd-mmm-yy”
- IMPORT_DATE
- Date in the format of “dd-mmm-yy”
- SOURCESYSTEM_CD
- You can put in the name of your organization
- VALUETYPE_CD
- Ok to leave blank
- M_EXCLUSION_CD
- Ok to leave blank
- C_PATH
- Ok to leave blank
- C_SYMBOL
- Ok to leave blank
- C_HLEVEL
- The table might look a little intimidating at first glance, but once you understand the idea behind it, it is straightforward to work with. Here are the main columns in the ontology (also frequently referred to as the metadata) table:
- As an example, let’s go through the process of adding a U07.1 “COVID-19” diagnosis to your existing i2b2 ontology
- Let’s assume that you are starting with an existing diagnoses ontology and c_fullname for your Diagnosis folder looks something like “\i2b2\Diagnosis\”
- We will need to add a folder for the U chapter to accommodate our U07.1 code because up until April 1, 2020, ICD-10-CM did not have any codes beginning with U.
- C_hlevel will be 2 because the root (“\i2b2\) is 0 and your “Diagnosis” folder is 1
- C_fullname will be “\i2b2\Diagnosis\(U00-U85)\”
- C_name is “Codes for special purposes”
- C_synonym is kept at “N”
- C_visualattributes is “FA “ for Active Folder, and it’s a folder because it will have “children” beneath it
- Lastly, you can add the optional c_basecode as “ICD10CM:U00-U85”
- All other values should be set in accordance with descriptions above, being consistent with other rows of your ontology file and being mindful of keeping c_dimcode the same as c_fullname
- If at this point you can add this row to your i2b2 metadata table, commit the change and look at your webclient - you will see a brand new “Codes for special purposes” folder in your Diagnosis folder. It’s that easy!
Let’s add remaining rows. The pertinent info is:
C_HLEVEL
C_FULLNAME
C_NAME
C_VISUALATTRIBUTES
C_BASECODE
2
\i2b2\Diagnosis\U00-U85\
Codes for special purposes
FA
ICD10CM:U00-U85
3
\i2b2\Diagnosis\U00-U85\U00-U49\
Provisional assignment of new diseases of uncertain etiology or emergency use
FA
ICD10CM:U00-U49
4
\i2b2\Diagnosis\U00-U85\U00-U49\U07\
Conditions of Uncertain Etiology
FA
ICD10CM:U07
5
\i2b2\Diagnosis\U00-U85\U00-U49\U07\U07.1
2019-nCoV acute respiratory disease (COVID-19)
LA
ICD10CM:U07.1
- In this table, notice the last row with c_visualattribute of “LA “ - that’s the last element in this sub-tree (or hierarchy) and so it’s the leaf.
- There is one more crucial step in this process. If you recall, we said that c_basecode was optional. But you will recognize the content of that column as being the exact observation codes you will be storing in your observation_fact table. There must be a way to link ontology concepts with codes in observation_fact table. I2b2 accomplishes that via concept_dimesion table. So in addition to adding new ontology concepts, you also must make necessary changes to your concept_dimension table.
- Making changes to concept_dimension table is covered in one of the subsequent sections - please see below.
- Let’s assume that you added the desired concepts to your ontology table and made the necessary changes to your concept_dimension table. Now, when you load patient observations into your observation_fact table and populate the concept_cd with “ICD10CM:U07.1” you will be able to query for patients with COVID-19 diagnosis.
- Address c_fullname vs c_dimcode
- How to deploy, incl. TABLE_ACCESS; in multiple tables
- How to make changes
Add Comment