Ontology Tools
Space shortcuts
Space Tools

Informatics for Integrating Biology and the Bedside

 

 

 

 

 

 

 

 

i2b2 Ontology Mapper Data Installation Guide

(Linux/Oracle/SQL Server/Postgresql)

Data

 

 

 

 

 

 

Document Version:

1.1.0

i2b2 Software Version:

1.7.01

 

Table of Contents

Document Management

About this Guide

1. Pre-Requisites

1.1 Required Software

1.2 Assumptions

1.2.1 PM setup

2. Creating i2b2 MapData TABLES

2.1 Unzip the data package

2.2 Create Mapdata tables and load data

2.3 Next Steps

License

 

 

Document Management

 

Revision Number

Date

Author

Description of change

1.0.0

08/06/14

Lori Phillips

Created 1.0 version of document.

1.1.0

03/18/15

Lori Phillips

Updated for i2b2 MetaMap project

 

 

 

 

 

About this Guide

Informatics for Integrating Biology and the Bedside (i2b2) is one of the sponsored initiatives of the NIH Roadmap National Centers for Biomedical Computing ( http://www.bisti.nih.gov/ncbc/ ) .

One of the goals of i2b2 is to provide clinical investigators broadly with the software tools necessary to collect and manage project-related clinical research data in the genomics age as a cohesive entity; a software suite to construct and manage the modern clinical research chart.  This guide with the provided database scripts will help you to learn how to install the i2b2 data package for the optional Ontology Mapper (MAP) cell.

 

 

  1. Pre-Requisites
    1.     Required Software

The i2b2 database can be setup in either an Oracle, SQLServer or Postgres environment.  This document assumes that a working i2b2 hive is already in place and you are adding the MAP cell and its required data.

 

1.2         Assumptions

1.2.1 PM setup

 

The data in this package provides data for a domain of i2b2dem and a project id of Demo.

The assumption is that the PM service has been configured as such.  You can reference the PM Installation Guide for further information about setting up the PM.

You will need to ensure that group Demo has been created as shown in the Project Data section of the PM Installation Guide .  Once you have verified the projects are setup you can once again reference the PM Installation Guide to assign users to the Demo project id.

This package also assumes that a user with EDITOR role has been configured for the Demo project id.

  1. CREATing i2b2 MAPData TABLES

This chapter outlines the process of creating the i2b2 map tables. If you are using the Mapping VM you can skip to section 2.2 step 4 where you load your own data. Since the tables in the VM are prepopulated with demo data, you will have to truncate the tables first and then load your data.  All passwords are on the VM are ‘demouser’.

 

2.1         Unzip the data package

The package contains a folder called Automated_Data_Package. Under this directory are folders for Mapdata, and Hivedata.  These folders map to the following schemas:

 

Directory Folder

Schema

Project

Mapdata

i2b2mapdata

Demo

Hivedata

i2b2hive

 

 

Create a user account i2b2mapdata and i2b2mapdata schema much in the same way you did for the i2b2metadata user / i2b2metadata schema.  Please consult the general i2b2 installation guide if necessary:

https://community.i2b2.org/wiki/display/getstarted/3.3+Create+Database+User+Accounts

 

2.2         Create Mapdata tables and load data

The following outlines the steps to take to create the mapdata tables and load the demo data into them.

 

  1. cd Mapdata ’ of your working directory
  1. Create the mapdata tables, indexes and sequences by executing the scripts found in create_MapTables_oracle.sql or create_MapTables_sqlserver.sql

 

  1. Load the demo map data by loading the similarly-named flat files into each table.

Load i2b2_map.txt into i2b2_MAP, etc.

 

Both the i2b2_MAP and MAP_TABLE_ACCESS are prepopulated with UMLS ICD9 “map_to” data..

 

TABLE

CONTENT

I2B2_MAP

Prepopulated to the correct “map to” diagnosis (ICD9) data. It is modified so all leaf nodes are folders.

If populating with your ‘map to’ data, modify so all leaf nodes are set to folders ; e.g set c_visualattributes = ‘FAE where c_visualattributes = ‘LAE’

MAP_TABLE_ACCESS

Points to the root node of the I2B2_MAP Table.

 

 

Load your own data as follows:

TABLE

CONTENT

INTEGRATION

Contains the ontology you are mapping to (*see Note below).  In our demo installation it contains a copy of diagnosis (ICD9) data. Ideally it should have c_basecodes in your local format if it differs from the default format of ‘ICD9:xxx.yy’  If you are creating mappings for the  PCORI project, be sure to preserve and populate the PCORI_BASECODE field (basecodes in original PCORI format).

As a precaution run the following on the table after creation*.

update integration

set sourcesystem_cd = ‘Integration’

where sourcesystem_cd = ‘Integration_tool’

PROJECT_ONT_MAPPING

Contains mapping data where destination indicates ‘Mapped to” and source indicates terms you wish to map  (**see Note below) Our demo installation preconfigures this table to a small set of local diagnoses codes.

SCHEMES

Enter the schemes for both the source (wish to map) and destination (mapped to) ontologies

  • *Note: If the content of the Integration table is itself a product of a prior mapping, modify the sourcesystem_cd from ‘Integration_tool’ to ‘Integration’.  You are highly encouraged to perform this step as a precaution.
  • **Note: When loading the project_ont_mapping table with your own data, refer to the Addendum section later in this document for details.

 

Create Hive tables and load data

The following outlines the steps to take to create the hive tables and load the data into them.

 

  1. cd Hivedata ’ of your working directory
  2. Create the mapdata tables, indexes and sequences by executing the scripts found in create_i2b2hive_tables_oracle.sql or create_i2b2hive_tables_sqlserver.sql

 

  1. Load the data.

Load ont_map_db_lookup.txt into ONT_MAP_DB_LOOKUP

 

The data loaded into the i2b2hive ont_map_ db_lookup tables presumes that the default target location pointing to the hive you are currently setting up is “ i2b2demo ” and the project_path is “Demo/”.  Modify these fields and the datasource name to match your configuration.

 

  • This target location is also referred to as the domain of the hive and should match the domain that was setup initially for your i2b2 instance.

 

 

In the client i2b2.properties file:

I2b2.1= i2b2D emo,REST,http://54.146.154.235:9090/i2b2/rest/PMService/

#I2b2.2=YourSite,REST,http://jbossHost:jbossPort/i2b2/rest/PMService/

 

The I2b2.1 target location (“ i2b2D emo ”) points to the demonstration i2b2 MetaMap hive.  The hive you are currently setting up is I2b2.2 . in the properties file. Please be sure to rename “ YourSite ” to your domain.

 

2.3         Next Steps

At this point in the installation process you are ready to proceed with the Ontology Mapper Cell installation.

 

 

 

 

 


ADDENDUM: Ontology Mapping Table
s

 

 

PROJECT_ONT_MAPPING

COLUMN NAME

DATA TYPE

(ORACLE)

DATA TYPE

(SQL)

SOURCE_CODING_SYSTEM

VARCHAR2(50)

VARCHAR(50)

SOURCE_BASECODE

VARCHAR2(50)

VARCHAR(50)

SOURCE_NAME

VARCHAR2(2000)

VARCHAR(2000)

SOURCE_FULLNAME

VARCHAR2(700)

VARCHAR(700)

SOURCE_TOOLTIP

VARCHAR2(900)

VARCHAR(700)

SOURCE_TABLE_CD

VARCHAR2(25)

VARCHAR(25)

SOURCE_KEY

VARCHAR2(725)

VARCHAR(725)

DESTINATION_CODING_SYSTEM

VARCHAR2(50)

VARCHAR(50)

DESTINATION_BASECODE

VARCHAR2(50)

VARCHAR(50)

DESTINATION_NAME

VARCHAR2(2000)

VARCHAR(2000)

DESTINATION_FULLNAME

VARCHAR2(700)

VARCHAR(700)

DESTINATION_TABLE_CD

VARCHAR2(25)

VARCHAR(25)

DESTINATION_KEY

VARCHAR2(725)

VARCHAR(725)

MAPPING_SOURCE

VARCHAR2(50)

VARCHAR(50)

VAR_FLAG

VARCHAR2(25)

VARCHAR(25)

FLAG

INT

INT

STATUS_CD

VARCHAR2(25)

VARCHAR(25)

UPDATE_DATE

DATE

DATETIME

C_TOTALNUM

INT

INT

 

 

 

 

Definition of Fields in the Ontology Mapping Table

 

source_coding_system

source_coding_system is coding system we wish to map from. Examples include ICD9 (for diagnoses), or NDC (for medications). Or it may be any number of other coding systems, even home-grown ones.   If there is no source_coding_system name set it to (null).

source_basecode

source_basecode is the term that describes the ontological concept within the source coding system. This may be an ICD9 code (for diagnoses), or an NDC code (for medications) . Or it may be any number of other coding systems, even home-grown ones.  The basecode should be set to the same format as it would appear in your fact table. 

source_name

source_name is the descriptive text value for the term we wish to map. It is what is displayed in the i2b2 user interface.

source_tooltip

source_ tooltip describes the hierarchy of ontological concept within its source hierarchy .  It is displayed as a tooltip within the mapping tool user interface to give scope to the term to be mapped.  If none exist, set it to (null).

source_table_cd (optional)

source_ table_cd has been added to assist in SHRINE mappings. In the SHRINE adapter mapping xml file, terms are mapped by keys vs. fullnames.  This field stores the table_cd corresponding to the source term’s i2b2 table_access entry.

source_key (optional)

source_ key has been added to assist in SHRINE mappings. In the SHRINE adapter mapping xml file, terms are mapped by keys vs. fullnames.  This field represents the source term’s key as it appears in the source i2b2 system.

 

 

In some cases you may have mappings from a third party.  If so populate the destination columns as follows.  Otherwise set all destination columns to (null).

destination_coding_system

destination_coding_system is coding system we wish to map to. Examples include SNO or ICD10 (for diagnoses), or RXNORM (for medications). Or it may be any number of other coding systems, even home-grown ones.   .   If there is no destination_coding_system name set it to (null).

 

destination_basecode

destination_basecode is the term that describes the ontological concept within the destination_coding system. This may be an ICD10 or SNOMED code (for diagnoses), or an RxNORM code (for medications) . Or it may be any number of other coding systems, even home-grown ones.  It is assumed that the destination_basecode is prepended with the destination coding system name, such as: SNO:85189001.

destination_name

destination _name is the descriptive text value for the term we are mapping to.

destination_fullname

destination_fullname is the hierarchical path that leads to the term for the i2b2 coding system we wish to map to. Below is an example of destination_fullname for the term ‘Acute appendicitis’.  It is shown on several lines but is actually one concatenated line in the destination_fullname field. Each ‘\’ represents another hierarchical level.

 

\Clinical findings

\Acute appendicitis (disorder)

Be sure to set this field to (null) for all unmapped terms.

 

destination_table_cd (optional)

destination _ table_cd has been added to assist in SHRINE mappings. In the SHRINE adapter mapping xml file, terms are mapped by keys vs. fullnames.  This field stores the table_cd corresponding to the SHRINE term’s table_access entry.

destination_key (optional)

destination _ key has been added to assist in SHRINE mappings. In the SHRINE adapter mapping xml file, terms are mapped by keys vs. fullnames.  This field represents the destination term’s key as it appears in the SHRINE system.

mapping_source (optional)

mapping_source represents the originating source of the mapping data or the name of the i2b2 user that modified the mapping.

var_flag (optional)

var_flag is an optional field that may be used to describe the quality of the mapping as may be specified in original (third party) mapping data.  The terms ‘EXACT’, ‘APPROXIMATE’, ‘COMBINATION’ are supported in the visual display as blue, green and yellow flags respectively. Mappings with a null var_flag appear with purple flags.

flag (optional)

flag is an optional field that may be used to store numerical quality of mapping values that may be provided by an external mapping source.  It is not used by the MAP service.  If desired, the numerical flag values may be converted by the i2b2 user to a descriptive term and placed in the var_flag field specified above. 

status_cd

status_cd is used to signify the status of a mapping.  It can have the following values:

status_cd = ‘A’ or null active mapping

  ‘D’ marked for deletion

  ‘V’ verified mapping

update_date (optional)

update_date indicates the date the term’s mapping was modified in some way such as reassigning, deleting or verifying the mapping.

 

 

c_totalnum (optional)

c_totalnum is an optional field that lists the number of patients that have an observation of the term in a project associated with the source scheme.  It is intended as a way to help prioritize which terms need to be mapped. For example, if time or resources are limited you can choose a threshold (“terms that appear in 10 patients or more”)  and map only those terms that meet that threshold.

 


License

The i2b2 source code is licensed under the i2b2 Software License Software. This includes but not limited to all code in the edu.harvard.mgh.i2b2.* package namespace.