Health Ontology Mapper
Space shortcuts
Space Tools

 

 

 

 

 

 

 

 

 

Health Ontology Mapper (HOM) Installation Document – 2.1

 

 

 

 

 

Table of Contents

 

Table of Contents

Document Version History

Pre-requisites for Health Ontology Mapper (HOM)

Mapping Interpreter Install and Execution Instructions

Configuring Disposition Maps

Configuring BioPortal Maps

Installing HOM in a SQL Server Environment

Parallel processing of HOM maps

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Document Version History

 

Date

Version

Description

Author(s)

03/18/2010

1.0

Initial version

Prakash Lakshminarayanan

03/31/2010

1.1

Updated with I2B2 schema upgrade info for HOM and patch install instructions for Observation Fact View

Prakash Lakshminarayanan

04/02/2010

1.2

Updated with SQL Server conversion notes

Elishema Fishman (UW)

04/14/2010

1.2.1

Added note on Oracle database

Prakash Lakshminarayanan

04/30/2010

1.3

Release 1.3

Prakash Lakshminarayanan

05/21/2010

1.4

Added support for NDC codes and writing to file

Prakash Lakshminarayanan

06/18/2010

1.4.1

Removed patch for Observation Fact View

Prakash Lakshminarayanan

07/19/2010

1.4.2

Removed alter scripts

Prakash Lakshminarayanan

07/29/2010

1.5

Added pre-requisites & additional RxNorm config info

Prakash Lakshminarayanan

08/16/2010

1.6

Added info on configuring Disposition maps

Prakash Lakshminarayanan

08/29/2011

2.0

Added info on configuring BioPortal maps & minor changes

Prakash Lakshminarayanan

09/19/2011

2.1

Added support for parallel processing of maps

Prakash Lakshminarayanan

 

 

Pre-requisites for Health Ontology Mapper (HOM)

 

  • Sun JDK 1.5

http://java.sun.com/javase/downloads/index_jdk5.jsp

 

  • Working i2b2 environment

https://www.i2b2.org/software/index.html

 

Mapping Interpreter Install and Execution Instructions

 

  • The Mapping Interpreter component of HOM allows for source data translations into target format using instance maps.  Download Mapping Interpreter executable (MappingInterpreter-2.1.jar) from downloads section.

 

  • Extract the executable onto a local folder using the following command:

jar xvf MappingInterpreter-2.1.jar

 

  • Create HOM schema (HOM_Schema_Design.gif) either from scratch or upgrade existing I2B2 schema per your database environment.

Execute the following scripts in the given order to apply HOM schema changes onto existing I2B2 schema running on Oracle:

  1. scripts\oracle\ I2B2DEMODATA_HOM_Upgrade_Oracle.sql (for i2b2 1.3 schema)

or

scripts\oracle\I2B2DEMODATA_HOM_Upgrade_Oracle_For_i2b2_16.sql (for i2b2 1.6 schema)

  1. scripts\oracle\ OntoMapper_Metadata_Insert_Oracle.sql
  2. scripts\oracle\ Pharmacy_Cpt_Oracle.sql
  3. scripts\oracle\ Pharmacy_Obs_Oracle.sql

 

Execute the following scripts in the given order to create new I2B2 1.3 schema with HOM upgrade for Sybase IQ:

  1. scripts\sybase\I2B2DEMODATA_IQ.sql
  2. scripts\sybase\I2B2METADATA_IQ.sql
  3. scripts\sybase\I2B2WORKDATA_IQ.sql
  4. scripts\sybase\I2B2HIVE_IQ.sql
  5. scripts\sybase\I2B2_Metadata_Insert_IQ.sql
  6. scripts\sybase\OntoMapper_Metadata_Insert_IQ.sql
  7. scripts\sybase\Pharmacy_Cpt_IQ.sql
  8. scripts\sybase\Pharmacy_Obs_IQ.sql

 

Execute the following scripts in the given order to create new I2B2 1.3 schema with HOM upgrade for Oracle:

  1. scripts\oracle\I2B2DEMODATA_Oracle.sql
  2. scripts\oracle\I2B2METADATA_Oracle.sql
  3. scripts\oracle\I2B2WORKDATA_Oracle.sql
  4. scripts\oracle\I2B2HIVE_Oracle.sql
  5. scripts\oracle\I2B2_Metadata_Insert_Oracle.sql
  6. scripts\oracle\OntoMapper_Metadata_Insert_Oracle.sql
  7. scripts\oracle\Pharmacy_Cpt_Oracle.sql
  8. scripts\oracle\Pharmacy_Obs_Oracle.sql

Note: For Oracle the following database link needs to be created in the i2b2demodata schema per your environment prior to running the Pharmacy*.sql scripts. 

create public database link i2b2metadata connect to i2b2metadata identified by <i2b2metadata_password> using '<db_name>'

 

  • Configure the database properties for i2b2demodata and i2b2metadata schemas per your environment in mapper.properties (found under the extraction directory). 

 

  • Change the file (mapper instance xml file) upload folder property (UPLOAD_DIR) in mapper.properties (found under the extraction folder) per your environment.

 

  • Configure the logger properties in log4j.properties found under extraction folder.

 

  • Configure the RxNormMapper.xml file found under xml folder.  The ‘sourceConceptPath’ and ‘expression’ tags in RxNormMapper.xml file need to be configured as shown prior to execution.  The ‘expression’ tag can also be configured with a different drug name and its corresponding ‘sourceConceptPath’.  Per the RxNorm data load scripts (Pharmacy_xx_xx.sql) mentioned above the concept path for any drug can be located by looking up the CONCEPT_PATH column against the drug name mentioned in the NAME_CHAR column. These columns are referenced from   CONCEPT_DIMENSION table available in the i2b2demodata schema.  When changing concept paths the ‘name’ tag under ‘provenance’ needs to be updated with a new name for the map. 

  < inputSet >

    < sourceConceptSet >

< sourceConceptPath > \00Pharmacy\000000000P\0013118446\

</ sourceConceptPath >

    </ sourceConceptSet >

  </ inputSet >

 

  < scriptlet language = "RxNorm" >

< column sourceColumn = "tval_char" targetColumn = "tval_char" sourceType = "string" targetType = "string" >

  < expression > protease </ expression >

  </ column >

  </ scriptlet >

 

Note: For proper RxNorm map execution the concept_cd & name_char combination in CONCEPT_DIMENSION should match with concept_cd & tval_char combination in OBSERVATION_FACT for any drug concept path (CONCEPT_DIMENSION) specified in the xml via ‘sourceConceptPath’ element.

 

  • Configure the property ‘RxNORM_NDC’ in mapper.properties (found under the extraction folder) if you need RxNorm processing by NDC codes instead of drug names.  In this scenario the ‘expression’ tag in RxNormMapper.xml should contain the NDC code of the drug to be processed.  Also ensure that the NAME_CHAR column in CONCEPT_DIMENSION and TVAL_CHAR column in OBSERVATION_FACT is populated with NDC code for the drug rather than its name.

 

  • To improve the performance of the Mapping Interpreter the output of RxNorm processing could be written to a file and later bulk loaded onto the database.  This could be configured via the ‘WRITE_TO_FILE’ property found in mapper.properties.  Enabling this property generates a file OBSERVATION_FACT_x_y.txt in the upload folder configured earlier.  The ‘x’ in the filename indicates the id for the executed mapper xml file from MAP_DIMENSION table and ‘y’ indicates the file sequence no. when multiple files are generated as result of ‘FILE_SIZE_LIMIT’ specified in mapper.properties.  The generated files have to be bulk loaded onto OBSERVATION_FACT table.  The column sequence (of the table) in which data is written is included as the first line of each file.  Also, the column separator in the generated data files could be configured via the ‘COL_SEPARATOR’ property in mapper.properties. 

 

  • Execute the Mapping Interpreter (from the extraction directory) by passing requisite parameters.  The Mapping Interpreter takes an xml file (mapper instance file) as input.

java –Xms512m –Xmx1024m com.ucsf.mapper.common.OntoMapInterpreter file=xml\RxNormMapper.xml

 

  • If the interpreter could not be run due to class-path issues, alternatively, the Mapping Interpreter jar could be re-created to include the modified configurations and executed as a jar from the extraction folder.

jar cmf META-INF\MANIFEST.MF MappingInterpreter-2.1.jar .

java –Xms512m –Xmx1024m -jar MappingInterpreter-2.1.jar file=xml\RxNormMapper.xml

 

  • Once a mapper instance file is executed it could be re-run by supplying its id (MAP_ID) found in MAP_DIMENSION table of I2B2DEMODATA schema.

java –Xms512m –Xmx1024m com.ucsf.mapper.common.OntoMapInterpreter id=1

or

java –Xms512m –Xmx1024m -jar MappingInterpreter-2.1.jar id=1

 

  • To execute the mapper for all the drugs in database use RxNormMapperBulk.xml file (found under xml folder) after configuring ‘sourceConceptPath’ as shown (per the included Pharmacy data scripts).  For other data sources the root node of the drug concept paths in CONCEPT_DIMENSION table (i2b2demodata schema) needs to be mentioned here.  Also ensure that wildcard character ‘*’ is used in ‘expression’ tag and the ‘language’ attribute of the scriptlet is ‘RxNormBulk’ for bulk execution.  

  < inputSet >

    < sourceConceptSet >

< sourceConceptPath \00Pharmacy\ </ sourceConceptPath >

    </ sourceConceptSet >

  </ inputSet >

 

  < scriptlet language = "RxNormBulk" >

< column sourceColumn = "tval_char" targetColumn = "tval_char" sourceType = "string" targetType = "string" >

  < expression > * </ expression >

  </ column >

  </ scriptlet >

 

  • The mapper could also be executed for all drugs starting with a particular alphabet by configuring ‘sourceConceptPath’ as shown (per the included Pharmacy data scripts).  This approach could be extended to other data sources by specifying the ‘sourceConceptPath’ accordingly.

< sourceConceptPath > \00Pharmacy\000000000A\ </ sourceConceptPath >

 

Note: All new RxNorm maps should be given a new filename apart from a unique map name in the ‘name’ tag under ‘provenance’ section of the xml.

Configuring Disposition Maps

  • Disposition maps allow for conversion of local discharge disposition codes to HL7 standards.  For disposition maps the ‘sourceConceptPath’ tag needs to be configured similar to RxNorm bulk maps.   All records matching the specified ‘sourceConceptPath’ would be taken up for Disposition processing.

< sourceConceptPath > \Dispositio\ </ sourceConceptPath >

 

  • The scriptlet ‘language’ attribute should be configured as ‘Disposition’ and the ‘expression’ tag should consist of the JEXL (Java Expression Language) expression used to convert the disposition codes to HL7 standard.

  < scriptlet language = "Disposition" >

< column sourceColumn = "tval_char" targetColumn = "tval_char" sourceType = "string" targetType = "string" >

  < expression > tval_char.equalsIgnoreCase(' rou ') ? '1'

  : (tval_char.equalsIgnoreCase(' hos ') ? '2'

  : (tval_char.equalsIgnoreCase(' snf ') ? '3'

  : (tval_char.equalsIgnoreCase(' oto ') ? '30'

: (tval_char.equalsIgnoreCase(' ltc ')

|| tval_char.equalsIgnoreCase(' reh ')

|| tval_char.equalsIgnoreCase(' rhi ')

|| tval_char.equalsIgnoreCase(' otf ')

|| tval_char.equalsIgnoreCase(' oth ') ? '5'

  : (tval_char.equalsIgnoreCase(' icf ')

|| tval_char.equalsIgnoreCase(' hhc ')

|| tval_char.equalsIgnoreCase(' hhe ') ? '6'

  : (tval_char.equalsIgnoreCase(' ama ') ? '7'

  : (tval_char.equalsIgnoreCase(' lpp ')

|| tval_char.equalsIgnoreCase(' srh ') ? '14'

  : (tval_char.equalsIgnoreCase(' dec ') ? '20'

  : ''))))))))

  </ expression >

  </ column >

</ scriptlet >

 

  • The ‘sourceColumn’ attribute of the column tag could either be tval_char or nval_num depending on the column having the discharge code.  If nval_num is used for the conversion the ‘sourceType’ and ‘columnType’ attributes should be changed to ‘double’.

Configuring BioPortal Maps

 

  • BioPortal maps operate completely off BioPortal web-site based on point-to-point maps defined on the site.  For BioPortal maps the ‘sourceConceptPath’ tag needs to be configured similar to RxNorm maps.   All records matching the specified ‘sourceConceptPath’ would be taken up for BioPortal processing.

<sourceConceptPath>\DataSource\000000o624\0000000on7\0000000sex\</sourceConceptPath>

 

  • The targetOntologyId element needs to be configured within targetDataModel with the BioPortal Ontology Id of the target ontology.

<targetDataModel>

  <targetOntologyId>1649</targetOntologyId>

</targetDataModel>

 

  • The scriptlet ‘language’ attribute should be configured as ‘BioPortal’ and the ‘expression’ tag should consist of the REST URI of the BioPortal term representing the ‘sourceConceptPath’ defined above.  Also, the scriptlet ‘sourceOntologyId’ attribute needs to be configured with BioPortal Ontology Id of the source ontology.  And the ‘apiKey’ scriptlet attribute needs to be configured with a valid BioPortal apiKey.  The ‘sourceColumn’ attribute of the column tag could either be tval_char or nval_num depending on the column having the code to be processed.  If nval_num is used for the conversion the ‘sourceType’ and ‘columnType’ attributes should be changed to ‘double’

< scriptlet language = "BioPortal" sourceOntologyId = "1648" apiKey = "24e47498-54e0-11e0-9d7b-005056aa3316" >

< column sourceColumn = "tval_char" targetColumn = "tval_char" sourceType = "string" targetType = "string" >

< expression > http://re s t.bioontology.org/bioportal/virtual/ontology/1648?conceptid=OSHPD_0000000sex </ expression >

</ column >

</ scriptlet >

 

Note: BioPortal maps work only with i2b2 1.6 schema.  Do refer to earlier section ‘Mapping Interpreter Install and Execution Instructions’ for applying HOM schema changes to i2b2 1.6 schema.

Installing HOM in a SQL Server Environment

 

The HOM project offers two schemas, Sybase and a back ported version of an Oracle database.  If installing i2b2 in a Sybase environment, the HOM schemas do not need to be altered. 

The Sybase and SQL Server tables are quite similar, and therefore require few changes to the schema set.   For this reason, it is recommended to install the Sybase version schemas that are shipped with HOM in a SQL server environment.  ( Note : It is assumed that all the type changes were initially updated when converting to the Sybase schema from the Oracle version.)

 

When converting to SQL Server, the following changes must be made to the Sybase schema:

  • Groupings : Parentheses must be added intermittently.
  • Index updates : In Sybase one can index an arbitrary number of column values.  In SQL server there is a MAX index of 15, so an arbitrary index is not possible.   Because there is no way to translate this change in indexes these optimizations are no longer possible, and there may be unknown ramifications.
  • Translation to large variable length strings :  In Sybase, the value for VAR CHAR is 8192, which is outside the range of acceptable values in SQL Server.  In the SQL Server schema, the VAR CHAR value must be switched to MAX, around 9600 (may be hardware dependent).    
  • Sequences disparities : Sybase uses an auto-increment feature for Sequences, while SQL Server uses Identity.  This must be switched manually in the schema.

 

Noteworthy concerns:

  • i2b2 ships with SQL server and Oracle, but after these changes have been made i2b2 will be running the Sybase version of schemas.   The various updates to the schema set may result in various type differences (e.g., a 20 bit integer in the original version may result in a 32 or 16 bit integer in the Sybase version).   These potential misalignments are not completely understood; though, it is presumed that these misalignments were thoughtfully considered and discussed before implementation.  
    • Result : You may potentially notice an impact to data imports, if you were to pull in data from the original SQL Server schema because the amount of data inside of the system makes it difficult to determine the variation in data types and precision. 

 

  • The XML_Result_ID column of the QT_XML_Result table in the i2b2demodata schema failed to import.  This resulted in the failed import of the entire table.   Further investigation revealed that these tables are not used, so this issue is unimportant.   (NOTE: this could be unique to the UW environment, but it may be an issue that is fundamental to Sybase and SQL Server)  

 

Parallel processing of HOM maps

 

HOM maps could be run in parallel i.e. multiple HOM maps could be initiated simultaneously against the same database.  The status of such maps could be monitored via the STATUS column of MAP_DIMENSION table.  When maps are initiated a new entry goes into the MAP_DIMENSION table with status ‘P’ for processing and when the maps complete the status changes to ‘C’ for completed.  The status of sequential maps could also be monitored the same way.