You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 3
Next »
Optimization of i2b2 query strategy
i2b2 queries often contain multiple attributes which are combined with AND clauses (e.g. "diagnosis = diabetes AND age <= 18 years"). The i2b2 CRC cell queries these attributes consecutively from the OBSERVATION_FACT table, using the result sets of each query as a base for the next query. Thus the amount of patients or encounters that need to be evaluated decrease over each step of the query plan. In order to narrow down the search space quickly, it would be ideal to first query for attributes occurring rarely in the overall dataset. While this optimization is supported by i2b2, the platform needs to know about the frequency of each concept in the overall dataset to make use of it.
The ontology table "I2B2" contains the column "C_TOTALNUM" for this purpose. It provides the count of distinct patients for each concept in the overall dataset. The CRC utilizes this column to sort the order of query steps in the overall query plan in order to first query for attributes with low occurence. However, the C_TOTALNUM column is not routinely filled during loading of the OBSERVATION_FACT table, which leads to this optimization not being applied in many cases.
The following SQL Script can be exected at the end of an ETL (extraction, transformation, loading) pathway to calculate all value for the C_TOTALNUM column automatically. The script was developed on an Oracle 11g platform but should be easily portable to other relational databases.
-- SQL Script to calculate the column i2b2.c_totalnum
-- 1. Backup the i2b2 table
CREATE TABLE t_i2b2 AS (SELECT * FROM i2b2);
-- 2a. Calculate count of distinct patients for all branches and leaves of concept hierarchy (regular concepts
CREATE TABLE t_totalnum AS
SELECT i2b.c_tablename,
i2b.c_fullname,
COUNT(DISTINCT obs.patient_num) AS c_totalnum
FROM observation_fact obs
JOIN concept_dimension con ON obs.concept_cd = con.concept_cd
JOIN i2b2 i2b ON con.concept_path LIKE i2b.c_fullname || '%'
GROUP BY i2b.c_tablename,i2b.c_fullname;
-- 2b. Calculate count of distinct patients for all branches and leaves of concept hierarchy (modifier concepts
INSERT INTO t_totalnum (c_tablename, c_fullname, c_totalnum)
SELECT i2b.c_tablename,
i2b.c_fullname,
COUNT(DISTINCT obs.patient_num) AS c_totalnum
FROM observation_fact obs
JOIN modifier_dimension con ON obs.modifier_cd = con.modifier_cd
JOIN i2b2 i2b ON con.modifier_path LIKE i2b.c_fullname || '%'
GROUP BY i2b.c_tablename,i2b.c_fullname;
-- 3. Truncate i2b2 table
TRUNCATE TABLE i2b2;
-- 4. Repopulate the i2b2 table from its backup and the generated counts
INSERT INTO i2b2 (c_hlevel, c_fullname, c_name, c_synonym_cd,
c_visualattributes, c_totalnum, c_basecode, c_metadataxml,
c_facttablecolumn, c_tablename, c_columnname, c_columndatatype,
c_operator, c_dimcode, c_comment, c_tooltip, m_applied_path,
update_date, download_date, import_date, sourcesystem_cd, valuetype_cd,
m_exclusion_cd, c_path, c_symbol)
SELECT i2b.c_hlevel,
i2b.c_fullname, i2b.c_name, i2b.c_synonym_cd, i2b.c_visualattributes,
NVL(tnm.c_totalnum, 0), i2b.c_basecode, i2b.c_metadataxml,
i2b.c_facttablecolumn, i2b.c_tablename, i2b.c_columnname,
i2b.c_columndatatype, i2b.c_operator, i2b.c_dimcode, i2b.c_comment,
i2b.c_tooltip, i2b.m_applied_path, i2b.update_date, i2b.download_date,
i2b.import_date, i2b.sourcesystem_cd, i2b.valuetype_cd,
i2b.m_exclusion_cd, i2b.c_path, i2b.c_symbol
FROM t_i2b2 i2b
LEFT JOIN t_totalnum tnm ON i2b.c_tablename = tnm.c_tablename AND i2b.c_fullname = tnm.c_fullname;
-- 5. Drop temporary tables
DROP TABLE t_totalnum;
DROP TABLE t_i2b2;
-- 6. Commit changes
COMMIT;