i2b2 queries often contain multiple attributes which are combined with AND clauses (e.g. "diagnosis = diabetes AND age <= 18 years"). The i2b2 CRC cell queries these attributes consecutively from the OBSERVATION_FACT table, using the result sets of each query as a base for the next query. Thus the amount of patients or encounters that need to be evaluated decrease over each step of the query plan. In order to narrow down the search space quickly, it would be ideal to first query for attributes occurring rarely in the overall dataset. While this optimization is supported by i2b2, the platform needs to know about the frequency of each concept in the overall dataset to make use of it.
The ontology table "I2B2" contains the column "C_TOTALNUM" for this purpose. It provides the count of distinct patients for each concept in the overall dataset. The CRC utilizes this column to sort the order of query steps in the overall query plan in order to first query for attributes with low occurence. However, the C_TOTALNUM column is not routinely filled during loading of the OBSERVATION_FACT table, which leads to this optimization not being applied in many cases.
The following SQL Script can be exected at the end of an ETL (extraction, transformation, loading) pathway to calculate all value for the C_TOTALNUM column automatically. The script was developed on an Oracle 11g platform but should be easily portable to other relational databases.
-- SQL Script to calculate the column i2b2.c_totalnum -- 1. Backup the i2b2 table CREATE TABLE t_i2b2 AS (SELECT * FROM i2b2); -- 2a. Calculate count of distinct patients for all branches and leaves of concept hierarchy (regular concepts CREATE TABLE t_totalnum AS SELECT i2b.c_tablename, i2b.c_fullname, COUNT(DISTINCT obs.patient_num) AS c_totalnum FROM observation_fact obs JOIN concept_dimension con ON obs.concept_cd = con.concept_cd JOIN i2b2 i2b ON con.concept_path LIKE i2b.c_fullname || '%' GROUP BY i2b.c_tablename,i2b.c_fullname; -- 2b. Calculate count of distinct patients for all branches and leaves of concept hierarchy (modifier concepts INSERT INTO t_totalnum (c_tablename, c_fullname, c_totalnum) |
SELECT i2b.c_tablename, i2b.c_fullname, COUNT(DISTINCT obs.patient_num) AS c_totalnum FROM observation_fact obs JOIN modifier_dimension con ON obs.modifier_cd = con.modifier_cd JOIN i2b2 i2b ON con.modifier_path LIKE i2b.c_fullname || '%' GROUP BY i2b.c_tablename,i2b.c_fullname; -- 3. Truncate i2b2 table TRUNCATE TABLE i2b2; -- 4. Repopulate the i2b2 table from its backup and the generated counts INSERT INTO i2b2 (c_hlevel, c_fullname, c_name, c_synonym_cd, c_visualattributes, c_totalnum, c_basecode, c_metadataxml, c_facttablecolumn, c_tablename, c_columnname, c_columndatatype, c_operator, c_dimcode, c_comment, c_tooltip, m_applied_path, update_date, download_date, import_date, sourcesystem_cd, valuetype_cd, m_exclusion_cd, c_path, c_symbol) SELECT i2b.c_hlevel, i2b.c_fullname, i2b.c_name, i2b.c_synonym_cd, i2b.c_visualattributes, NVL(tnm.c_totalnum, 0), i2b.c_basecode, i2b.c_metadataxml, i2b.c_facttablecolumn, i2b.c_tablename, i2b.c_columnname, i2b.c_columndatatype, i2b.c_operator, i2b.c_dimcode, i2b.c_comment, i2b.c_tooltip, i2b.m_applied_path, i2b.update_date, i2b.download_date, i2b.import_date, i2b.sourcesystem_cd, i2b.valuetype_cd, i2b.m_exclusion_cd, i2b.c_path, i2b.c_symbol FROM t_i2b2 i2b LEFT JOIN t_totalnum tnm ON i2b.c_tablename = tnm.c_tablename AND i2b.c_fullname = tnm.c_fullname; -- 5. Drop temporary tables DROP TABLE t_totalnum; DROP TABLE t_i2b2; -- 6. Commit changes COMMIT; |
The script temporarily truncates the i2b2 table and repopulaates it from a backup. This should be kept in mind if local modifications of i2b2 installations include triggers on this table. The script creates and drops 2 temporary tables during execution. |