Page History

...

Column indexes are a long established method to improve query performance by speeding up the selection of relevant rows as well as join operations. In order to achieve this effect, the distinct values of selected table columns are stored separately, together with pointers to table rows containing each value. The database can leverage the often smaller size of these index structures to more quickly locate relevant data rows, instead of having to scan the full original table. Indexes are often stored in specialized structures (e.g. binary trees) which are optimized for quick access. Indexes can be constructed on single columns as well as multiple columns (composite indexes), which are concatenated to form the actual index value. Composite indexes provide an additional performance benefit when multiple columns are queried that are all part of the composite index, as only a single index structure needs to be examined, instead of several. However, composite indexes can only be applied if the provided search criteria are matching the order of columns in the index (e.g. if a composite index is concatenated out of columns A, B and C, it cannot be applied if column A is not part of the query).

Whether the database actually uses indexes or not is decided by the query optimizer at runtime, depending on table statistics and cached performance data from previous queries, and explicit "optimizer hints" that can be inluded in SQL queries, among others. The query strategy actually chosen can be examined at runtime (e.g. with the Oracle EXPLAIN PLAN statement) or by inspecting query logs.

...

Index Name	Task	Contained columns
OBSERVATION_FACT_PK	primary key	ENCOUNTER_NUM, CONCEPT_CD, PROVIDER_ID, START_DATE
FACT_CNPT_PAT_ENCT_IDX	optimize queries by concept code	CONCEPT_CD, INSTANCE_NUM, PATIENT_NUM, ENCOUNTER_NUM
FACT_NOLOB	optimize queries by patient id and start date	PATIENT_NUM, START_DATE, CONCEPT_CD, ENCOUNTER_NUM
FACT_PATCON_DATE_PRVD_IDX	optimize queries by patient id and concept code	PATIENT_NUM, CONCEPT_CD, START_DATE, END_DATE
OF_CTX_BLOB	optimize fulltext queries to BLOB column	OBSERVATION_BLOB

With the exception of the OF_CTX_BLOB index, all default indexes are composites covering multiple columns. As mentioned above, composite indexes can only be applied when query criteria are included that match the order in which the composite index is concatenated. E.g. the FACT_NOLOB index would speed up searching for a concept code only when at least a patient number and a start date is also included (which precedes the concept code in the concatenation). This leaves only the FACT_CNPT_PAT_ENCT_INDEX to support queries by concept code without the necessity to include additional mandatory columns. The OF_CTX_BLOB column is a special "fulltext" index that relates only to the OBSERVATION_BLOB

...

column, which can contain long texts (e.g. discharge letters, radiology findings) and is not filled for the majority of typical i2b2 data items.

As described above, only one index is useful for queries by concept code. The application of this index is also hardcoded into CRC query SQL by means of an optimizer hint forcing the query optimizer to use this index even if it would usually decide otherwise. However, the index only covers concept codes, not modifier codes. As modifiers have gained increasing importance since i2b2 v1.6, it is recommended to drop this index and replace it with 2 new indexes covering both concept as well as modifier codes. Dropping renaming the original index allows the query optimizer to freely choose appropriate indexes without having to remove the offending optimizer hint from the CRC cell source code.

The following Oracle SQL script drops the FACT_CNPT_PAT_ENCT_IDX and replaces it with 2 separate indexes for concept and modifier codes:

Code Block

language	csharp
collapse	true

DROP INDEX fact_cnpt_pat_enct_idx;
CREATE INDEX idrt_fact_cnpt_pat_enct_idx ON observation_fact (concept_cd, instance_num, patient_num, encounter_num);
CREATE INDEX idrt_fact_mdf_pat_enct_idx  ON observation_fact (modifier_cd, instance_num, patient_num, encounter_num);
COMMIT;

Info
The script should be invoked with the user credentials of the schema containing the i2b2 ontology table.

Related information: Partitioning of tables and indexes can provide further performance improvements. The following section contains a modified script to create partitioned indexes. Also, the ETL process can be sped up by dropping indexes before loading and recreating them afterwards. asd

Versions Compared

Old Version 7

New Version 8

Key

Pages

Recently updated