IDRT - Integrated Data Repository Toolkit
Space shortcuts
Space Tools

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: i2b2 idrt etl extraction transformation loading optimization index maintenance partition partitions partitioning delta update insert bulk loader jdbc

...

Info

It should be noted that bulk loaders are database specific. If compatibility across several databases is required, it may be necessary to either create several adapted versions of the ETL pathway (affecting only the "final" step of actually loading generated CSV files) or to not use a bulk loader. E.g., the IDRT pathways stick to JDBC components to achieve greater cross-database compatibility)

Partially Loading Fact Data

Truncating and fully reloading fact table data is standard practice in data warehousing. However, for large tables this approach may not be feasible e.g. due to loading duration exceeding the available downtime of the warehouse.

Delta updates are an alternative approach which relies on timestamps or update flags in the source data to selectively insert new rows or update existing rows in the destination table. Depending on the volume of data changed between loading intervals this approach can notably reduce loading duration. However, a reliable source of timestamps/flags is required, which may be difficult to prove (is the source application reliably maintaining all creation or update timestamps with all possible transcations?). Alternatively, trigger-based mechanisms are possible to reliably carry over INSERTs or UPDATEs from a source system into the data warehouse. Delta updates are highly specific to the local setting and relevant source data, so their implementation details are out of scope of this guide.

Especially during the design phase or when new data types are being added to an i2b2 warehouse, it may be necessary to frequently reload only a specific portion of the overall dataset (e.g. only lab data). If the OBSERVATION_FACT table has been partitioned (as described in the page on query optimization), individual partitions can be quickly truncated, which can be notably faster than issuing DELETE statements for individual records or wildcarded concept codes.

The following Oracle SQL statement selectively truncates the LAB partition (laboratory findings):

Code Block
languagesql
ALTER TABLE observation_fact TRUNCATE PARTITION lab;
Info

The script should be invoked with the credentials of the schema containing the OBSERVATION_FACT table.

Please note that the statement will delete all data in the LAB partition.