IDRT - Integrated Data Repository Toolkit
Space shortcuts
Space Tools

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note

Please note that the database will store all statements running up to the next commit in a temporary area (e.g. the TEMP tablespace), which needs to be adequately sized to avoid tablespace overflows.

Using a dedicated bulk loader

Many databases provide a dedicated "bulk loader" tool which is optimized for loading large amounts of source data "en bloc" into a table, e.g. Oracle SQL*Loader. Such loaders often provide additional performance boosts through bypassing transaction- and consistency mechanisms (e.g. "DIRECT PATH" and "UNRECOVERABLE" options), which effectively stream data directly onto the disk. In a data warehouse setting, these kinds of optimizations are generally considered safe to use as the database is not used in a transactional setting and write operations are centrally controlled by an ETL pathway. Use of a bulk loader with enabled optimizations can lead to notable performance improvements of up to 100x when compared to individual INSERT statements.

Bulk loaders usually require CSV source files containing the raw data to be loaded and a separate controlfile containing all required information about the destination table and columns as well as optimization options to be applied.

ETL tools often provide access to bulk loaders through dedicated nodes or tasks. E.g. Talend Open Studio provides an insert node as well as a bulk loader node which are interchangeable. 

Info

It should be noted that bulk loaders are database specific. If compatibility across several databases is required, it may be necessary to either create several adapted versions of the ETL pathway (affecting only the "final" step of actually loading generated CSV files) or to not use a bulk loader. E.g., the IDRT pathways stick to JDBC components to achieve greater cross-database compatibility)