Uploaded image for project: 'i2b2 Core Software'
  1. i2b2 Core Software
  2. CORE-277

Temporal query exclusion problem

    XMLWordPrintable

Details

    • Bug
    • Status: New
    • Major
    • Resolution: Unresolved
    • None
    • TBD
    • CRC Cell
    • None
    • Rank:
      0|i003pz:
    • i2b2 Desktop (Workbench), Query Tool, Temporal Query Tool
    • Temporal Queries

    Description

      Symptoms and cause:
      -----------------------------
      SQL statements generated to run temporal query are copying nearly the entire fact table into temp tables 3 times (sequentially, and wipes it off first before creating a new one). The temp tables eat through the allotted space causing the query to fail.

      This occurs when the temporal query contains at least one term that is “excluded.” This is tested on a customized i2b2 instance with nearly 73000 patients. However, this affects i2b2 as well as the code that handles temporal queries are the same. The same query on the same set of patients would succeed in an environment where more temp space is allotted.

      Can it possibly be better optimized to reduce the space required?


      Discussion and Possible Fixes:
      ----------------------------------------------
      Both potential routes presented below require some careful thinking and decision as there does not seem to be an easy fix and both routes require changes to server side. Other ideas are welcome.

      1. One thought is that the non-excluded part of the temporal query can be pulled out and copied into the population portion (non-temporal) of the query so that the search space will be first narrowed by the population portion, reducing the number of rows that needs to be copied into the temp table.

      Client side can be made to automatically do this. However, preliminary tests (performed manually) showed that it had no effect – queries still failed. This is because the generated SQL statements do the temporal part of the query first and the population part later.

      Unless the server is optimized so that the population part is performed first and apply its results to the scope of the temporal part, this approach does not work.

      2. A second approach is to narrow the scope of the exclusion in temporal queries. Instead of allowing exclusions to be applied to each individual terms (fine-grained), we change it so that it applies only to temporal sequences (coarse-grained). This means instead of letting users look for patients who had (no A1C Test prior to a diabetes diagnosis), we ONLY allow users to look for patients who do not have (an A1C Test prior to a diabetes diagnosis). This can avoid copying the entire fact table into temp tables because we can run (A1C Test prior to a diabetes diagnosis) and then compute for its complement. In addition, by pulling out the NOT (exclusion), NOT becomes more natural to understood by the users. However, this approach also has some consequences: (1) It has reduced expressivity -- it cannot express, for example (no A1C Test followed by a diabetes diagnosis followed by another no A1C Test). (2) Both the server code and client code (web client + workbench) will need significant redesign/modification.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tdw9 Taowei David Wang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: