Overview
The Data Exporter functionality enables an i2b2 user to request patient data and create the data file for the patients found by a desired query. In the configured workflow, a table of variables is designed using create data table for export tool. The predefined data tables are then generated under Data Request breakdown types that can be selected in the Run Query dialog, After the query run, the data requests are displayed under the Data Request Manager tool. Users will be able to view their data requests as well as the status of the data file generation. When a data request is made, an e-mail is to a predefined data manager The data manager (Manager user role) processes the request by generating the datafile under the Data Request Manager tool. The patient data is exported as a file and stored in a specified location for retrieval.
The patient data file specifications are configured in the database using configuration parameters in the database.
Data Export Workflow
Data Table Creation - User process
The Create Data Table feature allows User to design a table of variables of interest using the Design table feature. The table can then be saved as a template that can be loaded and re-used to request data export.
Design Table
The DESIGN TABLE panel allows User to create a data table with list of variables of interest.
User logs into the web client and accesses Data Export tool under the Analysis Tools plugin
The design table panel displays list of predefined variables that will be included by default in the data export file. These predefined variables are Gender, Age, Race, and Ethnicity
From the Ontology Terms tab, drag and drop additional variables of interest onto the DESIGN TABLE grid. These variables will be added to the list of variables included in the data export. The predefined variables are locked and cannot be unchecked since they are required for every data table. Other variables can be deleted under Actions column to exclude them from the data set file.
Aggregation method for the variable can be applied by clicking on the Value in the Aggregation Method column to assign aggregate options to each variable
Aggregate Options
The availability of aggregation options depends on the type of concept it aggregates on.
Example:
Aggregation options for a non-numeric variable such as Diagnosis are Existence (Yes/No), Count: Number of Concepts, Count: Number of Dates, Count Number of Encounters, Count Number of Facts, Count Number of Providers, Date (First), Date (Most Recent), Most Frequent Concept (Codes)
Aggregation options for concepts that support numeric values such as labs include Average, Min, Max, etc. in addition to all the functions available to normal variables
Aggregation Options | Explanation |
Existence (Yes/No) | Whether the patient has an observation of this concept. This is the default option. |
Count: Number of Concepts | Total number of concepts |
Count: Number of Dates | Total number of dates for the participant |
Count: Number of Encounters | Total number of encounters for the participant |
Count: Number of Facts | Total number of observations |
Count: Number of Providers | Total number of providers for the participant |
Date: First Date | Date of earliest observation |
Date: Last Date | Date of the most recent observation |
Calc: First Value | Minimum value of all numeric values observations |
Calc: Last Value | Maximum value of all numeric values observations |
Preview Table
Table Design can be previewed to get a preview of the data export file that will be generated.
User clicks on Preview Table to verify the variables added are displayed in the data columns in the format desired.
Design and Preview is an iterative process where User to add and remove variables as well as to assign or edit Aggregate options
Save Table
- User clicks on Save Table
Load Table
Saved table definition enables the User to Load a saved table definition to be used for data export
- User clicks on Load menu option to display the list of your saved Table definitions. Under MY TABLES,
- Select desired table definition and clicks on LOAD button
The loaded table definition can be further refined and previewed before saving it for final datafile request
Data Request - User process
- User logs into web client and creates a query.
- In the Run Query dialog box, User created table definitions are displayed under Data Request(s) section
- Select one or more table definition Request checkboxes along with Patient set breakdown checked and click run query
4. Emails are automatically sent - one to the User's-email, to inform them the request has been made; and one to the specified manager user to inform them to generate the data file.
5. The table definition name with the status is displayed as one of the breakdown items in the previous query.
Data Request Manager - List User's Data Requests
After the Data request query finishes (in the previous query panel), the request details and the status of the request are logged under the Data Request Manager tool.
- Click on Tools=> Data Request Manager plugin
The Data Request Manager displays all the data requests user made along with their status
Only data requests that the User created from the data table definition breakdowns are displayed under the Request Data Manager.
View Data Request Details
2. Click on View Details
View Details displays the Request details as well as other options to
- View the Query (Click on Query ID number to display the previous query in the Find Patients window)
- View the Data Table Definition (Click on View under Data Request Type drop down Request item
- Option to Withdraw Request (Click on Withdraw Request button under Actions section.
- Option to enter Comments
Note tthat if a Data Request is Withdrawn, you will need to submit request in the Comments box to reverse the Status.
Data Export - Manager process
The User created data requests are managed by the Manager user using the Data Request Manager tool.
Data Request Manager - List Users Data Requests
- Click on Tools=> Data Request Manager plugin.
All the data requests submitted by the Users are listed under the List of Export Data Requests. The initial status of the requests is Submitted.
The Manager User can view the details of the Requests using VIEW DETAILS button and generate the Data files using Create File(s) button)
View Data Request Details
Click on View Details.
The View Details page is similar to the display as a Non-Manager user,. In addition the
Generate data file
The data export runs in the background and the file is generated in a specified folder
Example Export files (all data are fake)
User-created Demographics
User-created Diagnosis
Data Export Configuration
Design and Architecture
The i2b2 breakdown architecture has been modified to support new breakdown types for data request and export. HIVE_CELL_PARAMS has new parameters for global configuration, the QT_RESULT_TYPE table has new entries for query export types, and the QT_BREAKDOWN_PATH defines the request and export. These are XML documents containing e-mails to send and SQL to execute. These breakdown types display in the run query dialog. User e-mails are sent to the e-mail address of the currently logged-in use. Data Manager e-mails are sent to an address configured in HIVE_CELL_PARAMS, and the data manager e-mail in an export is also saved as a file in the export location. The export file generation location, also configured in HIVE_CELL_PARAMS, is on the local drive hosting the i2b2 application.
User Roles and Actions
User visibility to Data Table Creation, Data Request and Data Export is based on User role configuration. Configuration is managed by the Admin user using Admin Dashboard plugin.
Below is the list of Actions available based on the User role.
User Role | Create Data Table | Data Request | Data Request Manager View Details | Data Export/Generate Data file | Change Request Status |
---|---|---|---|---|---|
Obfuscated | Enabled | Enabled | Enabled | Disabled | Disabled |
Aggregate | Enabled | Enabled | Enabled | Disabled | Disabled |
Protected | Enabled | Enabled | Enabled | Disabled | Disabled |
LDS | Enabled | Enabled | Enabled | Disabled | Disabled |
Manager/No LDS | Enabled/disabled?since on LDS | Enabled/disabled? | Enabled | Enabled | Enabled |
Manager/LDS | Enabled | Enabled | Enabled | Enabled | Enabled |
Admin/noLDS | disabled?since on LDS | Enabled/Disabled? | Enabled? | Enabled | Enabled |
Admin | Enabled | Enabled | Enabled | Enabled | Enabled |
Database changes
In order to support the data Table design, a new table RPDO_TABLE_REQUEST is added. The metadata in Data Table design is stored in the table when the data Table is saved.
There are 4 patient_dimension default concepts that are loaded into the RPDO_TABLE_REQUEST which are required. These are: race_cd (ethnicity), reace_cd(race), age_in_years_num, sex_cd and gender
Any other user created concepts in the data table design will be stored in the RPDO_TABLE_REQUEST. Below is the list that correspond to values in the Data Table
RPDO_TABLE_REQUEST PARAMETERS | DATA TABLE Values |
---|---|
Table_Request_id | auto generated incremental value, corresponding to each concept in the data table. Unique for each concept |
Table_instance_id | unique for each Data table, auto generated incremental value( same value for all the concepts underneath) |
Configuring Database Parameters
Data table definition details
QT_QUERY_RESULT_TYPE
The data table definitions are logged as breakdowns in the QT_QUERY_RESULT_TYPE table. The values are populated dynamically when the user saves the data table definition.
Following values are stored
RESULT_TYPE_ID: <TABLE_INSTANCE_ID>
NAME: Auto generated number: RPDO_<n>
VISUAL_ATTRIBUTE_TYPE_ID is set to LU
USER_ROLE_CD is set to DATA_LDS /NULL?
DISPLAY_TYPE_ID: CATNUM
CLASSNAME:
Example:
RESULT_TYPE_ID | NAME | DESCRIPTION | DISPLAYTYPE_ID | VISUAL_ATTRIBUTE_TYPE_ID | USER_ROLE_CD | CLASSNAME |
1159 | RPDO_78 | User-created Demographics Data Request | CATNUM | LU | DATA_LDS | edu.harvard.i2b2.crc.dao.setfinder.QueryResultPatientRequest |
We recommend data requests be limited to DATA_LDS users (irrespective of Admin path role - admin, user, manager) and data exports be limited to MANAGER users (irrespective of data path role - data_lds, data_agg,data_obf, data_deid). The user role determines if exports/requests are visible and runnable.
QT_QUERY_RESULT_TYPE | user role | User-created data request | data Request Manager | Visibility/Actions |
---|---|---|---|---|
DATA_LDS | DATA_LDS | Success | Success | Only user-created requests |
MANAGER with DATA_LDS | DATA_LDS | Success | Success | All user-created requests |
MANAGER without DATA_LDS | MANAGER | ? | ? | ? |
Admin | Admin | ? | ? | ? |
Data Request definition details
QT_BREAKDOWN_PATH
Qt_breakdown_path logs the table definition value and the result_instance_id. to be associated with the ptset. query.
The table definition is logged as the table_instance_id when the user saves the table_definition. The result_instance_id is numeric value of the table_instance_id,
NAME: <Table_Instance_id>
VALUE: <Result_instance_id>
Example:
NAME | VALUE |
---|---|
RPDO_78 | EXEC i2b2synthea8.dbo.usp_rpdo2 @TABLE_INSTANCE_ID=78 @RESULT_INSTANCE_ID={{{RESULT_INSTANCE_ID}}} @MIN_ROW=0 @MAX_ROW=10000 |
Data file format/location and Email parameters set-up: HIVE_CELL_PARAMS
The generated Data file format, location and email server parameters are configurable in the HIVE_CELL_PARAMS.
The <Filename> parameter in the QT_BREAKDOWN_PATH definition will override the default value specified in the HIVE_CELL_PARAMS.
Wildfly must be restarted for changes to the HIVE_CELL_PARAMs to take effect.
Cell ID | Parameter Name | Example Value | Notes |
CRC | edu.harvard.i2b2.crc.exportcsv.datamanageremail | userid@partners.org | Email address used for sending request/export e-mails |
CRC | edu.harvard.i2b2.crc.smtp.host | smtp.partners.org | SMTP host |
CRC | edu.harvard.i2b2.crc.smtp.port | 25 | SMTP port |
CRC | edu.harvard.i2b2.crc.smtp.ssl.enabled | FALSE | TRUE will enable SSL |
CRC | edu.harvard.i2b2.crc.smtp.auth | FALSE | TRUE will enable SMTP authentication |
CRC | edu.harvard.i2b2.crc.smtp.username | none | SMTP username (required for SMTP authentication) |
CRC | edu.harvard.i2b2.crc.smtp.password | none | SMTP password (required for SMTP authentication) |
CRC | edu.harvard.i2b2.crc.smtp.enabled | FALSE | TRUE will enable e-mails |
CRC | edu.harvard.i2b2.crc.smtp.from.fullname | Data Manager | Name that e-mails will be sent from. |
CRC | edu.harvard.i2b2.crc.smtp.from.email | datamanager@site.org | E-mail address that e-mails will be sent from. |
CC | edu.harvard.i2b2.crc.smtp.subject | i2b2 Data Request | Subject line for e-mails. |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultescapecharacter | " | Escape character for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.maxfetchrows | -1 | Maximum number of rows to export, or -1 for no limit |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultlineend | \n | Line ending for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultseperator | \t | Field separator for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.resultfetchsize | 50000 | Number of records retrieved during each database fetch. |
CRC | edu.harvard.i2b2.crc.exportcsv.filename | {{{PROJECT_ID}}}/{{{DATE_yyyyMMdd}}}_{{{FULL_NAME}}}.tsv | Parameterized template for export file names. If the extension is .zip, the file is zipped. |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultquotechar | " | Quote character for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.workfolder | /tmp/i2b2 | Folder on the i2b2 server for data exports |
CRC | edu.harvard.i2b2.crc.exportcsv.zipencryptmethod | none | Encryption method for the exported ZIP file. One of STANDARD, NONE, or AES. |
Request and Export Results:
Metadata about data request and export queries are stored in the QT_XML_RESULT table, in the XML_VALUE field:
Query Type | XML_VALUE example |
---|---|
Data Request | <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <ns10:i2b2_result_envelope xmlns:ns6="http://www.i2b2.org/xsd/cell/crc/psm/querydefinition/1.1/" xmlns:ns5="http://www.i2b2.org/xsd/hive/msg/1.1/" xmlns:ns8="http://www.i2b2.org/xsd/cell/pm/1.1/" xmlns:ns7="http://www.i2b2.org/xsd/cell/crc/psm/analysisdefinition/1.1/" xmlns:ns9="http://www.i2b2.org/xsd/cell/ont/1.1/" xmlns:ns10="http://www.i2b2.org/xsd/hive/msg/result/1.1/" xmlns:ns2="http://www.i2b2.org/xsd/hive/pdo/1.1/" xmlns:ns4="http://www.i2b2.org/xsd/cell/crc/psm/1.1/" xmlns:ns3="http://www.i2b2.org/xsd/cell/crc/pdo/1.1/"> <body> <ns10:result name="PATIENT_DEMOGRAPHIC_REQUEST"> <data column="patientCount" type="int">1</data> <data column="RequestEmail" type="string">This user act in project NODE9 requested i2b2 request entitled - "ACT_Vaccination_14_53_14", submitted on May 22, 2024 2:53:13 PM, with the query master of 1492. </data> <data column="DataManagerEmail" type="string">datamanager@site.org</data> </ns10:result> </body> </ns10:i2b2_result_envelope> |
Data Export | <?xml version="1.0" encoding="UTF-8" standalone="yes"?> |
Software Changes:
- Data: New entries in HIVE_CELL_PARAMS, QT_QUERY_RESULT_TYPE, and QT_BREAKDOWN_PATH define the exporter configuration.
- Java code: New breakdown classes and updates to existing java classes to support the data exporter functionality.