Overview
The Data Exporter functionality enables an i2b2 user to create a data table definition and request patient data for the desired query. In the data table configured workflow, a table of variables is designed using create data table for export tool. The predefined data tables are then displayed under the Data Request(s) breakdown types that can be selected in the Run Query dialog, After the query run, the Manager user will be able to view and manage the data requests under the Data Request Manager tool. i2b2 Users will be able to view their data requests status and details . Email are generated both for the data Manager and the i2b2 User when the request is submitted. The data Manager processes the request by generating the data file under the Data Request Manager tool. The patient data is exported as a file and stored in a specified location for retrieval.
The patient data file is generated as per the data table definition design specification
Data Export Workflow
Data Table Creation - User process
The Create Data Table feature allows User to design a table of variables of interest using the Design table feature. The table can then be saved as a template that can be loaded and re-used to request data export.
Design Table
The DESIGN TABLE panel allows i2b2 User to create a data table definition with list of variables of interest.
User logs into the web client and accesses create data table for export tool under the Tools plugin
The DESIGN TABLE panel displays list of predefined variables that will be included by default in the data export file. These predefined variables are Gender, Age, Race
From the Ontology Terms tab, drag and drop additional variables of interest onto the DESIGN TABLE grid. These variables will be added as a list of variables included in the data export. The predefined variables are locked and cannot be unchecked since they are required for every data table. Other variables can be deleted under Actions column to exclude them from the data file.
Aggregation method for the variable can be applied by clicking on the drop down box in the Aggregation Method column to assign aggregate options to each variable
Aggregate Options
The availability of aggregation options depends on the type of concept it aggregates on.
Example:
Aggregation options for a non-numeric variable such as Diagnosis are Existence (Yes/No), Count: Number of Concepts, Count: Number of Dates, Count Number of Encounters, Count Number of Facts, Count Number of Providers, Date (First), Date (Most Recent), Most Frequent Concept (Codes)
Aggregation options for concepts that support numeric values such as labs include Average, Min, Max, etc. in addition to all the functions available to normal variables
Aggregation Options | Explanation |
Existence (Yes/No) | Whether the patient has an observation of this concept. This is the default option. |
Count: Number of Concepts | Total number of concepts |
Count: Number of Dates | Total number of dates for the participant |
Count: Number of Encounters | Total number of encounters for the participant |
Count: Number of Facts | Total number of observations |
Count: Number of Providers | Total number of providers for the participant |
Date: First Date | Date of earliest observation |
Date: Last Date | Date of the most recent observation |
Calc: First Value | Minimum value of all numeric values observations |
Calc: Last Value | Maximum value of all numeric values observations |
Calc: Number of Values | count no of all numeric values observations |
Calc: Average Value | Average value of all numeric values observations |
Calc: Minimum Value | lowest value of all numeric values observations |
Calc: Maximum Value | Highest value of all numeric values observations |
Calc: Median Value | Median value of all numeric values observations |
Preview Table
Table Design can be previewed to get a preview of the data export file that will be generated.
User clicks on PREVIEW TABLE to verify the variables added are displayed in the data columns in the format desired.
Design and Preview is an iterative process where User tcan add and remove variables as well as assign or edit Aggregate options
Save Table
- Click on Save
- The table definition can be either saved under MY TABLES or PROJECT SHARED TABLES
SYSTEM SHARED TABLES is available only for Admins where per-defined templates are made available to i2b2 Users
MY TABLES and PROJECT SHARED TABLES table definitions are editable. SYSTEM SHARED TABLE definitions are non-editable
Load Table
Load table definition enables the User to Load a saved table definition to be used for data export
- User clicks on Load menu option to display Load Table Definition with list of user saved definitions
- Select desired table definition and clicks on LOAD button
Data file definitions from Load Table definitions can be further refined and saved for submitting data request
Data Request - User process
Request 1.8.2 data exports
- User logs into web client and creates a query.
- In the Run Query dialog box, User created table definitions are displayed under Data Request(s) section
- Select one or more User Created:<table definition Request> checkboxes
Request 1.8.1 data exports
User can now request data for 1.8.1 data request breakdowns along with 1.8.2 requests.
4. Select 1.8.1 data request check boxes
5. Click run query
4. Emails are automatically sent - one to the User, to inform them the request has been made; and one to the data manager to inform them on the User request.
5. The table definition name with the FINISHED status is displayed as one of the breakdown items in the previous query resutls.
Data Request Manager
User's Data Requests List
User submitted data requests (both 1.8.2 and 1.8.1) are logged under the Data Request Manager tool.
- Click on Tools=> Data Request Manager plugin
The Data Request Manager displays data requests along with their status and details
Only the User created data requests are displayed under the Request Data Manager.
View Data Request Details
2. Click on View Details
View Details displays the Request details and Request status as well as other options
- Query Name (Click on Query ID number to display the previous query in the Find Patients window)
- Data Table Definition (Click on View under Data Request Type drop down Request item
- Option to Withdraw Request
- Option to enter Comments
- Log info box displays the User's actions. and log of the Request status
Note that if a Data Request is Withdrawn, you will need to submit the request in the Comments box to reverse the Status.
Data Export - Manager process
The User created data requests are managed by the data Manager using the Data Request Manager tool.
Data Request Manager
Data Requests List
- Click on Tools=> Data Request Manager plugin.
List of User submitted data requests are displayed . The initial status of the requests is Submitted.
The Manager User can view the details of the Requests using VIEW DETAILS button and generate the Data files using Create File(s) button)
View Data Request Details
Click on View Details.
The View Details panel is similar in display for non-Manager user, except the Manager User can change the Data Request Status..
The Log info box displays the status of the Data Request as well as the Data File creation status.
Create Data File(s)
Click on Create File(s) on the Data Request Manager page
The data export runs in the background and the file is generated in a specified folder.as defined in the HIVE_CELL_PARAMS. Data file generation status is displayed under the Status column under Data Request Manager
Status will change as the Data File is processed. from Submitted to File in Progress to File Available.
Data Request Status | Data column Value |
---|---|
Submitted | Submitted |
Withdrawn | Cancelled |
Denied | Incomplete |
File in Progress | Queued |
File available | File available |
Example Export files (all data are fake)
2 sets of files are generated for each data export.
Definition file: has the variable names and data types for the data file
Data file: has the values for the variables
User-created Demographics
Definition file
Example:1969_Definition_For_Demographics_8_5_25.csv
TABLE_INSTANCE_ID | TABLE_INSTANCE_NAME | USER_ID | GROUP_ID | SET_INDEX | C_FACTTABLECOLUMN | C_TABLENAME | COLUMN_NAME | C_FULLPATH | C_COLUMNNAME | C_COLUMNDATATYPE | C_OPERATOR | C_DIMCODE | AGG_TYPE | CONSTRAIN_BY_DATE_TO | CONSTRAIN_BY_DATE_FROM | CONSTRAIN_BY_VALUE_OPERATOR | CONSTRAIN_BY_VALUE_CONSTRAINT | CONSTRAIN_BY_VALUE_UNIT_OF_MEASURE | CONSTRAIN_BY_VALUE_TYPE | CREATE_DATE |
7 | DEM_6_27 | demoManager | SQLServerLarge | 0 | sex_cd | patient_dimension | Gender | @ | @ | @ | @ | @ | Value | 37:14.3 | ||||||
7 | DEM_6_27 | demoManager | SQLServerLarge | 1 | age_in_years_num | patient_dimension | Age | @ | @ | @ | @ | @ | Value | 37:14.3 | ||||||
7 | DEM_6_27 | demoManager | SQLServerLarge | 2 | race_cd | patient_dimension | Race | @ | @ | @ | @ | @ | Value | 37:14.3 | ||||||
7 | DEM_6_27 | demoManager | SQLServerLarge | 3 | patient_num | patient_dimension | (children) < 18 years old | \\ACT_DEMO\ACT\Demographics\Age\< 18 years old\ | birth_date | N | > | dateadd(YY, -18, GETDATE()) | Exists | 37:14.3 |
Data file:
Example: 1969_Demographics_8_5_25_08052025.cs
patient_num | Gender | Age | Race | (children) < 18 years old | 75-84 years old | >= 65 years old |
26 | M | 72 | @ | No | Yes | Yes |
53 | F | 55 | WHITE | No | No | No |
87 | F | 59 | WHITE | No | No | No |
92 | F | 88 | DECLINED | No | No | Yes |
94 | F | 67 | ASIAN | No | No | Yes |
153 | M | 77 | @ | No | Yes | Yes |
User-created Labs
Definition file
Example: 2282_Definition_For_LAB_TESTS_HUGEPT_AGGOPTIONS
TABLE_INSTANCE_ID | TABLE_INSTANCE_NAME | USER_ID | GROUP_ID | SET_INDEX | C_FACTTABLECOLUMN | C_TABLENAME | COLUMN_NAME | C_FULLPATH | C_COLUMNNAME | C_COLUMNDATATYPE | C_OPERATOR | C_DIMCODE | AGG_TYPE | CONSTRAIN_BY_DATE_TO | CONSTRAIN_BY_DATE_FROM | CONSTRAIN_BY_VALUE_OPERATOR | CONSTRAIN_BY_VALUE_CONSTRAINT | CONSTRAIN_BY_VALUE_UNIT_OF_MEASURE | CONSTRAIN_BY_VALUE_TYPE | CREATE_DATE |
310 | LAB_TESTS_HUGEPT_AGGOPTIONS | act | NODE9 | 0 | sex_cd | patient_dimension | Gender | @ | @ | @ | @ | @ | Value | 25:57.3 | ||||||
310 | LAB_TESTS_HUGEPT_AGGOPTIONS | act | NODE9 | 1 | age_in_years_num | patient_dimension | Age | @ | @ | @ | @ | @ | Value | 25:57.3 | ||||||
310 | LAB_TESTS_HUGEPT_AGGOPTIONS | act | NODE9 | 2 | race_cd | patient_dimension | Race | @ | @ | @ | @ | @ | Value | 25:57.3 | ||||||
310 | LAB_TESTS_HUGEPT_AGGOPTIONS | act | NODE9 | 3 | race_cd | patient_dimension | Ethnicity | @ | @ | @ | @ | @ | Value | 25:57.3 | ||||||
310 | LAB_TESTS_HUGEPT_AGGOPTIONS | act | NODE9 | 4 | concept_cd | concept_dimension | Cholesterol (Group:CHOL) | \\i2b2_LABS\i2b2\Labtests\LAB\(LLB16) Chemistry\(LLB17) Lipid Tests\CHOL\ | concept_path | N | like | \i2b2\Labtests\LAB\(LLB16) Chemistry\(LLB17) Lipid Tests\CHOL\ | NumFacts | 25:57.3 |
Data file:
Example: 2282_LAB_TESTS_HUGEPT_AGGOPTIONS_08122025
patient_num | Gender | Age | Race | Ethnicity | Cholesterol (Group:CHOL) |
9 | M | 49 | white | white | 1 |
89 | F | 57 | white | white | 1 |
13 | F | 53 | white | white | 1 |
21 | F | 72 | white | white | 7 |
77 | F | 80 | white | white | 1 |
Data Export Configuration
Design and Architecture
The i2b2 breakdown architecture is modified to support the data table definition and new breakdown types for User created data requests. The database tables have been modified to support the data table definition parameters
- RPDO_TABLE_REQUEST table stores the default data table parameters
- HIVE_CELL_PARAMS has new parameters for global and email configurations and the data file generation location (currently on local drive)
- QT_RESULT_TYPE table logs new entries for data table definition breakdown types
- QT_BREAKDOWN_PATH logs the data table definition and the data export execution details.
- QT_XML_RESULT logs XML documents containing data request status and e-mail details.
User Roles and Actions
User visibility to Data Table Creation, Data Request and Data Export is based on User role configuration. Configuration is managed by the Admin user using Admin Dashboard plugin.
Below is the list of Actions available based on the User role. (future implementation)
User Role | Create Data Table | Data Request | Data Request Manager View Details | Data Export/Generate Data file | Change Request Status |
---|---|---|---|---|---|
Obfuscated | Enabled | Enabled | Enabled | Disabled | Disabled |
Aggregate | Enabled | Enabled | Enabled | Disabled | Disabled |
Protected | Enabled | Enabled | Enabled | Disabled | Disabled |
LDS | Enabled | Enabled | Enabled | Disabled | Disabled |
Manager/No LDS | Enabled | Enabled | Enabled | Disabled | Disabled |
Manager/LDS | Enabled | Enabled | Enabled | Enabled | Enabled |
Admin | Enabled | Enabled | Enabled | Enabled | Enabled |
Database Configuration changes
RPDO_TABLE_REQUEST
In order to support the data Table design, a new table RPDO_TABLE_REQUEST is added. The metadata in Data Table design is stored in the table when the data Table is saved.
Four patient_dimension default rows are loaded into the RPDO_TABLE_REQUEST which are required. These are: vital_status_cd, race_cd, age_in_years_num, sex_cd and gender
The default rows insert script is provided in the data install folder of the release
QT_BREAKDOWN_PATH
QT_Breakdown_Path table has been modified to include a new column Group_id. It logs the Project_id value.
Column Name | Data Type |
---|---|
Group_Id | Varchar (50) |
Database updates- Data table definition creation
New rows are dynamically logged in db tables when the user creates the data table definition or submits a data request
RPDO_TABLE_REQUEST
Data table design variables and values are dynamically inserted in the RPDO_TABLE_REQUEST at the time of data table Definition Save.
RPDO_TABLE_REQUEST PARAMETERS | DATA TABLE Values |
---|---|
TABLE_REQUEST_ID | auto generated incremental value, corresponding to each concept in the data table. Unique for each concept example: 1282,1283, 1284 |
TABLE_INSTANCE_ID | unique for each Data table ( same value for all the concepts underneath) example: 17 |
TABLE_INSTANCE_NAME | Data table definition name |
USER_ID | data request user |
GROUP_ID | project id |
SET_INDEX | auto incremented for each concept name for the table instance |
C_FACTTABLECOLUMN | concept_cd value in fact table or patient_num value in patient_dimension |
C_TABLENAME | corresponding to table storing concept_cd/patient_num( concept_dimension or patient_dimension ) |
COLUMN_NAME | corresponds to value in column_name of data table definition |
C_FULLPATH | corresponds to metadata table column value |
C_COLUMNNAME | value in concept_path column in concept_dimension/patient_dimension |
C_COLUMNDATATYPE | corresponds to metadata table column value |
C_OPERATOR | corresponds to metadata table column value |
C_DIMCODE | corresponds to metadata table column value |
AGG_TYPE | corresponds to Aggregate option selection in data table definition |
CONSTRAIN_BY_DATE_TO | corresponds to date constraint option selection in data table definition |
CONSTRAIN_BY_DATE_TO | corresponds to date constraint option selection in data table definition |
CONSTRAIN_BY_DATE_FROM | corresponds to date constraint option selection in data table definition |
CONSTRAIN_BY_VALUE_OPERATOR | corresponds to set value constraint selection in data table definition |
CONSTRAIN_BY_VALUE_CONSTRAINT | corresponds to set value constraint selection in data table definition |
CONSTRAIN_BY_VALUE_UNIT_OF_MEASURE | corresponds to set value constraint selection in data table definition |
CONSTRAIN_BY_VALUE_TYPE | corresponds to set value constraint selection in data table definition |
CREATE_DATE | corresponds data table definition creation date |
SHARED | Y (for project and system shared table definitions); N (User table definition) |
DELETE_FLAG | Y or N |
C_VISUALATTRIBUTES | values are LA or LH (based on visibility table definitions panel) |
Example:
QT_QUERY_RESULT_TYPE
The data table definition is logged as a breakdown type in the QT_QUERY_RESULT_TYPE table when the user saves the data table definition.
RESULT_TYPE_ID | NAME | DESCRIPTION | DISPLAYTYPE_ID | VISUAL_ATTRIBUTE_TYPE_ID | USER_ROLE_CD | CLASSNAME |
Auto generated number example : 144 | RPDO_<TABLE_INSTANCE_ID> example: RPDO_17 | User-created <Table_Name>Request | CATNUM | LU (User and system shared tables) LP (Project shared tables) | DATA_LDS | edu.harvard.i2b2.crc.dao.setfinder.QueryResultPatientRequest |
Example:
We recommend data requests be limited to DATA_LDS users and data exports be limited to MANAGER users. The user role determines if exports/requests are visible and runnable. Refer to section on User Roles and Actions
QT_BREAKDOWN_PATH
A new row is logged when the user saves the data table definition.
- Name column logs the Table_instance_id when the user saves the table_definition.
- Value column logs the EXEC statement of the stored procedure that generates the data file.
- Group_id has the value of project_id
- The result_instance_id variable gets updated with the numeric value from the QUERY_RESULT_INSTANCE where the result_type_id (Ptset)=1 at the time of creation of Data file.
Example:
NAME | VALUE | Group_id |
---|---|---|
RPDO_<Table_instance_id> Example: RPDO_17 | EXEC i2b2synthea8.dbo.usp_rpdo2 @TABLE_INSTANCE_ID=<Table_instance_id> @RESULT_INSTANCE_ID={{{RESULT_INSTANCE_ID}}} @MIN_ROW=0 @MAX_ROW=10000 | SQLServerLarge |
Example:
Database updates- Data request submission
1.8.2 requests
QT_QUERY_RESULT_INSTANCE
When the User runs a query with User created Data Request breakdown option selected, a row is logged in the QT_Query_Result_Instance for the result_type_id generated in the QT_RESULT TYPE
Column Name | Value (Example) |
---|---|
Result_Type_ id | 144 |
Result_instance_id | 1729 |
Example:
QT_BREAKDOWN_PATH
The EXEC statement in Value column is updated with value for table_instance_id
Example:
NAME | VALUE |
---|---|
RPDO_<Table_instance_id> Example: RPDO_144 | EXEC i2b2synthea8.dbo.usp_rpdo2 @TABLE_INSTANCE_ID=17 @RESULT_INSTANCE_ID={{{RESULT_INSTANCE_ID}}},@MIN_ROW=0 @MAX_ROW=99999999 |
{{{RESULT_INSTANCE_ID}}} gets replaced with QT_QUERY_RESULT_INSTANCE.Result_instance_id as the stored-procedure in the back-end gets executed.
at the time of data file creation.
QT_XML_RESULT
A row is logged for the result_instance_id of the data request submitted at the time of query run. Metadata about data request queries are stored in the QT_XML_RESULT table, in the XML_VALUE field. The metadata gets updated as the data export process is completed (from data request submission to data file creation)
Column Name | Value (example) |
---|---|
Result_Instance_Id | 1729 |
XML_Value | <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <ns10:i2b2_result_envelope xmlns:ns6="http://www.i2b2.org/xsd/cell/crc/psm/analysisdefinition/1.1/" xmlns:ns5="http://www.i2b2.org/xsd/hive/msg/1.1/" xmlns:ns8="http://www.i2b2.org/xsd/cell/pm/1.1/" xmlns:ns7="http://www.i2b2.org/xsd/cell/crc/psm/querydefinition/1.1/" xmlns:ns9="http://www.i2b2.org/xsd/cell/ont/1.1/" xmlns:ns10="http://www.i2b2.org/xsd/hive/msg/result/1.1/" xmlns:ns2="http://www.i2b2.org/xsd/hive/pdo/1.1/" xmlns:ns4="http://www.i2b2.org/xsd/cell/crc/psm/1.1/" xmlns:ns3="http://www.i2b2.org/xsd/cell/crc/pdo/1.1/"> <body> <ns10:result name="RPDO_17"> <data column="SUBMITTED" type="string">20250808_125429</data> <data column="EMAIL" type="string">rmetta@mgb.org</data> <data column="QUEUED" type="string">20250808_125631</data> <data column="PROCESSING" type="string">20250808_125631</data> <data column="DIRECTORY" type="string">/opt/dataexport/SQLServerLarge/724</data> <data column="FINISHED" type="string">20250808_125633</data> <data column="APPROVEDBY" type="string">demoManager</data> </ns10:result> </body> </ns10:i2b2_result_envelope> |
Example:
1.8.1 Requests
3 new default rows are logged in RPDO_TABLE_REQUEST when user runs a query and submits 1.8.1 data request for the first time . This will enable the 1.8.1 requests to be displayed on data request Manager panel along with 1.8.2 requests. Subsequent requests utilize the same default rows for display.
RPDO_TABLE_REQUEST
QT_RESULT_TYPE
A row is logged in QT_RESULT_TYPE for the default table definition. Subsequent data request submissions utilize the same row for the default rows result_type_id
QT_RESULT_INSTANCE
A row is logged in the QT_Query_Result_Instance for the same result_type_id generated in the QT_RESULT TYPE every time user submits a request
QT_XML_RESULT
A row is logged for the result_instance_id of the data request submitted at the time of query run. Metadata about data request queries(including user email address) are stored in the QT_XML_RESULT table, in the XML_VALUE field. The metadata gets updated as the data export process is completed (from data request submission to data file creation)
Data file creation
1.8.2 Data file creation
CREATE FILES(s) executes usp_rpdo2 stored procedure in the back end by calling the EXEC statement in QT_BREAKDOWN_PATH to create the data file
1.8.1 Data file creation
CREATE FILES(s) executes the <data request name>.csv in QT_BREAKDOWN_PATH. The select statement in the VALUE column is executed to create the data file The result_instance_id value of the query is inserted dynamically each time of data file is created for the same table_instance_id. and does not store in the QT_BREAKDOWN_PATH
Data file format/location and Email parameters set-up
HIVE_CELL_PARAMS
The generated Data file format, location and email server parameters are configurable in the HIVE_CELL_PARAMS.
Wildfly must be restarted for changes to the HIVE_CELL_PARAMs to take effect.
Cell ID | Parameter Name | Example Value | Notes |
CRC | edu.harvard.i2b2.crc.smtp.host | smtp.partners.org | SMTP host |
CRC | edu.harvard.i2b2.crc.smtp.port | 25 | SMTP port |
CRC | edu.harvard.i2b2.crc.smtp.ssl.enabled | FALSE | TRUE will enable SSL |
CRC | edu.harvard.i2b2.crc.smtp.auth | FALSE | TRUE will enable SMTP authentication |
CRC | edu.harvard.i2b2.crc.smtp.username | none | SMTP username (required for SMTP authentication) |
CRC | edu.harvard.i2b2.crc.smtp.password | none | SMTP password (required for SMTP authentication) |
CRC | edu.harvard.i2b2.crc.smtp.enabled | FALSE | TRUE will enable e-mails |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultescapecharacter | " | Escape character for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.maxfetchrows | -1 | Maximum number of rows to export, or -1 for no limit |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultlineend | \n | Line ending for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultseperator | \t | Field separator for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.resultfetchsize | 50000 | Number of records retrieved during each database fetch. |
CRC | edu.harvard.i2b2.crc.exportcsv.filename | {{{PROJECT_ID}}}/{{{DATE_yyyyMMdd}}}_{{{FULL_NAME}}}.tsv | Parameterized template for export file names. If the extension is .zip, the file is zipped. |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultquotechar | " | Quote character for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.workfolder | /tmp/i2b2 | Folder on the i2b2 server for data exports |
CRC | edu.harvard.i2b2.crc.exportcsv.zipencryptmethod | none | Encryption method for the exported ZIP file. One of STANDARD, NONE, or AES. |
PM_PROJECT_PARAMS
Stores the parameters for the data Manager email, address, Subject line and email details for the User and data Manager
Software Changes:
- Data: New entries in RPDO_TABLE_REQUEST, HIVE_CELL_PARAMS, QT_QUERY_RESULT_TYPE, and QT_BREAKDOWN_PATH define the exporter configuration.
- Java code: New breakdown classes and updates to existing java classes to support the data exporter functionality.