Overview
The Data Exporter functionality enables an i2b2 user to create a data table definition and request patient data for the desired query. In the data table configured workflow, a table of variables is designed using create data table for export tool. The predefined data tables are then displayed under the Data Request(s) breakdown types that can be selected in the Run Query dialog, After the query run, the Manager user will be able to view and manage the data requests under the Data Request Manager tool. i2b2 Users will be able to view their data requests status and details . Email are generated both for the data Manager and the i2b2 User when the request is submitted. The data Manager processes the request by generating the datafile under the Data Request Manager tool. The patient data is exported as a file and stored in a specified location for retrieval.
The patient data file is generated as per the data table definition design specification
Data Export Workflow
Data Table Creation - User process
The Create Data Table feature allows User to design a table of variables of interest using the Design table feature. The table can then be saved as a template that can be loaded and re-used to request data export.
Design Table
The DESIGN TABLE panel allows i2b2 User to create a data table definition with list of variables of interest.
User logs into the web client and accesses create data table for export tool under the Tools plugin
The DESIGN TABLE panel displays list of predefined variables that will be included by default in the data export file. These predefined variables are Gender, Age, Race
From the Ontology Terms tab, drag and drop additional variables of interest onto the DESIGN TABLE grid. These variables will be added as a list of variables included in the data export. The predefined variables are locked and cannot be unchecked since they are required for every data table. Other variables can be deleted under Actions column to exclude them from the data file.
Aggregation method for the variable can be applied by clicking on the drop down box in the Aggregation Method column to assign aggregate options to each variable
Aggregate Options
The availability of aggregation options depends on the type of concept it aggregates on.
Example:
Aggregation options for a non-numeric variable such as Diagnosis are Existence (Yes/No), Count: Number of Concepts, Count: Number of Dates, Count Number of Encounters, Count Number of Facts, Count Number of Providers, Date (First), Date (Most Recent), Most Frequent Concept (Codes)
Aggregation options for concepts that support numeric values such as labs include Average, Min, Max, etc. in addition to all the functions available to normal variables
Aggregation Options | Explanation |
Existence (Yes/No) | Whether the patient has an observation of this concept. This is the default option. |
Count: Number of Concepts | Total number of concepts |
Count: Number of Dates | Total number of dates for the participant |
Count: Number of Encounters | Total number of encounters for the participant |
Count: Number of Facts | Total number of observations |
Count: Number of Providers | Total number of providers for the participant |
Date: First Date | Date of earliest observation |
Date: Last Date | Date of the most recent observation |
Calc: First Value | Minimum value of all numeric values observations |
Calc: Last Value | Maximum value of all numeric values observations |
Preview Table
Table Design can be previewed to get a preview of the data export file that will be generated.
User clicks on PREVIEW TABLE to verify the variables added are displayed in the data columns in the format desired.
Design and Preview is an iterative process where User tcan add and remove variables as well as assign or edit Aggregate options
Save Table
- Click on Save
- The table definition can be either saved under MY TABLES or PROJECT SHARED TABLES
SYSTEM SHARED TABLES is available only for Admins where pre-defined templates are made available to i2b2 Users
Load Table
Load table definition enables the User to Load a saved table definition to be used for data export
- User clicks on Load menu option to display Load Table Definition with list of user saved definitions
- Select desired table definition and clicks on LOAD button
The load table definition can be further refined and previewed before saving it for final datafile request
Data Request - User process
User-created data requests
- User logs into web client and creates a query.
- In the Run Query dialog box, User created table definitions are displayed under Data Request(s) section
- Select one or more User Created:<table definition Request> checkboxes
Data Request - Request 1.8.1 data exports
User can now request data for 1.8.1 data request breakdowns along with 1.8.2 requests.
4. Select 1.8.1 data request check boxes
5. Click run query
4. Emails are automatically sent - one to the User, to inform them the request has been made; and one to the data manager to inform them on the User request.
5. The table definition name with the FINISHED status is displayed as one of the breakdown items in the previous query resutls.
Data Request Manager - User's Data Requests List
User submitted data requests are logged under the Data Request Manager tool.
- Click on Tools=> Data Request Manager plugin
The Data Request Manager displays User submitted data requests along with their status and details
Only the User created data requests are displayed under the Request Data Manager.
View Data Request Details
2. Click on View Details
View Details displays the Request details and Request status as well as other options
- Query Name (Click on Query ID number to display the previous query in the Find Patients window)
- Data Table Definition (Click on View under Data Request Type drop down Request item
- Option to Withdraw Request
- Option to enter Comments
- Log info box displays the User's actions. and log of the Request status
Note that if a Data Request is Withdrawn, you will need to submit the request in the Comments box to reverse the Status.
Data Export - Manager process
The User created data requests are managed by the data Manager using the Data Request Manager tool.
Data Request Manager - Users Data Requests List
- Click on Tools=> Data Request Manager plugin.
List of User submitted data requests are displayed . The initial status of the requests is Submitted.
The Manager User can view the details of the Requests using VIEW DETAILS button and generate the Data files using Create File(s) button)
View Data Request Details
Click on View Details.
The View Details panel is similar in display for non-Manager user, except the Manager User can change the Data Request Status..
The Log info box displays the status of the Data Request as well as the Data File creation status.
Create Data File(s)
Click on Create File(s) on the Data Request Manager page
The data export runs in the background and the file is generated in a specified folder.as defined in the HIVE_CELL_PARAMS. Data file generation status is displayed under the Status column under Data Request Manager
Status will change as the Data File is processed. from Submitted to File in Progress to File Available.
Data Request Status | Data column Value |
---|---|
Submitted | Submitted |
Withdrawn | Cancelled |
Denied | Incomplete |
File in Progress | Queued |
File available | File available |
Example Export files (all data are fake)
2 sets of files are generated for each data export.
Defintion file: has the variable names and data types for the data file
TABLE_INSTANCE_ID | TABLE_INSTANCE_NAME | USER_ID | GROUP_ID | SET_INDEX | C_FACTTABLECOLUMN | C_TABLENAME | COLUMN_NAME | C_FULLPATH | C_COLUMNNAME | C_COLUMNDATATYPE | C_OPERATOR | C_DIMCODE | AGG_TYPE | CONSTRAIN_BY_DATE_TO | CONSTRAIN_BY_DATE_FROM | CONSTRAIN_BY_VALUE_OPERATOR | CONSTRAIN_BY_VALUE_CONSTRAINT | CONSTRAIN_BY_VALUE_UNIT_OF_MEASURE | CONSTRAIN_BY_VALUE_TYPE | CREATE_DATE |
7 | Reeta_DEM_6_27 | demoManager | SQLServerLarge | 0 | sex_cd | patient_dimension | Gender | @ | @ | @ | @ | @ | Value | 37:14.3 | ||||||
7 | Reeta_DEM_6_27 | demoManager | SQLServerLarge | 1 | age_in_years_num | patient_dimension | Age | @ | @ | @ | @ | @ | Value | 37:14.3 | ||||||
7 | Reeta_DEM_6_27 | demoManager | SQLServerLarge | 2 | race_cd | patient_dimension | Race | @ | @ | @ | @ | @ | Value | 37:14.3 | ||||||
7 | Reeta_DEM_6_27 | demoManager | SQLServerLarge | 3 | patient_num | patient_dimension | (children) < 18 years old | \\ACT_DEMO\ACT\Demographics\Age\< 18 years old\ | birth_date | N | > | dateadd(YY, -18, GETDATE()) | Exists | 37:14.3 | ||||||
7 | Reeta_DEM_6_27 | demoManager | SQLServerLarge | 4 | patient_num | patient_dimension | 75-84 years old | \\ACT_DEMO\ACT\Demographics\Age\75-84 years old\ | birth_date | N | BETWEEN | dateadd(YY, -85, dateadd(dd, +1,getdate())) AND dateadd(YY, -74, getdate()) | Exists | 37:14.3 | ||||||
7 | Reeta_DEM_6_27 | demoManager | SQLServerLarge | 5 | patient_num | patient_dimension | >= 65 years old | \\ACT_DEMO\ACT\Demographics\Age\>= 65 years old\ | birth_date | N | <= | dateadd(YY, -65, GETDATE()) | Exists | 37:14.3 |
Data file: has the values for the variables
patient_num | Gender | Age | Race | (children) < 18 years old | 75-84 years old | >= 65 years old |
26 | M | 72 | @ | No | Yes | Yes |
53 | F | 55 | WHITE | No | No | No |
87 | F | 59 | WHITE | No | No | No |
92 | F | 88 | DECLINED | No | No | Yes |
94 | F | 67 | ASIAN | No | No | Yes |
96 | F | 29 | WHITE | No | No | No |
128 | F | 62 | WHITE | No | No | Yes |
153 | M | 77 | @ | No | Yes | Yes |
User-created Demographics
1969_Definition_For_Demographics_8_5_25.csv
1969_Demographics_8_5_25_08052025.csv
User-created Labs
Data Export Configuration
Design and Architecture
The i2b2 breakdown architecture is modified to support the data table definition and new breakdown types for User created data requests. The database tables have been modified to support the data table definition parameters
- RPDO_TABLE_REQUEST table stores the default data table parameters
- HIVE_CELL_PARAMS has new parameters for global and email configurations and the data file generation location (currently on local drive)
- QT_RESULT_TYPE table has new entries for data table definition breakdown types
- QT_BREAKDOWN_PATH modified to log the data table definition and the data export execution details.
- QT_XML_RESULT logs XML documents containing e-mail details.
User Roles and Actions
User visibility to Data Table Creation, Data Request and Data Export is based on User role configuration. Configuration is managed by the Admin user using Admin Dashboard plugin.
Below is the list of Actions available based on the User role. (future implementation)
User Role | Create Data Table | Data Request | Data Request Manager View Details | Data Export/Generate Data file | Change Request Status |
---|---|---|---|---|---|
Obfuscated | Enabled | Enabled | Enabled | Disabled | Disabled |
Aggregate | Enabled | Enabled | Enabled | Disabled | Disabled |
Protected | Enabled | Enabled | Enabled | Disabled | Disabled |
LDS | Enabled | Enabled | Enabled | Disabled | Disabled |
Manager/No LDS | Enabled | Enabled | Enabled | Disabled | Disabled |
Manager/LDS | Enabled | Enabled | Enabled | Enabled | Enabled |
Admin | Enabled | Enabled | Enabled | Enabled | Enabled |
Database Configuration
RPDO_TABLE_REQUEST
In order to support the data Table design, a new table RPDO_TABLE_REQUEST is added. The metadata in Data Table design is stored in the table when the data Table is saved.
Four patient_dimension default rows are loaded into the RPDO_TABLE_REQUEST which are required. These are: vital_status_cd, race_cd, age_in_years_num, sex_cd and gender
The default rows insert script is provided in the data install folder of the release
QT_BREAKDOWN_PATH
QT_Breakdown_Path table has been modified to include a new column Group_id. It logs the Project_id value.
Column Name | Data Type |
---|---|
Group_Id | Varchar (50) |
Database Updates
New rows are logged in db tables at the time of data table definition creation and data request submission
RPDO_TABLE_REQUEST
Data table design variables and values are dynamically logged in the RPDO_TABLE_REQUEST at the time of data table Definition Save.
RPDO_TABLE_REQUEST PARAMETERS | DATA TABLE Values |
---|---|
Table_Request_id | auto generated incremental value, corresponding to each concept in the data table. Unique for each concept example: 1282,1283, 1284 |
Table_instance_id | unique for each Data table ( same value for all the concepts underneath) example: 17 |
QT_QUERY_RESULT_TYPE
The data table definitions are logged as breakdowns in the QT_QUERY_RESULT_TYPE table. The rows are inserted dynamically when the user saves the data table definition.
RESULT_TYPE_ID | NAME | DESCRIPTION | DISPLAYTYPE_ID | VISUAL_ATTRIBUTE_TYPE_ID | USER_ROLE_CD | CLASSNAME |
Auto generated number example : 144 | RPDO_<TABLE_INSTANCE_ID> example: RPDO_17 | User-created <Table_Name>Request | CATNUM | LU | DATA_LDS | edu.harvard.i2b2.crc.dao.setfinder.QueryResultPatientRequest |
We recommend data requests be limited to DATA_LDS users and data exports be limited to MANAGER users. The user role determines if exports/requests are visible and runnable. Refer to section on User Roles and Actions
QT_BREAKDOWN_PATH - Table Definition Creation
A new row is logged when the user saves the data table defintion.
- Name column logs the Table_instance_id when the user saves the table_definition.
- Value column logs the EXEC statement of the stored procedure that generates the data file.
- Group_id has the value of project_id
- The result_instance_id variable gets updated with the numeric value from the QUERY_RESULT_INSTANCE where the result_type_id (Ptset)=1 at the time of creation of Data file.
Example:
NAME | VALUE | Group_id |
---|---|---|
RPDO_<Table_instance_id> Example: RPDO_17 | EXEC i2b2synthea8.dbo.usp_rpdo2 @TABLE_INSTANCE_ID=<Table_instance_id> @RESULT_INSTANCE_ID={{{RESULT_INSTANCE_ID}}} @MIN_ROW=0 @MAX_ROW=10000 | SQLServerLarge |
QT_QUERY_RESULT_INSTANCE
When the User runs a query with User created Data Request option selected, a row is logged in the QT_Query_Result_Instance for the result_type_id generated in the QT_Result_type
Column Name | Value (Example) |
---|---|
Result_Type_ id | 144 |
Result_instance_id | 1729 |
QT_BREAKDOWN_PATH - Data File Creation
The Value column is updated with for table_instance_id and result_instance_id in the EXEC statement and the stored procedure to generate the Data file is executed
Example:
NAME | VALUE |
---|---|
RPDO_<Table_instance_id> Example: RPDO_144 | EXEC i2b2synthea8.dbo.usp_rpdo2 @TABLE_INSTANCE_ID=17 @RESULT_INSTANCE_ID=1729@MIN_ROW=0 @MAX_ROW=10000 |
QT_XML_RESULT
Metadata about data request queries are stored in the QT_XML_RESULT table, in the XML_VALUE field.
Column Name | Value (example) |
---|---|
Result_Instance_Id | 1729 |
XML_Value | <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <ns10:i2b2_result_envelope xmlns:ns6="http://www.i2b2.org/xsd/cell/crc/psm/analysisdefinition/1.1/" xmlns:ns5="http://www.i2b2.org/xsd/hive/msg/1.1/" xmlns:ns8="http://www.i2b2.org/xsd/cell/pm/1.1/" xmlns:ns7="http://www.i2b2.org/xsd/cell/crc/psm/querydefinition/1.1/" xmlns:ns9="http://www.i2b2.org/xsd/cell/ont/1.1/" xmlns:ns10="http://www.i2b2.org/xsd/hive/msg/result/1.1/" xmlns:ns2="http://www.i2b2.org/xsd/hive/pdo/1.1/" xmlns:ns4="http://www.i2b2.org/xsd/cell/crc/psm/1.1/" xmlns:ns3="http://www.i2b2.org/xsd/cell/crc/pdo/1.1/"> <body> <ns10:result name="RPDO_17"> <data column="SUBMITTED" type="string">20250808_125429</data> <data column="EMAIL" type="string">rmetta@mgb.org</data> <data column="QUEUED" type="string">20250808_125631</data> <data column="PROCESSING" type="string">20250808_125631</data> <data column="DIRECTORY" type="string">/opt/dataexport/SQLServerLarge/724</data> <data column="FINISHED" type="string">20250808_125633</data> <data column="APPROVEDBY" type="string">demoManager</data> </ns10:result> </body> </ns10:i2b2_result_envelope> |
Data file format/location and Email parameters set-up: HIVE_CELL_PARAMS
The generated Data file format, location and email server parameters are configurable in the HIVE_CELL_PARAMS.
Wildfly must be restarted for changes to the HIVE_CELL_PARAMs to take effect.
Cell ID | Parameter Name | Example Value | Notes |
CRC | edu.harvard.i2b2.crc.exportcsv.datamanageremail | userid@partners.org | Email address used for sending request/export e-mails |
CRC | edu.harvard.i2b2.crc.smtp.host | smtp.partners.org | SMTP host |
CRC | edu.harvard.i2b2.crc.smtp.port | 25 | SMTP port |
CRC | edu.harvard.i2b2.crc.smtp.ssl.enabled | FALSE | TRUE will enable SSL |
CRC | edu.harvard.i2b2.crc.smtp.auth | FALSE | TRUE will enable SMTP authentication |
CRC | edu.harvard.i2b2.crc.smtp.username | none | SMTP username (required for SMTP authentication) |
CRC | edu.harvard.i2b2.crc.smtp.password | none | SMTP password (required for SMTP authentication) |
CRC | edu.harvard.i2b2.crc.smtp.enabled | FALSE | TRUE will enable e-mails |
CRC | edu.harvard.i2b2.crc.smtp.from.fullname | Data Manager | Name that e-mails will be sent from. |
CRC | edu.harvard.i2b2.crc.smtp.from.email | datamanager@site.org | E-mail address that e-mails will be sent from. |
CC | edu.harvard.i2b2.crc.smtp.subject | i2b2 Data Request | Subject line for e-mails. |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultescapecharacter | " | Escape character for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.maxfetchrows | -1 | Maximum number of rows to export, or -1 for no limit |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultlineend | \n | Line ending for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultseperator | \t | Field separator for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.resultfetchsize | 50000 | Number of records retrieved during each database fetch. |
CRC | edu.harvard.i2b2.crc.exportcsv.filename | {{{PROJECT_ID}}}/{{{DATE_yyyyMMdd}}}_{{{FULL_NAME}}}.tsv | Parameterized template for export file names. If the extension is .zip, the file is zipped. |
CRC | edu.harvard.i2b2.crc.exportcsv.defaultquotechar | " | Quote character for export files |
CRC | edu.harvard.i2b2.crc.exportcsv.workfolder | /tmp/i2b2 | Folder on the i2b2 server for data exports |
CRC | edu.harvard.i2b2.crc.exportcsv.zipencryptmethod | none | Encryption method for the exported ZIP file. One of STANDARD, NONE, or AES. |
Software Changes:
- Data: New entries in RPDO_TABLE_REQUEST, HIVE_CELL_PARAMS, QT_QUERY_RESULT_TYPE, and QT_BREAKDOWN_PATH define the exporter configuration.
- Java code: New breakdown classes and updates to existing java classes to support the data exporter functionality.