Overview

The Data Exporter functionality enables an i2b2 user to request patient data for the patients found by a desired query. In the configured workflow, data is requested and generated through new breakdown types that can be selected in the Run Query dialog. Users will see "data request" breakdowns, which send an e-mail to a predefined data manager requesting a type of data. The data manager (Manager user role) processes the request by rerunning the query with a special "data export" breakdown. The patient data is exported as a file and stored in a specified location for retrieval.

The patient data export types, request options, and file specifications are configured in the database using specific request type entries and configuration parameters in the database.

Data Export Workflow

Data Table Creation- User-created Data Table (User process)

The Create Data Table feature allows User to design a table of variables of interest using the Design table feature. The table can then be saved as a template  that can be loaded and re-used to request data export.

Design Table

The DESIGN TABLE panel allows User to create a data table with list of variables of interest.

  1. User logs into the webclient and accesses Data Export tool under the Analysis Tools plugin

  2. The design table panel displays list of predefined variables that will be included by default in the data export file. These predefined variables are Gender, Age, Race, and Ethnicity

  3. From the Ontology Terms tab, drag and drop additional variables of interest onto the DESIGN TABLE grid. These variables will be added to the list of variables included in the data export. The predefined variables are locked and cannot be unchecked since they are required for every data table.  Other variables can be deleted under Actions column to exclude them from the data set file.

  4. Aggregation method for the variable can be applied by clicking on the Value in the Aggregation Method column to assign aggregate options to each variable 

Aggregate Options

The availability of aggregation options depends on the type of concept it aggregates on.

Example:

Aggregation Options

Explanation

Existence (Yes/No)

Whether the patient has an observation of this concept. This is the default option.

Count: Number of Concepts

Total number of concepts

Count: Number of Dates

Total number of dates for the participant

Count: Number of Encounters

Total number of encounters for the participant

Count: Number of Facts

Total number of observations

Count: Number of Providers

Total number of providers for the participant

Date (First)

Date of earliest observation

Date (Most Recent)

Date of the most recent observation

Count

Total number of observations

All Concepts (Names/Text)

All concept names are listed

Most Frequent Concept (Names/Text)

Most frequent concept name

All Concepts (Codes)

All concept codes are listed

Most Frequent Concept (Codes)

Most frequent concept code

Minimum Value

Minimum value of all numeric values observations

Maximum Value

Maximum value of all numeric values observations

Median Value

Median value of all numeric values observations

Average Value

Average value of all numeric values observations

Mode (Most Frequent Value)

Most frequent value for numeric, enum, blob observations

List of All Values

List of all values for numeric, enum, blob observations

Preview Table

Table Design can be previewed to get a preview of the data export file that will be generated.

  1. User clicks on Preview Table to verify the variables added are displayed  in the data columns in the format desired.


Design and Preview is an iterative process where User to add and remove variables and to assign or edit Aggregate Method


Save Table

  1. User clicks on Save Table

Load Table

Saved table definition enables the User to Load  a saved table definition to be used for data export

  1. User clicks on Load menu option to display the list of your saved Table definitions. Under MY TABLES,
  2. Select desired table definition and clicks on LOAD button


The loaded table definition can be further refined and previewed before saving it for final datafile request



Data Request - User-created Data Request (User process)

  1. User logs into webclient and creates and runs a query.
  2. In the Run Query dialog box, select one or more User created Data table definition Request checkboxes and click run query

  3. Emails are automatically sent - one to the requester's e-mail, to inform them the request has been made; and one to the specified manager user to inform them to run the user's export request. The query name (in previous queries) is tagged with the query master id, so it can be identified by the manager.


Data Export - User-created Data Requests (Manager process)

User-created data requests are managed by Data Request Manager tool where a manager user can view the request details and generate data files

Data Request Manager



View Data Request Details


Generate data file 


The data export runs in the background and the file is generated in a specified folder 


Example Export files (all data are fake)

User-created Demographics


User-created Diagnosis


Data Export Configuration

Design and Architecture

The i2b2 breakdown architecture has been modified to support new breakdown types for data request and export. HIVE_CELL_PARAMS has new parameters for global configuration, the QT_RESULT_TYPE table has new entries for query export types, and the QT_BREAKDOWN_PATH defines the request and export. These are XML documents containing e-mails to send and SQL to execute. These breakdown types display in the run query dialog. User e-mails are sent to the e-mail address of the currently logged-in use. Data Manager e-mails are sent to an address configured in HIVE_CELL_PARAMS, and the data manager e-mail in an export is also saved as a file in the export location. The export file generation location, also configured in HIVE_CELL_PARAMS, is on the local drive hosting the i2b2 application. 

User Roles and Actions



User RoleCreate Data TableData RequestData Request ManagerGenerate Data file 
ObfuscatedSuccessSuccessNo VisibilityDisabled
AggregateSuccessSuccessNo VisibilityDisabled
ProtectedSuccessSuccessNo VisibilityDisabled
LDSSuccessSuccessNo VisibilityDisabled
Manager/No LDSSuccessSuccessNo VisibilityDisabled
Manager/LDSSuccessSuccessSuccessSuccess
AdminSuccessSuccessNo VisibilityDisabled


Database changes

In order to support the  data Table  design, a new table RPDO_TABLE_REQUEST is added.  The metadata in Data Table design is stored in the table when the data Table is saved.

There are 4 patient_dimension default concepts that  are loaded into the RPDO_TABLE_REQUEST which are required. These are: race_cd (ethnicity), reace_cd(race), age_in_years_num, sex_cd and gender


Any other user created concepts in the data table design will be stored in the RPDO_TABLE_REQUEST.  Below is the list that correspond to values in the Data Table

RPDO_TABLE_REQUEST PARAMETERSDATA TABLE Values
Table_Request_idauto generated incremental value, corresponding to each concept in the data table. Unique for each concept
Table_instance_idunique for each Data table, auto generated incremental value( same value for all the concepts underneath)


Configuring Database Parameters

Data table definition details

QT_QUERY_RESULT_TYPE

The data table definitions are logged as breakdowns in the QT_QUERY_RESULT_TYPE table. The values are populated dynamically when the user saves the data table definition.

Following values are stored

RESULT_TYPE_ID: <TABLE_INSTANCE_ID>

NAME: Auto generated number: RPDO_<n>

VISUAL_ATTRIBUTE_TYPE_ID is set to LU 

USER_ROLE_CD is set to DATA_LDS /NULL?

DISPLAY_TYPE_ID: CATNUM

CLASSNAME: 

Example:

RESULT_TYPE_IDNAMEDESCRIPTIONDISPLAYTYPE_IDVISUAL_ATTRIBUTE_TYPE_IDUSER_ROLE_CDCLASSNAME
1159RPDO_78User-created Demographics Data RequestCATNUMLUDATA_LDSedu.harvard.i2b2.crc.dao.setfinder.QueryResultPatientRequest



We recommend data requests be limited to DATA_LDS users (irrespective of Admin path role - admin, user, manager) and data exports be limited to MANAGER users (irrespective of data path role - data_lds, data_agg,data_obf, data_deid). The user role determines if exports/requests are visible and runnable. 

 QT_QUERY_RESULT_TYPEuser roleUser-created data requestdata Request ManagerVisibility/Actions

 DATA_LDS

DATA_LDS

Success

Success

Only  user-created requests

MANAGER with

DATA_LDS

DATA_LDS

Success

Success

All user-created requests

MANAGER

without DATA_LDS

MANAGER

?

??

Admin

Admin

?

??



Data Request definition details

QT_BREAKDOWN_PATH

Qt_breakdown_path logs the table definition value and the result_instance_id. to be associated with the ptset. query.

The table definition is logged as the table_instance_id when the user saves the table_definition.  The result_instance_id is numeric value of the table_instance_id,



NAME:  <Table_Instance_id>

VALUE:  <Result_instance_id>

Example:

NAMEVALUE
RPDO_78

EXEC i2b2synthea8.dbo.usp_rpdo2 @TABLE_INSTANCE_ID=78 @RESULT_INSTANCE_ID={{{RESULT_INSTANCE_ID}}} @MIN_ROW=0 @MAX_ROW=10000


Data file format/location and Email parameters set-up: HIVE_CELL_PARAMS

The generated Data file format, location and email server parameters are configurable in the HIVE_CELL_PARAMS. 

The <Filename> parameter  in the QT_BREAKDOWN_PATH  definition will override the default value specified in the HIVE_CELL_PARAMS. 

Wildfly must be restarted for changes to the HIVE_CELL_PARAMs to take effect.



Cell IDParameter NameExample ValueNotes
CRCedu.harvard.i2b2.crc.exportcsv.datamanageremailuserid@partners.orgEmail address used for sending request/export e-mails
CRCedu.harvard.i2b2.crc.smtp.hostsmtp.partners.orgSMTP host
CRCedu.harvard.i2b2.crc.smtp.port25SMTP port
CRCedu.harvard.i2b2.crc.smtp.ssl.enabledFALSETRUE will enable SSL
CRCedu.harvard.i2b2.crc.smtp.authFALSETRUE will enable SMTP authentication
CRCedu.harvard.i2b2.crc.smtp.usernamenoneSMTP username (required for SMTP authentication)
CRCedu.harvard.i2b2.crc.smtp.passwordnoneSMTP password (required for SMTP authentication)
CRCedu.harvard.i2b2.crc.smtp.enabledFALSETRUE will enable e-mails
CRCedu.harvard.i2b2.crc.smtp.from.fullnameData ManagerName that e-mails will be sent from.
CRCedu.harvard.i2b2.crc.smtp.from.emaildatamanager@site.orgE-mail address that e-mails will be sent from.
CCedu.harvard.i2b2.crc.smtp.subjecti2b2 Data RequestSubject line for e-mails.
CRCedu.harvard.i2b2.crc.exportcsv.defaultescapecharacter"Escape character for export files
CRCedu.harvard.i2b2.crc.exportcsv.maxfetchrows-1Maximum number of rows to export, or -1 for no limit
CRCedu.harvard.i2b2.crc.exportcsv.defaultlineend\nLine ending for export files
CRCedu.harvard.i2b2.crc.exportcsv.defaultseperator\tField separator for export files
CRCedu.harvard.i2b2.crc.exportcsv.resultfetchsize50000Number of records retrieved during each database fetch.
CRCedu.harvard.i2b2.crc.exportcsv.filename{{{PROJECT_ID}}}/{{{DATE_yyyyMMdd}}}_{{{FULL_NAME}}}.tsvParameterized template for export file names. If the extension is .zip, the file is zipped.
CRCedu.harvard.i2b2.crc.exportcsv.defaultquotechar"Quote character for export files
CRCedu.harvard.i2b2.crc.exportcsv.workfolder/tmp/i2b2Folder on the i2b2 server for data exports
CRCedu.harvard.i2b2.crc.exportcsv.zipencryptmethodnoneEncryption method for the exported ZIP file. One of STANDARD, NONE, or AES.

Request and Export Results:

Metadata about data request and export queries are stored in the QT_XML_RESULT table, in the XML_VALUE field:


Query TypeXML_VALUE example
Data Request<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns10:i2b2_result_envelope xmlns:ns6="http://www.i2b2.org/xsd/cell/crc/psm/querydefinition/1.1/" xmlns:ns5="http://www.i2b2.org/xsd/hive/msg/1.1/" xmlns:ns8="http://www.i2b2.org/xsd/cell/pm/1.1/" xmlns:ns7="http://www.i2b2.org/xsd/cell/crc/psm/analysisdefinition/1.1/" xmlns:ns9="http://www.i2b2.org/xsd/cell/ont/1.1/" xmlns:ns10="http://www.i2b2.org/xsd/hive/msg/result/1.1/" xmlns:ns2="http://www.i2b2.org/xsd/hive/pdo/1.1/" xmlns:ns4="http://www.i2b2.org/xsd/cell/crc/psm/1.1/" xmlns:ns3="http://www.i2b2.org/xsd/cell/crc/pdo/1.1/">
<body>
<ns10:result name="PATIENT_DEMOGRAPHIC_REQUEST">
<data column="patientCount" type="int">1</data>
<data column="RequestEmail" type="string">This user act in project NODE9 requested i2b2 request
entitled - "ACT_Vaccination_14_53_14", submitted on May 22, 2024 2:53:13 PM, with the query master of 1492.
</data>
<data column="DataManagerEmail" type="string">datamanager@site.org</data>
</ns10:result>
</body>
</ns10:i2b2_result_envelope>
Data Export

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns10:i2b2_result_envelope xmlns:ns6="http://www.i2b2.org/xsd/cell/crc/psm/querydefinition/1.1/" xmlns:ns5="http://www.i2b2.org/xsd/hive/msg/1.1/" xmlns:ns8="http://www.i2b2.org/xsd/cell/pm/1.1/" xmlns:ns7="http://www.i2b2.org/xsd/cell/crc/psm/analysisdefinition/1.1/" xmlns:ns9="http://www.i2b2.org/xsd/cell/ont/1.1/" xmlns:ns10="http://www.i2b2.org/xsd/hive/msg/result/1.1/" xmlns:ns2="http://www.i2b2.org/xsd/hive/pdo/1.1/" xmlns:ns4="http://www.i2b2.org/xsd/cell/crc/psm/1.1/" xmlns:ns3="http://www.i2b2.org/xsd/cell/crc/pdo/1.1/">
    <body>
        <ns10:result name="PATIENT_LAB_CSV">
            <data column="RowCount" type="int">3</data>
            <data column="Log" type="string">Environment Settings:---------------------------------------------------------------------------
 
Machine name: localhost
Operating system: Linux 3.10.0-1160.99.1.el7.x86_64
Login user: wildfly
Version number: [6.3.2.1]
Config title: 2. PRODUCTION 
Run directory:     /opt/dataexport/act/1491/Lab.csv
 File Name: /opt/dataexport/act/1491/Lab.csv
 md5 hash: d41d8cd98f00b204e9800998ecf8427e
</data>
            <data column="QueryMasterID" type="string">1491</data>
        </ns10:result>
    </body>
</ns10:i2b2_result_envelope>

Software Changes: