Data Science Bundle
Combining clinical and genomic data using i2b2 and tranSMART to perform complex analyses of real-world data

Introduction


This data science bundle supports complex analyses of real-world clinical and genomic data. It includes i2b2, which enables query and cohort identification, and tranSMART, adds a suite of tools for data exploration, R-based advanced analytics (e.g., correlation analysis, heat maps, PCA, etc.), and genomic modules for Genome Wide Association Studies (GWAS) and high dimensional data analysis such as RNAseq.


Figure 1. High-level view of the bundle. The i2b2 common data model integrates clinical and genomic data. i2b2 provides tools for query and cohort selection, and tranSMART contains modules for high-dimensional analyses.

Use Cases


Uses cases for this bundle include:


Bundle Components


This bundle includes documentation on how to install and configure the following items:


Demo

A public demo of this bundle is available at the following URL:
{+}http://shrine-node3.i2b2transmartplugins.org/+XXXXXX
It consists of both i2b2 and tranSMART running on the same database with Synthea demo data.

Technical Architecture


i2b2 Components


i2b2 consists of independent applications that provide different functionality called "cells" (Figure 1). A collection of cells form an i2b2 "hive". Most i2b2 hives include (1) a Project Management (PM) cell for authentication and authorization; (2) a Clinical Research Chart (CRC) cell, which contains patient data and the query engine; and, (3) an Ontology (ONT) cell, which describes the concepts and codes contained within the CRC cell. Many i2b2 hives also include (4) a Workplace (Work) cell, which enables users to "bookmark" frequently used items in the user interface and share these with collaborators; and (5) an Identity Management (IM), which allows authorized users to retrieve identified patient data. Cells communicate with each other using i2b2 XML messages sent to APIs. When a cell receives a request message, it queries a table in the HiveData database to determine the location of main database for that cell, based on the user's project. An exception is the PM cell, which uses a single database for all projects. The i2b2 Web Client is written entirely in HTML and JavaScript. It communicates with a Web Proxy on a web server, which redirects messages to the appropriate cell.

Figure 2. i2b2 components.

tranSMART Components

The TranSMART web user interface is a single tomcat application with an extended set of plugins which may be enabled/disabled in the configuration file.
TranSMART is delivered with a set of supporting applications:

System Requirements


i2b2

i2b2 requirements can be found here. A summary of the key requirements:


tranSMART

TranSMART is supported with:

The additional database requirements (e.g. Oracle Enterprise) ar eto support partitioning for the large ables used for omics data in the tranSMART-specific schemas.

Installation

i2b2 Install


tranSMART Install

Installation instructions are on the transmart wiki. They can be used generally on any Linux operating system.
Install scripts are provided to install on a set of supported operating systems. They are provided for a fresh clean installation of the operating system and can be amended and re-launched in case of problems (e.g. files not in the expected path/format, or changs to the components/requirements for R installation from the public R distributions)