This data science bundle supports complex analyses of real-world clinical and genomic data. It includes i2b2, which enables query and cohort identification, and tranSMART, adds a suite of tools for data exploration, R-based advanced analytics (e.g., correlation analysis, heat maps, PCA, etc.), and genomic modules for Genome Wide Association Studies (GWAS) and high dimensional data analysis such as RNAseq.
Figure 1. High-level view of the bundle. The i2b2 common data model integrates clinical and genomic data. i2b2 provides tools for query and cohort selection, and tranSMART contains modules for high-dimensional analyses.
Uses cases for this bundle include:
This bundle includes documentation on how to install and configure the following items:
A public demo of this bundle is available at the following URL:
http://shrine-node3.i2b2transmartplugins.org/
It consists of both i2b2 and tranSMART running on the same database with Synthea demo data.
i2b2 consists of independent applications that provide different functionality called "cells" (Figure 1). A collection of cells form an i2b2 "hive". Most i2b2 hives include (1) a Project Management (PM) cell for authentication and authorization; (2) a Clinical Research Chart (CRC) cell, which contains patient data and the query engine; and, (3) an Ontology (ONT) cell, which describes the concepts and codes contained within the CRC cell. Many i2b2 hives also include (4) a Workplace (Work) cell, which enables users to "bookmark" frequently used items in the user interface and share these with collaborators; and (5) an Identity Management (IM), which allows authorized users to retrieve identified patient data. Cells communicate with each other using i2b2 XML messages sent to APIs. When a cell receives a request message, it queries a table in the HiveData database to determine the location of main database for that cell, based on the user's project. An exception is the PM cell, which uses a single database for all projects. The i2b2 Web Client is written entirely in HTML and JavaScript. It communicates with a Web Proxy on a web server, which redirects messages to the appropriate cell.
Figure 2. i2b2 components.
The TranSMART web user interface is a single tomcat application with an extended set of plugins which may be enabled/disabled in the configuration file.
TranSMART is delivered with a set of supporting applications:
i2b2 requirements can be found here. A summary of the key requirements:
TranSMART is supported with:
The additional database requirements (e.g. Oracle Enterprise) ar eto support partitioning for the large ables used for omics data in the tranSMART-specific schemas.
Installation instructions are on the transmart wiki. They can be used generally on any Linux operating system.
Install scripts are provided to install on a set of supported operating systems. They are provided for a fresh clean installation of the operating system and can be amended and re-launched in case of problems (e.g. files not in the expected path/format, or changs to the components/requirements for R installation from the public R distributions)