There is a wealth of information within the plain text clinical narrative.  The purpose of this cell is to harness the unstructured information by allowing i2b2 users to query and join that information with existing i2b2 concepts.  Currently, the entire note is commonly stored as a single row in the observation_blob field in the observation_fact table in i2b2.  One of NLP cTAKES' features is its capability to 'read' through and extract concepts from plain text notes and transform them into structured and normalized information.  The purpose of this cell is to incorporate cTAKES and i2b2 by formatting the output of cTAKES into the i2b2 observation_fact table format (facts, concepts, modifiers, and values) which can then be easily queried by existing i2b2 interfaces.
There will be 2 main components:

  • An administrative tool (cTAKES GUI) that will allow users to specify the input DataSource of the note(s), the output of the notes(s), and the NLP pipeline to be used.  The cTAKES GUI will be designed to be a web interface (packaged a war file to be easily deployed to standard servlet containers such as Tomcat).  The configuration information will be stored and could be reused for future experiments.  The output in this case will be specially formatted in the observation_fact format. 

  • An interface for users to query the extracted data.  We plan to reuse the existing i2b2 web client tool by adding an 'NLP' ontology which contains all of the concepts that could be used to filter and joined with other ontologies such as demographics or codified data. 


*Design and Architecture
*Installation Guide

Contact Information

Pei J Chen
Children's Hospital Boston
E-mail: pei.chen[at]

Guergana K Savova, PhD
Children's Hospital Boston
E-mail: guergana.savova[at]

Jonathan Bickel, MD
Children's Hospital Boston
E-mail: jonathan.bickel[at]

