Health Ontology Mapper
Space shortcuts
Space Tools

 

i2b2 Clinical Research Chart (CRC) Design Document

July 30, 2008  - SM

 

 

 

Overview of the CRC

 

The Data Repository Cell (also called the Clinical Reseach Chart, or CRC), is designed to hold data from clinical trials, medical record systems and laboratory systems, along with many other types of clinical data from heterogeneous sources.  The CRC stores this data in three tables, the patient, visit and observation tables. In addition to these three tables, there are three lookup tables, the concept, provider and code tables, and two mapping tables, patient_mapping and visit_mapping.

 

The three data tables, along with two of the lookup tables, concept and provider, make up the star schema of the warehouse.  The code table is strictly a lookup table and is not part of the star schema per se. All of the tables that are part of the CRC are described in this document.

 

 

 

 

Description of the i2b2 Data Mart

 

The i2b2 data mart is a data warehouse modeled on the star schema structure first proposed by Ralph Kimball.  The database schema looks like a star, with one central fact table surrounded radially by one or more dimension tables. The most important concept regarding the construction of a star schema is identifying what constitutes a fact.

 

In healthcare, a logical fact is an observation on a patient. It is important to note that an observation may not represent the onset or date of the condition or event being described, but instead is simply a recording or a notation of something.  For example, the observation of ‘diabetes’ recorded in the database as a ‘fact’ at a particular time does not mean that the condition of diabetes began exactly at that time, only that a diagnosis was recorded at that time (there may be many diagnoses of diabetes for this patient over time).

 

The fact table contains the basic attributes about the observation, such as the patient and provider numbers, a concept code for the concept observed, a start and end date, and other parameters described below in this document.  In i2b2, the fact table is called observation_fact..

 

Dimension tables contain further descriptive and analytical information about attributes in the fact table. A dimension table may contain information about how certain data is organized, such as a hierarchy that can be used to categorize or summarize the data.  In the i2b2 Data Mart, there are four dimension tables that provide additional information about fields in the fact table:  patient_dimension, concept_dimension,  visit_dimension, and provider_dimension.


 


 

 

                                                       

Description of the Tables

 

General Information

 

The observation table has only required columns.  The patient_dimension and visit_dimension tables have both required and optional columns.  All the tables have five technically-oriented, or administrative, columns :              

 

                                          

Column name

Data type

Nullable

Definition

Update_Date

datetime

Yes

Date the row was updated by the source system (date is obtained from the source system)

Download_Date

datetime

Yes

Date the data was downloaded from the source system.

Import_Date

datetime

Yes

Date data was imported into the CRC

Sourcesystem_Cd

varchar(50)

Yes

Coded value for the data source system

Upload_Id

  decimal(38,0)

Yes

A numeric id given to the upload

 

 

 

Observation Fact

 

The observation_fact table is the fact table of the i2b2 star schema and represents the intersection of the dimension tables. Each row describes one observation about a patient made during a visit.  Most queries in the i2b2 database require joining the observation_fact table with one or more dimension tables together. 

 

Encounter_num can be joined to encounter_num in the Visit_Dimension table.

 

Patient_num can be joined to patient_num in the Patient_Dimension and Visit_Dimension tables.

 

Practitioner_id can be joined to c_basecode in the Provider_Dimension table.

 

 

                                                                     

                                                                               

 

                           

 

Observation_Fact

Key

Column Name

Column Definition

Nullable? (Default = Y)

PK

Encounter_num

Encoded i2b2 patient visit number.

NO

 

Patient_num

Encoded i2b2 patient number.

NO

PK

Concept_Cd

ID number for observation of interest (i.e. diagnoses, procedures, medications, lab test)

 

PK

Provider_ID

Practitioner id or provider id.

 

PK

Start_Date

Starting date-time of observation (mm/dd/yyyy)

 

PK

Modifier_cd

Ranking of Modifiers

1, 2, 3, …

1.1, 1.2. 1.3…

1.1.1, 1.1.2.  1.1.3…

 

 

ValType_cd

Format of the concept.

N = Numeric

T = Text

 

Tval_char

The operator used when valtype_cd = “N”

E = Equals

L = Less Than

G = Greater Than

 

Nval_num

Stores a numerical value

 

ValueFlag_cd

Used to flag certain outlying or abnormal values

H = High,

L=Low

A = Abnormal

 

Quantity_num

Quantity of nval

 

Units_cd

Units of measurement or nval

 

End_Date

The ending date-time for the observation

 

Location_cd

A location code, such as for a clinic

 

Confidence_num

Assessment of accuracy of data

 

 

Observation_blob

   Holds any extra data that exists, often encrypted PHI

 

 

Update_date

(as above)

 

 

Download_date

(as above)

 

 

Import_date

(as above)

 

 

Sourcesystem_cd

(as above)

 

 

Upload_id

(as above)

 

 

                           

 

 

 

Patient Dimension

 


 

Each record in the Patient_Dimension table represents a patient in the database .  The table includes demographics fields such as gender, age, race, etc.  Most attributes of the patient dimension table are discrete (i.e. Male/Female, Zip code, etc.).

 

Patient_num can be joined to patient_num in the Observation_Fact and Visit_Dimension tables.

 

An example Patient table is shown below.  The Patient table has four required columns, Patient_Num, Birth_Date, Death_Date, and Vital_Status_Cd.  The patient_num column must be filled (must not be null).    The patient_num column is the primary key for the table and therefore can not contain duplicates.  This column holds a reference number for the patient within the data repository.   It is an integer.  The birth_date and death_date columns can be null, and are date-time fields.  They contain the birth and death dates for the patient if they exist.  They are not standardized to a specific time zone, a limitation that may need to be addressed in the future.  The vital_status_cd column contains a code that represents the vital status of the patient, and the precision of the vital status data.  The codes for this field were determined arbitrarily, as there was no standardized coding system for their representation.  The values for the vital_status_cd column are N for living (Null date) and Y for deceased (accurate to day), M for deceased (accurate to month), and X for deceased (accurate to year).

 

The Patient table may have an unlimited number of optional columns and their data types and coding systems are local implementation-specific.  In the example Patient table, there are eight optional columns.  The rules for using the codes in the columns to perform queries are represented in the metadata.  For example, the columns shown below include a race_cd column and a statecityzip_path column.  Codes from the race_cd column are enumerated values that may be grouped together to achieve a desired result, such as if there are 4 codes for the “white” race such as W, WHITE, WHT, and WHITE-HISPANIC, they can be counted directly to determine the number of white-race patients in the database.  Codes from the statecityzip_path are strings that represent hierarchical information.  In that way, the string is queried from left to right in a string comparison to determine what patients are returned by the query, such as if a code is MA\BOSTON\02114 and all the patients in BOSTON are desired, the string “MA\BOSTON\*” (where * is a wildcard) would be queried.

 

 

                               

 

 

 

Visit Dimension

 

 

The Visit_Dimension table represents a session where observations were made.  This session can involve a patient directly, such as a visit to a doctor’s office, or it can involve the patient indirectly, as in running several tests on a tube of the patient’s blood.  Several observations can be made during a visit.  All visits must have a start time associated with them, but they may or may not have an end date.  The visit record also contains specifics about the location of the session, such as which hospital or clinic the session occurred at, and whether the patient was an inpatient or outpatient at the time of the visit. 

 

Encounter_num can be joined to encounter_num in the Observation_Fact table.

Patient_num can be joined to patient_num in the Observation_Fact and Visit_Dimension tables.

 

An example Visit table is shown below.  The Visit table has four required columns Patient_Num, Start_Date, End_Date, and Active_Status_Cd. The visit_num column is the primary key for the table and therefore can not contain duplicates.  This column holds a reference number for the visit within the data repository.  It is an integer.  The start_date and end_date columns can be null, and are date-time fields.  Because a visit is considered to be an event, there is a distinct beginning and ending date and time for the event.  However, these dates may not be recorded and the active_status_cd is used to record whether the event is still ongoing, and the precision of the available dates.  Conceptually, this makes it very similar to the vital_status_cd column in the Patient table.  The codes for this field were determined arbitrarily, as there was no standardized coding system for their representation.  The values for the active_status_cd column are shown in the appendix.

 

The Visit table may have an unlimited number of optional columns, but their data types and coding systems are local implementation specific.  In the example Patient table, there are four optional columns.  The rules for using the codes in the columns to perform queries are represented in the metadata, and the values within the columns follow a similar pattern as described above for the Patient table.

 

                                   

                          

 

 

 

 

 

Concept Dimension

 

The concept_dimension table contains one row for each concept.  Possible concept types are diagnoses, procedures, medications and lab tests. The structure of the table gives enough flexibility to store virtually any concept type, such as demographics and genetics data.

 

c_basecode can be joined to concept_cd in the Observation_Fact table.

 

The concept_path is a path that delineates the concept’s hierarchy.

              The concept_code is the code that represents the diagnosis, procedure, or any other coded value.

Name_char is the actual name of the concept.

 

 

               

 

 

 

 

Provider Dimension

 

Each record in the Provider_Dimension table represents a doctor or provider at an institution. 

 

The provider_path is the path that describes the how the provider fits into the institutional hierarchy. Institution, department, provider name and a code may be included in the path.

 

            c_basecode can be joined to practitioner_id in the Observation_Fact table.

 

                                                    

 

 

             

 

                            Code_Lookup      

 

The code_looup table contains coded values for different fields in the CRC.  For example, in the visit_dimension table, there is the location_cd  field that may have different values  for different  types hospital locations and these values would be stored in the code lookup table. The first four fields of the table might look like this:

 

 

 

 

                                                          

 

 

 

 

 

     Patient_Mapping

 

The patient_mapping table maps the i2b2 patient_num to an encrypted number from the source_system (patient_ide --  the ‘e’ in ide is for encrypted) Patient_ide_source contains the name of the source system.  Patient_ide_status gives the status of the patient number in the source system, for example, if it is Active or Inactive or Deleted or Merged. 

 

 

                         

 

 

 

Encounter_Mapping

 

The encounter_mapping table maps the i2b2 encounter_number to an encrypted number from the source system (encounter_ide_source --  the ‘e’ in ide is for encrypted) .  Encounter_ide_source contains the name of the source system.  Encounter_ide_status

Gives the status of the encounter in the source system, for example, if it is Active, Inactive, Deleted or Merged.