Health Ontology Mapper
Space shortcuts
Space Tools

 

 

 

 

 

 

 

 

Data Discovery and Data Request Lifecycle Tracking

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

May 1, 2008

 

 

 

 

 

 

 

 


1. Introduction

 

In efforts to integrate disparate data within the Health System, an initiative is underway to build A Integrated Data Repository (IDR) to collectively hold and integrate clinical, clinical research, biomedical, biosciences, economic, administrative, and public health data. This integrated data store will serve the needs of both the Health System and Translational R esearch. The data will then be transformed into useful information that could be structured and managed with ease. The IDR will be an active environment that will allow continuous access to the latest detailed data and built-in analysis to drive optimal clinical and research activities. This product will be a compilation of internal data fields drawn from clinical systems aggregated into a single environment.

 

 

1.1 Purpose

 

 

This document will outline the interaction between a clinical researcher and an IDR system.  Our primary focus is to document the clinical research process where IDR serves as the primary source of data.  This may include the confidential introduction by the researcher of new data into the environment which must be joined with data already resident within the IDR.

 

The purpose of this document is to give the reader a detailed understanding of the goals of this project, customer requirements, and expectations.

 

1.3 Document Definitions

 

IDR – Integrated Data Repository

IS – Information Services

ETL – Extract Transform and Load

EMPI – Enterprise Master Patient Index

HIPAA – The Health Insurance Portability and Accountability Act

o Enacted by the US Congress to establish, amongst other items, a

national standard for protecting the security and privacy of health

information. ( http://www.hhs.gov/ocr/hipaa/ )

HL7 – Health Level 7

o Standard used for information transportation amongst disparate IT

systems.

o “HL7 is an international community of healthcare subject matter

experts and information scientists collaborating to create standards for

the exchange, management and integration of electronic healthcare

information. HL7 promotes the use of such standards within and

among healthcare organizations to increase the effectiveness and

efficiency of healthcare delivery for the benefit of all.”

( http://www.hl7.org/ )

ODBC - Open Database Connectivity. ( Reference: http://en.wikipedia.org/wiki/ODBC )

VPN - V irtual Private Network ( Reference: http://en.wikipedia.org/wiki/ODBC )

• i 2b2 – Informatics for Integrating Biology and the Bedside

CTMS - Clinical Trial Management System


2. Overview

 

2.1 Project Perspective

 

Since the plan for this repository is to be the primary source of data on clinical outcomes for researchers, it is imperative to understand how currently data is being extracted, interpreted, and used by researchers.

 

It is important to define these interactions for the following reasons:

 

  1. To properly vet the functional requirements of an IDR based on the business process of our researcher customers
  2. To better identify areas of improvement in IDR technology this may be required in order to most effectively service the needs of the research community.
  3. To identify all data and document artifacts which are typically generated as a result of interaction with the IDR.  These artifacts must be identified and a determination must be made regarding which of those artifacts may contain PHI (Personal Health Information) and so require addition security and compliance for their proper handling.
  4. Identify the publication process as it relates to the IDR.  That is necessary so that information can be shared with collaborators, used for patient recruitment, provided as a result of oversight when necessary and so that data may be sent for publication on websites and journals.  It is necessary for these activities to be properly defined in order for the security and confidentiality of PHI information to be maintained.  This is also critical for protecting the institutions rights with regard to any new Intellectual Property generated and the possible patenting of that information.

 

2.2 Project Objectives

 

With the introduction of IDR technology we expect that the typical business process followed by a researcher will be fundamentally altered as a result of the availability of this new and power technology.

 

Specifically,

  1. At the very beginning of the research enterprise it is common for a faculty member to seek to validate certain hypothesized correlations between data elements collected from disparate sources.  Until now it has been very difficult to quickly and effectively look for the possible existence of these ad-hoc correlations.  However with the advent of the IDR it will be possible for a faculty member to gain access to the IDR environment and very quickly establish the relevance of a hypothesis.  Please note that access to the de-identified data by the faculty member may require a waiver by the IRB!  Some sites have allowed unrestricted access to the IDR by all faculty however it is clear that such practices may not typically be acceptable by the IRB for such a large and powerful warehouse of information.
  2. Quick interaction with IDR will help with inclusion/exclusion research criteria in a cohort discovery.
  3. Correlations between data elements don’t really prove anything.  However they can be used to bolster the case that a specific line of inquiry may have merit.  That support will typically be used to argue for grant moneys and IRB approval.
  4. During this process access to custom UI’s for Time Domain Query, Genetic Epidemiology etc (depending on the researcher’s Domain) to refine these speculative hypothesis would be extremely useful.
  5. Prior to the conduct of a study and during the process of writing the clinical protocol (an SOP) a study design phase will typically be conducted.  During study design a large number of ad-hoc queries will be made against the IDR.  It is therefore expected that a general purpose data mining tool would be highly useful.  Also, during the study design phase the researcher may typically require access to a statistics package such as SAS and STATA; thus, a secure access must be provided for data extraction and analysis.
  6. Once the study has been designed, a cohort of patients must be identified. IDR will be used to generate that list.  This functionality of an IDR is expected to save a large amount of money and time during the conduct of a clinical trial.  However, these de-identified cohorts will typically be considered to be a form of IP which should not be shared with commercial entities without prior consent from the Technology Transfer (Data Warehouse) department.
  7. The researcher provides the documentation given by the IRB to the Business Analyst associated with the IDR as proof of sufficient access privs
  8. With IRB approval for access to PHI an identified set of patients will be obtained and the process of recruitment of patients into the trial will begin.  The IDR will provide the researcher with a more accurate list of patients.  Although it is expected that the process of recruitment will not be altered by the IDR, the fact that the PHI data is derived from the IDR will have a significant impact on the researcher’s interaction with these systems.  For example, a medical center will provide the information as requested under IRB approval but the establishment of such a large warehouse of information will require the very secure handling of that PHI both within the IDR itself and most importantly by the researcher that is supplied that data by the IDR.  By supplying the researcher with PHI data we are assuming some of the responsibility for maintaining a secure environment where that researcher may safely conduct patient recruitment.
  9. Patients are often contacted via their physician and this doctor must be willing follow the proscribed clinical protocol.  As such the doctor must sign an Investigator approval form (a contract which usually contains a non-disclosure agreement, patent protection and release of liability) and they must be supplied with the necessary training, documentation and forms (a case report form).
  10. Once the doctor has signed the Investigator Approval contract those contracts must be stored (and may be considered a form of PHI).

 

 

3. Current State

 

There are basically 5 different methods of collecting clinical data in common use today. 

 

3.1 Clinical Data Collection Methods:

 

  1. Data entry of the information from a paper source
  2. Direct electronic data capture by the Investigator
  3. Data from a 3 rd party like a lab service
  4. Capture of data from electronic equipment present at the Investigator site
  5. Direct electronic data entry by a patient (a patient diary)

 

3.2 Types of data collected for clinical research :

1.     Case Report Form Data – data entered manually by the patient or by the investigator’s staff into a predefined form

  1. Blank – a “blank” is a CRF with all of its associated instructions and field constraints.
  2. eCRF – a PDF transform of the data entry screens used to enter the CRF
  3. Archival eCRF – a PDF and an associated computer audit trial that shows who has modified the data, when and what the old values were before it was altered.
  4. Sub-CRF – A CRF that was sent to the FDA because the patient died had a serious adverse event or withdrew from the study.
  5. CRF Annotation – a document that describes the association between the fields in a tabular report to the fields on the CRF where the data was first entered.

2.     Patient Recorded Diaries (a kind of ePRO)

3.     Electronic Data Capture

  1. Data collected from clinical systems.
  2. Information collected by the patient personally (patient portals)
  3. Data that is entered directly into the computer system, such as EMR without first writing it down onto paper
  4. Data that is first entered on paper and then electronically converted to a computer format, either through scanning or manual entry.

4.     Source Data Employed Frequently in Clinical Research

  1. Documents
  2. Hospital records
  3. Clinical patient charts
  4. Lab notes
  5. Informal memoranda
  6. Patient diaries
  7. Evaluation checklists
  8. Pharmacy dispensing info
  9. Data from automated systems
  10. Electronic representations of paper records (that have been verified against the source paper and later certified)
  11. Photos
  12. X-rays
  13. Subject records within the pharmacy
  14. Medical department notes from clinical environments

 

 

3.3 Type of Research and Business Process 

  1. Clinical (patient care – “at the bed side”)

More description of the business process will go in this section–  Kevin Haynes from Penn will provide a write up per 5-27-08 meeting

  1. Clinical Research

More description of the business process will go in this section–  Mark Weiner & John Holms from Penn will provide a write up per 5-27-08 per meeting

  1. Basic Research

More description of the business process will go in this section–  UCSF & Penn will provide a write up per 5-27-08 per meeting


3.4 Data request workflow

This section should probably be broken to smaller sections, identify where pts are being lost

 

1.     Step 1 - Cohort Creation

This is defining a set of patients to be studied for a particular research project.  The cohort can be defined from any combination of the four elements mentioned in section 3.2; based on laboratory results, procedures performed, diagnoses, demographics or visit criteria.  In some cases the cohort is supplied directly by the client. An example of this last case is an ongoing study with a previously defined group of subjects.

 

However, when extracting data from electronic records there is a great potential for intentionally not excluding/including some patient groups, depending how data is coded and interpreted.

 

2.     Step 2 - Disqualification from the cohort.  This can be based on any combination of the five factors shown; these are the same five factors as used in cohort creation.  Although logically, one might consider disqualification to be part of cohort creation (and in fact it is), practical considerations usually (but not always) lead to doing it as a separate step.  The size of the cohorts involved and the need to use computing resources efficiently usually require running at least some disqualification criteria on a cohort created by other criteria.

 

3.     Step 3 - is to pull the data wanted for the final cohort, those that have passed the disqualification phase.  The results desired fall into the same five categories as are used for cohort creation and disqualification, as shown above.  In some cases, this data is pulled within the same time frame as the cohort creation; in other cases, it is pulled periodically or a limited number of times at pre-determined intervals.  Typically, it is done in the same time frame in a retrospective study and done later in a prospective study.  In some cases, this concluding data is not provided by THREDS (The Health Record Data Service).

 

All combinations of the factors listed above are theoretically possible but not observed.  Also worth noting is that data sources other than THREDS may be part of any study.

 

Some examples might make this clearer.  In some cases, I have simplified these studies to reduce the number of factors actually being considered or obscured some factors to protect confidentiality of the research.

 

 

The diagram above summarizes what the work flow in the data requests.

This diagram still needs to be redesigned

 

 

3.4 Use cases

This is a contribution from Paul Norris. We may change to different use cases

Study 1:   Calculate incremental cost effectiveness between different colorectal screening tests and identify factors related to non-adherence to screening.  Prospective study with ongoing recruitment.

 

Cohort – Primary care patients within a certain age range who have upcoming appointments within the next 90 days.  This is pulled from two tables, one with demographics and one with scheduling.

 

  Disqualification – Must be due for screening, as determined by not having had any of 3 screening tests within the specific recommended date range for each of the 3 tests.  Must not have any of a list of other diagnoses that are considered to interfere with the study.  This data is pulled for the cohort from a table of combined diagnoses and already performed procedures.

 

Results – The results provided to the investigator from THREDS are sufficient identifiers to identify and contact the patient to recruit them for the study.  Final study results will come from patient charts, checking if they got one or more recommended screening tests and which tests they got.

 

Study 2:   Retrospective study to quantify racial differences in health care utilization and kidney function decline related to chronic kidney disease.

 

 

Cohort – Adults who had two outpatient serum creatinine measurements within the last two years during a specified ten year period.

 

Disqualification – none for this study

 

Results –

  1. Demographic information, including ethnicity and measures of social economic status
  2. Outpatient and inpatient visits
  3. All results for a list of 20 different laboratory tests
  4. Date of death and cause of death
  5. Any of a list of approximately 50 diagnoses
  6. Any of a list of approximately 30 procedures
  7. Sources of information outside THREDS include renal clinic notes and the United States Renal Data System

 

Study 3: Insulin resistance after stroke.  This study tests the hypothesis that a particular drug known to diminish insulin resistance may be beneficial to stroke victims in avoiding a repeat incident.  This study is prospective and contributes to a randomized clinical trial.

 

Cohort – Inpatients or outpatients with diagnosis of stroke.  A new cohort is run each month for patients with a diagnosis received in the prior month.

 

Disqualification –

  1. Known to be dead in clinical data base or California Death Registry.
  2. Prior diagnosis of stroke.
  3. Any of a list of diagnoses thought to be detrimental or possibly interfering with the study

 

Results –

  1. Contact information for qualifying patients, including their providers
  2. Discharge summaries or visit notes
  3. Future clinic appointments for the qualifying cohort.
  4. Looking at the study design, it seems quite likely that there will be drug efficacy measures outside THREDS.

 

This note is intended to provide an overview of the THREDS workflow for a typical study.  What I believe it also shows is that this workflow encompasses a very wide variety of specific workflows and that the studies can be quite complex.

 

3.5 Research Data request - workflow examples at Penn

This is just a space holder….

 

3.6 Research Data request - workflow examples at UCSF

This is just a space holder….

 

3.7 Research Data request - workflow examples at CHOP

This is just a space holder….


Research Data Request Lifecycle

 


4. Capturing User Requests

 

The Discovery Interface will capture new requests for user access. 

 

4.1 User Interface – Logging user requests

 

The “Request New User” interface will provide the following features:

 

  1. Prompt for user background information required by the IRB including the name and contact information of the Principle Investigator, research staff, and administrative contact.  This screen will completely describe exactly who will be granted access to the data.

 

  1. Prompt for the abstract which describes the research to be conducted including relevant background information.

 

  1. An outline of the proposed project plan.

 

  1. A data selection screen which allows the researcher to browse what data are available in the system and select the data elements of interest.

 

  1. An approval status interface which will tell the researcher the status of the data request generated by this system. 

 

  1. This system will also allow a requester to browse through existing requests.  That way if another researcher has already made a similar request for data access then that work can be reused for the new request as well.

4.2 Self-Service

 

The researcher may also look at the existing types of data that are available by clicking on the Data Selection tab.  This screen will enable the requester to check for available data without any assistance. If the requester comes to the conclusion that there is not enough data to support the study, the study and request can be appropriately modified or reconsidered. 

 

It is anticipated that the self-service system may not be able to provide all of the information required by a researcher; however, if a large portion of the requester’s questions can be answered via this self service interface, it will greatly contribute to the scalability of an IDR.

 

 


By clicking on Data Request Status, a requester can check on the progress of the request. He or she can also enter comments.  Researchers may search for a request by entering either the request number, request name, or criteria. (Screen below will be expanded to include the type of query, time estimates, and more defined drop down lists)

 

 

The last tab is only accessible to Data Repository Analysts (BA or Data Analyst).  Analysts may enter comments, and lessons learn as well as edit or look up existing SQL code.

 

This interface will be filled out by the Business Analyst. The Business Analyst will enter the IRB approval number and attach a copy of the protocol.

5. Benefits of IDR

  1. To allow multiple custom changes to the exclusion criteria within a cohort discovery
  2. Expedited cohort discovery process
  3. Integrated data from all clinical systems and research data stores
  4. other…

 

6. Inference Based Ontology Mapping

 

Once a researcher has determined what data elements they require, a request for access to that data must be approved by the IRB.  Once access has been approved the researcher will be given access to a view of the IDR data requested.

 

The researcher will have access to both the direct source data, and when necessary, to translated (ontology mapped) data housed in the IDR.  That data will have been translated by a rules based system and be housed within “harvest tables” within the data warehouse.

 

6.1 Types of electronic clinical data that will typically be captured:

1.     Data derived from other sources within the IDR.

2.     Data entered by the researcher’s staff that may need to be joined with data within the IDR.

3.     Data that may typically flow into the IDR from a source CTMS environment.

4.     Data obtained from a CRO (Contract Research Organization)

 

5.     Some of the Roles

 

6.     Sponsor – the organization that is sponsoring the researcher (The Researcher)

7.     EDC System Owner – the organization (or company) that owns the electronic data capture system

8.     CRO – Contract Research Organization

9.     Clinical Research Associate – oversees the clinical research project and verifies the data collected against source documents.

10.      Investigator – The person that conducts the clinical research or his/her staff

 

5.2 Research Data Request analysis upon project implementation 

  1. Post implementation analysis (benefits…)
  2. workflow diagram
  3. Use cases
  4. Research Data request - workflow examples at Penn
  5. Research Data request - workflow examples at UCSF
  6. Research Data request - workflow examples at CHOP
  7. ….


References: (will format this later)

Grimes David, Schulz Kenneth. An overview of clinical research: the lay of the land.

The Lancet 2002, Vol 359; 57-61

 

Clinical Trial Electronic Data Capture Task Group, PhRMA Biostatistics and Data Management Technical Group. Electronic Clinical Data Capture . EDS Position Paper Revision 1. May 2005

 

Ash Joan, Anderson Nicholas, Tarczy-Hornoch Peter. People and Organizational Issues in Research Systems Implementation. JAMIA, 2008, Vol 15