Health Ontology Mapper
Space shortcuts
Space Tools

 

 

 

 

 

 

The Integrated Data Repository (IDR)

 

Data Discovery and Data Request Lifecycle

 

 

Authors: Maggie Massary 1 , Ketty Mobed 2 , Mark Weiner 1 , Marco Casale 3 , Prakash Lakshminarayanan 2 , John Holmes 1 , Kevin Haynes 1 , Hillari Allen 2 , Paul Norris 2 , Davera Gabriel 4 , Rob Wynden 2

 

 

 

 

 

 

 

 

1University of Pennsylvania – Health System

2University of California , San Francisco – Academic Research Systems

3University of Rochester - Medical Center

4University of California , Davis - Clinical & Translational Science Center

 

 

 

 

 

 

August 12, 2008

 

 

 

 

 

 

 

1. Introduction

 

In efforts to integrate disparate data within the Health System, an initiative is underway to build an Integrated Data Repository (IDR) to collectively hold and integrate clinical and biomedical research data, which also includes economic, administrative, and public health information. This integrated data store will serve the needs of both the Health System, and Clinical/ Translational R esearch. The available and integrated data can then be transformed into useful information that could be structured and managed with ease. The IDR will be an active environment that will allow continuous access to the latest detailed data and built-in analysis to drive optimal clinical practice and biomedical research activities. This product will be a compilation of internal and external data fields drawn from protected medical electronic systems and publically available data files aggregated into one single secure environment.

 

 

1.1 Purpose

 

This document will outline the interaction between biomedical researcher or clinician and the IDR system.  Our primary focus is to both summarize and present in detail the different types of biomedical research pathways where the IDR can serve as the primary source of data and the utility of the IDR for clinical bedside practice.  The IDR environment may also serve as a secure storage site of data to the investigator which can be joined with data already residing within IDR.

 

The purpose of this document is to give the reader a detailed understanding of the goals of the project, the customer requirements, and underlying expectations.

 

 

2. Overview

 

2.1 Project Perspective

 

Since the plan for this repository is to be the primary source for clinical and biomedical data, it is imperative to understand how data is currently being extracted, interpreted, and used by biomedical researchers and clinicians.

 

It is important to define these interactions for the following reasons:

 

  1. To properly vet the functional requirements of the IDR based on the business process of our research customers.
  2. To better identify areas of improvement in IDR technology. This may be required in order to most effectively service the needs of the research community.
  3. To identify all data and document artifacts which are typically generated as a result of interaction with the IDR. These artifacts must be identified and a determination must be made regarding which of those artifacts may contain Personal Health Information (PHI) to therefore require additional security and compliance for their proper handling.
  4. To identify the publication process as it relates to IDR.  This is necessary when sharing information with collaborators, for instance on patient recruitment, to provide an oversight when necessary, and so that data may be sent for publication on websites and journals.  It is pertinent for these activities to be properly defined in order for the security and confidentiality of PHI information to be maintained.  This is also critical for protecting the institutions’ rights with regard to any new Intellectual Property generated and the possible patenting of that information.

 

2.2 Project Properties and Objectives

 

With the introduction of IDR technology we expect that the typical business process followed by a researcher will be fundamentally altered as a result of the availability of this new and powerful technology.

 

Specifically,

  1. At the very beginning of the research enterprise it is common for a researcher to set out to validate certain hypothesized correlations between data elements collected from disparate sources (see Figure 1.).  Until now it has been very difficult to quickly and effectively look for the possible existence of these ad-hoc correlations.  However with the advent of the IDR it will be possible for a biomedical investigator to gain access to this comprehensive data and very quickly establish the relevance of a hypothesis or a biomedical research idea. 1  
  2. Quick interaction with IDR will help with inclusion/exclusion research criteria in a cohort discovery.
  3. Correlations between data elements don’t really prove anything.  However they can be used to bolster the case that a specific line of inquiry may have merit.  That support will typically be used to argue for grant funding and IRB approval.
  4. During this process, access to custom user interfaces (UIs) for Time Domain Query, Genetic Epidemiology, etc. (depending on the researcher’s Domain) to refine these speculative hypotheses would be extremely useful.
  5. Prior to the conduct of a study and during the process of writing the clinical protocol (a standard operating procedure (SOP)), a study design phase is typically conducted. 

The IDR as a general purpose data mining tool is extremely helpful to do large numbers of ad-hoc queries to aid in the study design. Also, during the study design phase the researcher may typically require access to a statistical software package such as SAS, STATA or SPSS; thus, secure access can also be provided for data extraction and analysis in the pre-research phase.

  1. Once the study has been designed, a cohort of patients must be identified. IDR would be used to generate that list.  This functionality of an IDR is expected to save a large amount of money and time during the conduct of a clinical trial or other biomedical research.  However, these de-identified cohorts will typically be considered to be a form of Intellectual Property (IP) which should not be shared with commercial entities without prior consent from the Technology Transfer (Data Warehouse) Office 1 .
  2. The researcher provides the documentation given by the IRB to the Business Analyst associated with the IDR as proof of sufficient access privileges. With IRB approval for access to PHI, an identified set of patients will be obtained and the process of recruitment of patients into the trial will begin.  The IDR will provide the researcher with a more accurate list of patients. 2   By supplying the researcher with PHI data we are assuming some of the responsibility for maintaining a secure environment where that researcher may safely conduct patient recruitment

 

  1. Physicians who contact their patients for study participation must be willing to follow the prescribed clinical protocol and must sign an Investigator approval form (a contract which usually contains a non-disclosure agreement, patent protection and release of liability) and they must also be supplied with the necessary training, documentation and forms (a case report form).

1 Please note that even access to the de-identified data by the investigator may require a waiver by the IRB.  Some sites have allowed unrestricted access to the IDR by all faculty, however it is clear that such practices may not typically be acceptable by the IRB for such a large and powerful warehouse of information.

 

2 Although it is expected that the process of recruitment will not be altered by the IDR, the fact that the PHI data is derived from the IDR will have a significant impact on the researcher’s interaction with these systems.  For example, a medical center will provide the information as requested under IRB approval but the establishment of such a large warehouse of information will require the very secure handling of that PHI both within the IDR itself and most importantly by the researcher that is supplied that data by the IDR.

  1. All signed Investigator Approval contracts, must be securely stored (as they may be considered a form of PHI).

 

 

Figure 1. The Current State of Bio medical Research Flow 

             

 

 

3. Current and Projected Data Collection and Processing Stages & Anticipated Challenges

 

There are basically 5 different methods of collecting clinical practice and biomedical research data in common use today. 

 

3.1 Clinical Data Collection Methods:

 

  1. Data entry of the information from a paper source
  2. Data extracted from individual electronic clinical systems such as electronic medical records (EMR) or computerized physician order entry ( CPOE)
  3. Data from a 3 rd party such as a lab service
  4. Capture of data from electronic equipment present at the investigator site
  5. Direct electronic data entry by a patient (a patient diary).

 

3.2 Types of data used in clinical and biomedical research :

  1. Case Report Form Data – data entered manually by the patient or by the clinician/investigator’s staff into a predefined form
    1. Blank – a “blank” is a case report form (CRF) with all of its associated instructions and field constraints.
    2. eCRF – a portable document format (PDF) transform of the data entry screens used to enter the CRF
    3. Archival eCRF – a PDF and an associated computer audit trial that shows who has modified the data, when and what the old values were before it was altered.
    4. Sub-CRF – A CRF that was sent to the FDA because the patient died had a serious adverse event or withdrew from the study.
    5. CRF Annotation – a document that describes the association between the fields in a tabular report to the fields on the CRF where the data was first entered.
  2. Patient Recorded Diaries (a kind of ePRO)
  3. Electronic Data Sources
    1. Data collected or compiled in clinical / research / public health systems.
    2. Information collected by the patient personally (patient portals)
    3. Data that is entered directly into the computer system, such as EMR without first writing it down onto paper
    4. Data that is first entered on paper and then electronically converted to a computer format, either through scanning or manual entry.
  4. Source Data Employed Frequently in Biomedical Research
    1. Documents
    2. Hospital records
    3. Clinical patient charts
    4. Lab notes
    5. Informal memoranda
    6. Patient diaries
    7. Evaluation checklists
    8. Pharmacy dispensing info
    9. Data from automated systems
    10. Electronic representations of paper records (that have been verified against the source paper and later certified)
    11. Photographs
    12. X-rays
    13. Subject records within the pharmacy
    14. Medical department notes from clinical environments
    15. Hospital specimen banks, i.e. pathology department, blood banks
    16. Local, state or federal public health reports and data banks, i.e. cancer registry, death index

 

 

3.3 Type of Research, Business Processes and the Vision for IDR

 

a. Biomedical Research

Biomedical research basically encompasses four distinct research arenas: 1) clinical research , 2) p opulation research , 3) animal research and 4) bench research . Research in all four arenas are fundamentally necessary to reach the common goal of improving individual and population health. A researcher may utilize more than one research arena for a specific research project. The availability of IDR will consequently be of great help to the researcher to streamline the data gathering and analysis process.

 

 


Figure 2. illustrates that work and business flows among the four outlined research arenas are very similar.

 

 

 

 

 


Figure 3. presents a simplified generic flow of transactions which the biomedical researcher needs to undertake from the pre-research phase to post-phase research. The possible interactions of a researcher to IDR (IDR) are also illustrated.

 

 

 

 

 

 

 

 

Use Case Examples for Biomedical Research

The use case examples below illustrate typical requests and needs by biomedical researchers in the four different described research arenas and the required interactions to IDR.

 

Use Cases for Clinical Research (CR)

 

CR-Study 1:

Title : Specific Toxicities, Adverse Events, and Hypoglycemia of Various Diabetes Therapies 

Description : The purpose of this retrospective clinical study is to investigate the therapeutic effect of how sugar levels are controlled and to quantify blood sugar levels for each drug. The study will further explore specific clinical outcomes and adverse events associated with diabetes therapies, such as heart failure, renal disease, liver dysfunction, polyps, and weight gain.

Assumption : Sufficient amount patient information is available in the clinical data base(s) to provide the ability to do a power analysis and come up with conclusive evidence to support the hypotheses.

Data Plan : To aggregate patients with diabetes, based on one unique patient identifier (i.e. MRN), into various defined cohorts, including aggregations based on combinations of number, frequency and duration of diagnosis codes, relevant lab parameters, and the intensity and duration of medications.

Research Requirements : Sufficiently complete data is available in IDR to perform the primary and associated secondary analyses.  These data should include effect modifiers and outcomes, but also elements, such as co-morbidities and demographics that could be used as stratifiers to adjust for confounding.

Conclusion : The pulled and aggregated dataset is suitable for the proposed analysis. The dataset needs to be exported into a structured format that can be studied and plugged into the analytical software*.

Alternative Conclusion : The aggregated dataset is not adequate and suitable for the proposed analysis. The principal investigator may need to redefine the data search before a new dataset can be pulled and aggregated.

 

CR-Study 2:

Title : The Impact of Avandia® and Other Similar Glycemic Controlling Medications on Cases of Myocardial Infarction (MI)

Description : This is a comparative study to determine if there is an impact on cases of MI and using different types of glycemic controlling drugs such as Avandia, frequently used to control diabetes. A clinical investigator would like to search the available clinical data and compare the use of Avandia and similar drugs on the occurrence of myocardial infarctions (MI).

Assumption : Electronic health record data, clinical trials data and medication data are available on IDR.

Data Plan : The Clinical Investigator submits the request specifying all drug and disease outcome codes under investigation, plus any additional data points needed for the planned aggregated data analyses.

Research Requirements : All approvals are in place. Sufficiently complete medication data and accurate disease status data are available in IDR to perform the comparative analyses, including the availability of de-identified demographic information and co-morbidities for stratifaction and adjustment purposes. The extracted data needs to be converted and delivered in a format specified by the investigator.*

Conclusion : The Clinical Investigator obtains an aggregated de-identified data set regarding clinical trials or any patient encounters on the use of Avandia and similar medications with any association related to MI. The data set is composed of variables from the IDR which are derived from electronic health record data, clinical trials data and medication data.

Alternate Conclusion : Some of the requested therapeutic data are not available at all or incomplete. Based on the specific output results the investigator re-assesses and re-defines the original scientific research plan.

 

CR-Study 3:

Title : A Randomized Clinical Trial to Compare the Therapeutic Effects of Combination Drugs and Singly Dispensed Drugs in Post-Coronary Angioplasty Patients

Description : The intent of this study is to compare differences in therapeutic delivery and effects in randomly allocated post-coronary angioplasty patients with high cholesterol and high blood pressure to the 1) regularly prescribed combination drug Caduet®, or 2) separately Norvasc® and Lipitor®, or 3) generic amlodipine and atorvastatin.

Assumption : Hospital data for patients who have just undergone coronary angioplasty are readily identifiable through the IDR.

Data Plan : Over a 12-month period, patients who are either scheduled or just have undergone coronary angioplasty will be identified through IDR and the requested patient information is forwarded to the investigator. Data to be received should include date and time of surgery, patient identifiers such as DOB, contact information, surgeon name and other specified demographic data.

Research Requirements : Date and surgery information is readily identifiable and extractable prior to, or immediately after the coronary angioplasty procedure using the clinic IDR. The clinical researcher submits the specified patient data request with proof of all required approvals. The data will be delivered to the investigator in the pre-specified form.*

Conclusion : The specified data is extracted, merged can be made available to the researcher.

Alternate Conclusion 1 : The researcher was not able to provide all the required documentation. The data extraction will be put on hold until the researcher has been able to provide required information.

Alternate Conclusion 2 : The requested information cannot be pulled in the prescribed timely manner. Refer this problem back to the investigator.

 

Use Cases for Population Research (PR)

 

PR-Study 1:

Title : Prenatal Care, Delivery Procedure, Length of Post-Partum Stay, Health Insurance, and Demographic Characteristics at Time of Delivery in Three In-system Hospitals 

Description : This de-identified retrospective observational study will compare prenatal care frequency, type of delivery procedure, duration of post-partum hospital stay, demographic and health insurance characteristics for all women that have delivered in three in-system hospitals in the past 5 years.

Assumption : Information of frequency of prenatal care, complete delivery, health insurance and demographic information is available for all women who have delivered at the three in-system hospital sites in the IDR.

Data Plan : De-identified, but individual information on prenatal care frequency, delivery method, duration of post-partum hospital stay, health insurance, and demographic characteristics need to be extracted and merged from clinical databases available in the system’s IDR. The merged and de-identified data will be analyzed by the investigator controlling for influential confounders.

Research Requirements : Prenatal visit frequencies and complete delivery (discharge) and insurance information for patients are available from clinic charts in IDR for the three in-system hospitals for the last 5 years. The data can be extracted and merged into a specified dataset type.*

Conclusion : All requested 5-year data for the 3 in-system hospitals exist and are extractable into one data file to be analyzed by the investigator.

Alternate Conclusion : Prenatal visit history is not available for many of the women and only partial 5-year data exists for one of the in-system hospitals, but complete 3-year data exists for all hospitals under study. Refer back to the investigator and wait for further directives.

 

PR-Study 2:

Title : Do Age and Race/Ethnicity Matter in 3-Yr. Outcomes of Different Types of Implemented Prostate Cancer Therapy on Quality of Life?

Description : This is a prospective observational study to investigate how prostate cancer therapeutics impact quality of life in regard to age and race/ethnicity. Over a 12-month period, 250 men newly diagnosed with and treated for prostate cancer will be recruited into this quality of life study to be followed-up for 36 months after the initial treatment procedure. Required data for initial recruitment will include Early Case Ascertainment (ECA) data from the regional or hospital cancer registry, which will include relevant physician and patient information needed to recruit prostate cancer patients.

Assumption : ECA of prostate cancer and relevant physician and pathology data is captured and available on IDR.

Data Plan : MD and patient information and pathology confirmation on newly diagnosed prostate cancer patients located on IDR, are merged and forwarded to investigator on a continuous basis for the 12-month recruitment period or shorter, depending on when target recruitment numbers are reached.

Research Requirements : The investigator has full IRB study approval; early identification of prostate cancer cases, contact information on diagnosing physician and patient, and definitive laboratory results are available and can be extracted from IDR and merged.

Conclusion : The required MD, patient and laboratory information can be extracted in a timely manner and merged and forwarded on a regular basis to the investigator for the next 12 months.

Alternate Conclusion : ECA is not readily available at the research hospital, but only through the State’s Cancer Registry. It will need to be explored if access to the state cancer registry is permissible and what data elements are available if use is permitted. Forward this information back to investigator waiting for further directives.

 

Use Cases for Animal Research (AR)

 

AR-Study 1:

Title : Propagation of Human Brain Tumor Cell Lines in Mice and Other Rodents

Description : The intent of this de-identified research is to find the best (animal) rodent model in which glioblastoma cells (brain tumor cells) can be propagated and maintained for further in vitro research purposes. Freshly autopsied human glioblastoma cells will be identified for through the clinical pathology database available in IDR. The available and appropriate pathology tissues will then be retrieved and extracted. Glioblastoma cells will then will be injected into different rodent species (i.e. mice, rats, hamsters) and monitored, measured and evaluated over time for normal growth, pathology and animal and cell survival for future harvesting purposes.

Assumption : The clinical database available in IDR contains detailed reports of the type, date of biopsy and location of banked pathological tissues, including biopsied pathology specimen from brain tumor patients.

Data Plan : The investigator submits a detailed data request to locate newly autopsied unaltered pathological glioblastoma tissues using IDR. The deliverable data also should include date of biopsy and the specific brain location biopsied.

Research Requirements : The investigator has all the required approvals. The extracted information specifies where the requested pathological tissues are banked and also contains contact information for the banking location.

Conclusion : The IDR can link specific pathology and tissue bank requests to existing collected pathology and tissue samples and the location where the samples have been processed and stored.

Alternate Conclusion : The pathology department does not receive and store fresh biopsied cancer tissue samples. However newly obtained pathology tissue samples are recorded and kept in another investigator’s laboratory. The IDR sends the original investigator this information so the research protocol can be modified according to resulting investigators’ agreements.

 

AR-Study 2:

Title : Using a Mouse Model for the Evaluation of Pathogenesis and Immunity to Specific Influenza Virus Strains Isolated from Humans

Description : The intent of this study is to evaluate growth, pathogenesis and immunity of specific influenza virus strains isolated from humans in mouse models. Identified human viral isolates from influenza infected patients will be inoculated intra-nasally into groups of homogeneous laboratory mice and their course of illness and herd immunity to the illness will be documented and evaluated.

Assumption : Immediate (time-sensitive) information of human infectious disease occurrences and laboratory documentation on infectious disease isolates are captured in the clinical database(s) and available in IDR.

Data Plan : The investigator gets notified by IDR who, when and where specified new infected influenza tissues or isolates are available.

Research Requirements : The study is time-sensitive. The sampled virally infected tissues or isolates must still be viable and intact to be prepared for inoculation into mice. The investigator (with all approvals in place) will need to be notified through IDR who the diagnosing clinician was and when and where the isolated and confirmed viral specimens are being stored.

Conclusion : The requested information is available in IDR within the set time frame and can be forwarded to the investigator, who in turn receives the infectious tissue samples to prepare and inoculate the laboratory mice.

Alternate Conclusion : The available information in IDR does not meet the time constraints necessary for viable viral tissues to be harvested and prepared. This limitation needs to be explained to the investigator and further research directives need to be awaited.

 

 

Use Cases for Bench Research (BR)

 

BR-Study 1:

Title : In vitro Development of a New and Powerful Multi-Drug Regimen to Treat Multidrug-Resistant Tuberculosis (MDR-TB)

Description : The purpose of this multi-phased study is to firstly develop a powerful new multi-drug regimen in vitro to treat MDR-TB. If successful, the next step will be to test this new drug regimen in vivo (animal models). For this study MDR mycobacterium tuberculosis (mt) bacterial colonies harvested from infected human biologic samples (i.e. sputum) are required. The investigator needs specimen storage and access information and detailed information on de-identified patient treatment and specific bacterial drug susceptibility properties for the collected and banked biologic samples available in either clinical charts or State Health Department documentation.

Assumption : IDR has the capacity to store and make accessible this type of required information to the investigator.

Data Plan : A clinic- or institution-wide record search for banked and available MDR-mt colonies within the IDR will generate a list of labs (or locations) where the specified organisms are housed. Specific treatment outcomes for each banked specimen are available as well. This information is compiled and forwarded on to the investigator.*

Research Requirements : The investigator has all the appropriate approvals. The IDR contains all the requested information and can be shared with the investigator. The investigator has the appropriate means to receive, store and maintain bacterial samples.

Conclusion : The requested search items are known and available in IDR. The information is compiled and sent to the investigator.

Alternate Conclusion : All the required information is not available through IDR. However, a list of local and state investigators, who have either in the past or are currently studying, storing or maintaining mt colonies, is available through IDR. Refer these results back to the initial investigator and wait for further directives.

 

BR-Study 2:

Title : Comparison of Genetic Ancestry and Genetic Markers of Small Cell and Non-Small Cell Lung Cancer Patients

Description : The scope of the study involves using de-identified frozen blood samples, collected during another investigator’s population-based lung cancer study and banked at that investigator’s laboratory. Since the specific lung cancer diagnosis of the study patients have been histologically identified and documented, the present investigator requests only those blood samples that have histologically been identified as either small-cell carcinoma or non-small cell carcinoma. Genetic ancestry and marker studies will be carried out on the DNA components of the stored blood samples and compared between the two different histologies.

Assumption : Diagnostic and histological information for the stored blood samples and laboratory contact information is available on IDR and study and lab id numbers can be linked.

Data Plan : In the IDR where the previous investigator’s data may be available, the study id number and histology code of those patients who have been diagnosed and coded as either small or non-small cell lung carcinoma will be extracted and linked to the lab database to identify lab numbers linked to the stored blood samples. The aggregated data set contains patient study id number, lab number and histology code. Based on the lab id number, the correct frozen blood specimen can be pulled and genetically analyzed.

Research Requirements : The investigator has all the appropriate approvals. The IDR contains the previous investigator’s study outcome and laboratory information.

Conclusion : Study id and lab id are linkable and the histology code can be extracted from the study data file. An aggregated list with three data points is procured for the investigator.

Alternate Conclusion : The study information which contains patient study id and lung cancer diagnosis and histology codes are available on IDR. However, the laboratory data is not available in the IDR. Study id numbers and histology codes for the patients under study can be pulled and transmitted to the investigator. A short message informs the investigator that the lab data is not available in IDR. Await further directives.

 

* A data dictionary will be provided to define the variables for analysis.

 

 

b. Clinical Patient Care – “Bedside” Practice

The IDR is foreseen to be utilized for another purpose as well. Clinicians, who are administering patient care at the ‘bedside’, would be able to use the IDR ‘in-real’ time to pull up relevant information in regard to general patient characteristics and treatment profiles for specific diseases and conditions.

 

Use Case for patient care-bedside practice:

 

Title : Local Real-time Antibiotogram

Description : A clinician is examining a new inpatient with fever, and other signs of infection localizing to a specific body system.  Cultures are drawn, but the clinician wants to initiate an empiric antibiotic regimen that is consistent with the patient’s age, allergies, and renal function as well as typical organisms associated with that system and antibiotic resistance patterns that are specific to that part of the hospital based on recent patterns.

Cohort Requirement : Patients who are clinically and demographically similar to the current patient.

Data Requirement : To access current patient characteristics and basic decision support to avoid presentation of regimens to which the patient is allergic, or at doses that are inconsistent with the patient’s age and renal function; decision support to suggest possible infectious etiologies given patient’s symptoms; access to comparable microbiology culture data on prior patients who had similar presentations.

Conclusion : Generated dataset shows the final diagnoses of patients with similar clinical presentations as the current patient, as well as the summarized results of prior patient culture, including organisms isolated and resistance patterns, the antibiotic regimens chosen for these other patients and the overall clinical course of these prior patients.  This data should be integrated with the current patients clinical status to avoid highlighting antibiotic regimens that would be inappropriate for the current patient based on allergies or renal function.

 

 

3.4 Potential Challenges of IDR Utilization to the Investigator

Several challenges to the investigator and gaps using the IDR are apparent:

a. Data Accessibility – The investigator may have knowledge of other relevant data sources pertinent for the anticipated research but are not housed in IDR,

b. Duplicate Data with Differing Values – Some available data, such as date of birth or date of death, may be captured in more than one database housed in IDR; some values of this duplicate data may be different.

c. Data Collection Protocols – To write up their own protocol an investigator may need to know how exactly other IDR housed data were collected.

d. Data Quality – The cleanliness, reliability and interoperability of data and data sources included in IDR need to be addressed and ascertained

e. IRB Issues – All data residing in IDR need to be backed by institute-specific IRB approvals.

f. Data Ownership / Intellectual Property Issues – The need to have data ownership issues be resolved and documented for all IDR available data

 


4. Capturing User Requests

 

              Figure 4. Research Data Request Lifecycle              

 


Figure 4. illustrates the general architecture and flow of information in the IDR from data discovery to the finish line of a study. 

 

4.1 The Data Discovery User Interface (UI)

 

The Data Discovery UI (Figure 5.) will facilitate detailed views of the IDR-available data and self-selection of specific variables needed by the researcher.

 

This screen will provide the following features:

 

  1. Prompt the principal investigator (PI) to name data discovery query and save it or to recall a saved query

 

  1. Prompt the PI to search available data bases, select specific data criteria and find more explanations or characteristics for specified data points

 

  1. Prompt the PI to indicate which of the data sources are to be used

 

  1. Give the PI an overview of the selected variable list

 

Figure 5. Data Discovery UI

 

4.2 IDR Request User Interface

 

This screen will log user requests to capture a new request or allow the PI to access a saved query or previous data request. This screen will describe who and for what purpose IDR access is requested.

 

The ‘IDR Request’ interface (Fig. 6) will provide the following features:

 

  1. To prompt PI to select the request as ‘new’ with a relevant project title or to choose from the list logged in from previous encounters.

 

  1. To prompt the name and contact information of the PI or PI designated contact person (imported from User Background UI where applicable).

 

  1. To prompt for a brief description of the proposed research

 

  1. To prompt if the study already has been IRB approved, including iploading of relevant documents

 

 

Figure 6. IDR Request UI

 

 

4.3 Data Request User Interface

 

The Data Request UI (Fig. 7) will summarize the anticipated research and specify in detail the data criteria and parameters. This screen will

 

  1. Prompt the PI to select research type and to select and upload required IRB / IACUC (for animal research) information and documentation

 

  1. Prompt the name and contact information of PI  or PI designated requester (imported from User Background UI where applicable).

 

  1. Prompt for a detailed research plan or the research abstract to be downloaded

 

  1. Prompt the PI to narrow criteria search to and designate variable parameters

 

  1. Prompt the requester to choose the report format.

 

 

Figure 7. Data Request UI

     

4.4 Other Projected User Interfaces

 

Several other User Interfaces will also be incorporated into the working schema of IDR.

 

  1. The Background User Information UI is where initially the PI enters his/her professional/institutional affiliation and contact information.

 

  1. The Mapping UI – here the PI can request for specified mapping of requested or extraneous data source.

 

  1. The Formatting UI – here the PI can request for specific available formatting

 

  1. The Data Status Request UI – the PI can check on his/her specific data request status (based on Req. ID #) and also will be able to retrieve the processed data set/query (defined drop down list) at this site

 

  1. The BA only UI is only available to the Business Analyst where all relevant queries are performed and the status of each request is kept updated. Analysts may also enter comments, and lessons learned, as well as edit or look up existing SQL code.

 

5. Benefits of IDR

  1. To allow multiple custom changes to the exclusion criteria within a cohort discovery
  2. Expedited cohort discovery process
  3. Integrated data from all clinical systems and research data stores
  4. other…

 

6. Rules Based Ontology Mapping

 

Once a researcher has determined what data elements they require, a request for access to that data must be approved by the IRB.  Once access has been approved the researcher will be given access to a view of the IDR data requested.

 

The researcher will have access to both the direct source data, and when necessary, to translated (ontology mapped) data housed in the IDR.  That data will have been translated by a rules based system and be housed within “harvest tables” within the data warehouse.

 

6.1 Types of electronic clinical data that will typically be captured:

  1. Data derived from other sources within the IDR.
  2. Data entered by the researcher’s staff that may need to be joined with data within the IDR.
  3. Data that may typically flow into the IDR from a source CTMS environment.
  4. Data obtained from a CRO (Contract Research Organization)

 

6.2 Some of the Roles

 

  1. Sponsor – the organization that is sponsoring the researcher (The Researcher)
  2. EDC System Owner – the organization (or company) that owns the electronic data capture system
  3. CRO – Contract Research Organization
  4. Clinical Research Associate or Project Coordinator/Manager– oversees the clinical research project and verifies the data collected against source documents.
  5. Principal Investigator – The person that has IRB permission to conduct the research and has overall responsibility for the study

 


References : ( to be formatted )

 

Grimes David, Schulz Kenneth. An overview of clinical research: the lay of the land.

The Lancet 2002, Vol 359; 57-61

 

Clinical Trial Electronic Data Capture Task Group, PhRMA Biostatistics and Data Management Technical Group. Electronic Clinical Data Capture . EDS Position Paper Revision 1. May 2005

 

Ash Joan, Anderson Nicholas, Tarczy-Hornoch Peter. People and Organizational Issues in Research Systems Implementation. JAMIA, 2008, Vol 15

 

 

 

 

APPENDIX

 

IDR – Integrated Data Repository

IS – Information Services

ETL – Extract Transform and Load

EMPI – Enterprise Master Patient Index

HIPAA – The Health Insurance Portability and Accountability Act

o Enacted by the US Congress to establish, amongst other items, a

national standard for protecting the security and privacy of health

information. ( http://www.hhs.gov/ocr/hipaa/ )

HL7 – Health Level 7

o Standard used for information transportation amongst disparate IT

systems.

o “HL7 is an international community of healthcare subject matter

experts and information scientists collaborating to create standards for

the exchange, management and integration of electronic healthcare

information. HL7 promotes the use of such standards within and

among healthcare organizations to increase the effectiveness and

efficiency of healthcare delivery for the benefit of all.”

( http://www.hl7.org/ )

ODBC - Open Database Connectivity. ( Reference: http://en.wikipedia.org/wiki/ODBC )

VPN - V irtual Private Network ( Reference: http://en.wikipedia.org/wiki/ODBC )

• i 2b2 – Informatics for Integrating Biology and the Bedside

CTMS - Clinical Trial Management System