I. Introduction and prerequisites This document describes the process of creating scriptlets for the GIRI platform. It is recommended to have a look at the example scriptlets (config.xml & mainscript.r) and how they behave in the webclient. Here you will find comprehensive background information and some hints for developing such scriptlets. It is assumed that GIRI is correctly installed (as described in install_notes.txt) and that you have access to the root scriptlet folder, where the scriptlets are stored. Furthermore an i2b2 account is required that enables logging into the i2b2 webclient interface. Beside this document's content you need knowledge of the R programming language. II. Workflow of the GIRI enduser a) End user clicks on "Statistical functions" in the analysis view of the i2b2 webclient plugin. The scriptlet list is loaded as drop down menu: For every subfolder under the root scriptlet folder an entry is shown in the scriptlet list. If there are config.xml files, these files are checked for XML validity and an error is shown if not valid. b) End user chooses one scriptlet from the dropdown list. Now the scriptlet options appear. c) End user defines the following input parameters: - 1 or more patient sets. - 0 or more concepts. - 0 or more additional input parameters. d) End user clicks on "View results" and now has the ability to: - Inspect results displayed as value(s), tables or plot images. - Inspect output stream (stdout or stderr) of R (useful for debugging the script). - Download .csv files of the tables. - Download .svg files of the plots. - Download the R environment file. III. Workflow of the GIRI scriptlet developer a) Create a new subfolder ( = scriptlet folder) in the root scriptlet folder. b) Create a config.xml file inside this folder (or copy it from an existing scriptlet) and edit it as required. c) Create a mainscript.r file inside this folder and use R programming language for your statistical computations. d) Test and debug your script. The steps b), c) and d) are described in detail in the following part. IV. Creating a config.xml file First of all: As the GIRI platform follows the principle "Convention over configuration" you don not even need a config.xml file (see Example 1). Then the scriptlet's name corresponds to the name of the scriptlet folder and configuration options stay undefined (e.g. scriptlet description) or have default values. However it is recommended to have a config.xml file - at least for giving the end user some usage information. A config.xml file consists of three main parts (which can also be ommited if not required): a) settings, b) additional inputs, c) custom outputs a) settings: General scriptlet settings and description texts. - title: The scriptlets's title that is used instead of the scriptlets's folder name. Appears in the scriptlet dropdown list, as heading and for some internal things. - description: This text is displayed directly below the heading on the "Choose scriptlet" tab. - passROutput (boolean): Should the stdout output of the R script (e.g. created with print function) be captured and displayed on the results tab? - passRErrors (boolean): Should the stderr output (error messages) of the R script be captured and displayed on the results tab? This is especially useful for debugging - resultDescription: This text is displayed on top of the "View results" tab. - plotDescription: This text is displayed above the plot images. b) additional inputs: Give the end user the possibility for further customizing and influencing the result. There are 3 different types of additional input parameters. - text: A simple textbox. It is possible to set a title, a description and the number of lines (-> height) of the box. - dropdown: A dropdown list with predefined options. It is possible to set a title, a description and the available options. - concept: An additional concept drag & drop field. The concept path will be transferred. It is possible to set a title and a description. All additional input variable names and descriptions are displayed in the web frontend. In R they are initialized as key/value pairs (see below). c) custom outputs: Beside standard output variables (see below), it is possible to define custom output variables. The advantage is, that these outputs have their own names and descriptions, which will be displayed on the "View results" page. V. Creating a mainscript.r file The mainscript.r file holds the R code that is processed in an R instance. It runs in an environment with preinitialized data that can be used for computations. This data come either from the i2b2 data warehouse or from the user defined additional input parameters. 1. Input All additional input parameters are saved in the character vector GIRI.input. The elements are named with the name given in the config.xml file. So it can be seen as a String/String key/value map. You can access them for example with GIRI.input["Plot label"]. As written beforehand the end user defines 1 or more patient sets and 0 or more concepts. For every patient set following data is available: - GIRI.patients: Data coming from the patient_dimension i2b2 database table for the specified patient set. - GIRI.observations: All observations of the specified patient set that are assigned to at least one of the specified concepts. Data come from the observations_fact i2b2 database table. (Not available if no concept was specified). - GIRI.events: Same as GIRI.observations but with data coming from the visit_dimension table. - GIRI.observers: Same as GIRI.observations but with data coming from the provider_dimension table. - GIRI.modifiers: Same as GIRI.observations but with data coming from the modifier_dimension table. These data structures are R lists: Data for the first specified patient set (Patient set 1) is hence found in GIRI.patients[[1]], GIRI.observations[[1]], GIRI.events[[1]], GIRI.observers[[1]] and GIRI.modifiers[[1]]. For the second patient set (Patient set 2), set the index to 2. The tables in these lists have the same columns as in the corresponding i2b2 DB tables. If more than one concept is specified, all corresponding observations, events etc. are unified. It is possible to separate them again in R by using: - GIRI.concept.names: Character vector that holds the names (concept_paths) of the specified concepts. Length is equivalent to the number of specified concepts (and independent of the specified patient sets). 2. Output It can be distinguished between 3 types of output data, that are displayable in the "View results" tab: - R data: All R objects can be made available directly by assigning them to reserved output variables. We distinguish between standard output variables and custom output variables. Standard output variables do not have custom names and descriptions and do not need to be specified in the config.xml file. They are named: GIRI.output.1, GIRI.output.2, GIRI.output.3 ... Please make sure that the first standard output variable is named GIRI.output.1 and all succesive variables have an incrementing index without gaps. Custom output variables have to be assigned inside the GIRI.output list. This has to be done exactly with the name, that has been specified in the config.xml file. For example: GIRI.output[["My output name"]] <- 42 Note that objects of class "data.frame" and "matrix" will be displayed as HTML tables with a possibility to download a csv file of this table for further processing. All other data are converted to a string representation (toString function in R). - Plots: Before the script runs, the svg output device is activated (svg command in R). After the script has run through, the plots are automatically saved by def.off(). So the only thing you have to do for plotting is using a plot command in R. If you use it multiple times, multiple plots are created. The plots can also be downloaded by the end user. If you have good reasons for initializing your own plotting device you have to consider two things: - You have to use the svg command, as other formats are currently not supported. - Use "/plot%03d.svg" as filename while has to be the absolute path to the plot directory of the webserver (e.g. /var/www/webclient/js-i2b2/cells/plugins/GIRIPlugin/assets/plots). - Output from stdout or stderr: If activated in the config.xml file, output from these R streams is directly passed as text to the webclient frontend. Additionally it is possible to save the whole R environment file for debugging or further usage. VI. Testing and debugging A good approach for developing a new scriptlet is: Create a config.xml file and an empty mainscript.r (assign custom output variables with dummy values), choose your scriptlet in the web frontend and drag and drop some patient sets and concepts, download the R environment and load it into an R session ( load("path to RImage") ). Now play around interactively with this R session and write your script. Test your script with source("path to mainscript.r"). Finally test your scriptlet in the webclient (considering border cases) with passRErrors enabled. VII. Further notes and known issues - Errors while processing the R script could cause it to abort. So in an error case you cannot rely on that even undamaged data is displayed correctly or that the downloadable R environment file is healthy and up-to-date. - Consider that assigning an empty data.frame to an output variable (and some more operations on them) causes errors. This happens often e.g. when no observations for a patient set are in the database. So check this case carefully. - If several patient sets are used for an scriptlet, there could be performance issues. This is not due to the implementation of the GIRI platform, nor due to a slow processing of the R script (in most cases). The reason is that for every patient set, one "GetPDOfromInputList" message request has to be sent to the i2b2 CRC cell, which takes a significant amount of time. - Some functions will not work or will cause errors: This happens mainly if you have not enough rights for doing an action (open a port, write a file in filesystem...) or using functions that: a) let you choose a file interactively, b) read something from console as input, c) load or save history.