Variant Call Format or VCF is an efficient and nonredundant way of storing gene sequence variations in text files. Any i2b2 client site that wants to use this data inside i2b2 to query for genotyped subjects will require an ETL process to convert raw VCF file data into i2b2 queryable format. The ETL process shall parse out variant annotations from the raw data in VCF files, transform, and store in the observation_blob field of the observation_fact table in i2b2 data-mart.
In the genomics example package, we have provided a sample ETL .net NET application that can parse zipped vcf VCF files (using using the gunzip library only) files with with genetic variants discovered by the Illumina Multi-Ethnic Genotyping Array and load them on i2b2 SQL to an i2b2 SQL Server database (only). We have provided a . A sample zipped vcf VCF file toois also included. The .net NET application, sample vcf VCF file and required SQL Server scripts can be found inside the “ETL Process” folder inside the package.
To test this application with the sample vcf VCF file, the following steps should be followed: