[CORE-204] Duplicate data in GitHub repo i2b2-data Created: 11/Jan/16  Updated: 25/Jan/16  Resolved: 14/Jan/16

Status: Closed
Project: i2b2 Core Software
Component/s: Data
Affects Version/s: 1.7.07
Fix Version/s: 1.7.07

Type: Bug Priority: Major
Reporter: Keith Dwyer Assignee: Janice Donahoe
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Rank: 0|i002lb:
i2b2 Sponsored Project/s:
i2b2 Core
Developer Notes: Removed duplicate files. Deleted the release_1-7 folder and all its contents.
Testing Notes: TEST STATUS: Completed
TESTED BY: Janice Donahoe


Test Date: 01/13/2016
Build Number:
Test Status: Passed Testing

Clients Tested :
     Not applicable

Environments Tested :
     Browsers: Not applicable for this test
     Databases: Oracle, PostgreSQL, SQL Server
     Client OS: Not applicable for this test

Test Comments:
Tested with the latest Data build and it appears to be working correctly. Bamboo did not have any errors when running the install scripts.

An unrelated issue was found with the Bamboo scripts. The tests for age related queries failed due to the new year. These tests will be updated to reflect the new age of the test patients.

The total size of all files inside the GitHub repo i2b2-data (which is the equivalent of the old i2b2createdb artifact) is 3 GB. Previously, this artifact used to take up 1.51 GB when unzipped.

Looking at the repo, there is the "edu.harvard.i2b2.data" folder, which has always been present. However, there is also an additional "release_1-7" folder which contains the same data as the "Release_1-7" folder underneath "edu.harvard.i2b2.data".

This duplicate data is unnecessary, and one of these folders should be cut out. Looking at the repo, it appears the "release_1-7" hasn't been committed to in 6 months, while "edu.harvard.i2b2.data" has seen activity recently.

Comment by Janice Donahoe [ 12/Jan/16 ]
Thanks Keith! I will take a look at this and fix it right away. I will also place a note in the current 1.7.07-RC1 release not to download as data is duplicated.
Comment by Keith Dwyer [ 13/Jan/16 ]
Would it be possible to update both the master branch and the 1.7.07-RC1 tag with the removed duplicate data directory?
Additionally, which folder (edu.harvard.i2b2.data/ or release_1-7/) is the one that will remain in the repository? My first guess would be the one that has most recently been updated (edu.harvard.i2b2.data, which matches the old createdb archives from i2b2.org). However, the previous 1.7.06 tag does not have this folder (it only has release_1-7/, which does not match the old createdb archives from i2b2.org).

This would affect our internal deployment scripts.
Comment by Janice Donahoe [ 14/Jan/16 ]
I updated the master branch and it now has a new tag called v1.7.07-RC2. The folder that remains is edu.harvard.i2b2.data. We decided to keep this one to prevent any problems sites may have running their ant scripts. The tag and zip file for final released product will be called v1.7.07.
Comment by Janice Donahoe [ 25/Jan/16 ]
On 01/22/2016, the 1.7.07 Release was made available at the following locations.

 - zip files for release 1.7.07 are available on this site. This includes both the code and documentation.

 - source code has been tagged with v1.7.07.
Generated at Fri Jun 05 15:28:00 UTC 2020 using JIRA 7.6.3#76005-sha1:8a4e38d34af948780dbf52044e7aafb13a7cae58.