[CORE-204] Duplicate data in GitHub repo i2b2-data Created: 11/Jan/16 Updated: 25/Jan/16 Resolved: 14/Jan/16
|Project:||i2b2 Core Software|
|Reporter:||Keith Dwyer||Assignee:||Janice Donahoe|
|Remaining Estimate:||Not Specified|
|Time Spent:||Not Specified|
|Original Estimate:||Not Specified|
|i2b2 Sponsored Project/s:||
|Developer Notes:||Removed duplicate files. Deleted the release_1-7 folder and all its contents.|
|Testing Notes:|| TEST STATUS: Completed
COMPLETION DATE: 01/13/2016
TESTED BY: Janice Donahoe
Test Date: 01/13/2016
Build Number: 1.7.07.0002
Test Status: Passed Testing
Clients Tested :
Environments Tested :
Browsers: Not applicable for this test
Databases: Oracle, PostgreSQL, SQL Server
Client OS: Not applicable for this test
Tested with the latest Data build and it appears to be working correctly. Bamboo did not have any errors when running the install scripts.
An unrelated issue was found with the Bamboo scripts. The tests for age related queries failed due to the new year. These tests will be updated to reflect the new age of the test patients.
The total size of all files inside the GitHub repo i2b2-data (which is the equivalent of the old i2b2createdb artifact) is 3 GB. Previously, this artifact used to take up 1.51 GB when unzipped.
Looking at the repo, there is the "edu.harvard.i2b2.data" folder, which has always been present. However, there is also an additional "release_1-7" folder which contains the same data as the "Release_1-7" folder underneath "edu.harvard.i2b2.data".
This duplicate data is unnecessary, and one of these folders should be cut out. Looking at the repo, it appears the "release_1-7" hasn't been committed to in 6 months, while "edu.harvard.i2b2.data" has seen activity recently.
|Comment by Janice Donahoe [ 12/Jan/16 ]|
|Thanks Keith! I will take a look at this and fix it right away. I will also place a note in the current 1.7.07-RC1 release not to download as data is duplicated.|
|Comment by Keith Dwyer [ 13/Jan/16 ]|
Would it be possible to update both the master branch and the 1.7.07-RC1 tag with the removed duplicate data directory?
Additionally, which folder (edu.harvard.i2b2.data/ or release_1-7/) is the one that will remain in the repository? My first guess would be the one that has most recently been updated (edu.harvard.i2b2.data, which matches the old createdb archives from i2b2.org). However, the previous 1.7.06 tag does not have this folder (it only has release_1-7/, which does not match the old createdb archives from i2b2.org).
This would affect our internal deployment scripts.
|Comment by Janice Donahoe [ 14/Jan/16 ]|
|I updated the master branch and it now has a new tag called v1.7.07-RC2. The folder that remains is edu.harvard.i2b2.data. We decided to keep this one to prevent any problems sites may have running their ant scripts. The tag and zip file for final released product will be called v1.7.07.|
|Comment by Janice Donahoe [ 25/Jan/16 ]|
On 01/22/2016, the 1.7.07 Release was made available at the following locations.
- zip files for release 1.7.07 are available on this site. This includes both the code and documentation.
- source code has been tagged with v1.7.07.