Uploaded image for project: 'i2b2 Core Software'
  1. i2b2 Core Software
  2. CORE-204

Duplicate data in GitHub repo i2b2-data

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.7.07
    • 1.7.07
    • Data
    • None
    • i2b2 Core
    • Removed duplicate files. Deleted the release_1-7 folder and all its contents.
    • Hide
      TEST STATUS: Completed
      COMPLETION DATE: 01/13/2016
      TESTED BY: Janice Donahoe

      ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

      Test Date: 01/13/2016
      Build Number: 1.7.07.0002
      Test Status: Passed Testing

      Clients Tested :
           Not applicable

      Environments Tested :
           Browsers: Not applicable for this test
           Databases: Oracle, PostgreSQL, SQL Server
           Client OS: Not applicable for this test

      Test Comments:
      Tested with the latest Data build and it appears to be working correctly. Bamboo did not have any errors when running the install scripts.

      ISSUES FOUND:
      An unrelated issue was found with the Bamboo scripts. The tests for age related queries failed due to the new year. These tests will be updated to reflect the new age of the test patients.
      Show
      TEST STATUS: Completed COMPLETION DATE: 01/13/2016 TESTED BY: Janice Donahoe ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Test Date: 01/13/2016 Build Number: 1.7.07.0002 Test Status: Passed Testing Clients Tested :      Not applicable Environments Tested :      Browsers: Not applicable for this test      Databases: Oracle, PostgreSQL, SQL Server      Client OS: Not applicable for this test Test Comments: Tested with the latest Data build and it appears to be working correctly. Bamboo did not have any errors when running the install scripts. ISSUES FOUND: An unrelated issue was found with the Bamboo scripts. The tests for age related queries failed due to the new year. These tests will be updated to reflect the new age of the test patients.

    Description

      The total size of all files inside the GitHub repo i2b2-data (which is the equivalent of the old i2b2createdb artifact) is 3 GB. Previously, this artifact used to take up 1.51 GB when unzipped.

      Looking at the repo, there is the "edu.harvard.i2b2.data" folder, which has always been present. However, there is also an additional "release_1-7" folder which contains the same data as the "Release_1-7" folder underneath "edu.harvard.i2b2.data".

      This duplicate data is unnecessary, and one of these folders should be cut out. Looking at the repo, it appears the "release_1-7" hasn't been committed to in 6 months, while "edu.harvard.i2b2.data" has seen activity recently.

      Attachments

        Activity

          Thanks Keith! I will take a look at this and fix it right away. I will also place a note in the current 1.7.07-RC1 release not to download as data is duplicated.
          jmd86 Janice Donahoe added a comment - Thanks Keith! I will take a look at this and fix it right away. I will also place a note in the current 1.7.07-RC1 release not to download as data is duplicated.
          kdwyer Keith Dwyer added a comment -
          Would it be possible to update both the master branch and the 1.7.07-RC1 tag with the removed duplicate data directory?
          Additionally, which folder (edu.harvard.i2b2.data/ or release_1-7/) is the one that will remain in the repository? My first guess would be the one that has most recently been updated (edu.harvard.i2b2.data, which matches the old createdb archives from i2b2.org). However, the previous 1.7.06 tag does not have this folder (it only has release_1-7/, which does not match the old createdb archives from i2b2.org).

          This would affect our internal deployment scripts.
          kdwyer Keith Dwyer added a comment - Would it be possible to update both the master branch and the 1.7.07-RC1 tag with the removed duplicate data directory? Additionally, which folder (edu.harvard.i2b2.data/ or release_1-7/) is the one that will remain in the repository? My first guess would be the one that has most recently been updated (edu.harvard.i2b2.data, which matches the old createdb archives from i2b2.org). However, the previous 1.7.06 tag does not have this folder (it only has release_1-7/, which does not match the old createdb archives from i2b2.org). This would affect our internal deployment scripts.
          I updated the master branch and it now has a new tag called v1.7.07-RC2. The folder that remains is edu.harvard.i2b2.data. We decided to keep this one to prevent any problems sites may have running their ant scripts. The tag and zip file for final released product will be called v1.7.07.
          jmd86 Janice Donahoe added a comment - I updated the master branch and it now has a new tag called v1.7.07-RC2. The folder that remains is edu.harvard.i2b2.data. We decided to keep this one to prevent any problems sites may have running their ant scripts. The tag and zip file for final released product will be called v1.7.07.
          jmd86 Janice Donahoe added a comment - - edited
          On 01/22/2016, the 1.7.07 Release was made available at the following locations.

          https://www.i2b2.org/software/
           - zip files for release 1.7.07 are available on this site. This includes both the code and documentation.

          https://github.com/i2b2
           - source code has been tagged with v1.7.07.
          jmd86 Janice Donahoe added a comment - - edited On 01/22/2016, the 1.7.07 Release was made available at the following locations. https://www.i2b2.org/software/  - zip files for release 1.7.07 are available on this site. This includes both the code and documentation. https://github.com/i2b2  - source code has been tagged with v1.7.07.

          People

            jmd86 Janice Donahoe
            kdwyer Keith Dwyer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: