Dataverse 5.1
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Large File Upload for Installations Using AWS S3
The added support for multipart upload through the API and UI (Issue #6763) will allow files larger than 5 GB to be uploaded to Dataverse when an installation is running on AWS S3. Previously, only non-AWS S3 storage configurations would allow uploads larger than 5 GB.
Dataset-Specific Stores
In previous releases, configuration options were added that allow each dataverse to have a specific store enabled. This release adds even more granularity, with the ability to set a dataset-level store.
Major Use Cases
Newly-supported use cases in this release include:
- Users can now upload files larger than 5 GB on installations running AWS S3 (Issue #6763, PR #6995)
- Administrators will now be able to specify a store at the dataset level in addition to the Dataverse level (Issue #6872, PR #7272)
- Users will have their dataset's directory structure retained when uploading a dataset with shapefiles (Issue #6873, PR #7279)
- Users will now be able to download zip files through the experimental Zipper service when the set of downloaded files have duplicate names (Issue #80, PR #7276)
- Users will now be able to download zip files with the proper file structure through the experiment Zipper service (Issue #7255, PR #7258)
- Administrators will be able to use new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause stale search results to not load. (Issue #4225, PR #7211)
Notes for Dataverse Installation Administrators
New API for setting a Dataset-level Store
- This release adds a new API for setting a dataset-specific store. Learn more in the Managing Dataverse and Datasets section of the Admin Guide.
Multipart Upload Storage Monitoring, Recommended Use for Multipart Upload
Charges may be incurred for storage reserved for multipart uploads that are not completed or cancelled. Administrators may want to do periodic manual or automated checks for open multipart uploads. Learn more in the Big Data Support section of the Developers Guide.
While multipart uploads can support much larger files, and can have advantages in terms of robust transfer and speed, they are more complex than single part direct uploads. Administrators should consider taking advantage of the options to limit use of multipart uploads to specific users by using multiple stores and configuring access to stores with high file size limits to specific Dataverses (added in 4.20) or Datasets (added in this release).
New APIs for keeping Solr records in sync
This release adds new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause search results to not load. Learn more in the Solr section of the Admin Guide.
Documentation for Purging the Ingest Queue
At times, it may be necessary to cancel long-running Ingest jobs in the interest of system stability. The Troubleshooting section of the Admin Guide now has specific steps.
Biomedical Metadata Block Updated
The Life Science Metadata block (biomedical.tsv) was updated. "Other Design Type", "Other Factor Type", "Other Technology Type", "Other Technology Platform" boxes were added. See the "Additional Upgrade Steps" below if you use this in your installation.
Notes for Tool Developers and Integrators
Spaces in File Names
Dataverse Installations using S3 storage will no longer replace spaces in file names of downloaded files with the + character. If your tool or integration has any special handling around this, you may need to make further adjustments to maintain backwards compatibility while also supporting Dataverse installations on 5.1+.
Complete List of Changes
For the complete list of code changes in this release, see the 5.1 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide
Upgrade Instructions
-
These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.
-
Undeploy the previous version.
<payara install path>/bin/asadmin list-applications
<payara install path>/bin/asadmin undeploy dataverse<-version>
(where <payara install path>
is where Payara 5 is installed, for example: /usr/local/payara5
)
- Stop payara and remove the generated directory, start.
service payara stop
- remove the generated directory:
rm -rf <payara install path>/payara/domains/domain1/generated
service payara start
-
Deploy this version.
<payara install path>/bin/asadmin deploy dataverse-5.1.war
-
Restart payara
Additional Upgrade Steps
-
Update Biomedical Metadata Block (if used), Reload Solr, ReExportAll
wget https://github.com/IQSS/dataverse/releases/download/v5.1/biomedical.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @biomedical.tsv -H "Content-type: text/tab-separated-values"
-
Check if your Solr installation is running with the latest
schema.xml
config file (https://github.com/IQSS/dataverse/releases/download/v5.1/schema.xml), update if needed. -
Run the script updateSchemaMDB.sh to generate updated solr schema files and preserve any other custom fields in your Solr configuration.
For example: (modify the path names as needed)
cd /usr/local/solr-7.7.2/server/solr/collection1/conf
wget https://github.com/IQSS/dataverse/releases/download/v5.1/updateSchemaMDB.sh
chmod +x updateSchemaMDB.sh
./updateSchemaMDB.sh -t .
See http://guides.dataverse.org/en/5.1/admin/metadatacustomization.html?highlight=updateschemamdb for more information. -
Run ReExportall to update JSON Exports
http://guides.dataverse.org/en/5.1/admin/metadataexport.html?highlight=export#batch-exports-through-the-api