Dataverse Software 5.13
This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Schema.org Improvements (Some Backward Incompatibility)
The Schema.org metadata used as an export format and also embedded in dataset pages has been updated to improve compliance with Schema.org's schema and Google's recommendations for Google Dataset Search.
Please be advised that these improvements have the chance to break integrations that rely on the old, less compliant structure. For details see the "backward incompatibility" section below. (Issue #7349)
Folder Uploads via Web UI (dvwebloader, S3 only)
For installations using S3 for storage and with direct upload enabled, a new tool called DVWebloader can be enabled that allows web users to upload a folder with a hierarchy of files and subfolders while retaining the relative paths of files (similarly to how the DVUploader tool does it on the command line, but with the convenience of using the browser UI). See Folder Upload in the User Guide for details. (PR #9096)
Long Descriptions of Collections (Dataverses) are Now Truncated
Like datasets, long descriptions of collections (dataverses) are now truncated by default but can be expanded with a "read full description" button. (PR #9222)
License Sorting
Licenses as shown in the dropdown in UI can be now sorted by the superusers. See Sorting Licenses section of the Installation Guide for details. (PR #8697)
Metadata Field Production Location Now Repeatable, Facetable, and Enabled for Advanced Search
Depositors can now click the plus sign to enter multiple instances of the metadata field "Production Location" in the citation metadata block. Additionally this field now appears on the Advanced Search page and can be added to the list of search facets. (PR #9254)
Support for NetCDF and HDF5 Files
NetCDF and HDF5 files are now detected based on their content rather than just their file extension. Both "classic" NetCDF 3 files and more modern NetCDF 4 files are detected based on content. Detection for older HDF4 files is only done through the file extension ".hdf", as before.
For NetCDF and HDF5 files, an attempt will be made to extract metadata in NcML (XML) format and save it as an auxiliary file. There is a new NcML previewer available in the dataverse-previewers repo.
An extractNcml API endpoint has been added, especially for installations with existing NetCDF and HDF5 files. After upgrading, they can iterate through these files and try to extract an NcML file.
See the NetCDF and HDF5 section of the User Guide for details. (PR #9239)
Support for .eln Files (Electronic Laboratory Notebooks)
The .eln file format is used by Electronic Laboratory Notebooks as an exchange format for experimental protocols, results, sample descriptions, etc...
Improved Security for External Tools
External tools can now be configured to use signed URLs to access the Dataverse API as an alternative to API tokens. This eliminates the need for tools to have access to the user's API token in order to access draft or restricted datasets and datafiles. Signed URLs can be transferred via POST or via a callback when triggering a tool via GET. See Authorization Options in the External Tools documentation for details. (PR #9001)
Geospatial Search (API Only)
Geospatial search is supported via the Search API using two new parameters: geo_point
and geo_radius
.
The fields that are geospatially indexed are "West Longitude", "East Longitude", "North Latitude", and "South Latitude" from the "Geographic Bounding Box" field in the geospatial metadata block. (PR #8239)
Reproducibility and Code Execution with Binder
Binder has been added to the list of external tools that can be added to a Dataverse installation. From the dataset page, you can launch Binder, which spins up a computational environment in which you can explore the code and data in the dataset, or write new code, such as a Jupyter notebook. (PR #9341)
CodeMeta (Software) Metadata Support (Experimental)
Experimental support for research software metadata deposits has been added.
By adding a metadata block for CodeMeta, we take another step toward adding first class support of diverse FAIR objects, such as research software and computational workflows.
There is more work underway to make Dataverse installations around the world "research software ready."
Note: Like the metadata block for computational workflows before, CodeMeta is listed under Experimental Metadata in the guides. Experimental means it's brand new, opt-in, and might need future tweaking based on experience of usage in the field. We hope for feedback from installations on the new metadata block to optimize and lift it from the experimental stage. (PR #7877)
Mechanism Added for Stopping a Harvest in Progress
It is now possible for a sysadmin to stop a long-running harvesting job. See Harvesting Clients in the Admin Guide for more information. (PR #9187)
API Endpoint Listing Metadata Block Details has been Extended
The API endpoint /api/metadatablocks/{block_id}
has been extended to include the following fields:
controlledVocabularyValues
- All possible values for fields with a controlled vocabulary. For example, the values "Agricultural Sciences", "Arts and Humanities", etc. for the "Subject" field.isControlledVocabulary
: Whether or not this field has a controlled vocabulary.multiple
: Whether or not the field supports multiple values.
See Metadata Blocks in the API Guide for details. (PR #9213)
Advanced Database Settings
You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be sslmode=require
. See the new Database Persistence section of the Installation Guide for details. (PR #8915)
Support for Cleaning up Leftover Files in Dataset Storage
Experimental feature: the leftover files stored in the Dataset storage location that are not in the file list of that Dataset, but are named following the Dataverse technical convention for dataset files, can be removed with the new Cleanup Storage of a Dataset API endpoint.
OAI Server Bug Fixed
A bug introduced in 5.12 was preventing the Dataverse OAI server from serving incremental harvesting requests from clients. It was fixed in this release (PR #9316).
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release not already mentioned above include:
- Administrators can configure an alternative storage location where files uploaded via the UI are temporarily stored during the transfer from client to server. (PR #8983, See also Configuration Guide)
- To improve performance, Dataverse estimates download counts. This release includes an update that makes the estimate more accurate. (PR #8972)
- Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files). (PR #9018)
- A persistent identifier, CSRT, is added to the Related Publication field's ID Type child field. For datasets published with CSRT IDs, Dataverse will also include them in the datasets' Schema.org metadata exports. (Issue #8838)
- Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections.
New JVM Options and MicroProfile Config Options
The following JVM option is now available:
dataverse.personOrOrg.assumeCommaInPersonName
- the default is false
The following MicroProfile Config options are now available (these can be treated as JVM options):
dataverse.files.uploads
- alternative storage location of generated temporary files for UI file uploadsdataverse.api.signing-secret
- used by signed URLsdataverse.solr.host
dataverse.solr.port
dataverse.solr.protocol
dataverse.solr.core
dataverse.solr.path
dataverse.rserve.host
The following existing JVM options are now available via MicroProfile Config:
dataverse.siteUrl
dataverse.fqdn
dataverse.files.directory
dataverse.rserve.host
dataverse.rserve.port
dataverse.rserve.user
dataverse.rserve.password
dataverse.rserve.tempdir
Notes for Developers and Integrators
See the "Backward Incompatibilities" section below.
Backward Incompatibilities
Schema.org
The following changes have been made to Schema.org exports (necessary for the improvements mentioned above):
- Descriptions are now joined and truncated to less than 5K characters.
- The "citation"/"text" key has been replaced by a "citation"/"name" key.
- File entries now have the mimetype reported as 'encodingFormat' rather than 'fileFormat' to better conform with the Schema.org specification for DataDownload entries. Download URLs are now sent for all files unless the dataverse.files.hide-schema-dot-org-download-urls setting is set to true.
- Author/creators now have an @type of Person or Organization and any affiliation (affiliation for Person, parentOrganization for Organization) is now an object of @type Organization
License Files
License files are now required to contain the new "sortOrder" column. When attempting to create a new license without this field, an error would be returned. See Configuring Licenses section of the Installation Guide for reference.
Complete List of Changes
For the complete list of code changes in this release, see the 5.13 milestone on GitHub.
Installation
If this is a new installation, please see our Installation Guide. Please don't be shy about asking for help if you need it!
After your installation has gone into production, you are welcome to add it to our map of installations by opening an issue in the dataverse-installations repo.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from version 4.x to 5.0 of the Dataverse software following the instructions in the release notes for version 5.0. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.13.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo
to change to that user first. For example, sudo -i -u dataverse
if dataverse
is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5
. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5
if you are using a csh
-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.13.war
5. Restart Payara
service payara stop
service payara start
6. Reload citation metadata block
wget https://github.com/IQSS/dataverse/releases/download/v5.13/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
If you are running an English-only installation, you are finished with the citation block. Otherwise, download the updated citation.properties file and place in the dataverse.lang.directory
.
wget https://github.com/IQSS/dataverse/releases/download/v5.13/citation.properties
cp citation.properties /home/dataverse/langBundles
7. Replace Solr schema.xml to allow multiple production locations and support for geospatial indexing to be used. See specific instructions below for those installations without custom metadata blocks (1a) and those with custom metadata blocks (1b).
Note: with this release support for indexing of the experimental workflow metadata block has been removed from the standard schema.xml.
If you are using the workflow metadata block be sure to follow the instructions in step 7b) below to maintain support for indexing workflow metadata.
7a. For installations without custom or experimental metadata blocks:
-
Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide
-
Replace schema.xml
cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
-
Start solr instance (usually service solr start, depending on Solr/OS)
7b. For installations with custom or experimental metadata blocks:
-
Stop solr instance (usually service solr stop, depending on solr installation/OS, see the Installation Guide
-
Edit the following line to your schema.xml (to indicate that productionPlace is now multiValued='true"):
<field name="productionPlace" type="string" stored="true" indexed="true" multiValued="true"/>
-
Add the following lines to your schema.xml to add support for geospatial indexing:
<!-- Dataverse geospatial search -->
<!-- https://solr.apache.org/guide/8_11/spatial-search.html#rpt -->
<field name="geolocation" type="location_rpt" multiValued="true" stored="true" indexed="true"/>
<!-- https://solr.apache.org/guide/8_11/spatial-search.html#bboxfield -->
<field name="boundingBox" type="bbox" multiValued="true" stored="true" indexed="true"/>
<!-- Dataverse - per GeoBlacklight, adding field type for bboxField that enables, among other things, overlap ratio calculations -->
<fieldType name="bbox" class="solr.BBoxField" geo="true" distanceUnits="kilometers" numberType="pdouble" />
-
Restart Solr instance (usually service solr start, depending on solr/OS)
Optional Upgrade Step: Reindex Linked Dataverse Collections
Datasets that are part of linked dataverse collections will now be displayed in
their linking dataverse collections. In order to fix the display of collections
that have already been linked you must re-index the linked collections. This
query will provide a list of commands to re-index the effected collections:
select 'curl http://localhost:8080/api/admin/index/dataverses/'
|| tmp.dvid from (select distinct dataverse_id as dvid
from dataverselinkingdataverse) as tmp
The result of the query will be a list of re-index commands such as:
curl http://localhost:8080/api/admin/index/dataverses/633
where '633' is the id of the linked collection.
Optional Upgrade Step: Run File Detection on .eln Files
Now that .eln files are recognized, you can run the Redetect File Type API on them to switch them from "unknown" to "ELN Archive". Afterward, you can reindex these files to make them appear in search facets.