github IQSS/dataverse v6.8

8 hours ago

Dataverse 6.8

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.8 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!

Release Highlights

Highlights for Dataverse 6.8 include:

  • Open OnDemand integration
  • Logs for diagnosing PID failures
  • Link permission split off from publish permission
  • New and improved APIs
  • Bug fixes

Features Added

Open OnDemand Integration

Open OnDemand, a web frontend to High Performance Computing (HPC) resources, has been integrated with Dataverse, allowing files to be downloaded from Dataverse installations around the world or from a specific dataset landing page if an external tool is enabled. Additionally, after computation is complete, datasets can be created in Dataverse from Open OnDemand and files can be uploaded. See the docs, #11768 and #11769.

Logs for Diagnosing PID Failures

A new feature flag called enable-pid-failure-log can be enabled to help diagnose PID failures. When set, Dataverse will log requests for dataset and file pages via persistentId that fail in monthly log files of the form PIDFailures_<yyyy-MM>.log. These potentially indicate when someone has shared a draft PID without publishing or cases where a '.' or other character has been added to the PID, which may be of interest to site administrators. The new log files can be used in concert with the pidreport.py script to generate and email monthly PID failure reports. See #11601.

Link Permission Split Off from Publish Permission

Linking or unlinking a dataset or dataverse now requires the new "LinkDataset" or "LinkDataverse" permissions, respectively. Previously, this action was covered by the "PublishDataset" or "PublishDataverse" permission. Splitting linking into its own permission, separate from publishing, allows for more fine-grained access control, if you choose to implement custom roles that differ from the roles that Dataverse ships with. A (Flyway) database migration script will be run automatically such that all roles that have permission to publish will continue to have permission to link. See #11691.

Bugs Fixed

  • A bug, introduced in v6.7, that caused files to not be indexed in draft versions that were added after the initial dataset version was published, has been fixed. See #11776 and #11779.
  • The styled citations available through the "View Styled Citations" menu were including extra characters, e.g. 'doi:' in the URL form of the PIDs in the citation. This is now fixed. See #11629.
  • The updates to support keeping the history of curation status labels added in #11268 will incorrectly show curation statuses added prior to v6.7 as the current one, regardless of whether newer statuses exist. This bug has been fixed. See #11685. (As a work-around for 6.7 admins can add createtime dates (must be prior to when 6.7 was installed) to the curationstatus table for entries that have null createtimes. The code fix in this version properly handles null dates as indicating older/pre v6.7 curationstatuses.)
  • The addDataset and addDataverse API endpoints now trigger user notifications. See #1342 and #11696.
  • A bug introduced in v6.5 broke Handle parsing when using a lowercase shoulder. This is now fixed. See #11592.
  • Due to changes in how the commons-lang3 Java library handles a non-ascii character, two keys in the citation.properties and citation.tsv files have changed to include i instead of ɨ. controlledvocabulary.language.magɨ_(madang_province) has been changed to controlledvocabulary.language.magi_(madang_province) and controlledvocabulary.language.magɨyi has been changed to controlledvocabulary.language.magiyi. In the upgrade instructions below, we indicate that the citation metadata block should be reloaded. Translations will need to make the same adjustment. See #11632.
  • When following the container demo tutorial, it was not possible to update Solr fields after adding additional metadata blocks. This has been fixed. See #11722 and #11723.

API Updates

Templates API

New endpoints have been implemented in the Dataverses API for the management of dataset templates:

  • POST /dataverses/{id}/templates: Creates a template for a given collection id.
  • GET /dataverses/{id}/templates: Lists the templates for a given collection id.

See the guides, #11562, #11565, #11703, and #11704.

File Categories API

A new API was added that returns a list of categories (both built-in and custom) that may be applied to the files of a given dataset. See the guides, #11634, and #11668.

MyData Collection List API

The MyData Collection List API is used to get a list of the collections an authenticated user can create a Dataset in. Param userIdentifier={userName} is used by a superuser to get the collections for a specific user.

See also the guides #11525, and #11681.

Search API: datasetCount and show_collections

The search index now includes datasetCount for each collection, counting published, linked, and harvested datasets. Collections can be filtered using datasetCount (e.g., datasetCount:[1000 TO *]), and the value is returned in Dataverse search results via the Search API. See #10190.

The Search API now supports a show_collections parameter for dataset results. When the parameter is set, each result includes a collections array showing the dataset's parent and linked collections. Each entry includes id, name, and alias, for example:

"collections": [
  {
    "id": 42,
    "name": "My cool collection",
    "alias": "myCoolCollection"
  }
]

See also the guides and #11558.

Listing Collections a Dataset Has Been Linked To

The API for listing the collections a dataset has been linked to (api/datasets/$linked-dataset-id/links) is no longer restricted to superusers. For unpublished datasets, users need the "View Unpublished Dataset" permission to access the API. Unpublished collections in the list require the "View Unpublished Dataverse" permission; otherwise, they are hidden. See #11492.

Listing Collections a Collection Has Been Linked To

The API for listing the collections a collection has been linked to now returns a different, backward-incompatible JSON format. See #11633, #11669, and the API Changelog (also listed under Backward Incompatible Changes, below). Also, additional fields are now being returned. See #11724 and #11728.

Listing Metadata Blocks: isAdvancedSearchFieldType

The API endpoints api/{dataverse-alias}/metadatablocks and /api/metadatablocks/{block_id} have been extended to include isAdvancedSearchFieldType to know whether the field can be used in advanced search or not. See #11614 and #11617.

Notifications API: unreadCount, markAsRead, inAppNotificationFormat

The Notifications API has been updated in various ways:

  • The JSON returned from listing notifications now includes a "displayAsRead" boolean to indicate if a notification has been read. See #11650 and #11664.
  • You can get a count of unread notifications via a new unreadCount API endpoint.
  • You can mark a notification as read via a new markAsRead API endpoint.
  • The JSON can be returned using inAppNotificationFormat. See #11648 and #11696.
  • A bug was fixed where a NullPointerException was being thrown when retrieving notifications without a requestor. See #11703, and #11704.

Edit File Metadata: Empty Values Clear Data

Previously the API POST /files/{id}/metadata would ignore fields with empty values. Now the API updates the fields with the empty values essentially clearing the data. Missing fields will still be ignored.

An optional query parameter (sourceLastUpdateTime) was added to ensure the metadata update doesn't overwrite stale data.

See also the guides, #11392, #11439, and the API Changelog (also listed under Backward Incompatible Changes, below).

Get Customization File Contents API

A new API has been added to get customization file contents: analytics, homepage, header, footer, style, and logo. See the guides, #11448, and #11467.

Retrieving URLs to Launch External Tools

New API calls have been added to retrieve the URLs needed to launch external tools on specific datasets and files:

  • /api/datasets/$DATASET_ID/externalTool/$TOOL_ID/toolUrl: docs
  • /api/files/$FILE_ID/externalTool/$TOOL_ID/toolUrl: docs

If the dataset/file is not public, the caller must authenticate and have permission to view the dataset/file. In such cases, the generated URL will include a callback token containing a signed URL the tool can use to retrieve all the parameters it is configured for. See Backward Incompatible Changes, below for a change to the JSON response. See #11760.

Security Updates

This release contains important security updates. If you are not receiving security notices, please sign up by following the steps in the guides.

Authentication Updates

Authentication updates listed below were introduced as we work toward allowing the new Dataverse frontend, a React-based Single Page Application (SPA). We list them here for completeness but unless you are experimenting with the new frontend or playing with OIDC directly, they probably will have no impact on your installation.

  • We've strengthened the security of the api-bearer-auth-use-builtin-user-on-id-match feature flag. It will now only work when the provided bearer token includes an idp claim that matches the Keycloak Service Provider identifier. By enforcing this check, the risk of impersonation from other identity providers is significantly reduced, since they would need to be explicitly configured with this specific, non-standard identifier. See the list of feature flags, #11689, and #11763.
  • A new feature flag api-bearer-auth-use-shib-user-on-id-match supports the use of clients in instances that have historically allowed login via Shibboleth. Specifically, with this flag enabled, when an OIDC bridge is configured to allow OIDC login with validation by the bridged Shibboleth providers, users with existing Shibboleth-based accounts in Dataverse can log in to those accounts, thereby maintaining access to their existing content and retaining their roles. (For security reasons, Dataverse's current support for direct login via Shibboleth cannot be used in browser-based clients.) See the list of feature flags, #11605, and #11622.
  • A new feature flag api-bearer-auth-use-oauth-user-on-id-match supports the use of clients in instances that have historically allowed login via GitHub, ORCID, or Google. Specifically, with this flag enabled, when an OIDC bridge is configured to allow OIDC login with validation by the bridged OAuth providers, users with existing GitHub, ORCID, or Google accounts in Dataverse can log in to those accounts, thereby maintaining access to their existing content and retaining their roles. See the list of feature flags, #11671, and #11645.

Finally, there is one other authenticated-related update that has the potential to affect a small number for Dataverse installations. See the EOL announcement below about the InCommon Federation feed for details.

Developer Updates

Writing External Exporters

The getDatasetFileDetails data structure now contains "directoryLabel" (file path). See #10523 and #11618.

End-Of-Life (EOL) Announcements

PostgreSQL 13 Reaches EOL on 13 November 2025

We mentioned this in the Dataverse 6.6 release notes, but as a reminder, according to https://www.postgresql.org/support/versioning/ PostgreSQL 13 reaches EOL on 13 November 2025. As mentioned in the Installation Guide, we recommend running PostgreSQL 16 since it is the version we test with in our continuous integration (since February 2025). The Dataverse 5.4 release notes explained the upgrade process from 9 to 13 (e.g. pg_dumpall, etc.) and the steps will be similar. If you have any problems, please feel free to reach out (see "getting help" in these release notes).

For Dataverse instances that Use Shibboleth as Members of the InCommon Federation

Please note that most of the known Dataverse instances that support Shibboleth logins do so without being part of InCommon, and therefore are not affected. All such instances will be able to continue using the old login workflow without needing to make any configuration changes.

For the relatively few instances using InCommon: Since InCommon discontinued their old-style federation metadata feed, a new Shibboleth implementation has been added to utilize the recommended replacements: the MDQ protocol and the WayFinder service. In order to continue using InCommon, such instances will need to modify their shibd configuration and their registration with InCommon, plus set a new feature flag. See the upgrade instructions below for details. See also #11404 and #11502.

New Settings

  • dataverse.feature.api-bearer-auth-use-oauth-user-on-id-match
  • dataverse.feature.api-bearer-auth-use-shib-user-on-id-match
  • dataverse.feature.enable-pid-failure-log
  • dataverse.feature.shibboleth-use-localhost
  • dataverse.feature.shibboleth-use-wayfinder
  • dataverse.person-or-org.assume-comma-in-person-name
  • dataverse.person-or-org.org-phrase-array

The settings dataverse.personOrOrg.assumeCommaInPersonName and dataverse.personOrOrg.orgPhraseArray now support configuration via MicroProfile Config (MPConfig). (Previously, they were only configurable as JVM options.) Their MPConfig names are dataverse.person-or-org.assume-comma-in-person-name and dataverse.person-or-org.org-phrase-array, respectively, for consistency with naming conventions. In addition to the existing asadmin JVM option method, any supported MicroProfile Config API source can now be used to set their values (as with all other MPConfig settings). For backwards compatibility, dataverse.personOrOrg.assumeCommaInPersonName is still supported. However, dataverse.personOrOrg.orgPhraseArray is not, due to a change in the expected value format, as mentioned under Backward Incompatible Changes, below. dataverse.person-or-org.org-phrase-array now expects a comma-separated list of phrases as a value instead of a JsonArray of strings. The upgrade instructions below indicate to update both the name and value format if using the old setting. See #11485.

Deprecated Settings

  • dataverse.personOrOrg.assumeCommaInPersonName
  • dataverse.personOrOrg.orgPhraseArray

Backward Incompatible Changes

Generally speaking, see the API Changelog for a list of backward-incompatible API changes.

dataverse.personOrOrg.orgPhraseArray

The setting dataverse.personOrOrg.orgPhraseArray has been renamed to dataverse.person-or-org.org-phrase-array and now expects a comma-separated list of phrases as a value instead of a JsonArray of strings. See #11485.

Edit Metadata API Changes

  • For POST /api/files/{id}/metadata passing an empty string ("description":"") or array ("categories":[]) will no longer be ignored. Empty fields will now clear out the values in the file's metadata. To ignore the fields simply do not include them in the JSON string. See #11439.
  • For PUT /api/datasets/{id}/editMetadata the query parameter "sourceInternalVersionNumber" has been removed and replaced with "sourceLastUpdateTime" to verify that the data being edited hasn't been modified and isn't stale. See #11439.

Different JSON Format When Listing Collection Links

The API for listing the collections a collection has been linked to now returns a different, backward-incompatible JSON format. See #11633, #11669.

/api/externalTools Response

The responses from the GET /api/externalTools and /api/externalTools/{id} are now formatted as JSON (previously the toolParameters and allowedApiCalls were JSON serialized as strings) and any configured "requirements" are included. See #11760.

Complete List of Changes

For the complete list of code changes in this release, see the 6.8 milestone in GitHub.

Getting Help

For help with upgrading, installing, or general questions please see getting help in the Installation Guide.

Installation

If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!

Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!

You are also very welcome to join the Global Dataverse Community Consortium (GDCC).

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.7.1.

0. These instructions assume that you are upgrading from the immediate previous version. See tags on GitHub for a list of versions. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.

Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

export PAYARA=/usr/local/payara6

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. List deployed applications

$PAYARA/bin/asadmin list-applications

2. Undeploy the previous version (should match "list-applications" above)

$PAYARA/bin/asadmin undeploy dataverse-6.7.1

3. Download and deploy this version

wget https://github.com/IQSS/dataverse/releases/download/v6.8/dataverse-6.8.war
$PAYARA/bin/asadmin deploy dataverse-6.8.war

Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.

sudo service payara stop
sudo rm -rf $PAYARA/glassfish/domains/domain1/generated
sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache
sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases
sudo service payara start

4. Update metadata blocks

These changes reflect incremental improvements made to the handling of core metadata fields.

Reload the citation.tsv file to handle the commons-lang3 change mentioned above.

Expect the loading of the citation block to take several seconds because of its size (especially due to the number of languages).

wget https://raw.githubusercontent.com/IQSS/dataverse/v6.8/scripts/api/data/metadatablocks/citation.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv

5. For installations with internationalization or text customizations:

Please remember to update translations via Dataverse language packs.

If you have text customizations you can get the latest English files from https://github.com/IQSS/dataverse/tree/v6.8/src/main/java/propertyFiles.

6. Update Solr schema

Due to changes in the Solr schema (the addition of field "datasetCount"), updating the Solr schema and reindexing is required.

Download the updated schema.xml file:

wget https://raw.githubusercontent.com/IQSS/dataverse/v6.8/conf/solr/schema.xml
cp schema.xml /usr/local/solr/solr-9.8.0/server/solr/collection1/conf

6a. For installations with additional metadata blocks or external controlled vocabulary scripts, update fields

  • Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide).

  • Run the update-fields.sh script that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your Solr installation):

wget https://raw.githubusercontent.com/IQSS/dataverse/v6.8/conf/solr/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.8.0/server/solr/collection1/conf/schema.xml

Note that Docker-based installations use a different directory: solr/data/data/collection1/conf/schema.xml.

  • Start Solr instance (usually service solr start depending on Solr/OS).

7. Reindex Solr

curl http://localhost:8080/api/admin/index

8. Update dataverse.personOrOrg.orgPhraseArray, if used.

If you are using the dataverse.personOrOrg.orgPhraseArray setting, rename it to dataverse.person-or-org.org-phrase-array and replace the JSON array of strings with a comma-separated list. See also the (docs) for this settings and the New Settings and Backward Incompatible Changes sections above.

9. InCommon federation login update

If your instance is offering institutional Shibboleth logins as part of the InCommon federation, you must make some changes to your service configuration:

Note that if your Dataverse instance is using Shibboleth outside of InCommon, your login workflow should continue working unchanged, so please skip this section.

a. Configure your Service Provider (SP) in the InCommon Federation Manager to use WayFinder following their instructions.

b. Reconfigure your locally-running shibd service to use WayFinder and the new MDQ metadata retrieval protocol.
Download and place the new production signing key in /etc/shibboleth and name it inc-md-cert-mdq.pem.
Change the SSO and MetadataProvider sections of the /etc/shibboleth/shibboleth2.xml configuration file as follows:

<SSO discoveryProtocol="SAMLDS" discoveryURL="https://wayf.incommonfederation.org/DS/WAYF">
     SAML2 SAML1
</SSO>

and

<MetadataProvider type="MDQ" id="incommon" ignoreTransport="true" cacheDirectory="inc-mdq-cache"
  maxCacheDuration="86400" minCacheDuration="60" baseUrl="https://mdq.incommon.org/">
    <MetadataFilter type="Signature" certificate="inc-md-cert-mdq.pem"/>
    <MetadataFilter type="RequireValidUntil" maxValidityInterval="1209600"/>
</MetadataProvider>

See How to configure a Shibboleth service provider (SP) to use MDQ for more information.

c. Set the feature flag dataverse.feature.shibboleth-use-wayfinder=true.

Don't miss a new dataverse release

NewReleases is sending notifications on new releases.