Dataverse 6.8
Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.8 rather than the list of releases, which will cut them off.
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!
Release Highlights
Highlights for Dataverse 6.8 include:
- Open OnDemand integration
- Logs for diagnosing PID failures
- Link permission split off from publish permission
- New and improved APIs
- Bug fixes
Features Added
Open OnDemand Integration
Open OnDemand, a web frontend to High Performance Computing (HPC) resources, has been integrated with Dataverse, allowing files to be downloaded from Dataverse installations around the world or from a specific dataset landing page if an external tool is enabled. Additionally, after computation is complete, datasets can be created in Dataverse from Open OnDemand and files can be uploaded. See the docs, #11768 and #11769.
Logs for Diagnosing PID Failures
A new feature flag called enable-pid-failure-log
can be enabled to help diagnose PID failures. When set, Dataverse will log requests for dataset and file pages via persistentId that fail in monthly log files of the form PIDFailures_<yyyy-MM>.log
. These potentially indicate when someone has shared a draft PID without publishing or cases where a '.' or other character has been added to the PID, which may be of interest to site administrators. The new log files can be used in concert with the pidreport.py script to generate and email monthly PID failure reports. See #11601.
Link Permission Split Off from Publish Permission
Linking or unlinking a dataset or dataverse now requires the new "LinkDataset" or "LinkDataverse" permissions, respectively. Previously, this action was covered by the "PublishDataset" or "PublishDataverse" permission. Splitting linking into its own permission, separate from publishing, allows for more fine-grained access control, if you choose to implement custom roles that differ from the roles that Dataverse ships with. A (Flyway) database migration script will be run automatically such that all roles that have permission to publish will continue to have permission to link. See #11691.
Bugs Fixed
- A bug, introduced in v6.7, that caused files to not be indexed in draft versions that were added after the initial dataset version was published, has been fixed. See #11776 and #11779.
- The styled citations available through the "View Styled Citations" menu were including extra characters, e.g. 'doi:' in the URL form of the PIDs in the citation. This is now fixed. See #11629.
- The updates to support keeping the history of curation status labels added in #11268 will incorrectly show curation statuses added prior to v6.7 as the current one, regardless of whether newer statuses exist. This bug has been fixed. See #11685. (As a work-around for 6.7 admins can add createtime dates (must be prior to when 6.7 was installed) to the curationstatus table for entries that have null createtimes. The code fix in this version properly handles null dates as indicating older/pre v6.7 curationstatuses.)
- The addDataset and addDataverse API endpoints now trigger user notifications. See #1342 and #11696.
- A bug introduced in v6.5 broke Handle parsing when using a lowercase shoulder. This is now fixed. See #11592.
- Due to changes in how the commons-lang3 Java library handles a non-ascii character, two keys in the citation.properties and citation.tsv files have changed to include i instead of ɨ.
controlledvocabulary.language.magɨ_(madang_province)
has been changed tocontrolledvocabulary.language.magi_(madang_province)
andcontrolledvocabulary.language.magɨyi
has been changed tocontrolledvocabulary.language.magiyi
. In the upgrade instructions below, we indicate that the citation metadata block should be reloaded. Translations will need to make the same adjustment. See #11632. - When following the container demo tutorial, it was not possible to update Solr fields after adding additional metadata blocks. This has been fixed. See #11722 and #11723.
API Updates
Templates API
New endpoints have been implemented in the Dataverses API for the management of dataset templates:
- POST
/dataverses/{id}/templates
: Creates a template for a given collectionid
. - GET
/dataverses/{id}/templates
: Lists the templates for a given collectionid
.
See the guides, #11562, #11565, #11703, and #11704.
File Categories API
A new API was added that returns a list of categories (both built-in and custom) that may be applied to the files of a given dataset. See the guides, #11634, and #11668.
MyData Collection List API
The MyData Collection List API is used to get a list of the collections an authenticated user can create a Dataset in. Param userIdentifier={userName} is used by a superuser to get the collections for a specific user.
See also the guides #11525, and #11681.
Search API: datasetCount and show_collections
The search index now includes datasetCount for each collection, counting published, linked, and harvested datasets. Collections can be filtered using datasetCount (e.g., datasetCount:[1000 TO *]
), and the value is returned in Dataverse search results via the Search API. See #10190.
The Search API now supports a show_collections
parameter for dataset results. When the parameter is set, each result includes a collections
array showing the dataset's parent and linked collections. Each entry includes id
, name
, and alias
, for example:
"collections": [
{
"id": 42,
"name": "My cool collection",
"alias": "myCoolCollection"
}
]
See also the guides and #11558.
Listing Collections a Dataset Has Been Linked To
The API for listing the collections a dataset has been linked to (api/datasets/$linked-dataset-id/links
) is no longer restricted to superusers. For unpublished datasets, users need the "View Unpublished Dataset" permission to access the API. Unpublished collections in the list require the "View Unpublished Dataverse" permission; otherwise, they are hidden. See #11492.
Listing Collections a Collection Has Been Linked To
The API for listing the collections a collection has been linked to now returns a different, backward-incompatible JSON format. See #11633, #11669, and the API Changelog (also listed under Backward Incompatible Changes, below). Also, additional fields are now being returned. See #11724 and #11728.
Listing Metadata Blocks: isAdvancedSearchFieldType
The API endpoints api/{dataverse-alias}/metadatablocks
and /api/metadatablocks/{block_id}
have been extended to include isAdvancedSearchFieldType
to know whether the field can be used in advanced search or not. See #11614 and #11617.
Notifications API: unreadCount, markAsRead, inAppNotificationFormat
The Notifications API has been updated in various ways:
- The JSON returned from listing notifications now includes a "displayAsRead" boolean to indicate if a notification has been read. See #11650 and #11664.
- You can get a count of unread notifications via a new unreadCount API endpoint.
- You can mark a notification as read via a new markAsRead API endpoint.
- The JSON can be returned using
inAppNotificationFormat
. See #11648 and #11696. - A bug was fixed where a NullPointerException was being thrown when retrieving notifications without a requestor. See #11703, and #11704.
Edit File Metadata: Empty Values Clear Data
Previously the API POST /files/{id}/metadata would ignore fields with empty values. Now the API updates the fields with the empty values essentially clearing the data. Missing fields will still be ignored.
An optional query parameter (sourceLastUpdateTime) was added to ensure the metadata update doesn't overwrite stale data.
See also the guides, #11392, #11439, and the API Changelog (also listed under Backward Incompatible Changes, below).
Get Customization File Contents API
A new API has been added to get customization file contents: analytics, homepage, header, footer, style, and logo. See the guides, #11448, and #11467.
Retrieving URLs to Launch External Tools
New API calls have been added to retrieve the URLs needed to launch external tools on specific datasets and files:
/api/datasets/$DATASET_ID/externalTool/$TOOL_ID/toolUrl
: docs/api/files/$FILE_ID/externalTool/$TOOL_ID/toolUrl
: docs
If the dataset/file is not public, the caller must authenticate and have permission to view the dataset/file. In such cases, the generated URL will include a callback token containing a signed URL the tool can use to retrieve all the parameters it is configured for. See Backward Incompatible Changes, below for a change to the JSON response. See #11760.
Security Updates
This release contains important security updates. If you are not receiving security notices, please sign up by following the steps in the guides.
Authentication Updates
Authentication updates listed below were introduced as we work toward allowing the new Dataverse frontend, a React-based Single Page Application (SPA). We list them here for completeness but unless you are experimenting with the new frontend or playing with OIDC directly, they probably will have no impact on your installation.
- We've strengthened the security of the
api-bearer-auth-use-builtin-user-on-id-match
feature flag. It will now only work when the provided bearer token includes anidp
claim that matches the Keycloak Service Provider identifier. By enforcing this check, the risk of impersonation from other identity providers is significantly reduced, since they would need to be explicitly configured with this specific, non-standard identifier. See the list of feature flags, #11689, and #11763. - A new feature flag
api-bearer-auth-use-shib-user-on-id-match
supports the use of clients in instances that have historically allowed login via Shibboleth. Specifically, with this flag enabled, when an OIDC bridge is configured to allow OIDC login with validation by the bridged Shibboleth providers, users with existing Shibboleth-based accounts in Dataverse can log in to those accounts, thereby maintaining access to their existing content and retaining their roles. (For security reasons, Dataverse's current support for direct login via Shibboleth cannot be used in browser-based clients.) See the list of feature flags, #11605, and #11622. - A new feature flag
api-bearer-auth-use-oauth-user-on-id-match
supports the use of clients in instances that have historically allowed login via GitHub, ORCID, or Google. Specifically, with this flag enabled, when an OIDC bridge is configured to allow OIDC login with validation by the bridged OAuth providers, users with existing GitHub, ORCID, or Google accounts in Dataverse can log in to those accounts, thereby maintaining access to their existing content and retaining their roles. See the list of feature flags, #11671, and #11645.
Finally, there is one other authenticated-related update that has the potential to affect a small number for Dataverse installations. See the EOL announcement below about the InCommon Federation feed for details.
Developer Updates
Writing External Exporters
The getDatasetFileDetails data structure now contains "directoryLabel" (file path). See #10523 and #11618.
End-Of-Life (EOL) Announcements
PostgreSQL 13 Reaches EOL on 13 November 2025
We mentioned this in the Dataverse 6.6 release notes, but as a reminder, according to https://www.postgresql.org/support/versioning/ PostgreSQL 13 reaches EOL on 13 November 2025. As mentioned in the Installation Guide, we recommend running PostgreSQL 16 since it is the version we test with in our continuous integration (since February 2025). The Dataverse 5.4 release notes explained the upgrade process from 9 to 13 (e.g. pg_dumpall, etc.) and the steps will be similar. If you have any problems, please feel free to reach out (see "getting help" in these release notes).
For Dataverse instances that Use Shibboleth as Members of the InCommon Federation
Please note that most of the known Dataverse instances that support Shibboleth logins do so without being part of InCommon, and therefore are not affected. All such instances will be able to continue using the old login workflow without needing to make any configuration changes.
For the relatively few instances using InCommon: Since InCommon discontinued their old-style federation metadata feed, a new Shibboleth implementation has been added to utilize the recommended replacements: the MDQ protocol and the WayFinder service. In order to continue using InCommon, such instances will need to modify their shibd configuration and their registration with InCommon, plus set a new feature flag. See the upgrade instructions below for details. See also #11404 and #11502.
New Settings
- dataverse.feature.api-bearer-auth-use-oauth-user-on-id-match
- dataverse.feature.api-bearer-auth-use-shib-user-on-id-match
- dataverse.feature.enable-pid-failure-log
- dataverse.feature.shibboleth-use-localhost
- dataverse.feature.shibboleth-use-wayfinder
- dataverse.person-or-org.assume-comma-in-person-name
- dataverse.person-or-org.org-phrase-array
The settings dataverse.personOrOrg.assumeCommaInPersonName
and dataverse.personOrOrg.orgPhraseArray
now support configuration via MicroProfile Config (MPConfig). (Previously, they were only configurable as JVM options.) Their MPConfig names are dataverse.person-or-org.assume-comma-in-person-name
and dataverse.person-or-org.org-phrase-array
, respectively, for consistency with naming conventions. In addition to the existing asadmin
JVM option method, any supported MicroProfile Config API source can now be used to set their values (as with all other MPConfig settings). For backwards compatibility, dataverse.personOrOrg.assumeCommaInPersonName
is still supported. However, dataverse.personOrOrg.orgPhraseArray
is not, due to a change in the expected value format, as mentioned under Backward Incompatible Changes, below. dataverse.person-or-org.org-phrase-array
now expects a comma-separated list of phrases as a value instead of a JsonArray of strings. The upgrade instructions below indicate to update both the name and value format if using the old setting. See #11485.
Deprecated Settings
- dataverse.personOrOrg.assumeCommaInPersonName
- dataverse.personOrOrg.orgPhraseArray
Backward Incompatible Changes
Generally speaking, see the API Changelog for a list of backward-incompatible API changes.
dataverse.personOrOrg.orgPhraseArray
The setting dataverse.personOrOrg.orgPhraseArray
has been renamed to dataverse.person-or-org.org-phrase-array
and now expects a comma-separated list of phrases as a value instead of a JsonArray of strings. See #11485.
Edit Metadata API Changes
- For POST /api/files/{id}/metadata passing an empty string ("description":"") or array ("categories":[]) will no longer be ignored. Empty fields will now clear out the values in the file's metadata. To ignore the fields simply do not include them in the JSON string. See #11439.
- For PUT /api/datasets/{id}/editMetadata the query parameter "sourceInternalVersionNumber" has been removed and replaced with "sourceLastUpdateTime" to verify that the data being edited hasn't been modified and isn't stale. See #11439.
Different JSON Format When Listing Collection Links
The API for listing the collections a collection has been linked to now returns a different, backward-incompatible JSON format. See #11633, #11669.
/api/externalTools Response
The responses from the GET
/api/externalTools
and /api/externalTools/{id}
are now formatted as JSON (previously the toolParameters and allowedApiCalls were JSON serialized as strings) and any configured "requirements" are included. See #11760.
Complete List of Changes
For the complete list of code changes in this release, see the 6.8 milestone in GitHub.
Getting Help
For help with upgrading, installing, or general questions please see getting help in the Installation Guide.
Installation
If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!
Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!
You are also very welcome to join the Global Dataverse Community Consortium (GDCC).
Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.7.1.
0. These instructions assume that you are upgrading from the immediate previous version. See tags on GitHub for a list of versions. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse
user. In the commands below, we use sudo to run the commands as a non-root user.
Also, we assume that Payara 6 is installed in /usr/local/payara6
. If not, adjust as needed.
export PAYARA=/usr/local/payara6
(or setenv PAYARA /usr/local/payara6
if you are using a csh
-like shell)
1. List deployed applications
$PAYARA/bin/asadmin list-applications
2. Undeploy the previous version (should match "list-applications" above)
$PAYARA/bin/asadmin undeploy dataverse-6.7.1
3. Download and deploy this version
wget https://github.com/IQSS/dataverse/releases/download/v6.8/dataverse-6.8.war
$PAYARA/bin/asadmin deploy dataverse-6.8.war
Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.
sudo service payara stop
sudo rm -rf $PAYARA/glassfish/domains/domain1/generated
sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache
sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases
sudo service payara start
4. Update metadata blocks
These changes reflect incremental improvements made to the handling of core metadata fields.
Reload the citation.tsv file to handle the commons-lang3 change mentioned above.
Expect the loading of the citation block to take several seconds because of its size (especially due to the number of languages).
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.8/scripts/api/data/metadatablocks/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv
5. For installations with internationalization or text customizations:
Please remember to update translations via Dataverse language packs.
If you have text customizations you can get the latest English files from https://github.com/IQSS/dataverse/tree/v6.8/src/main/java/propertyFiles.
6. Update Solr schema
Due to changes in the Solr schema (the addition of field "datasetCount"), updating the Solr schema and reindexing is required.
Download the updated schema.xml
file:
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.8/conf/solr/schema.xml
cp schema.xml /usr/local/solr/solr-9.8.0/server/solr/collection1/conf
6a. For installations with additional metadata blocks or external controlled vocabulary scripts, update fields
-
Stop Solr instance (usually
service solr stop
, depending on Solr installation/OS, see the Installation Guide). -
Run the
update-fields.sh
script that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your Solr installation):
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.8/conf/solr/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.8.0/server/solr/collection1/conf/schema.xml
Note that Docker-based installations use a different directory: solr/data/data/collection1/conf/schema.xml
.
- Start Solr instance (usually
service solr start
depending on Solr/OS).
7. Reindex Solr
curl http://localhost:8080/api/admin/index
8. Update dataverse.personOrOrg.orgPhraseArray, if used.
If you are using the dataverse.personOrOrg.orgPhraseArray
setting, rename it to dataverse.person-or-org.org-phrase-array
and replace the JSON array of strings with a comma-separated list. See also the (docs) for this settings and the New Settings and Backward Incompatible Changes sections above.
9. InCommon federation login update
If your instance is offering institutional Shibboleth logins as part of the InCommon federation, you must make some changes to your service configuration:
Note that if your Dataverse instance is using Shibboleth outside of InCommon, your login workflow should continue working unchanged, so please skip this section.
a. Configure your Service Provider (SP) in the InCommon Federation Manager to use WayFinder following their instructions.
b. Reconfigure your locally-running shibd
service to use WayFinder and the new MDQ metadata retrieval protocol.
Download and place the new production signing key in /etc/shibboleth
and name it inc-md-cert-mdq.pem
.
Change the SSO
and MetadataProvider
sections of the /etc/shibboleth/shibboleth2.xml
configuration file as follows:
<SSO discoveryProtocol="SAMLDS" discoveryURL="https://wayf.incommonfederation.org/DS/WAYF">
SAML2 SAML1
</SSO>
and
<MetadataProvider type="MDQ" id="incommon" ignoreTransport="true" cacheDirectory="inc-mdq-cache"
maxCacheDuration="86400" minCacheDuration="60" baseUrl="https://mdq.incommon.org/">
<MetadataFilter type="Signature" certificate="inc-md-cert-mdq.pem"/>
<MetadataFilter type="RequireValidUntil" maxValidityInterval="1209600"/>
</MetadataProvider>
See How to configure a Shibboleth service provider (SP) to use MDQ for more information.
c. Set the feature flag dataverse.feature.shibboleth-use-wayfinder=true
.