github backstage/backstage release-2021-02-16

latest releases: v1.26.3, v1.26.2, v1.26.1...
3 years ago

Hey, I just made a Pull Request!

Closes #3790

This PR adds the Splunk On-Call plugin.

This plugin provides:

  • A list of incidents
  • A way to trigger a new incident to specific users or/and teams
  • A way to acknowledge/resolve an incident
  • Information details about the persons on-call

The approach is globally the same as for the PagerDuty plugin except that I have adapted it to the Splunk On-Call logic and to the available API routes.

For example, the information you see in an entity is related to a specific team. (as Splunk On-Call doesn't have notion of service as pagerduty).
So you have to associate an entity to a team.

It could be interesting to have several teams associated to the same entity in the future.

Incident creation

Here is how the modal works to trigger a new incident.
It behaves like the incident creation modal in the Splunk On-Call portal.

Incidents list

The incidents action behaves like the incident actions in the Splunk On-Call portal.

dashboard

Here is how the incidents list works.
You can click on the action to acknowledge-resolve the selected incident.

The list of incidents is relative to the current team.

For the moment I'm reproducing the Splunk On-Call dashboard behavior, but it might be more relevant to display only the incidents that require action.

Error cases

Here is the list of the different possible error cases.

Configuration

In order to be able to perform certain action (create-acknowledge-resolve an action), you need to provide the username of the user making the action.
The user supplied must be a valid Splunk On-Call user and a member of your organization.

In app-config.yaml:

splunkoncall:
  username: <SPLUNK_ON_CALL_USERNAME>

The information displayed for each entity is based on the team name.
If you want to use this plugin for an entity, you need to label it with the below annotation:

annotations:
  splunk-on-call.com/team: <SPLUNK_ON_CALL_TEAM_NAME>

Here is the list of the different API routes used:

  • getIncidents: Fetches a list of incidents
  • getOnCallUsers: Fetches the list of users in an escalation policy.
  • triggerAlarm: Triggers an incident to specific users and/or specific teams.
  • acknowledgeIncident: Acknowledge an incident.
  • resolveIncident: Resolves an incident.
  • getUsers: Get a list of users for your organization.
  • getTeams: Get a list of teams for your organization.
  • getEscalationPolicies: Get a list of escalation policies for your organization.

As the Splunk On-Call API doesn't have a route to get a specific user or team, we must use the getUsers and getTeams routes to make a hashmap of the result. For example, the getOnCallUsers return only the username of the users and we need to retrieve his email/firstName/lastName.
(I think this is also the way they do in the Splunk On-Call dashboard: when we display the list of incidents and switch between teams, no api call is re-done)

Here is the final workflow.

I think there is room for improvement but I wanted to get your opinion on the current result.

✔️ Checklist

  • [X] A changeset describing the change and affected packages. (more info)
  • [X] Added or updated documentation
  • [X] Tests for new functionality and regression tests for bug fixes
  • [X] Screenshots attached (for UI changes)

Don't miss a new backstage release

NewReleases is sending notifications on new releases.