Skip to content

Instantly share code, notes, and snippets.

@wavejumper
Last active June 28, 2021 01:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wavejumper/3c30a57f0f50ad61735b12670f8f47f6 to your computer and use it in GitHub Desktop.
Save wavejumper/3c30a57f0f50ad61735b12670f8f47f6 to your computer and use it in GitHub Desktop.
temp-policies

Introducing Temporary Policies

New to kPow 79 are temporary policies and a whole suite of new admin features!

Temporary policies allow admins the ability to assign access control policies for a fixed duration. A common use-case would be providing a user TOPIC_INSPECT access to read data from a topic for an hour while resolving an issue in a Production environment.

This blog post introduces temporary policies with an all-to-common real-world scenario.

Scenario

You wake up one morning to a dreaded sight: a poison message has taken down one of your services! Your team decides the simplest solution is to skip the message by incrementing your consumer group's offset for the topic.

Now here's the problem: access to production is limited, and for such a simple action (incrementing the offset) a team member generally has to jump through the hoops of configuring the VPN, connecting to the jumpbox, and making sure they execute the right combination of bash commands against the Kafka cluster.

It always feels like such operations are unnecessarily time-consuming, brittle, and overall a frustrating process in a time-critical moment when you need to restore production access. And not only that, the jumpbox generally has full access to the Kafka cluster and there is no audit log recording the actions being committed.

Temporary policies in combination with kPow's existing Role-based access control and powerful mutation actions aim to improve this experience and give teams the confidence they need to easily effect change in a secured environment like production when things go wrong.

Configuring Role-Based Access Control

In this example, two roles are coming from our Identity provider: devs and owners.

We will assign anyone with the role owners an admin for kPow and give them GROUP_EDIT access to the production cluster. The devs role will be implicitly denied if we don't configure any access policies.

Our example RBAC yaml file might look something like:

admin_roles:
  - "owners"

authorized_roles:
  - "owners"
  - "devs"

policies:
  -
    actions:
      - GROUP_EDIT
    effect: Allow
    resource:
      - "*"
    role: "owners"

The poison pill

Today is the unfortunate day when your team has to make a change to the production cluster.

Your team lead has been briefed on the plan and has decided to grant the devs role Allow access for GROUP_EDIT on the cluster.

This has been done thru the Temporary Policies section of kPow's settings:

temp policy

Once created, all team members get the notification thru Slack that the temporary policy was created:

slack

Incrementing the offset

Now a member of your team has been tasked with the job of incrementing the offset of the consumer group for the problematic topic.

The dev looks to the application logs and notices that it is partition 3 of topic tx_trade1 that contains the poison message. The erroring consumer group is named trade_b2.

The developer then opens kPow, navigates to the "Workflows" tab, and selects the consumer group.

From within the consumer group view, the dev selects the partition and selects "Skip Offset".

This will schedule the mutation. The offset will be incremented once someone on the team scales down the trade_b2 service.

Skip Offset

Post-Mortem

After the production incident, you can use kPow to aid in your incidents post-mortem.

All actions that were undertaken were persisted in kPow's audit log topic for data governance.

You can see a full recorded history of all actions that were taken to restore the production service.

Audit Log

Inspecting the audit log message reveals the offset that was skipped.

Audit Log

You can use kPow's data inspect functionality to view the poison message to help investigate why that message took down the consumer group.

Data Inspect

Further Reading

Explore following articles in our documentation to learn more about kPow's features:

You might also be interested in the following articles:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment