Skip to main content

Non-graceful Shutdown Broker

ยท 2 min read
Christopher Kujawa

Today I had not much time for the chaos day, because of writing Gameday Summary, Incident review, taking part of incidents etc. So enough chaos for one day :)

But I wanted to merge the PR from Peter and test how our brokers behave if they are not gracefully shutdown. I did that on Wednesday (21-10-2020).

PR Mergeโ€‹

I tried again the new chaos experiment with a Production M cluster, before merging. It worked quite smooth. PR is merged #41 ๐ŸŽ‰

Non-graceful shutdownโ€‹

Currently in our experiments we do a normal kubectl delete pod, which does an graceful shutdown. The application has time to stop it's services etc. It would be interesting how Zeebe handles non-graceful shutdowns. In order to achieve that we can use the option --grace-period=0. For more information you can read for example this

I added additional experiments to our normal follower and leader restarts experiments, such that we have both graceful and non-graceful restarts. Both seem to work without any issues. I was also able to fix some bash script error with the help of shellcheck. Related issue https://github.com/zeebe-io/zeebe-chaos/issues/42.

Example output:

(chaostk) [zell kubernetes/ ns:f45d4dee-f73a-4733-9cd4-a4aa8b022376-zeebe]$ chaos run leader-terminate/experiment.json 
[2020-10-21 15:57:23 INFO] Validating the experiment's syntax
[2020-10-21 15:57:23 INFO] Experiment looks valid
[2020-10-21 15:57:23 INFO] Running experiment: Zeebe Leader restart non-graceful experiment
[2020-10-21 15:57:23 INFO] Steady-state strategy: default
[2020-10-21 15:57:23 INFO] Rollbacks strategy: default
[2020-10-21 15:57:23 INFO] Steady state hypothesis: Zeebe is alive
[2020-10-21 15:57:23 INFO] Probe: All pods should be ready
[2020-10-21 15:57:23 INFO] Probe: Should be able to create workflow instances on partition 3
[2020-10-21 15:57:27 INFO] Steady state hypothesis is met!
[2020-10-21 15:57:27 INFO] Playing your experiment's method now...
[2020-10-21 15:57:27 INFO] Action: Terminate leader of partition 3 non-gracefully
[2020-10-21 15:57:33 INFO] Steady state hypothesis: Zeebe is alive
[2020-10-21 15:57:33 INFO] Probe: All pods should be ready
[2020-10-21 15:58:28 INFO] Probe: Should be able to create workflow instances on partition 3
[2020-10-21 15:58:32 INFO] Steady state hypothesis is met!
[2020-10-21 15:58:32 INFO] Let's rollback...
[2020-10-21 15:58:32 INFO] No declared rollbacks, let's move on.
[2020-10-21 15:58:32 INFO] Experiment ended with status: completed

Related commits:

Participantsโ€‹

  • @zelldon