Skip to main content

2 posts tagged with "resiliency"

View All Tags

Job push resiliency

ยท 7 min read
Christopher Kujawa
Chaos Engineer @ Zeebe
Nicolas Pepin-Perreault
Senior Software Engineer @ Zeebe

In today's chaos day we experimented with job push resiliency.

The following experiments we have done today:

  1. Job streams should be resilient to gateway restarts/crash
  2. Job streams should be resilient to leadership changes/leader restarts
  3. Job streams should be resilient to cluster restarts

TL;DR; All experiments succeeded and showcased the resiliency even on component restarts. ๐Ÿš€

Gateway Termination

ยท 8 min read
Christopher Kujawa
Chaos Engineer @ Zeebe

In today's chaos day, we wanted to experiment with the gateway and resiliency of workers.

We have seen in recent weeks some issues within our benchmarks when gateways have been restarted, see zeebe#11975.

We did a similar experiment in the past, today we want to focus on self-managed (benchmarks with our helm charts). Ideally, we can automate this as well soon.

Today Nicolas joined me on the chaos day ๐ŸŽ‰

TL;DR; We were able to show that the workers (clients) can reconnect after a gateway is shutdown โœ… Furthermore, we have discovered a potential performance issue on lower load, which impacts process execution latency (zeebe#12311).