Skip to main content

2 posts tagged with "resiliency"

View All Tags

ยท 7 min read
Christopher Kujawa
Nicolas Pepin-Perreault

In today's chaos day we experimented with job push resiliency.

The following experiments we have done today:

  1. Job streams should be resilient to gateway restarts/crash
  2. Job streams should be resilient to leadership changes/leader restarts
  3. Job streams should be resilient to cluster restarts

TL;DR; All experiments succeeded and showcased the resiliency even on component restarts. ๐Ÿš€

ยท 8 min read
Christopher Kujawa

In today's chaos day, we wanted to experiment with the gateway and resiliency of workers.

We have seen in recent weeks some issues within our benchmarks when gateways have been restarted, see zeebe#11975.

We did a similar experiment in the past, today we want to focus on self-managed (benchmarks with our helm charts). Ideally, we can automate this as well soon.

Today Nicolas joined me on the chaos day ๐ŸŽ‰

TL;DR; We were able to show that the workers (clients) can reconnect after a gateway is shutdown โœ… Furthermore, we have discovered a potential performance issue on lower load, which impacts process execution latency (zeebe#12311).