🎉 Happy to announce that we are broadening the scope of our Chaos days, to look holistically at the whole Camunda Platform, starting today.
In the past Chaos days we often had a close look (or concentrated mostly) at Zeebe performance and stability.
Today, we will look at the Operate import performance and how Zeebe processing throughput might affect (or not?) the throughput and latency of the Operate import. Is it decoupled as we thought?
The import time is an important metric, representing the time until data from Zeebe processing is
visible to the User (excluding Elasticsearch's indexing). It is measured from when the record is written to the log, by the Zeebe processor, until Operate reads/imports it from Elasticsearch and converts it into its data model. We got much feedback (and experienced this on our own) that
Operate is often lagging behind or is too slow, and of course we want to tackle and investigate this further.
The results from this Chaos day and related benchmarks should allow us to better understand how the current importing
of Operate performs, and what its affects. Likely it will be a series of posts to investigate this further. In general,
the data will give us some guidance and comparable numbers for the future to improve the importing time. See also related GitHub issue #16912 which targets to improve such.
TL;DR; We were not able to show that Zeebe throughput doesn't affect Operate importing time. We have seen that Operate can be positively affected by the throughput of Zeebe. Surprisingly, Operate was faster to
import if Zeebe produced more data (with a higher throughput). One explanation of this might be that Operate was then less idle.