Dynamic Scaling with Dataloss
We continue our previous experiments with dynamically scaling by now also testing whether the cluster survives dataloss during the process.
One goal is to verify that we haven't accidentally introduced a single point of failure in the cluster. Another is to ensure that data loss does not corrupt the cluster topology.
TL;DR; Even with dataloss, the scaling completes successfully and with the expected results. We found that during scaling, a single broker of the previous cluster configuration can become a single point of failure by preventing a partition from electing a leader. This is not exactly a bug, but something that we want to improve.