Skip to main content

Not produce duplicate Keys

· 6 min read
Christopher Kujawa

Due to some incidents and critical bugs we observed in the last weeks, I wanted to spent some time to understand the issues better and experiment how we could detect them. One of the issue we have observed was that keys were generated more than once, so they were no longer unique (#8129). I will describe this property in the next section more in depth.

TL;DR; We were able to design an experiment which helps us to detect duplicated keys in the log. Further work should be done to automate such experiment and run it agains newer versions.

Unique Keys

In Zeebe each element must have an cluster wide unique key. This is a property we expect in several areas inside, but also outside of Zeebe (external services). We can call it an invariant we have to guarantee.

In order to have cluster wide unique key's we encode the partition id in the key's. You can check this code for more details. Furthermore only the Leader (of a partition) are in charge and allowed to generate new keys. If a fail-over happens the new leader need to continue with generating new keys, he has to resume where the other Leader have left-of.

The last part was exactly the issue we had, see #8129. The new leader generated key's, which have been already generated by the previous leader. This caused inconsistency in our internal state, but also in external services like Operate.

Chaos Experiment

This time it is a bit different approach, since we know it will fail and we want to examine how we can detect the weakness. In order to understand how we can detect and prevent/verify that this will not happen again, I want to run an experiment against 1.2.0. Later we should try to automate this experiment and run it against newer versions.

As described above the issue was caused due to a fail-over, to be more specific, the Follower has already reached the end of the log (replayed all events) and on fail-over he had nothing to replay. In order to reproduce this situation we will do the following experiment:

  1. Deploy an simple process model, with start and end event.
  2. Start an instance and await the result zbctl create instance "simpleProcess" --withResult --insecure
  3. Wait a short period of time
  4. Restart the current Leader k delete pod zell-chaos-zeebe-2
  5. Wait until a new Leader is choosen
  6. Start an instance and await the result zbctl create instance "simpleProcess" --withResult --insecure

After doing this we should copy the data from the new Leader and exermine the log (with zdb). Alternatively, we could also verify the exporter log in elastic.

In general it should be clear that if we take a look at the log the key can appear multiple times, but they should only reference one single entity/element.

A key is normally generated when processing a command and written as follow up event the first time to the log. For example when we process the ACTIVATE_ELEMENT command we write the key of the new entity on ELEMENT_ACTIVATED to the log. Keys are assigned for other types of entities as well, like process instance creations, deployments etc. Here it works similar.

If the follow-up event contains a key which doesn't occurre the first time, then we have a problem, and this is how we want to detect it.

Expected

To follow the chaos engineering approach we will define here a hypothesis, even if we know it might not true for 1.2.0. But later we can reuse it for newer version, because here it should apply.

We expect even if we processed all records and have a fail-over that the new leader should not generate the same keys again.

Actual

We started a benchmark with version 1.2.0, three brokers and one partition to reduce the scope and followed the described experiment above.

The second process instance creation returned a key (2251799813685258) which was incremented by one, compared to the other process instance (2251799813685257). Here we already knew that our experiment caused the issue. Because, multiple new entities are generated during processing of a process instance and we awaited the result, at the first process instance creation. This means we have generated already more than one key (on the old leader), which hasn't been picked up by the new leader.

We copyed the data from the new Leader and exermined the log via zdb. First we printed the complete log into a separate file as json:

  zdb log print -p raft-partition/partitions/1 | jq > data.json

This makes it human readable and also processable by other tools, like jq. After checking the file we already found one duplicate.

The first key was part of index 12 in term one and the duplicate was part of index 20 in term 2 (new leader term)

 {
"index": 12,
"term": 1,
"entries": [
{
"partitionId": 1,
"value": {
"version": 1,
"processDefinitionKey": 2251799813685249,
"bpmnProcessId": "simpleProcess",
"processInstanceKey": 2251799813685256,
"parentElementInstanceKey": -1,
"flowScopeKey": -1,
"parentProcessInstanceKey": -1,
"elementId": "simpleProcess",
"bpmnElementType": "PROCESS"
},
"timestamp": 1636627776427,
"position": 26,
"valueType": "PROCESS_INSTANCE",
"intent": "ACTIVATE_ELEMENT",
"recordType": "COMMAND",
"rejectionType": "NULL_VAL",
"rejectionReason": "",
"brokerVersion": "1.2.0",
"sourceRecordPosition": 25,
"key": 2251799813685256
},
{
"partitionId": 1,
"value": {
"version": 1,
"processDefinitionKey": 2251799813685249,
"bpmnProcessId": "simpleProcess",
"processInstanceKey": 2251799813685256,
"variables": {}
},
"timestamp": 1636627776427,
"position": 27,
"valueType": "PROCESS_INSTANCE_CREATION",
"intent": "CREATED",
"recordType": "EVENT",
"rejectionType": "NULL_VAL",
"rejectionReason": "",
"brokerVersion": "1.2.0",
"sourceRecordPosition": 25,
"key": 2251799813685257
}
]
},
 {
"index": 20,
"term": 2,
"entries": [
{
"partitionId": 1,
"value": {
"version": 1,
"processDefinitionKey": 2251799813685249,
"bpmnProcessId": "simpleProcess",
"processInstanceKey": 2251799813685257,
"parentElementInstanceKey": -1,
"flowScopeKey": -1,
"parentProcessInstanceKey": -1,
"elementId": "simpleProcess",
"bpmnElementType": "PROCESS"
},
"timestamp": 1636627887305,
"position": 46,
"valueType": "PROCESS_INSTANCE",
"intent": "ACTIVATE_ELEMENT",
"recordType": "COMMAND",
"rejectionType": "NULL_VAL",
"rejectionReason": "",
"brokerVersion": "1.2.0",
"sourceRecordPosition": 45,
"key": 2251799813685257
},
{
"partitionId": 1,
"value": {
"version": 1,
"processDefinitionKey": 2251799813685249,
"bpmnProcessId": "simpleProcess",
"processInstanceKey": 2251799813685257,
"variables": {}
},
"timestamp": 1636627887305,
"position": 47,
"valueType": "PROCESS_INSTANCE_CREATION",
"intent": "CREATED",
"recordType": "EVENT",
"rejectionType": "NULL_VAL",
"rejectionReason": "",
"brokerVersion": "1.2.0",
"sourceRecordPosition": 45,
"key": 2251799813685258
}

Since reading and checking the complete file is a bit hard and error prone, we tried to exermine the data with jq. This worked quite well, we were able to detect the duplicates with an jq expression.

cat data.json | jq -r '.records[] | select(.entries  != null) |  .entries[] | select (.intent == "ELEMENT_ACTIVATED" or .intent == "CREATED") | .key' data.json | sort | uniq -c -d | awk '{print $2}'

2251799813685257
2251799813685258
2251799813685263
2251799813685264

This expression we can use for further experiments and for automating such.