Bug Fix Review & Postmortem
1. Incident Summary
Findora Mainnet stopped generating blocks on 02/08/2023 at 11:30 AM PST. Gate.io generated a specific type of transaction which triggered a bug in the UTXO code. The team at Discreet Labs ran a rollback with validators and brought Mainnet back online.
2. Incident Impact
Mainnet consensus on Findora paused for 2 hours and 5 minutes. Transactions were unable to be processed during this time.
3. Incident Detection
The chain stopped at 11:30 AM and the DevOps team received an alarm at 11:40 AM. The proactive monitoring solution has a 10-minute period to count the block interval. Shorter intervals can produce false alarms.
4. Response Time & Recovery
The team responded to this incident at 11:40 AM PST and immediately began contacting community leaders and validators to coordinate a workaround. Mainnet was rolled back three blocks to remove the transaction from Gate.io. No additional transactions outside of those from Gate.io were affected by the rollback.
5. Timeline of Events
- 11:30 AM PST
Mainnet consensus was halted
- 11:40 AM PST
Proactive alarms were received. The recovery process begins.
- 12:45 PM PST
Network now is staged for rollback. Contacting validators for rollback.
- 1:35 PM PST
Mainnet begins producing blocks.
6. Root Cause
This incident was caused by two factors:
1. Findora uses JSON as the transaction data format in the original design. JSON does not promise that the order of fields before and after serialization remains the same. It is not a good choice as a transaction data format. In my opinion, It’s a bad transaction data format. When the data length is too long, or some fields are too long (more than one signature, maybe have other situations), the order of the fields will be unstable.
2. There is a problem with the consensus apphash calculation process. The hash of the transaction Merkle tree should be calculated before deserialization, or this hash’s calculation should skip.
7. Lessons Learned
Improvements can be made to the UTXO side code which will avoid this problem in the future. There are multiple options available to accomplish this, including a FIP proposal to constrain the Merkle tree calculation rule.
8. Corrective Action
Short-Term Solution:
Create a filter in RPC service (endpoint) to block transactions that cause crashes.
Long-Term Solution:
A code refactor for UTXO-related logic is being composed, reviewed, and proposed via Findora Improvement Proposal (FIP).
About Findora
Findora is a Layer-1 protocol delivering zero-knowledge solutions to Web3.
Findora integrates two ledgers into a single chain: an EVM ledger for interoperability and a UXTO ledger optimized for zk operations. This dual-layer architecture lets Findora encrypt blockchain data for programmable transparency and public use. By providing new use cases, Findora’s zk tech prepares Web3 for real-world adoption.
We appreciate our developers and would love to onboard you to the Findora ecosystem! Please reach out, and join our social channels for more.
Discord | Twitter | Reddit | Telegram | Youtube | LinkedIn | Facebook | Newsletter