EVM Staking Update Postmortem

EVM Staking Update Postmortem

A personal note from CEO Sam Harrison:

As far back as the summer of 2022, EVM Staking has been discussed within the Core Developer Group of Findora. Since its inception as a proof-of-stake chain, staking had been performed on the UTXO layer of the project. While the “account-less” paradigm of the UTXO layer gave Findora advantages for running zero-knowledge proofs and added opportunities for confidentiality and privacy, as a “computational layer,” UTXO left much to be desired.


The EVM layer was added to increase interoperability, to add additional smart contract functionality and to allow the wider developer community to use existing EVM toolsets to help them build their dApps and other projects. During this time - staking remained on the UTXO, and the only way to stake was through the native Findora wallet.


We needed to shift the security of the chain to the EVM layer and simplify the staking process for the Findora community. This change came with two added benefits. First, moving staking to the EVM makes third-party integrations for staking services easier. Second, EVM staking allows us to integrate directly into compatible wallets, thereby reducing the friction of staking. Less friction should lead to more tokens being staked which will both add security to the network as well as attract more delegators to the ecosystem of validators.


However, such a fundamental transition is not a simple feat.


In the process of that transition, and over the past two months, the Findora community has been saddled with a degraded wallet and staking experience. The code to shift staking to the EVM layer ended up impacting several of the connections between the staking service and wallets.


Everyone here at Discreet Labs (myself included) had high hopes that solutions to the initial bugs were straightforward and could be implemented in a timely manner. This was not the case. Fortunately, we have corrected these issues and are very excited to roll out EVM staking. Most of the issues and resolutions you can find in our postmortem.


Failure, they say, is the best teacher. We know that our quality assurance process in the past has not been adequate and that interdepartmental communication between our globally distributed teams needs to be improved. Remedies for these shortcomings have been implemented, and we plan to include third-party QA testing in future rollouts.


I want to thank our community for their support and passion, including those who voiced their concerns in our social media channels on the matter. My thanks especially goes out to Patrick from EasyNode for releasing several critical updates to the validator toolbox for the community. You are a rock star.


Finally, I want to reiterate that this was a hugely complicated task, and while we did encounter many issues, I want to thank the engineers and developers in the Core Developer Group for bringing us across the finish line. Thank you, very much, for getting it done. We are now moving forward with a tight focus on the work ahead.


The first priority is ecosystem engagement. I am happy to announce the creation of an Ecosystem Advisory Group. At my invitation, several leaders of projects running on Findora have agreed to meet with me on a regular basis. The structure and size of this group remains undefined, but the goal is to hear from these leaders regularly, to get their feedback on upcoming technical advancements and to provide a direct channel for infrastructure requests.


The Ecosystem Advisory Group will also, hopefully, serve as a funnel for quality projects to be introduced to our business development team. This team has not been idle, and now that the EVM Staking bugs have been resolved, I anticipate several announcements in the coming weeks.


I will miss those who parted ways with us. I respect them and their choices. However, I do believe that they are making a mistake. The future of this project is very bright. The coming weeks will see the curtain pulled back on some initiatives that we have been working on and they will change FRA for the better.


With Sincerity,

Sam


Executive Summary

On September 25th, Discreet Labs engineers began the process of rolling out a major upgrade that would translate staking from being based on the UTXO layer to instead being run on smart contracts on the EVM ledger. The upgrade, unfortunately, introduced several issues impacting users and validators, leading to a  degraded user experience. 


The core of these issues centered around full node performance problems, which cascaded into display issues in both desktop and mobile wallets, manifesting as incorrect or null staking data. Simultaneously, users experienced difficulties with reward claiming due to these node issues, affecting both delegators and validators.


Validators did face other issues, like an inability to remove themselves from jailed status, a new auto-quarantining mechanism introduced with EVM staking code. Findora validators were also unable to freely claim their commission rewards. Collectively, these problems not only reduced the network’s reliability but also damaged user trust.


However, with the resolution of these technical issues, the new EVM Staking portal is ready for production, providing a much better and easier experience for those wishing to stake on the Findora Network. Furthermore, staking is managed by smart contracts, simplifying the deployment of future consensus updates by requiring fewer mandatory validator upgrades.

Issues Breakdown

This postmortem will track the four main issues that arose from the deployment of the EVM Staking upgrade: 


  1. Fullnode reliability, which led to wallet display issues
  2. The inability of validators to unjail themselves
  3. Issues delegators faced in claiming rewards
  4. The inability of validators to claim rewards

The Full Node Database Lock Issue

What went wrong

The full node issues originated from performance bottlenecks. These nodes, crucial for data maintenance, struggled due to inefficient resource utilization, which resulted in database locks. The database lock on full nodes resulted in unresponsive API queries as a result of the new HTTP/API requests being made. This inefficiency in handling requests resulted in display issues in both desktop and mobile wallets. The wallets, reliant on full nodes for real-time data, began showing incorrect or null values for various staking-related data due to the nodes' inability to process requests promptly and accurately. 


The engineering team implemented a temporary fix of using Web3 RPC nodes, which are centralized but more efficient, before eventually restructuring databases and implementing redundancy in the fullnode automated systems to prevent the database lock issue from slowing the network.

Timeline of the Issue

  • September 28: Initial identification of performance issues with a full node. Concurrently, the Findora wallet started displaying null values for staked, unstaking, and claimable amounts.
  • October 3-6: Continued efforts to resolve full node and wallet display issues. Updates on API methods to correct display issues.
  • October 19: Specific issues with the iOS wallet app were noted.
  • October 24-25: Introduction and implementation of Web3 RPC Nodes.


Fixes Implemented

To address these challenges, several measures were taken:

  1. Improving Full Node Performance: The engineering team worked on isolating and resolving the full node performance issue by restructuring the database.
  2. Automated Monitoring Systems: An automated monitoring system tracks node performance, enabling them to restart much more quickly if there is a database lock issue.
  3. Updating Wallet APIs: Corrections were made in the API methods used by the wallet and block explorer to ensure accurate data retrieval and display.
  4. Ongoing Wallet App Updates: Regular updates were released for both desktop and mobile wallets to address the display issues and improve overall functionality. Additional updates are in the works for mobile wallet stability.


Improper Jailing of Validators

What Went Wrong

Unfortunately, at the launch of EVM Staking, there was no mechanism for validators to manually remove themselves from jail, a quarantine status due to a reduction in performance. This, combined with the ease of missing blocks due to the full node database locks, resulted in many validators being jailed unfairly with no way to remedy the situation.


Fix Implemented

The threshold at which a validator was jailed was temporarily lowered to give validators more margin for error. A permanent fix was created through Patrick of EasyNode, who was able to update the validator toolbox to give validators a way to unjail themselves. Also, as the network stabilized, fewer validators were jailed in the first place.

Delegators’ Inability to Claim Tokens

What Went Wrong

Immediately after the update, it appeared that delegators could not claim rewards because rewards and unbonding amounts were not properly displayed in the wallet. This was due to the failure of the fullnodes and Web3 RPC endpoints and was a cosmetic issue. Delegators were eventually able to see their reward balances and claim them.


Fix Implemented

By fixing the fullnode issue, delegators were able to use their wallets to claim rewards.

Validators’ Inability to Claim Commissions

What Went Wrong

A separate issue arose for validators in early November, where validators were only able to claim their commission after delegators claimed their rewards. The result was that validators were earning far less than what they should have based on their commission rates.


Fix Implemented

The Discreet Labs team issued an optional upgrade, v0.4.4, which which allows validators to manually claim rewards. Validators will be able to download the upgrade and claim rewards using the Validator Toolbox by December 9th.

Collective Impact

The collective impact of these issues meant that users and validators struggled to use the desktop or mobile version of their wallets. Not only was it hard to stake, but it was very difficult to claim rewards, and transactions took longer. In many ways, users felt like they were not able to interact with the network. The performance of the wallets has now been mostly restored to their pre-update level, and the network’s security has been moved to the EVM layer of the chain.

Lessons Learned

This particularly disappointing upgrade from Discreet Labs has caused the team to reflect on what can be done internally to improve rollouts and prevent similar slow-moving disasters.

Thorough Internal and External QA Testing

Our chief takeaway is that more quality assurance testing is needed. Not only do products need to be thoroughly tested internally before being released, but they need to go through external quality assurance testing as well. We’ve already started to change this, having members of the community test v1 of the new EVM Staking platform before launching it. Further, the Engineering and PM teams will avoid rushed deployments regardless of deadlines; if a product or feature is not ready for mainnet, it will be delayed until QA tests show optimal results.

Better Communication with Engineering

The remote nature of the Discreet Labs team can make communication difficult. We’ve set up a number of internal channels and procedures that should facilitate better communication with project managers and engineers in the future to avoid slow feedback loops and implement quicker escalation paths.


About Findora

Findora is a Layer-1 protocol delivering zero-knowledge solutions to Web3.


Findora integrates two ledgers into a single chain: an EVM ledger for interoperability and a UXTO ledger optimized for zk operations. This dual-layer architecture lets Findora encrypt blockchain data for programmable transparency and public use. By providing new use cases, Findora’s zk tech prepares Web3 for real-world adoption.


We appreciate our developers and would love to onboard you to the Findora ecosystem. Please reach out, and join our social channels for more.


Discord | Twitter | Reddit | Telegram | YouTube | LinkedIn | Facebook | Newsletter