2️⃣ Validium And The Layer 2 Two-By-Two — Issue No. 99
/Earlier this month, the team at StarkWare announced they were pushing their novel layer 2 scaling solution to the Ethereum mainnet. Link.
The project, called StarkEx, doesn't exactly conform to the accepted definitions of existing constructions, such as zkRollup, Optimistic Rollup, or Plasma. StarkWare co-founder Eli Ben-Sasson has suggested "Validium" as a general term to describe this new construction, and the name seems likely to stick after an endorsement from Vitalik Buterin. Link.
In this edition of Build Blockchain, we'll review the layer 2 design space emerging in the Ethereum ecosystem through the lens of a classic 2-by-2 matrix. We'll close by looking at Validium, which fills out this matrix, and discuss the pros and cons to this new approach.
The Layer 2 Two-By-Two
The evolution of layer 2 scaling research in the Ethereum community has been a case study in the way engineering works in the real world. Whereas promising ideas often seem to work in theory, the real richness of the tradeoff space isn't properly understood until engineers try to concretely implement those ideas. After years of research and attempts at such implementations, tradeoffs in the Ethereum layer 2 ecosystem can be boiled down to this 2-by-2 matrix, which I first saw proposed by StarkWare's Avihu Levy. Link.
On a layer one blockchain, all computation is executed on the main chain, and all data is stored there as well. Layer 2 solutions, then, can be classified by how they address the scalability of these two bottlenecks: computation and data storage. This is what's reflected in the 2-by-2 above. Let's proceed by filling in the squares of this matrix, concluding with the new Validium construction.
Plasma
The promising, theoretical idea that kicked off scaling research in the Ethereum ecosystem was Plasma. It was first proposed by Vitalik Buterin and Joseph Poon in 2017, and it lives in the bottom right corner of our matrix. Link.At its core, the idea of Plasma is relatively straightforward. To achieve greater scalability, Plasma moves both computation and data storage off of the main blockchain and into a second layer chain. The operators of this second layer network post "state commitments" to the main chain periodically in the form of Merkle tree roots. If the operator posts an invalid state transition, a user can submit a fraud proof to a smart contract on the main chain. If fraud is shown, this contract slashes the operator's required deposit.
While this idea is simple and elegant, the devil is (as usual) in the details. One such critical detail is data availability. Operators can be punished via fraud proofs if they post an invalid commitment to the main chain. For a user to submit a fraud proof, though, they need the data that went into that state commitment in the first place. What if the plasma operators simply refuse to publish this data? This would enable them to advance the network to invalid states without being held accountable.
Many workarounds to this problem were proposed, such as long withdrawal delays, which would enable a "mass exit" from the Plasma chain in the case of bad behavior. After years of research, though, a workable implementation of this mechanism has yet to materialize. This led to the exploration of other quadrants of our 2-by-2 matrix.
zkRollup
The next quadrant to receive significant attention is catercorner to Plasma. The construction that lives there is known as zkRollup, and interest in it was kicked off by an ETH Research post by, who else, but Vitalik Buterin. Link.zkRollup solves the data availability problem by, well... not really trying to solve it. Rollup constructions simply post the data for all layer 2 transactions on the base chain as arguments to a mainnet smart contract. This makes the data available to anyone observing the blockchain as so-called "calldata," and limits the scalability benefits conferred by zkRollups to the computational axis.
Unlike Plasma, which relies on an incentivization scheme enforced by fraud proofs to ensure computation is correct, zkRollups uses zero knowledge proofs validated on the main chain to ensure invalid state transitions can never occur. All the computation is thus "rolled up" into the proof, and the need to trust or validate the operators is removed.
It should also be noted that zkRollups do confer some relatively minor scalability benefits on the data storage axis as well. For one, data sent to the contract can be compacted, and calldata doesn't need to be held as active state by full nodes, reducing the burden on anyone who chooses to run one. zkRollups also remove the need to include signature data on the chain, because proof of transaction validity is also enforced by the zero knowledge proof.
The primary drawback of zkRollup is same thing that makes it powerful: it relies on cutting edge cryptography in the form of zero knowledge proofs. In addition to being very difficult to implement securely, zero knowledge proofs currently available are also prohibitively inefficient when generalized. Existing zkRollup implementations are thus forced to focus on application specific use cases, like Looprings decentralized layer 2 exchange, for example. Link.
Optimistic Rollups
The desire for generalized smart contract support in layer 2 led researchers to explore constructions similar to zkRollups, but without relying on zero knowledge proofs. One obvious option here is a return to interactive fraud proofs, which gives us so-called Optimistic Rollups.Since we've already discussed Plasma and zkRollups, the idea behind Optimistic Rollups is straightforward. They retain the usage of calldata to make all layer 2 data available on the base chain, but leverage fraud proofs, like those proposed for Plasma, to punish operators that attempt to advance to invalid states.
Because of these tradeoffs, Optimistic Rollups provide the least benefits in terms of scalability. This is also what makes them interesting, though. Generalized optimistic rollups are practical today. They don't rely on any advanced tech or large unsolved research problems. Link.
Several teams, like the folks at the aptly named "Optimism," are close to bringing this construction to mainnet. Link.
Validium
This brings us, finally, to the last blank cell of our layer 2 two-by-two. In the final remaining quadrant, we can now fill in the newly-named Validium, and discuss StarkEx, the first implementation of this set of tradeoffs. Link.Validium returns to the idea of keeping layer 2 data off-chain, unlocking much greater scalability gains than the rollup constructions can offer. Unlike Plasma, though, Validium doesn't rely on fraud proofs for validating computation, instead adopting zero knowledge proofs. Like zkRollup, this means Validium is currently limited to application specific implementations, and StarkEx is indeed purpose-built for use by decentralized exchanges.
This tradeoff, though, comes with some notable advantages over Plasma. The main chain verification of the zero knowledge proofs makes it impossible for operators to advance invalid state transitions. This mitigates the damage that network operators can perpetrate by withholding data. It would be impossible, for example, for colluding operators to push a state which transferred a user's tokens to their own wallets. This removes the need to design "mass exit" incentive games or to include long withdrawal delays in the protocol.
As several other researchers have pointed out, usage of zero knowledge proofs is not a cure-all for data availability attacks. Operators can, for example, make valid changes to state using properly funded accounts they control, but by withholding the data about those transactions, prevent other users from submitting the Merkle proofs they need for their own withdrawals. Link.
This attack effectively freezes account balances, and opens up users to bribery attacks from the operators, who might refuse to publish the required state unless the users surrender some fraction of their funds. Link.
To lessen the likelihood of such an attack, the StarkWare team has used what I would describe as an "engineering hack." I use that term quite affectionately, as someone who has deployed my fair share of engineering hacks in my career! The StarkEx product includes a federated "Data Availability Committee," whose members are required to sign data and keep it available at all times. As long as one of them is honest and operating, users should always be able to get the data they need to make withdrawals.
This solution isn't perfect, but it's probably acceptable for many usecases. Remember, everything is about tradeoffs. Compared to a completely trustless DEX on the base chain, a StarkEx exchange does involve slightly higher third-party risk. In exchange for that risk, a StarkEx exchange provides orders of magnitude better performance - a property that's important for serious traders. Compared to a centralized exchange a Validium layer 2 is still far more secure and trust minimized.
The Engineering Sausage
As I mentioned earlier, the progression of Ethereum's layer 2 scaling research is a great example of how the sausage is really made in an engineering context. While it might be tempting to take a cynical view on this seemingly slow and meandering process, I don't find it surprising at all.The same kind of iterative drift through the initially unseen design space occurs in most research settings. As engineers are forced to grapple with the constraints reality places on their theories, the real tradeoffs required come into focus. In the case of Ethereum layer 2 scaling, this discovery process is just playing out in the public square, with a lot of scrutiny from those who follow the crypto space and had high hopes for a quick breakthrough.
To me, the layer 2 Ethereum ecosystem looks like a healthy one. Through trial and error, researchers and implementors have painstakingly mapped out the design space. Various teams are now rapidly honing in on practical solutions that balance these tradeoffs to meet real user needs. When you combine all this progress on layer 2 with the fact Ethereum's base chain has seen sustained congestion in recent months, we may be reaching a tipping point. Over the next 12-18 months, I suspect we'll finally start seeing meaningful adoption of layer 2 solutions.