Ethereum Prysm client incident review: 248 blocks missing in a single epoch, validators lost 382 ETH

robot
Abstract generation in progress

【BitPush】On December 4th, the Ethereum mainnet experienced a significant technical disturbance. The Prysm client team recently released a detailed incident review report, reconstructing everything that happened at that time.

That day during Fusaka hours, Prysm beacon nodes encountered major issues when processing certain attestations—node resources were exhausted instantly, leading to a backlog of validator requests, and resulting in a large number of missing blocks and attestations. The numbers are quite alarming: from epoch 411439 to 411480, over a span of 42 epochs, 248 blocks were missing out of 1344 slots, with a missing rate of 18.5%. Network participation rate even dropped to 75% at one point, causing validators to lose approximately 382 ETH in attestations rewards.

What was the root cause? Prysm received attestations from nodes that may have lost synchronization, which referenced the previous epoch’s block root. To verify the legitimacy of these data, Prysm had to repeatedly replay old epoch states and perform high-cost epoch transition operations. Under the accumulation of concurrent requests, the node eventually triggered resource exhaustion. Interestingly, this flaw originated from Prysm PR 15965, which had been deployed to the testnet a month earlier, but at that time, it did not trigger the same scenario.

The fix was implemented in two steps. First, in version v7.0.0, the --disable-last-epoch-target parameter was enabled as a temporary stopgap. Then, subsequent releases v7.0.1 and v7.1.0 included a long-term solution—switching to use the head state to verify attestations, completely avoiding the need to replay historical states repeatedly. The issue gradually eased after UTC 4:45, and by epoch 411480, network participation had recovered to over 95%.

The Prysm team also took this opportunity for deep reflection. They pointed out that this incident once again demonstrated the importance of client diversity. If a single client accounts for more than one-third of the network, it could cause the network to temporarily fail to finalize; if it exceeds two-thirds, there is a risk of the entire chain failing. Meanwhile, the team also acknowledged issues with unclear communication regarding feature toggles and the inability of testing environments to effectively simulate large-scale node desynchronization scenarios. They plan to improve testing strategies and configuration management in the future.

ETH-1.08%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)