github pokt-network/pocket-core RC-0.5.2.9

latest releases: proofofconcept, RC-0.11.3, RC-0.11.2...
pre-release3 years ago

RC-0.5.2.9

After nine Beta releases, two Month's worth of continuous internal and external testing, and investigation and QA, Pocket Network's Engineering team feels the resource problems of RC-0.5.0 are fixed (see below for known issues) with the upcoming RC-0.5.2.
Official upgrade guide here

Important Release Notes

  1. Delete Session.DB before upgrading from RC-0.5.1
  • rm -rf <datadir>/session.db
  1. Run this release with the following environment variable: export GODEBUG="madvdontneed=1"
    Link to Golang Issue
  2. Use the default config for all options (except unique configurations like moniker, external addr, etc). You have two options:
  • Remove/config/config.json file, execute a CLI command, and update the custom configurations
  • Run pocket util update-configs command (creates a new config file and backs up old config file)

GoLevelDB is the only supported database from RC-0.5.2 onward

  • If previously using CLevelDB, users might experience incompatibility issues due to known incompatibilities between the two
  • PNI temporarily will provide a backup datadir to download to avoid syncing from scratch:
    13K .zip
    13K .tar.gz
  • After uncompressing theses files, place the contents in the <datadir>/data folder

Context And Original Issues

After a series related issues of Pocket Core's RC-0.5.0 were opened (#1115 #1094 #1116 #1117 ++) in October 2020, PNI opened a formal investigation into the related resource consumption issues of RC-0.5.0 (and subsequently the more stable RC-0.5.1). The main metric of concern with RC-0.5.0 Resources is 'Memory' (virtual, real, RSS, you name it), with a very tangible 'Memory Leak'. 'Relay Stability', though a primary concern for any release, is a secondary concern for RC-0.5.2 as RC-0.5.1 seemed to solve the immediate, emergency level Code 66 errors that plagued blocks 6K-7.5K. Speed is a tertiary concern with RC-0.5.0, taking 10+ hours to sync to Mainnet Block 7000.

Tooling

To debug the issues above, several tools were utilized to determine the root causes of all.

Listed in no particular order:

Debugging and Changelog

Immediately, PNI's team recognized many optimizations to be made within Pocket Core's own source code. This includes the following:

- Delete local Relay/Challenge Evidence on Code 66 failures
- Log relay errors to nodes (don't just return to clients)
- Added configuration to pre-validate auto transactions
- Sending proofs/claims moved to EndBlock
- Load only Blockmeta for PrevCtx
- Added configurable cache PrevCtx, Validators, and Applications
- Don't broadcast claims/proofs if syncing
- Spread out claims/proofs between non-session blocks
- Added max claim age configuration for proof submission 
- Reorganized non-consensus breaking code in Relay/Merkle Verify for efficiency before reads from state
- Configuration to remove ABCILogs
- Fixed (pseudo) memory leak in Tendermints RecvPacketMsg()
- Sessions only store addresses and not entire structs
- Only load bare minimum for relay processing
- Add order to AccountTxs query & blockTxsQuery RPC
- Reduce AccountTxsQuery & blockTxsQuery memory footprint

The results were quite significant in both speed and initial resource usage. Subsequently, the following BETA releases targeted bug fixes and small improvements that were a result of the drastic breaking changes from the original Beta.

- Nondeterministic hash fix
- Code 89 Fix
- Evidence Seal Fix
- Fixes header.TotalTxs !=
- Fixes header.NumTxs !=
- Updating TM version and Version Number to BETA-0.5.2.3
- Upgraded AccountTxs and BlockTxs to use ReducedTxSearch
- Implemented Reduced TxSearch in Tendermint

Will all of this, the speed and 'Relay Stability' concerns seem to be solved. However, the 'Memory Leak' was not fixed. Transparently, the team was surprised and unsure on how to proceed with tackling the issue. One thing that was clear, more visibility was needed to solve the issue. With the addition some much needed tooling (see above), the hunt was on for the leak culprit. Here's a taste of the testing the team did to hunt down this issue:

- 72 hour simulations in Docker
- Clean Room Relay Stress Tests in GCP
- Mainnet `Validator `and `Full Node` Simulations
- Snapshot comparisons between different versions
- Upgrade Path (0.5.1-0.5.2) simulations
- And Much Much More XD

With the help of some close partners and community members, memory offenders were checked off the list:

- Moved IAVL from Tendermint to Pocket Core
- Call LazyLoadVersion/Store for all queries and PrevCtx()
- Reduced Tendermint P2P EnsurePeers actions to prevent leak
- Lowered P2P config to far more conservative numbers
- Updated FastSync to default to V1
- Exposed default leveldb options
- Switched to only go-leveldb for leak benchmarking/performance reasons
- Child process to run madvdontneed if not set
- Updated P2P configs
- fixed nil txIndexer bug (Tendermint now sets txindexer and blockstore)
- removed event type and used Tendermint's abci.Event

Finally, in Beta-0.5.2.8, memory seemed to be at a constant rate.

Evidence

IAVL ISSUES

Screen Shot 2020-11-23 at 4 14 21 PM

Memory Bump during a block

Screen Shot 2020-11-27 at 5 15 32 PM

IAVL NODE CLONE

Screen Shot 2020-12-04 at 9 42 12 AM

Append Events

Screen Shot 2020-12-11 at 5 08 55 PM

Tendermint True Bit Indicies

Screen Shot 2020-11-28 at 3 12 50 PM

Multiple GCVIS heap stability at Beta 5.2.7

Screen Shot 2020-12-07 at 3 23 43 PM

Screen Shot 2020-12-11 at 3 53 26 PM

Screen Shot 2020-12-11 at 6 04 52 PM

Evidence of cache growth from mempool

Screen Shot 2020-12-11 at 6 13 15 PM

External Reports from community members

image
image
image

Disclaimer

Though, the memory seems to be both significantly decreased and stabilized, the team is still not convinced the memory growth issue is fully fixed (though not supported with evidence currently). The team expects to dive deeper and provide even more visibility into Tendermint and Pocket Core in future releases.

Don't miss a new pocket-core release

NewReleases is sending notifications on new releases.