github ArweaveTeam/arweave N.2.9.5-alpha3
Release 2.9.5-alpha3

latest releases: N.2.9.5-alpha5, N.2.9.5-alpha4, list...
pre-release3 months ago

This is an alpha update and may not be ready for production use. This software was prepared by the Digital History Association, in cooperation from the wider Arweave ecosystem.

This release includes several bug fixes. It passes all automated tests and has undergone a base level of internal testing, but is not considered production ready. We only recommend upgrading if you believe one of the listed bug fixes will improve your mining experience.

Support for repack-in-place from the replica.2.9 format

This release introduces support for repack-in-place from replica.2.9 to unpacked or to a different replica.2.9 address. In addition we've made several performance improvements and fixed a number of edge case bugs which may previously have caused some chunks to be skipped by the repack process.

Performance

Due to how replica.2.9 chunks are processed, the parameters for tuning the repack-in-place performance have changed. There are 4 main considerations:

  • Repack footprint size: replica.2.9 chunks are grouped in footprints of chunks. A full footprint is 1024 chunks distributed evenly across a partition.
  • Repack batch size: The repack-in-place process reads some number of chunks, repacks them, and then writes them back to disk. The batch size controls how many contiguous chunks are read at once. Previously a batch size of 10 would mean that 10 chunks would be read, repacked, and written. However in order to handle replica.2.9 data efficiently, a batch size indicates the number of footprints to process at once. So a batch size of 10 means that 10 footprints will be read, repacked, and written. Since a full footprint is 1024 chunks, the amount of memory required to process a batch size of 10 is now 10,240 chunks or roughly 2.5 GiB.
  • Available RAM: The footprint size and batch size drive how much RAM is required by the repack in place process. And if you're repacking multiple partitions at once, the RAM requirements can grow quickly.
  • Disk IO: If you determine that disk IO is your bottleneck, you'd want to increase the batch size as much as you can as reading contiguous chunks are generally much faster than reading non-contiguous chunks.
  • CPU: However in some cases you may find that CPU is your bottleneck - this can happen when repacking from a legacy format like spora_2_6, or can happen when repacking many partitions between 2 replica.2.9 addresses. The saving grace here is that if CPU is your bottleneck, you can reduce your batch size or footprint size to ease off on your memory utilization.

To control all these factors, repack-in-place has 2 config options:

  • repack_batch_size: controls the batch size - i.e. the number of footprints processed at once
  • repack_cache_size_mb: sets the total amount of memory to allocate to the repack-in-place process per partition. So if you set repack_cache_size_mb to 2000 and are repacking 4 partitions, you can expect the repack-in-place process to consume roughly 8 GiB of memory. Note: the node will automatically set the footprint size based on your configured batch and cache sizes - this typically means that it will reduce the footprint size as much as needed. A smaller footprint size will increase your CPU load as it will result in your node generating the same entropy multiple times. For example, if your footprint size is 256 the node will need to generate teh same entropy 4 times in order to process all 1024 chunks in the full footprint.

Debugging

This release also includes a new option on the data-doctor inspect tool that may help with debugging packing issues.

/bin/data-doctor inspect bitmap <data_dir> <storage_module>

Example: /bin/data-doctor inspect bitmap /opt/data 36,En2eqsVJARnTVOSh723PBXAKGmKgrGSjQ2YIGwE_ZRI.replica.2.9

Will generate a bitmap where every pixel represents the packing state of a specific chunk. The bitmap is laid out so that each vertical column of pixels is a complete entropy footprint. Here is an example of bitmap:

bitmap_storage_module_5_En2eqsVJARnTVOSh723PBXAKGmKgrGSjQ2YIGwE_ZRI replica 2 9

This bitmap shows the state of one node's partition 5 that has been repacked to replica.2.9. The green pixels are chunks that are in the expected replica.2.9 format, the black pixels are chunks that are missing from the miner's dataset, and the pink pixels are chunks that are too small to be packed (prior to partition ~9, users were allowed to pay for chunks that were smaller than 256KiB - these chunks are stored unpacked and can't be packed).

New prometheus metrics

  • ar_mempool_add_tx_duration_milliseconds: The duration in milliseconds it took to add a transaction to the mempool.
  • reverify_mempool_chunk_duration_milliseconds: The duration in milliseconds it took to reverify a chunk of transactions in the mempool.
  • drop_txs_duration_milliseconds: The duration in milliseconds it took to drop a chunk of transactions from the mempool
  • del_from_propagation_queue_duration_milliseconds: The duration in milliseconds it took to remove a transaction from the propagation queue after it was emitted to peers.
  • chunk_storage_sync_record_check_duration_milliseconds: The time in milliseconds it took to check the fetched chunk range is actually registered by the chunk storage.
  • fixed_broken_chunk_storage_records: The number of fixed broken chunk storage records detected when reading a range of chunks.
  • mining_solution: replaced the mining_solution_failure and mining_solution_total with a single metric, using labels to differentiate the mining solution state.
  • chunks_read: The counter is incremented every time a chunk is read from chunk_storage
  • chunk_read_rate_bytes_per_second: The rate, in bytes per second, at which chunks are read from storage. The type label can be 'raw' or 'repack'.
  • chunk_write_rate_bytes_per_second: The rate, in bytes per second, at which chunks are written to storage.
  • repack_chunk_states: The count of chunks in each state. 'type' can be 'cache' or 'queue'.
  • replica_2_9_entropy_generated: The number of bytes of replica.2.9 entropy generated.

Bug fixes and improvements

  • Several updates to the mining cache logic. These changes address a number of edge case performance and memory bloat issues that can occur while mining.
    • Guidance on setting the mining_cache_size_mb config: for now you can set it to 100x the number of partitions you are mining against. So if you are mining against 64 partitions on your node you would set it to 6400.
  • Improve the transaction validation performance, this should reduce the frequency of "desyncs". I.e. nodes should now be able to handle a higher network transaction volume without stalling
    • Do not delay ready_for_mining on validator nodes
    • Make sure identical tx-status pairs do not cause extra mempool updates
    • Cache the owner address once computed for every TX
  • Reduce the time it takes for a node to join the network:
    • Do not re-download local blocks on join
    • Do not re-write written txs on join
    • Reduce per peer retry budget on join 10 -> 5
  • Fix edge case that could occasionally cause a mining pool to reject a replica.2.9 solution.
  • Fix edge case crash that occurred when a coordinated miner timed out while fetching partitions from peers
  • Fix bug where storage module crossing weave end may cause syncing stall
  • Fix bug where crash during peer interval collection may cause syncing stall
  • Fix bug where we may miss VDF sessions when setting disable vdf_server_pull
  • Fix race condition where we may not detect double-signing
  • Optionally fix broken chunk storage records on the fly
    • Set enable fix_broken_chunk_storage_record to turn the feature on.

Full Changelog: N.2.9.5-alpha2...N.2.9.5-alpha3

Community involvement

A huge thank you to all the Mining community members who contributed to this release by identifying and investigating bugs, sharing debug logs and node metrics, and providing guidance on performance tuning!

Discord users (alphabetical order):

  • BerryCZ
  • bigbang
  • BloodHunter
  • Butcher_
  • core_1_
  • doesn't stay up late
  • dzeto
  • edzo
  • Evalcast
  • EvM
  • grumpy.003
  • Iba Shinu
  • JamsJun
  • jimmyjoe7768
  • lawso2517
  • MaSTeRMinD
  • metagravity
  • Qwinn
  • radion_nizametdinov
  • RedMOoN
  • sam
  • smash
  • sumimi
  • tashilo
  • Vidiot
  • wybiacx

Don't miss a new arweave release

NewReleases is sending notifications on new releases.