CasperLabs/CasperLabs v0.18.2 on GitHub

Fixes a bug in the initial synchronization where if all nodes failed in a round it reset not where it resumed from after the restart but to Genesis. This was pretty bad if the 1 bootstrap failed, when there was no other peer yet, resetting the syncing to zero. This has been in testing for 10 days.
Fixes a bug where the LFB can pick the wrong choice and get stuck for good. This seems to have happened based on timing, locally most likely with low round exponents, and can happen straight after genesis, or much later. The node affected must have its state wiped clean to start again. The fix has been tested for 6 days.
Changes the number of synchronizations, block downloads and block validations that can be done concurrently to 5, 2 and 2, respectively. These were added to try to avoid the exhaustion of the database connections that seemed to have affected everyone on testnet and LRT. Raising the database connections to 100 brought everything to a crawl. 20 worked I think. Archit raised these tunables to higher values, so set them appropriately before release, maybe 8-10. Previously it allowed up to the number of validators. This has been in testing for 6 days.
Changes the initial synchronization to pull block headers first, then do the scheduling, to try and minimise the time the stream is kept open with a database connection behind it. This has been in testing for 6 days.
Fixes a bug where the LFB can stop advancing, despite restarts with clean slate. This seems to depend on DAG shape and timing of events around era switch blocks, where the voting matrix is initialised empty and then stays that way because there's no more messages for the era, only the next one. It came up locally with fixed voting periods. Also on LRT3 but I don't know what settings that uses. The fix has been in testing for 3 days.
Changes the thread pool type where ongoing synchronizations happen and incoming requests are handled from cached to fork-join, so that we have less pressure on database connections. You can tune the --server-min-parallelism and the --server-parallelism-cpu-multiplier to give it more beans. It made the rejected tasks for the ingress pool go away. This has been tested for 3 days.
Optimizes the fork choice panorama preparation so that it doesn't traverse the j-past-cone of the block to try and find the previous message in an era from a given validator, but rather trusts that if it's not cited by a block then it's because the creator hasn't seen a message (true because we use direct justifications). This is to fix a case when validator A doesn't see messages from validator B, but validator C does, so for C to validate blocks coming from A they spend up to 14 seconds per block restoring the latest message A could have seen from B. That meant some network hiccup could make initial synchronization untenable. This fix has been in testing for 1 day.