ultraq/thymeleaf-layout-dialect 2.1.0 on GitHub

Be less strict with HTML templates that are auto-balanced by Attoparser (usually a result of not knowing which HTML elements cause auto-closing behaviours), instead only using tags that are in the original templates to influence the "model level". While this was a great tool for learning more about the HTML spec when it errors, it is more in line with how Thymeleaf behaves (#138)
Reveal the processed content and layout title values on the layout object (#137)
Huge improvements to the memory profile of the layout dialect (#102, #139)

Details of performance improvements between 2.0 and 2.1

I've uploaded the very basic web app that I was using for all of my memory tests since the first report of a memory leak in 2.0.0. GitHub repo of that project can be found here: https://github.com/ultraq/thymeleaf-layout-dialect-benchmark

It does very little, only using layout dialect features so that other things that would normally be a part of an application don't impact the tests. It does use Spring MVC however, which might be unnecessary overhead, but I have a good amount of faith in the Spring project's robustness such that I don't think it'll affect the results too badly.

That web app is started with the YourKit profiling agent attached, and object allocation profiling is also enabled. The web app is then stress tested with a simple JMeter test plan (included in the repo) that simulates concurrent users and load to exaggerate any problems in the layout dialect. At the end of the test, a forced GC is done and a memory snapshot is taken to see how the app is at rest.

Thymeleaf Layout Dialect 2.0.4

Main takeaways:

The JMeter test took about 3 minutes to complete (started around the 30 second mark), with requests taking an average of 1.674 seconds each
Old generation space at 99MB
35 garbage collections
27 million object allocations
4 seconds spent in GC
Several items taking over 10MB of retained memory (none of them appearing as dominators however, so are potentially GC'able, but don't seem to have been collected)
Majority of the object allocations taking place in the IModelExtensions.findModel closure, which uses a Groovy feature of dynamic metaclass creation

Thymeleaf Layout Dialect 2.1.0

Differences:

The JMeter test took about 1 minute to complete (also started around the 30 second mark), with requests taking an average of 452ms to complete (at least 3x faster)
Old generation space at 22MB (memory footprint 1/5th the size)
21 garbage collections (40% less GCs)
7.1 million object allocations (74% less objects created)
1 second spent in GC (75% less time spent in GC)
Only 1 item taking over 10MB of retained memory (dominator profile looking mostly the same however)
Majority of the object allocations no longer in a Groovy dynamic meta class method, but in one of Thymeleaf's utility projects, unbescape

Changes made and lessons learned

The change that had the biggest impact to the performance profile of the layout dialect was the removal of Groovy's dynamic metaclass creation. Here's a line representative of what that is:

thymeleaf-layout-dialect/Source/nz/net/ultraq/thymeleaf/models/extensions/IModelExtensions.groovy

Line 211 in b9f0000

model.metaClass.startIndex = eventIndex

What that line did was add a property, startIndex, to the object which I would use later on in the dialect to know exactly where a model started. A similar property is added for where the model ended. This was done because, in Thymeleaf 3, it did away with DOM nodes and so it was much harder for dialects to track the things that made up an "element", instead relying on queues of what it calls "events" (text, tags, comments, anything that you write into a template) with no clear demarcation of elements.

This was a huge convenience for the code, but as shown in the object allocation profiling above, something about it incurred a massive cost in memory. This was Groovy under-the-hood code, so I don't know exactly what goes on there to provide programmers this convenience.

So I removed lines like that throughout the layout dialect, and implemented an additional step for calling code to search for the start/end of a model after the model was received. Theoretically this should have been slower because it's an additional O(n) lookup on code that already did an O(n) lookup to retrieve the model in the first place, but practically it beat out the dynamic metaclass allocation.

Lesson learned: be careful/sparing with dynamic metaclasses. The convenience they provide is a boon for programmers, but if in a critical part of the code its benefits may not outweigh the costs.

Notes, caveats, and final words

These numbers are specific to the benchmark that I ran them on, so don't expect to see improvements of a similar scale in your own app. However, this all just means that the layout dialect should now be even less of a use on your own app's memory and CPU profiles, thus allowing you to focus instead on the performance of your app rather than the performance of your libraries.

If you continue to experience performance problems though, feel free to raise an issue but also provide memory profiles if at all possible. I've actually quite enjoyed digging into and fixing up these things as I learn a lot from it in the process.

ultraq/thymeleaf-layout-dialect 2.1.0 Thymeleaf Layout Dialect 2.1.0 on GitHub

Details of performance improvements between 2.0 and 2.1

Thymeleaf Layout Dialect 2.0.4

Thymeleaf Layout Dialect 2.1.0

Changes made and lessons learned

Notes, caveats, and final words

ultraq/thymeleaf-layout-dialect 2.1.0
Thymeleaf Layout Dialect 2.1.0

on GitHub