A Grails Memory Leak
Sunday, September 07th, 2008I’m in the process of building a large database to load-test our application at work. Index usage and execution plans are quite different with many rows than with a few, so I’m creating millions of rows in tables for domain classes that will see a lot of activity to find missing indexes, inefficient queries, etc.
I used the typical strategy for Hibernate batch processing –
flush()
and clear()
the Session
periodically to push the data to the database and free up memory used by domain instances. But even with a 1-gig heap, the process kept running out of memory after several hours, and was running very slowly for the last couple of hours – typically indicative of lots of garbage-collection thrashing.
Profiling the process using YourKit showed that domain instance counts were steadily increasing. Repeatedly running
gc()
had no effect on these instances, although it clearly ran since memory usage and other object counts dropped (note that calls to gc()
can’t force garbage collection, it’s just a request).
The code and domain classes aren’t complicated and didn’t have any obvious couplings that could cause instances to not be garbage collected, and I kept ripping out more and more code to try to find what was causing the problem. Finally I got it down to the smallest code block that illustrated the problem:
for (int i = 0; i < 100000; i++) { new Thing(name: "thing_${i}") Thread.sleep(50) // to allow time to watch things in the profiler }
I ran this in the Grails console and with Hibernate 2nd-level caching turned off, both in a transaction and not (to eliminate the transaction as the culprit). Just creating the instances and not saving them – in this code Hibernate isn’t involved at all – triggered the problem.
So I described the issue on the Grails User mailing list and after an interesting discussion Graeme mentioned that it had to do with the validation
Errors
objects. He suggested setting errors to null
after the instances were no longer used, i.e
for (int i = 0; i < 100000; i++) { Thing thing = new Thing(name: "thing_${i}") thing.errors = null Thread.sleep(50) // to allow time to watch things in the profiler }
This should work in a web context, but I was working in the console, so I got a NullPointerException
. The problem was clear when I looked at the MetaClass
code in DomainClassGrailsPlugin
:
static final PROPERTY_INSTANCE_MAP = new org.codehaus.groovy.grails.support.SoftThreadLocalMap() metaClass.setErrors = { Errors errors -> def request = RCH.getRequestAttributes()?.request def storage = request ? request : PROPERTY_INSTANCE_MAP.get() def key = "org.codehaus.groovy.grails.ERRORS_${delegate.class.name}_${System.identityHashCode(delegate)}" storage[key] = errors } metaClass.getErrors = {-> def request = RCH.getRequestAttributes()?.request def errors def storage = request ? request : PROPERTY_INSTANCE_MAP.get() def key = "org.codehaus.groovy.grails.ERRORS_${delegate.class.name}_${System.identityHashCode(delegate)}" errors = storage[key] if(!errors) { errors = new BeanPropertyBindingResult( delegate, delegate.getClass().getName()) storage[key] = errors } errors }
Instead of storing the Errors
in the Request
, which in the console doesn’t exist, it stores it in a Thread-local Commons Collections
ReferenceMap
. This map doesn’t accept null
s, so instead I had to access it directly, i.e.
DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()
This worked – memory usage was fine and I could see that instances were being garbage-collected while running under the profiler.
But this begs the question – why was I in this mess in the first place? Why are there Errors
objects involved if I’m not calling validate()
or save()
, just creating new instances?
ControllersGrailsPlugin
wires up a DataBindingDynamicConstructor
that takes a Map
to allow creating domain instances like this:
def foo = new Foo(prop1: 123, prop2: 'foo', prop3: new Date())
This dynamic constructor copies the map data using a GrailsDataBinder
into the new instance in DataBindingUtils.bindObjectToInstance()
, and at the end calls setErrors()
passing the BindingResult
(which implements Errors
) generated from copying the data. These are stored in the Map
(or as a Request
attribute if running in a web context) and this ended up being the reference that kept the instances from being garbage collection candidates even though they were no longer in scope in the loop.
So, mystery solved. In the end this is all academic for me since I needed to get things working so I rewrote the data population code using JDBC (and Java) and it runs quite a bit faster than the equivalent Grails/Groovy/GORM code. This isn’t usually an issue for small web requests but for a process that takes many hours it definitely makes a difference.