A Grails Memory Leak
I’m in the process of building a large database to load-test our application at work. Index usage and execution plans are quite different with many rows than with a few, so I’m creating millions of rows in tables for domain classes that will see a lot of activity to find missing indexes, inefficient queries, etc.
I used the typical strategy for Hibernate batch processing –
flush()
and clear()
the Session
periodically to push the data to the database and free up memory used by domain instances. But even with a 1-gig heap, the process kept running out of memory after several hours, and was running very slowly for the last couple of hours – typically indicative of lots of garbage-collection thrashing.
Profiling the process using YourKit showed that domain instance counts were steadily increasing. Repeatedly running
gc()
had no effect on these instances, although it clearly ran since memory usage and other object counts dropped (note that calls to gc()
can’t force garbage collection, it’s just a request).
The code and domain classes aren’t complicated and didn’t have any obvious couplings that could cause instances to not be garbage collected, and I kept ripping out more and more code to try to find what was causing the problem. Finally I got it down to the smallest code block that illustrated the problem:
for (int i = 0; i < 100000; i++) { new Thing(name: "thing_${i}") Thread.sleep(50) // to allow time to watch things in the profiler }
I ran this in the Grails console and with Hibernate 2nd-level caching turned off, both in a transaction and not (to eliminate the transaction as the culprit). Just creating the instances and not saving them – in this code Hibernate isn’t involved at all – triggered the problem.
So I described the issue on the Grails User mailing list and after an interesting discussion Graeme mentioned that it had to do with the validation
Errors
objects. He suggested setting errors to null
after the instances were no longer used, i.e
for (int i = 0; i < 100000; i++) { Thing thing = new Thing(name: "thing_${i}") thing.errors = null Thread.sleep(50) // to allow time to watch things in the profiler }
This should work in a web context, but I was working in the console, so I got a NullPointerException
. The problem was clear when I looked at the MetaClass
code in DomainClassGrailsPlugin
:
static final PROPERTY_INSTANCE_MAP = new org.codehaus.groovy.grails.support.SoftThreadLocalMap() metaClass.setErrors = { Errors errors -> def request = RCH.getRequestAttributes()?.request def storage = request ? request : PROPERTY_INSTANCE_MAP.get() def key = "org.codehaus.groovy.grails.ERRORS_${delegate.class.name}_${System.identityHashCode(delegate)}" storage[key] = errors } metaClass.getErrors = {-> def request = RCH.getRequestAttributes()?.request def errors def storage = request ? request : PROPERTY_INSTANCE_MAP.get() def key = "org.codehaus.groovy.grails.ERRORS_${delegate.class.name}_${System.identityHashCode(delegate)}" errors = storage[key] if(!errors) { errors = new BeanPropertyBindingResult( delegate, delegate.getClass().getName()) storage[key] = errors } errors }
Instead of storing the Errors
in the Request
, which in the console doesn’t exist, it stores it in a Thread-local Commons Collections
ReferenceMap
. This map doesn’t accept null
s, so instead I had to access it directly, i.e.
DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()
This worked – memory usage was fine and I could see that instances were being garbage-collected while running under the profiler.
But this begs the question – why was I in this mess in the first place? Why are there Errors
objects involved if I’m not calling validate()
or save()
, just creating new instances?
ControllersGrailsPlugin
wires up a DataBindingDynamicConstructor
that takes a Map
to allow creating domain instances like this:
def foo = new Foo(prop1: 123, prop2: 'foo', prop3: new Date())
This dynamic constructor copies the map data using a GrailsDataBinder
into the new instance in DataBindingUtils.bindObjectToInstance()
, and at the end calls setErrors()
passing the BindingResult
(which implements Errors
) generated from copying the data. These are stored in the Map
(or as a Request
attribute if running in a web context) and this ended up being the reference that kept the instances from being garbage collection candidates even though they were no longer in scope in the loop.
So, mystery solved. In the end this is all academic for me since I needed to get things working so I rewrote the data population code using JDBC (and Java) and it runs quite a bit faster than the equivalent Grails/Groovy/GORM code. This isn’t usually an issue for small web requests but for a process that takes many hours it definitely makes a difference.
Great write up on this, always good to see the solution illustrated so clearly.
I’m running into a memory leak on my grails app.
I have a pretty substantial amount of data processing, including domain object instantiation, running inside a quartz job. If things are being persisted indefinitely in thread local, this is probably the cause.
Very informative, thanks for the post.
Thanks Burt, this post was a big help in figuring out what was going on with some batch loading I’m doing through GORM. I wish I had found it a couple of days ago :).
[…] reading through a couple of very helpful posts by Burt Beckwith, and reading through the Hibernate chapter on batch processing, we have all […]
Hi Burt,
Where does this piece of code:
DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()
need to go…is it related to every domain class access?
Thanks
M
Mark –
Typically when doing batch imports in Hibernate you flush and clear the session regularly, e.g. every 25 or 100 inserts or so. I put the call to clear PROPERTY_INSTANCE_MAP in there, e.g.:
but you could also call it after every save() since as long as that’s successful you no longer need the associated errors.
I’m trying to implement this solution but the code is throwing an exception as below.
My domain is:
class RealEstateLead extends Lead{
Integer bedrooms = 0
Integer minArea
Integer maxPrice
Boolean finance = null
Boolean readyToLive = null
Boolean underConstruction = null
Set district = new HashSet()
FamilyIncomeRange familyIncomeRange
BigDecimal downPayment
/* 1-Venda 2-Aluguel 3-Aluguel por Temporada 4-Aluguel de quartos*/
Integer negotiationTypeId
DealType negotiationType
Integer realEstateTypeId
RealEstateType realEstateType
static hasMany = [district: Region]
static embedded = [‘familyIncomeRange’]
static transients = [
‘negotiationType’,
‘realEstateType’
]
static mapping = {
district lazy: false, cache: true, fetch: ‘join’
}
The batch process just update a group of objects that matches some criterias.
Is there a way to avoid the relationships just for this update?
Because I just need to update the status field.
leadsToNotify.eachWithIndex { lead, index ->
lead.status = LeadStatus.EXPIRED.value
lead.save(flush:true)
Message: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]; nested exception is org.hibernate.NonUniqueObjectException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]
Line | Method
->> 199 | run in org.quartz.core.JobRunShell
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
^ 546 | run in org.quartz.simpl.SimpleThreadPool$WorkerThread
Caused by HibernateSystemException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]; nested exception is org.hibernate.NonUniqueObjectException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]
->> 78 | doCall in br.com.fisgo.ActiveExpirationService$_executeList_closure1
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
| 70 | executeList in br.com.fisgo.ActiveExpirationService
| 45 | executeAll in ”
| 25 | execute in br.com.fisgo.jobs.FindLeadsToExpireJob
| 199 | run in org.quartz.core.JobRunShell
^ 546 | run in org.quartz.simpl.SimpleThreadPool$WorkerThread
Caused by NonUniqueObjectException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]
->> 78 | doCall in br.com.fisgo.ActiveExpirationService$_executeList_closure1
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
| 70 | executeList in br.com.fisgo.ActiveExpirationService
| 45 | executeAll in ”
| 25 | execute in br.com.fisgo.jobs.FindLeadsToExpireJob
| 199 | run in org.quartz.core.JobRunShell
^ 546 | run in org.quartz.simpl.SimpleThreadPool$WorkerThread
[…] This post was mentioned on Twitter by Ryuichi Ohtaki, ふむふむ. ふむふむ said: RT rohtaki: A Grails Memory Leak http://bit.ly/8Mbpwy なるほど。これでバッチ処理でメモリーリークするわけだ。 […]
You (and Graeme) saved my day. Thanks !
Hi Burt
In grails 1.3.6 Is the call to DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear() still needed when doing batch imports in Hibernate?
Thanks
Bala
Hi Burt,
First of all, great post. This was really helpful during my current project.
I did run into a small issue when applying this technique in my Grails 1.3.7 application. As you had originally reported, the objects I create during a batch process were not getting cleaned up. It appears to be related to the “error” property on the domain classes.
I tried adding DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear(), but it did not help clean out the objects. I checked out the code, and in DomainClassGrailsPlugin around line 118 I saw that Grails is now putting these references to the domain objects in RequestContextHolder.getRequestAttributes().
I added to the same batch cleanup step RequestContextHolder.resetRequestAttributes(). It seems to have done the job. I can’t vouch for the complete correctness, that is something I’ll leave to the experts, but for now I’m seeing the memory behavior I expect.
It looks like this has been cleaned up in 2.0 (http://jira.grails.org/browse/GRAILS-6828) so I’m probably going to limp along on the hack I mentioned above until I can upgrade.
Steve
Hi Steve
how did you manage to do RequestContextHolder.resetRequestAttributes()
because i get org.codehaus.groovy.grails.exceptions.GrailsRuntimeException: java.lang.IllegalStateException: No thread-bound request found
RequestContextHolder.resetRequestAttributes() worked for me. Back to a nice saw tooth patern.
I did squirrel away the current attributes..
def ra = RequestContextHolder.currentRequestAttributes()
batchService.longRunningStuff() // RequestContextHolder.resetRequestAttributes() called in here
RequestContextHolder.setRequestAttributes(ra)
[…] Burt Beckwith […]