A Grails Memory Leak

I’m in the process of building a large database to load-test our application at work. Index usage and execution plans are quite different with many rows than with a few, so I’m creating millions of rows in tables for domain classes that will see a lot of activity to find missing indexes, inefficient queries, etc.

I used the typical strategy for Hibernate batch processing – flush() and clear() the Session periodically to push the data to the database and free up memory used by domain instances. But even with a 1-gig heap, the process kept running out of memory after several hours, and was running very slowly for the last couple of hours – typically indicative of lots of garbage-collection thrashing.

Profiling the process using YourKit showed that domain instance counts were steadily increasing. Repeatedly running gc() had no effect on these instances, although it clearly ran since memory usage and other object counts dropped (note that calls to gc() can’t force garbage collection, it’s just a request).

The code and domain classes aren’t complicated and didn’t have any obvious couplings that could cause instances to not be garbage collected, and I kept ripping out more and more code to try to find what was causing the problem. Finally I got it down to the smallest code block that illustrated the problem:

for (int i = 0; i < 100000; i++) {
   new Thing(name: "thing_${i}")
   Thread.sleep(50) // to allow time to watch things in the profiler
}

I ran this in the Grails console and with Hibernate 2nd-level caching turned off, both in a transaction and not (to eliminate the transaction as the culprit). Just creating the instances and not saving them – in this code Hibernate isn’t involved at all – triggered the problem.

So I described the issue on the Grails User mailing list and after an interesting discussion Graeme mentioned that it had to do with the validation Errors objects. He suggested setting errors to null after the instances were no longer used, i.e

for (int i = 0; i < 100000; i++) {
   Thing thing = new Thing(name: "thing_${i}")
   thing.errors = null
   Thread.sleep(50) // to allow time to watch things in the profiler
}

This should work in a web context, but I was working in the console, so I got a NullPointerException. The problem was clear when I looked at the MetaClass code in DomainClassGrailsPlugin:

static final PROPERTY_INSTANCE_MAP =
     new org.codehaus.groovy.grails.support.SoftThreadLocalMap()

metaClass.setErrors = { Errors errors ->
   def request = RCH.getRequestAttributes()?.request
   def storage = request ? request : PROPERTY_INSTANCE_MAP.get()
   def key = "org.codehaus.groovy.grails.ERRORS_${delegate.class.name}_${System.identityHashCode(delegate)}"
   storage[key] = errors
}

metaClass.getErrors = {->
   def request = RCH.getRequestAttributes()?.request
   def errors
   def storage = request ? request : PROPERTY_INSTANCE_MAP.get()
   def key = "org.codehaus.groovy.grails.ERRORS_${delegate.class.name}_${System.identityHashCode(delegate)}"
   errors = storage[key]
   if(!errors) {
      errors =  new BeanPropertyBindingResult(
              delegate, delegate.getClass().getName())
      storage[key] = errors
   }
   errors
}

Instead of storing the Errors in the Request, which in the console doesn’t exist, it stores it in a Thread-local Commons Collections ReferenceMap. This map doesn’t accept nulls, so instead I had to access it directly, i.e.

DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()

This worked – memory usage was fine and I could see that instances were being garbage-collected while running under the profiler.

But this begs the question – why was I in this mess in the first place? Why are there Errors objects involved if I’m not calling validate() or save(), just creating new instances?

ControllersGrailsPlugin wires up a DataBindingDynamicConstructor that takes a Map to allow creating domain instances like this:

def foo = new Foo(prop1: 123, prop2: 'foo', prop3: new Date())

This dynamic constructor copies the map data using a GrailsDataBinder into the new instance in DataBindingUtils.bindObjectToInstance(), and at the end calls setErrors() passing the BindingResult (which implements Errors) generated from copying the data. These are stored in the Map (or as a Request attribute if running in a web context) and this ended up being the reference that kept the instances from being garbage collection candidates even though they were no longer in scope in the loop.

So, mystery solved. In the end this is all academic for me since I needed to get things working so I rewrote the data population code using JDBC (and Java) and it runs quite a bit faster than the equivalent Grails/Groovy/GORM code. This isn’t usually an issue for small web requests but for a process that takes many hours it definitely makes a difference.

This entry was posted on Sunday, September 07th, 2008 at 8:58pm and is filed under grails, hibernate, java. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

15 Responses to “A Grails Memory Leak”

Jack says:

September 23, 2008 at 9:58 pm

Great write up on this, always good to see the solution illustrated so clearly.
Trails says:

July 25, 2009 at 11:18 am

I’m running into a memory leak on my grails app.

I have a pretty substantial amount of data processing, including domain object instantiation, running inside a quartz job. If things are being persisted indefinitely in thread local, this is probably the cause.

Very informative, thanks for the post.
Ted Naleid says:

September 12, 2009 at 3:08 pm

Thanks Burt, this post was a big help in figuring out what was going on with some batch loading I’m doing through GORM. I wish I had found it a couple of days ago :).
Ted Naleid » Batch Import Performance with Grails and MySQL says:

October 1, 2009 at 12:54 am

[…] reading through a couple of very helpful posts by Burt Beckwith, and reading through the Hibernate chapter on batch processing, we have all […]
Mark O'Brien says:

November 6, 2009 at 8:06 am

Hi Burt,

Where does this piece of code:
DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()

need to go…is it related to every domain class access?

Thanks
M
Burt says:

November 6, 2009 at 2:09 pm

Mark –
Typically when doing batch imports in Hibernate you flush and clear the session regularly, e.g. every 25 or 100 inserts or so. I put the call to clear PROPERTY_INSTANCE_MAP in there, e.g.:
```
for (int i = 0; i < count; i++) {
  new Thing(...).save()
  if (i > 0 && i % BATCH_SIZE == 0) {
    session.flush()
    session.clear()
    DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()
  }
}
```
but you could also call it after every save() since as long as that’s successful you no longer need the associated errors.
- eduardo says:
  
  July 6, 2012 at 8:39 am
  
  I’m trying to implement this solution but the code is throwing an exception as below.
  
  My domain is:
  
  class RealEstateLead extends Lead{
  
  Integer bedrooms = 0
  Integer minArea
  Integer maxPrice
  Boolean finance = null
  Boolean readyToLive = null
  Boolean underConstruction = null
  Set district = new HashSet()
  FamilyIncomeRange familyIncomeRange
  BigDecimal downPayment
  
  /* 1-Venda 2-Aluguel 3-Aluguel por Temporada 4-Aluguel de quartos*/
  Integer negotiationTypeId
  DealType negotiationType
  
  Integer realEstateTypeId
  RealEstateType realEstateType
  
  static hasMany = [district: Region]
  
  static embedded = [‘familyIncomeRange’]
  
  static transients = [
  ‘negotiationType’,
  ‘realEstateType’
  ]
  
  static mapping = {
  district lazy: false, cache: true, fetch: ‘join’
  }
  
  The batch process just update a group of objects that matches some criterias.
  Is there a way to avoid the relationships just for this update?
  Because I just need to update the status field.
  
  leadsToNotify.eachWithIndex { lead, index ->
  lead.status = LeadStatus.EXPIRED.value
  lead.save(flush:true)
  
  Message: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]; nested exception is org.hibernate.NonUniqueObjectException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]
  Line | Method
  ->> 199 | run in org.quartz.core.JobRunShell
  – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
  ^ 546 | run in org.quartz.simpl.SimpleThreadPool$WorkerThread
  Caused by HibernateSystemException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]; nested exception is org.hibernate.NonUniqueObjectException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]
  ->> 78 | doCall in br.com.fisgo.ActiveExpirationService$_executeList_closure1
  – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
  | 70 | executeList in br.com.fisgo.ActiveExpirationService
  | 45 | executeAll in ”
  | 25 | execute in br.com.fisgo.jobs.FindLeadsToExpireJob
  | 199 | run in org.quartz.core.JobRunShell
  ^ 546 | run in org.quartz.simpl.SimpleThreadPool$WorkerThread
  Caused by NonUniqueObjectException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]
  ->> 78 | doCall in br.com.fisgo.ActiveExpirationService$_executeList_closure1
  – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
  | 70 | executeList in br.com.fisgo.ActiveExpirationService
  | 45 | executeAll in ”
  | 25 | execute in br.com.fisgo.jobs.FindLeadsToExpireJob
  | 199 | run in org.quartz.core.JobRunShell
  ^ 546 | run in org.quartz.simpl.SimpleThreadPool$WorkerThread
Tweets that mention An Army of Solipsists -- Topsy.com says:

January 20, 2010 at 8:20 pm

[…] This post was mentioned on Twitter by Ryuichi Ohtaki, ふむふむ. ふむふむ said: RT rohtaki: A Grails Memory Leak http://bit.ly/8Mbpwy なるほど。これでバッチ処理でメモリーリークするわけだ。 […]
Cédric Champeau says:

October 26, 2010 at 5:16 am

You (and Graeme) saved my day. Thanks !
Bala says:

December 23, 2010 at 6:55 pm

Hi Burt

In grails 1.3.6 Is the call to DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear() still needed when doing batch imports in Hibernate?

Thanks
Bala
Steve Ronderos says:

December 21, 2011 at 1:38 am

Hi Burt,

First of all, great post. This was really helpful during my current project.

I did run into a small issue when applying this technique in my Grails 1.3.7 application. As you had originally reported, the objects I create during a batch process were not getting cleaned up. It appears to be related to the “error” property on the domain classes.

I tried adding DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear(), but it did not help clean out the objects. I checked out the code, and in DomainClassGrailsPlugin around line 118 I saw that Grails is now putting these references to the domain objects in RequestContextHolder.getRequestAttributes().

I added to the same batch cleanup step RequestContextHolder.resetRequestAttributes(). It seems to have done the job. I can’t vouch for the complete correctness, that is something I’ll leave to the experts, but for now I’m seeing the memory behavior I expect.

It looks like this has been cleaned up in 2.0 (http://jira.grails.org/browse/GRAILS-6828) so I’m probably going to limp along on the hack I mentioned above until I can upgrade.

Steve
Mick says:

March 23, 2012 at 3:55 am

Hi Steve

how did you manage to do RequestContextHolder.resetRequestAttributes()

because i get org.codehaus.groovy.grails.exceptions.GrailsRuntimeException: java.lang.IllegalStateException: No thread-bound request found
Randy says:

April 4, 2012 at 10:39 pm

RequestContextHolder.resetRequestAttributes() worked for me. Back to a nice saw tooth patern.
Randy says:

April 4, 2012 at 10:43 pm

I did squirrel away the current attributes..

def ra = RequestContextHolder.currentRequestAttributes()
batchService.longRunningStuff() // RequestContextHolder.resetRequestAttributes() called in here
RequestContextHolder.setRequestAttributes(ra)
Bulk Insert With Grails « The Geek Credential says:

May 25, 2012 at 2:30 pm

[…] Burt Beckwith […]

A Grails Memory Leak

15 Responses to “A Grails Memory Leak”

Search

Pages

Categories

Archive

Calendar

Blogroll

Meta

April 2024
M	T	W	T	F	S	S
« Dec
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30