A Grails Memory Leak

I’m in the process of building a large database to load-test our application at work. Index usage and execution plans are quite different with many rows than with a few, so I’m creating millions of rows in tables for domain classes that will see a lot of activity to find missing indexes, inefficient queries, etc.

I used the typical strategy for Hibernate batch processing – flush() and clear() the Session periodically to push the data to the database and free up memory used by domain instances. But even with a 1-gig heap, the process kept running out of memory after several hours, and was running very slowly for the last couple of hours – typically indicative of lots of garbage-collection thrashing.

Profiling the process using YourKit showed that domain instance counts were steadily increasing. Repeatedly running gc() had no effect on these instances, although it clearly ran since memory usage and other object counts dropped (note that calls to gc() can’t force garbage collection, it’s just a request).

The code and domain classes aren’t complicated and didn’t have any obvious couplings that could cause instances to not be garbage collected, and I kept ripping out more and more code to try to find what was causing the problem. Finally I got it down to the smallest code block that illustrated the problem:

for (int i = 0; i < 100000; i++) {
   new Thing(name: "thing_${i}")
   Thread.sleep(50) // to allow time to watch things in the profiler
}

I ran this in the Grails console and with Hibernate 2nd-level caching turned off, both in a transaction and not (to eliminate the transaction as the culprit). Just creating the instances and not saving them – in this code Hibernate isn’t involved at all – triggered the problem.

So I described the issue on the Grails User mailing list and after an interesting discussion Graeme mentioned that it had to do with the validation Errors objects. He suggested setting errors to null after the instances were no longer used, i.e

for (int i = 0; i < 100000; i++) {
   Thing thing = new Thing(name: "thing_${i}")
   thing.errors = null
   Thread.sleep(50) // to allow time to watch things in the profiler
}

This should work in a web context, but I was working in the console, so I got a NullPointerException. The problem was clear when I looked at the MetaClass code in DomainClassGrailsPlugin:

static final PROPERTY_INSTANCE_MAP =
     new org.codehaus.groovy.grails.support.SoftThreadLocalMap()

metaClass.setErrors = { Errors errors ->
   def request = RCH.getRequestAttributes()?.request
   def storage = request ? request : PROPERTY_INSTANCE_MAP.get()
   def key = "org.codehaus.groovy.grails.ERRORS_${delegate.class.name}_${System.identityHashCode(delegate)}"
   storage[key] = errors
}

metaClass.getErrors = {->
   def request = RCH.getRequestAttributes()?.request
   def errors
   def storage = request ? request : PROPERTY_INSTANCE_MAP.get()
   def key = "org.codehaus.groovy.grails.ERRORS_${delegate.class.name}_${System.identityHashCode(delegate)}"
   errors = storage[key]
   if(!errors) {
      errors =  new BeanPropertyBindingResult(
              delegate, delegate.getClass().getName())
      storage[key] = errors
   }
   errors
}

Instead of storing the Errors in the Request, which in the console doesn’t exist, it stores it in a Thread-local Commons Collections ReferenceMap. This map doesn’t accept nulls, so instead I had to access it directly, i.e.

DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()

This worked – memory usage was fine and I could see that instances were being garbage-collected while running under the profiler.

But this begs the question – why was I in this mess in the first place? Why are there Errors objects involved if I’m not calling validate() or save(), just creating new instances?

ControllersGrailsPlugin wires up a DataBindingDynamicConstructor that takes a Map to allow creating domain instances like this:

def foo = new Foo(prop1: 123, prop2: 'foo', prop3: new Date())

This dynamic constructor copies the map data using a GrailsDataBinder into the new instance in DataBindingUtils.bindObjectToInstance(), and at the end calls setErrors() passing the BindingResult (which implements Errors) generated from copying the data. These are stored in the Map (or as a Request attribute if running in a web context) and this ended up being the reference that kept the instances from being garbage collection candidates even though they were no longer in scope in the loop.

So, mystery solved. In the end this is all academic for me since I needed to get things working so I rewrote the data population code using JDBC (and Java) and it runs quite a bit faster than the equivalent Grails/Groovy/GORM code. This isn’t usually an issue for small web requests but for a process that takes many hours it definitely makes a difference.

15 Responses to “A Grails Memory Leak”

  1. Jack says:

    Great write up on this, always good to see the solution illustrated so clearly.

  2. Trails says:

    I’m running into a memory leak on my grails app.

    I have a pretty substantial amount of data processing, including domain object instantiation, running inside a quartz job. If things are being persisted indefinitely in thread local, this is probably the cause.

    Very informative, thanks for the post.

  3. Ted Naleid says:

    Thanks Burt, this post was a big help in figuring out what was going on with some batch loading I’m doing through GORM. I wish I had found it a couple of days ago :) .

  4. [...] reading through a couple of very helpful posts by Burt Beckwith, and reading through the Hibernate chapter on batch processing, we have all [...]

  5. Mark O'Brien says:

    Hi Burt,

    Where does this piece of code:
    DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()

    need to go…is it related to every domain class access?

    Thanks
    M

  6. Burt says:

    Mark -
    Typically when doing batch imports in Hibernate you flush and clear the session regularly, e.g. every 25 or 100 inserts or so. I put the call to clear PROPERTY_INSTANCE_MAP in there, e.g.:

    for (int i = 0; i < count; i++) {
      new Thing(...).save()
      if (i > 0 && i % BATCH_SIZE == 0) {
        session.flush()
        session.clear()
        DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()
      }
    }

    but you could also call it after every save() since as long as that’s successful you no longer need the associated errors.

    • eduardo says:

      I’m trying to implement this solution but the code is throwing an exception as below.

      My domain is:

      class RealEstateLead extends Lead{

      Integer bedrooms = 0
      Integer minArea
      Integer maxPrice
      Boolean finance = null
      Boolean readyToLive = null
      Boolean underConstruction = null
      Set district = new HashSet()
      FamilyIncomeRange familyIncomeRange
      BigDecimal downPayment

      /* 1-Venda 2-Aluguel 3-Aluguel por Temporada 4-Aluguel de quartos*/
      Integer negotiationTypeId
      DealType negotiationType

      Integer realEstateTypeId
      RealEstateType realEstateType

      static hasMany = [district: Region]

      static embedded = ['familyIncomeRange']

      static transients = [
      'negotiationType',
      'realEstateType'
      ]

      static mapping = {
      district lazy: false, cache: true, fetch: ‘join’
      }

      The batch process just update a group of objects that matches some criterias.
      Is there a way to avoid the relationships just for this update?
      Because I just need to update the status field.

      leadsToNotify.eachWithIndex { lead, index ->
      lead.status = LeadStatus.EXPIRED.value
      lead.save(flush:true)

      Message: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]; nested exception is org.hibernate.NonUniqueObjectException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]
      Line | Method
      ->> 199 | run in org.quartz.core.JobRunShell
      - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - –
      ^ 546 | run in org.quartz.simpl.SimpleThreadPool$WorkerThread
      Caused by HibernateSystemException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]; nested exception is org.hibernate.NonUniqueObjectException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]
      ->> 78 | doCall in br.com.fisgo.ActiveExpirationService$_executeList_closure1
      - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - –
      | 70 | executeList in br.com.fisgo.ActiveExpirationService
      | 45 | executeAll in ”
      | 25 | execute in br.com.fisgo.jobs.FindLeadsToExpireJob
      | 199 | run in org.quartz.core.JobRunShell
      ^ 546 | run in org.quartz.simpl.SimpleThreadPool$WorkerThread
      Caused by NonUniqueObjectException: a different object with the same identifier value was already associated with the session: [br.com.fisgo.Region#45179]
      ->> 78 | doCall in br.com.fisgo.ActiveExpirationService$_executeList_closure1
      - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - –
      | 70 | executeList in br.com.fisgo.ActiveExpirationService
      | 45 | executeAll in ”
      | 25 | execute in br.com.fisgo.jobs.FindLeadsToExpireJob
      | 199 | run in org.quartz.core.JobRunShell
      ^ 546 | run in org.quartz.simpl.SimpleThreadPool$WorkerThread

  7. [...] This post was mentioned on Twitter by Ryuichi Ohtaki, ふむふむ. ふむふむ said: RT rohtaki: A Grails Memory Leak http://bit.ly/8Mbpwy なるほど。これでバッチ処理でメモリーリークするわけだ。 [...]

  8. You (and Graeme) saved my day. Thanks !

  9. Bala says:

    Hi Burt

    In grails 1.3.6 Is the call to DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear() still needed when doing batch imports in Hibernate?

    Thanks
    Bala

  10. Steve Ronderos says:

    Hi Burt,

    First of all, great post. This was really helpful during my current project.

    I did run into a small issue when applying this technique in my Grails 1.3.7 application. As you had originally reported, the objects I create during a batch process were not getting cleaned up. It appears to be related to the “error” property on the domain classes.

    I tried adding DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear(), but it did not help clean out the objects. I checked out the code, and in DomainClassGrailsPlugin around line 118 I saw that Grails is now putting these references to the domain objects in RequestContextHolder.getRequestAttributes().

    I added to the same batch cleanup step RequestContextHolder.resetRequestAttributes(). It seems to have done the job. I can’t vouch for the complete correctness, that is something I’ll leave to the experts, but for now I’m seeing the memory behavior I expect.

    It looks like this has been cleaned up in 2.0 (http://jira.grails.org/browse/GRAILS-6828) so I’m probably going to limp along on the hack I mentioned above until I can upgrade.

    Steve

  11. Mick says:

    Hi Steve

    how did you manage to do RequestContextHolder.resetRequestAttributes()

    because i get org.codehaus.groovy.grails.exceptions.GrailsRuntimeException: java.lang.IllegalStateException: No thread-bound request found

  12. Randy says:

    RequestContextHolder.resetRequestAttributes() worked for me. Back to a nice saw tooth patern.

  13. Randy says:

    I did squirrel away the current attributes..

    def ra = RequestContextHolder.currentRequestAttributes()
    batchService.longRunningStuff() // RequestContextHolder.resetRequestAttributes() called in here
    RequestContextHolder.setRequestAttributes(ra)

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.