Archive for the Category 'hibernate'

Grails App-Info plugin

Saturday, July 17th, 2010

Update July 5, 2012: See this blog post for an updated test app and support for Grails 2.0+.

I released a new Grails plugin today, app-info. This plugin exposes a lot of internal information for a Grails application. Most of the code for this plugin dates back a long time and was originally mostly JSPs that I would copy into whatever application I was working on to give myself a view into what’s going on. Over time as I started using Spring and Hibernate I added more pages to display that information, and it now also shows information about Grails too.
(more…)

Delayed SessionFactory Creation in Grails

Monday, January 25th, 2010

The topic of delaying DataSource and SessionFactory creation until some point after startup has come up a few times on the Grails user mailing list so I thought I’d give it a shot. I got it working, but it’s not pretty.

Grails (and Hibernate) will create up to three connections during initialization so the primary focus is to avoid those. In addition the DataSource will pre-instantiate connections, so we’ll delay those as well.
(more…)

Clustering Grails

Saturday, December 19th, 2009

I did a talk on Grails clustering at the Groovy & Grails eXchange 2009 in London last week and wanted to put up the slides and the sample application and clustering scripts. This was the third time I’ve given a version of this talk (first at a Boston Grails meetup and again at SpringOne 2GX) so I’m way overdue getting this up.
(more…)

Boston Grails Meetup GORM Presentation

Thursday, August 13th, 2009

I gave a talk last week at the Boston Grails meetup and wanted to share the slides. There were two code demonstrations in the Grails console so I’ll describe those here.

The general gist of the presentation was that mapped collections in Hibernate (and therefore Grails, since GORM wraps Hibernate) are a potentially risky choice for representing One-to-Many and Many-to-Many relationships.

To see what’s going on I added

loggingSql = true

to DataSource.groovy so we’d see the generated SQL in the console when comparing the standard and modified versions of the code.

First we create a Library and associate 20 (a completely arbitrary number) Visits:

def library = new Library(name: 'the library')
20.times {
   library.addToVisits(new Visit(personName: "person$it"))
   library.save(flush: true)
}

Clear the output window and add one more Visit:

def library = Library.list()[0]
library.addToVisits(new Visit(personName: "new person"))
library.save(flush: true)

and you’ll see SQL similar to

select this_.id as id3_0_, this_.version as version3_0_, this_.name as name3_0_ 
from library this_;

select visits0_.library_id as library3_1_, visits0_.id as id1_, visits0_.id as id2_0_,
   visits0_.version as version2_0_, visits0_.library_id as library3_2_0_,
   visits0_.person_name as person4_2_0_, visits0_.visit_date as visit5_2_0_ 
from visit visits0_  where visits0_.library_id=?;

insert  into visit (id, version, library_id, person_name, visit_date) 
values (null, ?, ?, ?, ?);

call identity();

update library
set version=?, name=?
where id=? and version=?;

which means that it’s first loading the Library, then loading all of its previous Visits (!), then creating the new Visit, and finally bumping the version of the Library since it’s “changed”. Not good – not only do we incur a potentially huge performance hit to add one new instance (once there are a large number already), we also run a high risk of a false optimistic locking failure.

Doing the equivalent work with the updated versions:

def library = new Library(name: 'the library').save()
20.times {
    new Visit(personName: "person$it", library: library).save(flush: true)
}

and

def library = Library.list()[0]
new Visit(personName: "new person", library: library).save(flush: true)

we get SQL that makes a lot more sense:

select this_.id as id2_0_, this_.version as version2_0_, this_.name as name2_0_ 
from library this_;

insert into visit (id, version, library_id, person_name, visit_date) 
values (null, ?, ?, ?, ?);

call identity();

First the Library is loaded, and then the Visit is created. No other Visit instances need to be loaded since we know the new instance is unique, and there’s no need to version the Library.

For the Many-to-Many example, we create a Role and grant it to 20 Users:

Role role = new Role(name: 'ROLE_USER').save()
20.times {
   def user = new User(username: "user$it").save()
   user.addToRoles(role)
   user.save(flush: true)
}

and clear the console and grant the Role to one more User:

def role = Role.list()[0]
def user = new User(username: "new person").save()
user.addToRoles(role)
user.save(flush: true)

and you’ll see SQL similar to

select this_.id as id0_0_, this_.version as version0_0_, this_.name as name0_0_
from role this_;

insert into user (id, version, username) 
values (null, ?, ?);

call identity();

select users0_.role_id as role1_0_, users0_.user_id as user2_0_ 
from user_roles users0_
where users0_.role_id=?;

update role
set version=?, name=?
where id=?  and version=?;

update user 
set version=?, username=? 
where id=? and version=?;

insert into user_roles (user_id, role_id) 
values (?, ?);

So we can see that it first loads the Role, then creates the User, then loads all of the previous Users that have that Role (proxies in this case, but still a lot of instances for no good reason), then it bumps the version for both the Role and the User, and finally adds a row to the join table.

Doing the same work with the updated versions:

Role role = new Role(name: 'ROLE_USER').save()
20.times {
	def user = new User(username: "user$it").save()
	UserRole.create(user, role, true)
}

and

def role = Role.list()[0]
def user = new User(username: "new user").save()
UserRole.create(user, role, true)

we get more reasonable SQL:

select this_.id as id4_0_, this_.version as version4_0_, this_.name as name4_0_ 
from role this_;

insert into user(id, version, username)
values (null, ?, ?);

call identity();

select userrole_.user_id, userrole_.role_id
from user_role userrole_
where userrole_.user_id=? and userrole_.role_id=?;

insert into user_role(user_id, role_id) 
values (?, ?);

where we can see that it loads the Role, creates the User, does a lightweight unique check in the join table, and then inserts the join table row.


Download the presentation here and the sample projects at here and here.


An Alternative Approach for Grails Integration Tests

Tuesday, December 30th, 2008

I like the approach that Grails takes for integration (and unit) testing; it’s similar to the testing strategy that I’ve used in non-Grails apps. Using HSQLDB to create an in-memory test database significantly reduces the time required to run database tests, and Hibernate’s database independence means you don’t need to pollute application code to make it testable.

Thomas Mueller originally wrote Hypersonic SQL but he ended up discontinuing it, and then other developers picked up where he left off, creating HSQLDB. For the past few years Mueller has been developing H2 from scratch and it has benefited from hindsight and lessons learned from Hypersonic and HSQLDB. According to the HSQLDB website there are some cool new features coming in 1.9 and 2.0, but H2 is already significantly more advanced. So I’ve chosen to use H2 here, but it would be pretty much the same with HSQLDB – just change the configuration properties.


There are many options for database testing; the approach Grails takes is to start a transaction for each test and roll it back afterwards. The approach I’d previously used was to rebuild the database between tests. Since the database is in-memory this is very fast and allows what I think is a more natural coding style. Since each test starts with a guaranteed clean database, there’s no need for a single wrapper transaction – I run everything in autocommit mode, similar to the way application code works in the app. If transactional behavior is required, just group multiple statements in a transaction – Grails makes it simple with withTransaction blocks and with transactional Services.

The key to this approach is a base test class that extends GroovyTestCase and which all Domain class test classes (or any test that accesses the database) extend:

import groovy.sql.Sql

import org.apache.log4j.Logger
import org.codehaus.groovy.grails.orm.hibernate.cfg.DefaultGrailsDomainConfiguration
import org.hibernate.cfg.Configuration

import org.hibernate.tool.hbm2ddl.SchemaExport

/**
 * Abstract base class for H2-based integration tests.
 * Rebuilds an in-memory database for each test.
 */
abstract class AbstractIntegrationTestCase extends GroovyTestCase {

  private static Configuration _configuration

  /** Disable Grails transaction around the tests. */
  def transactional = false

  def dataSource
  def grailsApplication
  def messageSource
  def sessionFactory

  /**
   * Replace the base class logger with one that logs using the actual class name.
   */
  protected final Logger log = Logger.getLogger(getClass())

  /**
   * {@inheritDoc}
   * @see junit.framework.TestCase#setUp()
   */
  @Override
  protected void setUp() {
    super.setUp()

    if (!_configuration) {
      // 1-time creation of the configuration

      Properties properties = new Properties()
      properties.setProperty 'hibernate.connection.driver_class', 'org.h2.Driver'
      properties.setProperty 'hibernate.connection.username', 'sa'
      properties.setProperty 'hibernate.connection.password', ''
      properties.setProperty 'hibernate.connection.url', 'jdbc:h2:mem:test'
      properties.setProperty 'hibernate.dialect', 'org.hibernate.dialect.H2Dialect'

      _configuration = new DefaultGrailsDomainConfiguration(
        grailsApplication: grailsApplication,
        properties: properties)
    }

    // rebuild the database before each test.
    new SchemaExport(_configuration).create(false, true)
    loadTestData dataSource

    clearSession()
  }

  protected void loadTestData(dataSource) {
    // override as necessary; by default run GORM create statements,
    // but could also load sql script(s)
  }

  /**
   * Run a JDBC query independent of Hibernate to verify the table count.
   * @param table  the table to query
   * @param where  optional WHERE clause
   * @return  the count
   */
  protected int findTableCount(String table, String where = null)  {

    String sqlSelect = "SELECT COUNT(0) FROM " + table
    if (where) {
      sqlSelect += " WHERE " + where
    }

    Sql.newInstance(dataSource).firstRow(sqlSelect)[0]
  }

  /**
   * Save or create the bean if it validates, and log validation errors otherwise.
   * @param bean  the bean
   * @param flush  if <code>true</code>, flush on save/update
   * @param throwException  if <code>true</code>, throw an exception if
   *    validation fails instead of returning null
   * @return  the saved/updated instance or <code>null</code>
   *    if there was a problem
   */
  protected def saveOrUpdate(bean, flush = false, throwException = true) {
    bean.validate()
    if (bean.hasErrors()) {
      println "problem creating/updating bean: ${bean}"
      def locale = Locale.getDefault()
      bean.errors.each { fieldErrors ->
        fieldErrors.allErrors.each { error ->
          println messageSource.getMessage(error, locale)
        }
      }
      if (throwException) {
        fail 'validation failed in saveOrUpdate'
      }
      return null
    }

    return bean.save(flush: flush)
  }

  protected void clearSession() {
    sessionFactory.currentSession.clear()
  }

  protected def getDomainClass() {
    return grailsApplication.getArtefact('Domain', getDomainClassName())
  }

  protected String getDomainClassName() {
    return getClass().name - 'Tests'
  }
}

It sets transactional = false to keep Grails from starting a transaction around each test. There’s a static variable to ensure that the one-time Configuration creation runs just once for all tests, and it runs SchemaExport.create() to run the drop and create DDL statements to create new empty tables for each test. A protected loadTestData() method is used to populate test data. Individual tests might need specific data but often there’s a core set of data that all tests use – this is good place to populate that shared data and to override as necessary.

Other utility methods include findTableCount() (with an optional WHERE clause) to check table row counts independent of Hibernate, saveOrUpdate() to save or update Domain class instances and log validation errors to help debug data population code issues, and clearSession() to clear the Hibernate Session’s 1st-level cache (to force reloading saved instances).

Here’s some sample test code from a test class:

void testWhatever() {

   Thing.withTransaction { status ->
      saveOrUpdate new Thing(...)
      saveOrUpdate new OtherThing(...)
      ...
   }

   clearSession()

   assertEquals ...
   assertTrue ...
}

but transactions aren’t required:

void testSomethingElse() {

   saveOrUpdate new Thing(...)
   assertEquals 5, Thing.count()
   assertTrue ...
}

For the most part I find that the test code I write using this isn’t much different from what I’d write in a regular Grails app. But it is definitely more flexible, and I find it’s more intuitive.

A Grails Memory Leak

Sunday, September 07th, 2008

I’m in the process of building a large database to load-test our application at work. Index usage and execution plans are quite different with many rows than with a few, so I’m creating millions of rows in tables for domain classes that will see a lot of activity to find missing indexes, inefficient queries, etc.

I used the typical strategy for Hibernate batch processing – flush() and clear() the Session periodically to push the data to the database and free up memory used by domain instances. But even with a 1-gig heap, the process kept running out of memory after several hours, and was running very slowly for the last couple of hours – typically indicative of lots of garbage-collection thrashing.

Profiling the process using YourKit showed that domain instance counts were steadily increasing. Repeatedly running gc() had no effect on these instances, although it clearly ran since memory usage and other object counts dropped (note that calls to gc() can’t force garbage collection, it’s just a request).

The code and domain classes aren’t complicated and didn’t have any obvious couplings that could cause instances to not be garbage collected, and I kept ripping out more and more code to try to find what was causing the problem. Finally I got it down to the smallest code block that illustrated the problem:

for (int i = 0; i < 100000; i++) {
   new Thing(name: "thing_${i}")
   Thread.sleep(50) // to allow time to watch things in the profiler
}

I ran this in the Grails console and with Hibernate 2nd-level caching turned off, both in a transaction and not (to eliminate the transaction as the culprit). Just creating the instances and not saving them – in this code Hibernate isn’t involved at all – triggered the problem.

So I described the issue on the Grails User mailing list and after an interesting discussion Graeme mentioned that it had to do with the validation Errors objects. He suggested setting errors to null after the instances were no longer used, i.e

for (int i = 0; i < 100000; i++) {
   Thing thing = new Thing(name: "thing_${i}")
   thing.errors = null
   Thread.sleep(50) // to allow time to watch things in the profiler
}

This should work in a web context, but I was working in the console, so I got a NullPointerException. The problem was clear when I looked at the MetaClass code in DomainClassGrailsPlugin:

static final PROPERTY_INSTANCE_MAP =
     new org.codehaus.groovy.grails.support.SoftThreadLocalMap()

metaClass.setErrors = { Errors errors ->
   def request = RCH.getRequestAttributes()?.request
   def storage = request ? request : PROPERTY_INSTANCE_MAP.get()
   def key = "org.codehaus.groovy.grails.ERRORS_${delegate.class.name}_${System.identityHashCode(delegate)}"
   storage[key] = errors
}

metaClass.getErrors = {->
   def request = RCH.getRequestAttributes()?.request
   def errors
   def storage = request ? request : PROPERTY_INSTANCE_MAP.get()
   def key = "org.codehaus.groovy.grails.ERRORS_${delegate.class.name}_${System.identityHashCode(delegate)}"
   errors = storage[key]
   if(!errors) {
      errors =  new BeanPropertyBindingResult(
              delegate, delegate.getClass().getName())
      storage[key] = errors
   }
   errors
}

Instead of storing the Errors in the Request, which in the console doesn’t exist, it stores it in a Thread-local Commons Collections ReferenceMap. This map doesn’t accept nulls, so instead I had to access it directly, i.e.

DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()

This worked – memory usage was fine and I could see that instances were being garbage-collected while running under the profiler.

But this begs the question – why was I in this mess in the first place? Why are there Errors objects involved if I’m not calling validate() or save(), just creating new instances?

ControllersGrailsPlugin wires up a DataBindingDynamicConstructor that takes a Map to allow creating domain instances like this:

def foo = new Foo(prop1: 123, prop2: 'foo', prop3: new Date())

This dynamic constructor copies the map data using a GrailsDataBinder into the new instance in DataBindingUtils.bindObjectToInstance(), and at the end calls setErrors() passing the BindingResult (which implements Errors) generated from copying the data. These are stored in the Map (or as a Request attribute if running in a web context) and this ended up being the reference that kept the instances from being garbage collection candidates even though they were no longer in scope in the loop.

So, mystery solved. In the end this is all academic for me since I needed to get things working so I rewrote the data population code using JDBC (and Java) and it runs quite a bit faster than the equivalent Grails/Groovy/GORM code. This isn’t usually an issue for small web requests but for a process that takes many hours it definitely makes a difference.

Using GORM outside of Grails part 2 – Swing

Sunday, September 07th, 2008

In an earlier post I wrote about getting GORM to work outside of Grails. It worked, but wasn’t really usable since it could only execute script files, so its usefulness was pretty limited in a real application. Greg Bond replied on the mailing (here and here) with some great enhancements that allowed it to work without script files. So I fleshed that out some more and now have a working implementation and even a Swing application to demonstrate.

The key was that Greg used grails compile to generate his domain class files. I’d just been using the groovyc Ant task, and when I looked at what the Grails compile script the difference turned out to be that Grails uses a Grails-aware subclass of Groovyc, org.codehaus.groovy.grails.compiler.GrailsCompiler. It allows you to specify a resourcePattern attribute to point at the domain class .groovy files for special treatment.

So now instead of one sample project there’s three. One is the gorm standalone project, which creates a jar (gorm_standalone.jar) containing GormHelper which bootstraps GORM. The other two are the sample application, split into GORM domain classes and the Swing application. The domain class application contains the domain class .groovy files plus DataSource.groovy, and most importantly an Ant script that builds a usable jar (domainclasses.jar). The Swing application uses gorm_standalone.jar and domainclasses.jar as libraries and displays a simple UI showing the results of database queries.

Here’s a quick screen shot:

You can download the sample apps here:

GORM standalone app

Sample app domain class app

Sample Swing app

A Grails Plugin for Multiple DataSources

Thursday, September 04th, 2008

There have been a few requests on the Grails user mailing list about using multiple data sources in a Grails app, i.e. some domain classes use one data source and database and others use another. Grails doesn’t directly support this – there’s only one DataSource and one SessionFactory, and all domain classes use them. But it turns out it’s not that difficult to support this (and it doesn’t involve too many ugly hacks …)


There are some implications of the approach I took. This doesn’t provide XA transactions, 2PC, etc. It’s just a partitioning of classes between two or more datasources. The way it works is to run after the HibernateGrailsPlugin and DomainClassPlugin have done their work. Then it uses a configuration defined in grails-app/conf/Datasources.groovy and creates one or more extra DataSource, SessionFactory, TransactionManager, etc. and re-runs the HibernateGrailsPlugin‘s doWithDynamicMethods closure for the appropriate subset of domain classes. This way when you call a magic GORM method (e.g. list(), get(), findByNameAndDate(), etc.) it will use the correct underlying datasource. Any domain class not defined as using a secondary datasource will use the ‘core’ datasource defined in DataSource.groovy.

Another issue is that all domain classes stay defined in the core datasource/SessionFactory – the existing behavior isn’t changed, other than redefining the metaclass methods to use another datasource. The only effect of this is that if you use dsCreate = ‘create-drop’ or ‘create’ or ‘update’ for the core datasource, all tables will be created in the core database even though some won’t be used.

Datasources DSL

The DSL used in Datasources.groovy is very similar to the format of DataSource.groovy. One difference is that the ‘hibernate’ section is inside the ‘datasource’ section, and there are a few extra attributes.

Name Type Required Description
name String yes datasource name, used as a Spring bean suffix, e.g. ‘ds2′
readOnly boolean no, defaults to false if true, the datasource and corresponding transactional services will be read-only
driverClassName String yes same as in DataSource
url String yes same as in DataSource
username String no same as in DataSource
password String no same as in DataSource
dbCreate String yes same as in DataSource
dialect String or Class yes (no autodetect) same as in DataSource
jndiName String no same as in DataSource
pooled boolean no, defaults to false same as in DataSource
loggingSql boolean no, defaults to false same as in DataSource
logSql boolean no, defaults to false same as in DataSource
environments List<String> no, defaults to [‘development’, ‘test’, ‘production’] list of environments this DataSource should be active in
domainClasses List<String> or List<Class> yes the domain classes that should use this DataSource
services List<String> no short names of the services that should use this DataSource (same as Spring bean without ‘Service’, e.g. ‘user’ for UserService)

See the sample app (link below) for a usage example.

OpenSessionInView

An OpenSessionInViewInterceptor is defined for each datasource, so the features that it provides are available to all domain classes. For example you can load a domain instance and set a property, and it will be detected as dirty and pushed to the database. Also, lazy loaded collections will load since there’s an active session available.

Further, if you save, create, load, etc. domain instances from multiple datasources in one controller method, all will work fine.

Transactional Services

By default, any service defined as transactional will use the core datasource. If you want a service to use a specific datasource, add its name to the ‘services’ attribute for a datasource definition. If there’s no one datasource for a particular service, you can still define programmatic transactions using withTransaction on any domain class for the appropriate datasource for each method or code block.

HibernateTemplate

I can’t think of non-contrived reasons to do so, but it’s possible to use a domain class in two or more datasources. The problem here is that the metaclass methods will end up mapped to the last declared datasource, so there’s no way to use GORM for the other datasource(s). However you can use Spring’s HibernateTemplate yourself – it has a lot of the functionality of GORM (GORM uses it under the hood). You can use the convenience method DatasourcesUtils.newHibernateTemplate(String dsName) to create a HibernateTemplate configured with the SessionFactory for the named datasource.

Usage

You might want to use this plugin even if you have only one database. Since you can define a datasource as being read-only and point read-only domain classes at it, your prevent yourself from accidentally creating, deleting, or updating instances.

I’ve created a basic (and rather contrived) test application. It has three datasources and five domain classes:

  • Country, State
  • Library, Book
  • Visit

Country and State are just lookup tables, so they use a read-only datasource. Visit has a weak foreign key to Library, but this is not enforced since they’re stored in two databases. It’s the responsibility of the application to ensure that the library id for each visit is valid. This is to simulate having a second database for auditing.

To test the app:

If you want to manually manage the tables in each database instead of letting Hibernate do it for you, you can use the schema-export script from here to capture the DDL for all the tables. Then you can execute the DDL statements for each database separately.


The plugin is in the Grails repository, so to use it just run “grails install-plugin datasources”. You can download the test application here.

A Gant Script to Call Hibernate’s SchemaExport

Monday, March 31st, 2008

Being new to Grails, I’ve been closely following the User and Dev mailing lists for tips and there was a recent discussion on the User list about accessing or generating the Hibernate schema script. The schema that Hibernate builds is good but typically DBAs need to customize things and the Hibernate output is a good starting point. It’s also useful to view the DDL between builds to monitor database changes.

In a traditional Spring/Hibernate app I use the schemaexport Ant task that comes with Hibernate Tools (this is a wrapper for SchemaExport), or by calling it directly if I need more control than the Ant task provides.

Running SchemaExport programmatically is pretty straightforward – you just have to build a Configuration from your domain classes (using hbm.xml files or annotated classes). But domain classes in Grails are (typically) .groovy files; I looked at the Grails/GORM code but couldn’t figure out how to properly build the Application, so the domain classes weren’t found and the Configuration was empty.

I’ve been reading the Grails source code trying to figure out how things work and I was looking at the code for the command-line scripts (e.g. “grails war”, “grails test-app”, etc.) and realized that I could borrow some setup code from there and get SchemaExport to work. It took a few hours to get going and a few more to distill it down to proper idiomatic Groovy (I am still a Java programmer …).

The code is below but you can download the script here. Some usage notes:

  • Usage: grails [environment] schema-export [generate | export] [stdout] [filename]
  • The action (generate or export), stdout option, and the filename are optional; if ommitted the action is generate, the filename defaults to ddl.sql in the project base dir, and ddl is only sent to stdout if the stdout option is present
  • Example 1: ‘grails schema-export’ generates ddl.sql for the default configuration
  • Example 2: ‘grails prod schema-export stdout’ generates ddl.sql for the ‘prod’ configuration, sending ddl to stdout in addition to the default file
  • Example 3: ‘grails dev schema-export c:/foo/bar/grails-app.sql’ generates the file ‘c:/foo/bar/grails-app.sql’ for the ‘dev’ configuration
  • Example 4: ‘grails prod schema-export export’ exports the schema (drop and create) using the ‘prod’ configuration
  • Example 5: ‘grails prod schema-export export stdout’ exports the schema using the ‘prod’ configuration, echoing the ddl to stdout
import org.hibernate.dialect.Dialect
import org.hibernate.dialect.DialectFactory
import org.hibernate.tool.hbm2ddl.SchemaExport
import org.codehaus.groovy.grails.orm.hibernate.cfg.DefaultGrailsDomainConfiguration
import org.springframework.jdbc.datasource.DriverManagerDataSource
import org.springframework.jdbc.support.JdbcUtils

grailsHome = Ant.project.properties.'environment.GRAILS_HOME'
includeTargets << new File("${grailsHome}/scripts/Bootstrap.groovy")

Properties props = new Properties()
String filename = "${basedir}/ddl.sql"
boolean export = false
boolean stdout = false

def configClasspath = {

  getClass().classLoader.rootLoader?.addURL(new File(classesDirPath).toURL())

  Ant.copy(todir: classesDirPath, file: "${basedir}/application.properties")
  Ant.copy(todir: classesDirPath, failonerror: false) {
    fileset(dir: "${basedir}/grails-app/conf",
           excludes: '*.groovy, log4j*, hibernate, spring')
    fileset(dir: "${basedir}/grails-app/conf/hibernate")
    fileset(dir: "${basedir}/src/java", excludes: '**/*.java')
  }
}

def configureFromArgs = {

  args = args ?: ''
  args.split('\n').each { arg ->
    arg = arg.trim()
    if (arg.length() > 0) {
      if (arg == 'export') {
        export = true
      }
      else if (arg == 'generate') {
        export = false
      }
      else if (arg == 'stdout') {
        stdout = true
      }
      else {
        // assume filename override
        filename = arg
      }
    }
  }
}

def populateProperties = {

  File dsFile = new File("${basedir}/grails-app/conf/DataSource.groovy")
  def dsConfig = null
  if (dsFile.exists()) {
    dsConfig = new ConfigSlurper(grailsEnv).parse(dsFile.text)
  }

  props.'hibernate.connection.username' = dsConfig?.dataSource?.username ?: 'sa'
  props.'hibernate.connection.password' = dsConfig?.dataSource?.password ?: ''
  props.'hibernate.connection.url' =
       dsConfig?.dataSource?.url ?: 'jdbc:hsqldb:mem:testDB'
  props.'hibernate.connection.driver_class' =
       dsConfig?.dataSource?.driverClassName ?: 'org.hsqldb.jdbcDriver'
  if (dsConfig?.dataSource?.dialect) {
    def dialect = dsConfig.dataSource.dialect
    if (dialect instanceof Class) {
      dialect = dialect.name
    }
    props.'hibernate.dialect' = dialect
  }
  else {
    println("WARNING: Autodetecting the Hibernate Dialect; "
    + "consider specifying the class name in DataSource.groovy")
    try {
      def ds = new DriverManagerDataSource(
          props.'hibernate.connection.driver_class',
          props.'hibernate.connection.url',
          props.'hibernate.connection.username',
          props.'hibernate.connection.password')
      String dbName = JdbcUtils.extractDatabaseMetaData(ds,
          'getDatabaseProductName')
      int majorVersion = JdbcUtils.extractDatabaseMetaData(ds,
          'getDatabaseMajorVersion')
      props.'hibernate.dialect' =
        DialectFactory.determineDialect(dbName, majorVersion).class.name
    }
    catch (Exception e) {
      println "ERROR: Problem autodetecting the Hibernate Dialect: ${e.message}"
      throw e
    }
  }
}

target('default': 'Run Hibernate SchemaExport') {
  depends(classpath, checkVersion, configureProxy, packageApp)

  configureFromArgs()

  File file = new File(filename)
  Ant.mkdir(dir: file.parentFile)

  configClasspath()
  loadApp()

  populateProperties()

  def configuration = new DefaultGrailsDomainConfiguration(
      grailsApplication: grailsApp,
      properties: props)

  def schemaExport = new SchemaExport(configuration)
    .setHaltOnError(true)
    .setOutputFile(file.path)
    .setDelimiter(';')

  String action = export ? "Exporting" : "Generating script to ${file.path}"
  println "${action} in environment '${grailsEnv}' using properties ${props}"

  if (export) {
    // 1st drop, warning exceptions
    schemaExport.execute(stdout, true, true, false)
    schemaExport.exceptions.clear()
    // then create
    schemaExport.execute(stdout, true, false, true)
  }
  else {
    // generate
    schemaExport.execute(stdout, false, false, false)
  }

  if (!schemaExport.exceptions.empty) {
    ((Exception)schemaExport.exceptions[0]).printStackTrace()
  }
}

Hibernate anti-pattern #1 – hashCode/equals

Friday, December 14th, 2007

I’ve been doing a lot of Hibernate refactoring and performance tuning at work lately and have been spending time wrestling with some interesting anti-patterns that have crept into the code. One is our particularly bad hashCode/equals implementation which has some pretty frustrating implications1.

There’s a lengthy discussion here about various ways to implement hashCode and equals for Hibernate-managed entities. The post is great, and the discussion in the comments is also good.


The assumption is that the 1st-level cache (and also the 2nd-level cache if implemented) ensures that instances are unique. So our base entity class has a final equals method that uses the same logic as Object – two instances are equal if they’re the same (==). Likewise, hashCode is final and returns System.identityHashCode(this), the same value as if the method weren’t overridden (we have a base class between Object and our base entity class that necessitates explicitly reverting the behavior to that of Object):

@Override
public final boolean equals(final Object o) {
  return o == this;
}

@Override
public final int hashCode() {
  return System.identityHashCode(this);
}

This turns out to be overly simplistic and optimistic – it sure would be nice if that’s all it took, since there’s no equals/hashCode maintenance for new entity classes, no custom entity-specific logic to implement.

If you call Foo foo1 = session.get(Foo.class, 123) and Foo foo2 = session.get(Foo.class, 123), the 2nd call will return a reference to the cached 1st instance – there’s only 1 database hit. This is one of the primary benefits of using an ORM framework like Hibernate – there’s some overhead in its approach but optimizations like this will tend to far outweigh the costs as long as you’re using the framework appropriately.

So in this case, foo1 == foo2, and the equals method works fine.

But if you mix eagerly loaded entity instances and proxies, weird things happen. For example, suppose Bar has a lazy many-to-one reference to Foo (or a lazy-loaded collection of Foos); == will always be false even if the target entity of the proxy is the same as the non-proxy instance:

Foo foo = (Foo)session.get(Foo.class, 123);
Bar bar = (Bar)session.get(Bar.class, 321);

// always false since bar's Foo is a proxy
System.out.println(bar.getFoo() == foo);

// true for proper equals, always false for us
System.out.println(bar.getFoo().equals(foo));

The result is that data access implementation logic has bled up past the service tier and into the web tier; the web tier shouldn’t have to be aware of proxies but now has to be. If there’s any risk of comparing proxied and non-proxied instances, we have to use primary key equality to test instance equality instead of the much more natural equals.

What’s worse, we can’t mix persistent and new entity instances in collections that maintain uniqueness (Sets/Maps). A new instance can never be equal to a persistent instance even if every non-pk property value is the same, so we again have to deal with the data tier bleeding into other tiers and jump through hoops to get Set uniqueness and contains to work.


So here’s my proposed approach. Comparing two persistent instances using primary key is very efficient since primary keys are unique. This doesn’t help when comparing persistent and new instances though, and is inappropriate for usage in hashCode for the reasons discussed in the Hibernate post referenced above. Specifically, the hashCode will change from whatever you use before the instance is persisted to the primary key (or some value derived from it) afterwards. This will confuse hash-based collections since the instance will most likely be in the wrong hash bucket and cannot be found.

So equals compares primary keys if both are non-null. If either is null, then the class must define the fields that determine uniqueness and compare those.

hashCode always uses the same fields used in the equals comparison and never the primary key. If the entity is immutable or the fields that are used for the calculation are, then the hashCode could be cached to save the cost of unnecessary recalculation.

Most of this logic is in the base entity class, with abstract methods that force each entity class to define entity-specific behavior:

import java.io.Serializable;

import javax.persistence.Column;
import javax.persistence.MappedSuperclass;
import javax.persistence.Transient;
import javax.persistence.Version;

import org.apache.commons.lang.builder.HashCodeBuilder;
import org.hibernate.Hibernate;

/**
 * Abstract base class for entities.
 * @param <T> the type
 */
@MappedSuperclass
public abstract class BaseEntity<T extends BaseEntity<T>>
       extends ValueObject {

  private int _version;

  /**
   * Default.
   */
  protected BaseEntity() {
    // default
  }

  // optimistic locking version, private since only Hibernate uses these
  @Version @Column(name = "oplock", nullable = false)
  @SuppressWarnings("unused")
  private int getVersion() {
    return _version;
  }
  @SuppressWarnings("unused")
  private void setVersion(final int version) {
    _version = version;
  }

  /**
   * Utility method for <code>equals()</code> methods.
   * @param o1  one object
   * @param o2  another object
   * @return <code>true</code> if they're both <code>null</code> or both equal
   */
  protected boolean areEqual(final Object o1, final Object o2) {
    if (o1 == null) {
      if (o2 != null) {
        return false;
      }
    }
    else if (!o1.equals(o2)) {
      return false;
    }

    return true;
  }

  /**
   * Utility method for <code>equals()</code> methods.
   * @param s1  one string
   * @param s2  another string
   * @param ignoreCase if <code>false</code> do case-sensitive comparison
   * @return <code>true</code> if they're both <code>null</code> or both equal
   */
  protected boolean areEqual(
      final String s1,
      final String s2,
      final boolean ignoreCase) {
    // for use in custom equals() methods

    if (s1 == null && s2 == null) {
      return true;
    }

    if (s1 == null || s2 == null) {
      return false;
    }

    return ignoreCase ? s1.equalsIgnoreCase(s2) : s1.equals(s2);
  }

  /**
   * Utility method for <code>equals()</code> methods.
   * @param d1  one date
   * @param d2  another date
   * @return <code>true</code> if they're both <code>null</code> or both equal
   */
  protected boolean areEqual(
      final Date d1,
      final Date d2) {
    // for use in custom equals() methods

    if (d1 == null && d2 == null) {
      return true;
    }

    if (d1 == null || d2 == null) {
      return false;
    }

    return d1.getTime() == d2.getTime();
  }

  /**
   * Utility method for <code>equals()</code> methods.
   * @param f1  one float
   * @param f2  another float
   * @return <code>true</code> if they're equal
   */
  protected boolean areEqual(
      final float f1,
      final float f2) {
    // for use in custom equals() methods

    return Float.floatToIntBits(f1) == Float.floatToIntBits(f2);
  }

  /**
   * Utility method for <code>hashCode()</code> methods.
   * @param values  the values to use in calculation
   * @return  the hash code value
   */
  protected int calculateHashCode(final Object... values) {
    HashCodeBuilder builder = new HashCodeBuilder();
    for (Object value : values) {
      builder.append(value);
    }
    return builder.toHashCode();
  }

  /**
   * {@inheritDoc}
   */
  @Override
  public int hashCode() {
    return calculateHashCode(getHashCodeData());
  }

  /**
   * Get the data used to calculate hash code;
   * use getters not fields in case the instance is a proxy.
   * @return the data
   */
  @Transient
  protected abstract Object[] getHashCodeData();

  /**
   * Allows type-specific "this".
   * @return  this
   */
  @Transient
  protected abstract T getThis();

  /**
   * Get the primary key.
   * @return  the pk, or <code>null</code> if not set or if not supported
   */
  @Transient
  public abstract Serializable getPk();

  /**
   * {@inheritDoc}
   */
  @SuppressWarnings("unchecked")
  @Override
  public final boolean equals(final Object other) {
    if (this == other) {
      return true;
    }

    if (other == null
        || // looks into the target class of a proxy if necessary
        !Hibernate.getClass(other).equals(Hibernate.getClass(this))) {
      return false;
    }

    // if pks are both set, compare
    if (getPk() != null) {
      Serializable otherPk = ((BaseEntity)other).getPk();
      if (otherPk != null) {
        return getPk().equals(otherPk);
      }
    }

    return dataEquals((T)other);
  }

  /**
   * Compare data only; null, class, and pk have been checked.
   * @param other the other instance
   * @return <code>true</code> if equal
   */
  protected abstract boolean dataEquals(T other);
}

A concrete subclass must implement getHashCodeData and dataEquals, for example using the username property for a User class:

/**
 * {@inheritDoc}
 */
@Override
protected boolean dataEquals(final User other) {
  return areEqual(getUsername(), other.getUsername(), true);
}

/**
 * {@inheritDoc}
 */
@Override
@Transient
protected Object[] getHashCodeData() {
  return new Object[] { getUsername() };
}

This has worked well so far. It’s consistent, even when an instance transitions from new to persistent and its primary key becomes non-null. The data used for hashCode calculation never changes, so that’s consistent. The data for equals does change, but it stays consistent – comparing two persistent entities or mixing persistent and new returns the correct result in all cases since non-pk data is used for comparison.

Note that equals is final but hashCode isn’t to allow caching.


My original implementation of this was way too simplistic, just using the Jakarta Commons Lang reflection methods that I described here. This approach is great for regular value objects and DTOs, but is way too expensive for Hibernate entities since every field is used, even collections and the primary key. Lazy-loaded collections will be retrieved from the database to calculate hashCode and equals, which is almost always inappropriate.

So I worked on a slightly smarter approach that skipped the primary key and collections but still used reflection for the rest of the fields. This is still too expensive, and breaks when using proxies since the fields will be null – the getter methods delegate to the data that’s in the proxy’s target instance.

Eventually I ended up with the current implementation, which has lasted for a while with no issues.




  1. Ironically, The Decidertm was asked recently why this choice was made and he couldn’t defend it, but was sure that there was a good reason for it and wasn’t willing to change anything even given the implications of the decision.[back]

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.