Hibernate anti-pattern #1 – hashCode/equals
I've been doing a lot of Hibernate refactoring and performance tuning at work lately and have been spending time wrestling with some interesting anti-patterns that have crept into the code. One is our particularly bad hashCode/equals implementation which has some pretty frustrating implications1.
There's a lengthy discussion here
about various ways to implement hashCode and equals for Hibernate-managed entities. The post is great, and the discussion in the comments is also good.
The assumption is that the 1st-level cache (and also the 2nd-level cache if implemented) ensures that instances are unique. So our base entity class has a final equals method that uses the same logic as Object - two instances are equal if they're the same (==). Likewise, hashCode is final and returns System.identityHashCode(this), the same value as if the method weren't overridden (we have a base class between Object and our base entity class that necessitates explicitly reverting the behavior to that of Object):
This turns out to be overly simplistic and optimistic - it sure would be nice if that's all it took, since there's no equals/hashCode maintenance for new entity classes, no custom entity-specific logic to implement.
If you call Foo foo1 = session.get(Foo.class, 123) and Foo foo2 = session.get(Foo.class, 123), the 2nd call will return a reference to the cached 1st instance - there's only 1 database hit. This is one of the primary benefits of using an ORM framework like Hibernate - there's some overhead in its approach but optimizations like this will tend to far outweigh the costs as long as you're using the framework appropriately.
So in this case, foo1 == foo2, and the equals method works fine.
But if you mix eagerly loaded entity instances and proxies, weird things happen. For example, suppose Bar has a lazy many-to-one reference to Foo (or a lazy-loaded collection of Foos); == will always be false even if the target entity of the proxy is the same as the non-proxy instance:
The result is that data access implementation logic has bled up past the service tier and into the web tier; the web tier shouldn't have to be aware of proxies but now has to be. If there's any risk of comparing proxied and non-proxied instances, we have to use primary key equality to test instance equality instead of the much more natural equals.
What's worse, we can't mix persistent and new entity instances in collections that maintain uniqueness (Sets/Maps). A new instance can never be equal to a persistent instance even if every non-pk property value is the same, so we again have to deal with the data tier bleeding into other tiers and jump through hoops to get Set uniqueness and contains to work.
So here's my proposed approach. Comparing two persistent instances using primary key is very efficient since primary keys are unique. This doesn't help when comparing persistent and new instances though, and is inappropriate for usage in hashCode for the reasons discussed in the Hibernate post referenced above. Specifically, the hashCode will change from whatever you use before the instance is persisted to the primary key (or some value derived from it) afterwards. This will confuse hash-based collections since the instance will most likely be in the wrong hash bucket and cannot be found.
So equals compares primary keys if both are non-null. If either is null, then the class must define the fields that determine uniqueness and compare those.
hashCode always uses the same fields used in the equals comparison and never the primary key. If the entity is immutable or the fields that are used for the calculation are, then the hashCode could be cached to save the cost of unnecessary recalculation.
Most of this logic is in the base entity class, with abstract methods that force each entity class to define entity-specific behavior:
import javax.persistence.Column;
import javax.persistence.MappedSuperclass;
import javax.persistence.Transient;
import javax.persistence.Version;
import org.apache.commons.lang.builder.HashCodeBuilder;
import org.hibernate.Hibernate;
/**
* Abstract base class for entities.
* @param <T> the type
*/
@MappedSuperclass
public abstract class BaseEntity<T extends BaseEntity<T>>
extends ValueObject {
private int _version;
/**
* Default.
*/
protected BaseEntity() {
// default
}
// optimistic locking version, private since only Hibernate uses these
@Version @Column(name = "oplock", nullable = false)
@SuppressWarnings("unused")
private int getVersion() {
return _version;
}
@SuppressWarnings("unused")
private void setVersion(final int version) {
_version = version;
}
/**
* Utility method for <code>equals()</code> methods.
* @param o1 one object
* @param o2 another object
* @return <code>true</code> if they're both <code>null</code> or both equal
*/
protected boolean areEqual(final Object o1, final Object o2) {
if (o1 == null) {
if (o2 != null) {
return false;
}
}
else if (!o1.equals(o2)) {
return false;
}
return true;
}
/**
* Utility method for <code>equals()</code> methods.
* @param s1 one string
* @param s2 another string
* @param ignoreCase if <code>false</code> do case-sensitive comparison
* @return <code>true</code> if they're both <code>null</code> or both equal
*/
protected boolean areEqual(
final String s1,
final String s2,
final boolean ignoreCase) {
// for use in custom equals() methods
if (s1 == null && s2 == null) {
return true;
}
if (s1 == null || s2 == null) {
return false;
}
return ignoreCase ? s1.equalsIgnoreCase(s2) : s1.equals(s2);
}
/**
* Utility method for <code>equals()</code> methods.
* @param d1 one date
* @param d2 another date
* @return <code>true</code> if they're both <code>null</code> or both equal
*/
protected boolean areEqual(
final Date d1,
final Date d2) {
// for use in custom equals() methods
if (d1 == null && d2 == null) {
return true;
}
if (d1 == null || d2 == null) {
return false;
}
return d1.getTime() == d2.getTime();
}
/**
* Utility method for <code>equals()</code> methods.
* @param f1 one float
* @param f2 another float
* @return <code>true</code> if they're equal
*/
protected boolean areEqual(
final float f1,
final float f2) {
// for use in custom equals() methods
return Float.floatToIntBits(f1) == Float.floatToIntBits(f2);
}
/**
* Utility method for <code>hashCode()</code> methods.
* @param values the values to use in calculation
* @return the hash code value
*/
protected int calculateHashCode(final Object... values) {
HashCodeBuilder builder = new HashCodeBuilder();
for (Object value : values) {
builder.append(value);
}
return builder.toHashCode();
}
/**
* {@inheritDoc}
*/
@Override
public int hashCode() {
return calculateHashCode(getHashCodeData());
}
/**
* Get the data used to calculate hash code;
* use getters not fields in case the instance is a proxy.
* @return the data
*/
@Transient
protected abstract Object[] getHashCodeData();
/**
* Allows type-specific "this".
* @return this
*/
@Transient
protected abstract T getThis();
/**
* Get the primary key.
* @return the pk, or <code>null</code> if not set or if not supported
*/
@Transient
public abstract Serializable getPk();
/**
* {@inheritDoc}
*/
@SuppressWarnings("unchecked")
@Override
public final boolean equals(final Object other) {
if (this == other) {
return true;
}
if (other == null
|| // looks into the target class of a proxy if necessary
!Hibernate.getClass(other).equals(Hibernate.getClass(this))) {
return false;
}
// if pks are both set, compare
if (getPk() != null) {
Serializable otherPk = ((BaseEntity)other).getPk();
if (otherPk != null) {
return getPk().equals(otherPk);
}
}
return dataEquals((T)other);
}
/**
* Compare data only; null, class, and pk have been checked.
* @param other the other instance
* @return <code>true</code> if equal
*/
protected abstract boolean dataEquals(T other);
}
A concrete subclass must implement getHashCodeData and dataEquals, for example using the username property for a User class:
This has worked well so far. It's consistent, even when an instance transitions from new to persistent and its primary key becomes non-null. The data used for hashCode calculation never changes, so that's consistent. The data for equals does change, but it stays consistent - comparing two persistent entities or mixing persistent and new returns the correct result in all cases since non-pk data is used for comparison.
Note that equals is final but hashCode isn't to allow caching.
My original implementation of this was way too simplistic, just using the Jakarta Commons Lang
reflection methods that I described here
. This approach is great for regular value objects and DTOs, but is way too expensive for Hibernate entities since every field is used, even collections and the primary key. Lazy-loaded collections will be retrieved from the database to calculate hashCode and equals, which is almost always inappropriate.
So I worked on a slightly smarter approach that skipped the primary key and collections but still used reflection for the rest of the fields. This is still too expensive, and breaks when using proxies since the fields will be null - the getter methods delegate to the data that's in the proxy's target instance.
Eventually I ended up with the current implementation, which has lasted for a while with no issues.
- Ironically, The Decidertm was asked recently why this choice was made and he couldn't defend it, but was sure that there was a good reason for it and wasn't willing to change anything even given the implications of the decision.[back]







December 15th, 2007 06:40
A more cryptic by shorter way to test equality is the following.
return (o1 == o2) || (o1 != null && o1.equals(o2));
December 15th, 2007 19:42
Many folks, for valid technical reasons, assign guids in their entity constructors and don’t use lazy loading. Web services and Swing apps come to mind, even web apps that prefer to use detached objects. Never assume lazy loading is the ideal object graph navigation solution. Remember only one rule: every app is, indeed, unique in its requirements.
May 25th, 2008 02:16
I really like your idea, thanks a lot. The one thing that I dn’t like is that there is reliance in this model on Hibernate. And there really doesn’t have to be. The only reason you use it is to handle the proxy issue. But with generics, you can actually get the “real” class of the object at runtime without the help of hibernate.
@SuppressWarnings(“unchecked”)
public Class entityClass() {
ParameterizedType thisType = (ParameterizedType) getClass().getGenericSuperclass();
return (Class) thisType.getActualTypeArguments()[0];
}
and then modify you equals:
@SuppressWarnings(“unchecked”)
@Override
public final boolean equals(final Object other) {
if (this == other) {
return true;
}
if (other == null) {
return false;
}
if (!entityClass().isAssignableFrom(other.getClass())) {
return false;
}
// if pks are both set, compare
if (getPk() != null) {
Serializable otherPk = ((BaseEntity)other).getPk();
if (otherPk != null) {
return getPk().equals(otherPk);
}
}
}
return dataEquals((T) other); //we can cast this because of the class checking above
}
May 25th, 2008 02:17
sorry — you lose all the parameterization in my previous post! <T> was filtered out…
May 25th, 2008 09:48
Er… that had some bugs as well…. but you get the idea.
September 23rd, 2008 09:42
What about cases where your fields are not immutable? Wouldn’t this break the relationship between equals and hashcode? Maybe I am wrong but this will ONLY be valid when the objects are immutable. I know you mention that if the entity or fields are immutable that the hashcode could be cached, but I would think it would not only be uncacheable, but also invalid otherwise.
example, you have two instances of the same entity which has already been saved and given an id. You change a field (lets say name) in one of the instances. equals will return true, but hashcode will be different (which will cause issues when storing in a collection).
instance A: id = 1, name=”A”
instance B: id = 1, name=”B”
A.equals(B) returns true, but A.hashcode() != B.hashcode().
September 23rd, 2008 23:58
@Chris
You’re right that this is a potential problem, but this is Hibernate after all, and you’re not going to have two different instances of a persistent entity with the same pk. It happens, unfortunately (especially mixing lazy-loaded proxies with regular instances) but it’s not the normal behavior.
November 14th, 2008 15:17
@Maulin
You can actually “tunnel” the classes of proxied entities by having a new method that delegates to getClass() rather than using generic subclassing and reflection:
public Class getRealClass() { getClass(); }
September 7th, 2009 07:01
@Burt
You could also use only id object in the hashCode calculation when present this will fix the problem as noticed by Chris.
January 28th, 2010 06:38
Fine solution. Thank you for sharing it, this post saves me a lot of typing.
January 28th, 2010 06:39
And many problems too.
February 16th, 2010 18:52
The lack of equals and hashCode implementations promotes the same issues as when using Java/JPA/Hibernate – they need to be defined on the “natural key” of the entity as the author noted.
My vote is that GORM should always require you to declare a natural key as well as the object’s primary key – just as you would by convention with JPA/Hibernate and Java.