Hibernate anti-pattern #1 – hashCode/equals
I’ve been doing a lot of Hibernate refactoring and performance tuning at work lately and have been spending time wrestling with some interesting anti-patterns that have crept into the code. One is our particularly bad hashCode
/equals
implementation which has some pretty frustrating implications1.
There’s a lengthy discussion here about various ways to implement
hashCode
and equals
for Hibernate-managed entities. The post is great, and the discussion in the comments is also good.
The assumption is that the 1st-level cache (and also the 2nd-level cache if implemented) ensures that instances are unique. So our base entity class has a final equals
method that uses the same logic as Object
– two instances are equal if they’re the same (==). Likewise, hashCode
is final and returns System.identityHashCode(this)
, the same value as if the method weren’t overridden (we have a base class between Object
and our base entity class that necessitates explicitly reverting the behavior to that of Object
):
@Override public final boolean equals(final Object o) { return o == this; } @Override public final int hashCode() { return System.identityHashCode(this); }
This turns out to be overly simplistic and optimistic – it sure would be nice if that’s all it took, since there’s no equals
/hashCode
maintenance for new entity classes, no custom entity-specific logic to implement.
If you call Foo foo1 = session.get(Foo.class, 123)
and Foo foo2 = session.get(Foo.class, 123)
, the 2nd call will return a reference to the cached 1st instance – there’s only 1 database hit. This is one of the primary benefits of using an ORM framework like Hibernate – there’s some overhead in its approach but optimizations like this will tend to far outweigh the costs as long as you’re using the framework appropriately.
So in this case, foo1 == foo2, and the equals
method works fine.
But if you mix eagerly loaded entity instances and proxies, weird things happen. For example, suppose Bar
has a lazy many-to-one reference to Foo
(or a lazy-loaded collection of Foo
s); == will always be false even if the target entity of the proxy is the same as the non-proxy instance:
Foo foo = (Foo)session.get(Foo.class, 123); Bar bar = (Bar)session.get(Bar.class, 321); // always false since bar's Foo is a proxy System.out.println(bar.getFoo() == foo); // true for proper equals, always false for us System.out.println(bar.getFoo().equals(foo));
The result is that data access implementation logic has bled up past the service tier and into the web tier; the web tier shouldn’t have to be aware of proxies but now has to be. If there’s any risk of comparing proxied and non-proxied instances, we have to use primary key equality to test instance equality instead of the much more natural equals
.
What’s worse, we can’t mix persistent and new entity instances in collections that maintain uniqueness (Set
s/Map
s). A new instance can never be equal to a persistent instance even if every non-pk property value is the same, so we again have to deal with the data tier bleeding into other tiers and jump through hoops to get Set
uniqueness and contains
to work.
So here’s my proposed approach. Comparing two persistent instances using primary key is very efficient since primary keys are unique. This doesn’t help when comparing persistent and new instances though, and is inappropriate for usage in hashCode
for the reasons discussed in the Hibernate post referenced above. Specifically, the hashCode
will change from whatever you use before the instance is persisted to the primary key (or some value derived from it) afterwards. This will confuse hash-based collections since the instance will most likely be in the wrong hash bucket and cannot be found.
So equals
compares primary keys if both are non-null
. If either is null
, then the class must define the fields that determine uniqueness and compare those.
hashCode
always uses the same fields used in the equals
comparison and never the primary key. If the entity is immutable or the fields that are used for the calculation are, then the hashCode
could be cached to save the cost of unnecessary recalculation.
Most of this logic is in the base entity class, with abstract methods that force each entity class to define entity-specific behavior:
import java.io.Serializable; import javax.persistence.Column; import javax.persistence.MappedSuperclass; import javax.persistence.Transient; import javax.persistence.Version; import org.apache.commons.lang.builder.HashCodeBuilder; import org.hibernate.Hibernate; /** * Abstract base class for entities. * @param <T> the type */ @MappedSuperclass public abstract class BaseEntity<T extends BaseEntity<T>> extends ValueObject { private int _version; /** * Default. */ protected BaseEntity() { // default } // optimistic locking version, private since only Hibernate uses these @Version @Column(name = "oplock", nullable = false) @SuppressWarnings("unused") private int getVersion() { return _version; } @SuppressWarnings("unused") private void setVersion(final int version) { _version = version; } /** * Utility method for <code>equals()</code> methods. * @param o1 one object * @param o2 another object * @return <code>true</code> if they're both <code>null</code> or both equal */ protected boolean areEqual(final Object o1, final Object o2) { if (o1 == null) { if (o2 != null) { return false; } } else if (!o1.equals(o2)) { return false; } return true; } /** * Utility method for <code>equals()</code> methods. * @param s1 one string * @param s2 another string * @param ignoreCase if <code>false</code> do case-sensitive comparison * @return <code>true</code> if they're both <code>null</code> or both equal */ protected boolean areEqual( final String s1, final String s2, final boolean ignoreCase) { // for use in custom equals() methods if (s1 == null && s2 == null) { return true; } if (s1 == null || s2 == null) { return false; } return ignoreCase ? s1.equalsIgnoreCase(s2) : s1.equals(s2); } /** * Utility method for <code>equals()</code> methods. * @param d1 one date * @param d2 another date * @return <code>true</code> if they're both <code>null</code> or both equal */ protected boolean areEqual( final Date d1, final Date d2) { // for use in custom equals() methods if (d1 == null && d2 == null) { return true; } if (d1 == null || d2 == null) { return false; } return d1.getTime() == d2.getTime(); } /** * Utility method for <code>equals()</code> methods. * @param f1 one float * @param f2 another float * @return <code>true</code> if they're equal */ protected boolean areEqual( final float f1, final float f2) { // for use in custom equals() methods return Float.floatToIntBits(f1) == Float.floatToIntBits(f2); } /** * Utility method for <code>hashCode()</code> methods. * @param values the values to use in calculation * @return the hash code value */ protected int calculateHashCode(final Object... values) { HashCodeBuilder builder = new HashCodeBuilder(); for (Object value : values) { builder.append(value); } return builder.toHashCode(); } /** * {@inheritDoc} */ @Override public int hashCode() { return calculateHashCode(getHashCodeData()); } /** * Get the data used to calculate hash code; * use getters not fields in case the instance is a proxy. * @return the data */ @Transient protected abstract Object[] getHashCodeData(); /** * Allows type-specific "this". * @return this */ @Transient protected abstract T getThis(); /** * Get the primary key. * @return the pk, or <code>null</code> if not set or if not supported */ @Transient public abstract Serializable getPk(); /** * {@inheritDoc} */ @SuppressWarnings("unchecked") @Override public final boolean equals(final Object other) { if (this == other) { return true; } if (other == null || // looks into the target class of a proxy if necessary !Hibernate.getClass(other).equals(Hibernate.getClass(this))) { return false; } // if pks are both set, compare if (getPk() != null) { Serializable otherPk = ((BaseEntity)other).getPk(); if (otherPk != null) { return getPk().equals(otherPk); } } return dataEquals((T)other); } /** * Compare data only; null, class, and pk have been checked. * @param other the other instance * @return <code>true</code> if equal */ protected abstract boolean dataEquals(T other); }
A concrete subclass must implement getHashCodeData
and dataEquals
, for example using the username property for a User
class:
/** * {@inheritDoc} */ @Override protected boolean dataEquals(final User other) { return areEqual(getUsername(), other.getUsername(), true); } /** * {@inheritDoc} */ @Override @Transient protected Object[] getHashCodeData() { return new Object[] { getUsername() }; }
This has worked well so far. It’s consistent, even when an instance transitions from new to persistent and its primary key becomes non-null
. The data used for hashCode
calculation never changes, so that’s consistent. The data for equals
does change, but it stays consistent – comparing two persistent entities or mixing persistent and new returns the correct result in all cases since non-pk data is used for comparison.
Note that equals
is final but hashCode
isn’t to allow caching.
My original implementation of this was way too simplistic, just using the Jakarta Commons Lang reflection methods that I described here
. This approach is great for regular value objects and DTOs, but is way too expensive for Hibernate entities since every field is used, even collections and the primary key. Lazy-loaded collections will be retrieved from the database to calculate
hashCode
and equals
, which is almost always inappropriate.
So I worked on a slightly smarter approach that skipped the primary key and collections but still used reflection for the rest of the fields. This is still too expensive, and breaks when using proxies since the fields will be null
– the getter methods delegate to the data that’s in the proxy’s target instance.
Eventually I ended up with the current implementation, which has lasted for a while with no issues.
- Ironically, The Decidertm was asked recently why this choice was made and he couldn’t defend it, but was sure that there was a good reason for it and wasn’t willing to change anything even given the implications of the decision.[back]
A more cryptic by shorter way to test equality is the following.
return (o1 == o2) || (o1 != null && o1.equals(o2));
Many folks, for valid technical reasons, assign guids in their entity constructors and don’t use lazy loading. Web services and Swing apps come to mind, even web apps that prefer to use detached objects. Never assume lazy loading is the ideal object graph navigation solution. Remember only one rule: every app is, indeed, unique in its requirements.
I really like your idea, thanks a lot. The one thing that I dn’t like is that there is reliance in this model on Hibernate. And there really doesn’t have to be. The only reason you use it is to handle the proxy issue. But with generics, you can actually get the “real” class of the object at runtime without the help of hibernate.
@SuppressWarnings(“unchecked”)
public Class entityClass() {
ParameterizedType thisType = (ParameterizedType) getClass().getGenericSuperclass();
return (Class) thisType.getActualTypeArguments()[0];
}
and then modify you equals:
@SuppressWarnings(“unchecked”)
@Override
public final boolean equals(final Object other) {
if (this == other) {
return true;
}
if (other == null) {
return false;
}
if (!entityClass().isAssignableFrom(other.getClass())) {
return false;
}
// if pks are both set, compare
if (getPk() != null) {
Serializable otherPk = ((BaseEntity)other).getPk();
if (otherPk != null) {
return getPk().equals(otherPk);
}
}
}
return dataEquals((T) other); //we can cast this because of the class checking above
}
sorry — you lose all the parameterization in my previous post! <T> was filtered out…
Er… that had some bugs as well…. but you get the idea.
What about cases where your fields are not immutable? Wouldn’t this break the relationship between equals and hashcode? Maybe I am wrong but this will ONLY be valid when the objects are immutable. I know you mention that if the entity or fields are immutable that the hashcode could be cached, but I would think it would not only be uncacheable, but also invalid otherwise.
example, you have two instances of the same entity which has already been saved and given an id. You change a field (lets say name) in one of the instances. equals will return true, but hashcode will be different (which will cause issues when storing in a collection).
instance A: id = 1, name=”A”
instance B: id = 1, name=”B”
A.equals(B) returns true, but A.hashcode() != B.hashcode().
@Chris
You’re right that this is a potential problem, but this is Hibernate after all, and you’re not going to have two different instances of a persistent entity with the same pk. It happens, unfortunately (especially mixing lazy-loaded proxies with regular instances) but it’s not the normal behavior.
@Maulin
You can actually “tunnel” the classes of proxied entities by having a new method that delegates to getClass() rather than using generic subclassing and reflection:
public Class getRealClass() { getClass(); }
@Burt
You could also use only id object in the hashCode calculation when present this will fix the problem as noticed by Chris.
Fine solution. Thank you for sharing it, this post saves me a lot of typing.
And many problems too. 🙂
The lack of equals and hashCode implementations promotes the same issues as when using Java/JPA/Hibernate – they need to be defined on the “natural key” of the entity as the author noted.
My vote is that GORM should always require you to declare a natural key as well as the object’s primary key – just as you would by convention with JPA/Hibernate and Java.
I had been struggling with this issue for a while – I found your solution very useful, thanks ))
Hi guys,
Interesting (and complex 🙂 ) general approach!
What if I have an entity that does not have a natural key? E.g. some representation of an index card:
Icard {
String id;
String title;
String side1;
}
In my case, I want to be able to change both the title and the side1 information and none of them should be unique…
/Tavi
We just take a UUID for the primary key and don’t leave it to hibernate to assign it. So we assign it during object construction. This way you can have a simple equals/hashcode just over the UUID id.