I’ve been doing a lot of Hibernate refactoring and performance tuning at work lately and have been spending time wrestling with some interesting anti-patterns that have crept into the code. One is our particularly bad hashCode
/equals
implementation which has some pretty frustrating implications.
There’s a lengthy discussion here
about various ways to implement hashCode
and equals
for Hibernate-managed entities. The post is great, and the discussion in the comments is also good.
The assumption is that the 1st-level cache (and also the 2nd-level cache if implemented) ensures that instances are unique. So our base entity class has a final equals
method that uses the same logic as Object
– two instances are equal if they’re the same (==). Likewise, hashCode
is final and returns System.identityHashCode(this)
, the same value as if the method weren’t overridden (we have a base class between Object
and our base entity class that necessitates explicitly reverting the behavior to that of Object
):
@Override
public final boolean equals(final Object o) {
return o == this;
}
@Override
public final int hashCode() {
return System.identityHashCode(this);
}
This turns out to be overly simplistic and optimistic – it sure would be nice if that’s all it took, since there’s no equals
/hashCode
maintenance for new entity classes, no custom entity-specific logic to implement.
If you call Foo foo1 = session.get(Foo.class, 123)
and Foo foo2 = session.get(Foo.class, 123)
, the 2nd call will return a reference to the cached 1st instance – there’s only 1 database hit. This is one of the primary benefits of using an ORM framework like Hibernate – there’s some overhead in its approach but optimizations like this will tend to far outweigh the costs as long as you’re using the framework appropriately.
So in this case, foo1 == foo2, and the equals
method works fine.
But if you mix eagerly loaded entity instances and proxies, weird things happen. For example, suppose Bar
has a lazy many-to-one reference to Foo
(or a lazy-loaded collection of Foo
s); == will always be false even if the target entity of the proxy is the same as the non-proxy instance:
Foo foo = (Foo)session.get(Foo.class, 123);
Bar bar = (Bar)session.get(Bar.class, 321);
// always false since bar's Foo is a proxy
System.out.println(bar.getFoo() == foo);
// true for proper equals, always false for us
System.out.println(bar.getFoo().equals(foo));
The result is that data access implementation logic has bled up past the service tier and into the web tier; the web tier shouldn’t have to be aware of proxies but now has to be. If there’s any risk of comparing proxied and non-proxied instances, we have to use primary key equality to test instance equality instead of the much more natural equals
.
What’s worse, we can’t mix persistent and new entity instances in collections that maintain uniqueness (Set
s/Map
s). A new instance can never be equal to a persistent instance even if every non-pk property value is the same, so we again have to deal with the data tier bleeding into other tiers and jump through hoops to get Set
uniqueness and contains
to work.
So here’s my proposed approach. Comparing two persistent instances using primary key is very efficient since primary keys are unique. This doesn’t help when comparing persistent and new instances though, and is inappropriate for usage in hashCode
for the reasons discussed in the Hibernate post referenced above. Specifically, the hashCode
will change from whatever you use before the instance is persisted to the primary key (or some value derived from it) afterwards. This will confuse hash-based collections since the instance will most likely be in the wrong hash bucket and cannot be found.
So equals
compares primary keys if both are non-null
. If either is null
, then the class must define the fields that determine uniqueness and compare those.
hashCode
always uses the same fields used in the equals
comparison and never the primary key. If the entity is immutable or the fields that are used for the calculation are, then the hashCode
could be cached to save the cost of unnecessary recalculation.
Most of this logic is in the base entity class, with abstract methods that force each entity class to define entity-specific behavior:
import java.io.Serializable;
import javax.persistence.Column;
import javax.persistence.MappedSuperclass;
import javax.persistence.Transient;
import javax.persistence.Version;
import org.apache.commons.lang.builder.HashCodeBuilder;
import org.hibernate.Hibernate;
/**
* Abstract base class for entities.
* @param <T> the type
*/
@MappedSuperclass
public abstract class BaseEntity<T extends BaseEntity<T>>
extends ValueObject {
private int _version;
/**
* Default.
*/
protected BaseEntity() {
// default
}
// optimistic locking version, private since only Hibernate uses these
@Version @Column(name = "oplock", nullable = false)
@SuppressWarnings("unused")
private int getVersion() {
return _version;
}
@SuppressWarnings("unused")
private void setVersion(final int version) {
_version = version;
}
/**
* Utility method for <code>equals()</code> methods.
* @param o1 one object
* @param o2 another object
* @return <code>true</code> if they're both <code>null</code> or both equal
*/
protected boolean areEqual(final Object o1, final Object o2) {
if (o1 == null) {
if (o2 != null) {
return false;
}
}
else if (!o1.equals(o2)) {
return false;
}
return true;
}
/**
* Utility method for <code>equals()</code> methods.
* @param s1 one string
* @param s2 another string
* @param ignoreCase if <code>false</code> do case-sensitive comparison
* @return <code>true</code> if they're both <code>null</code> or both equal
*/
protected boolean areEqual(
final String s1,
final String s2,
final boolean ignoreCase) {
// for use in custom equals() methods
if (s1 == null && s2 == null) {
return true;
}
if (s1 == null || s2 == null) {
return false;
}
return ignoreCase ? s1.equalsIgnoreCase(s2) : s1.equals(s2);
}
/**
* Utility method for <code>equals()</code> methods.
* @param d1 one date
* @param d2 another date
* @return <code>true</code> if they're both <code>null</code> or both equal
*/
protected boolean areEqual(
final Date d1,
final Date d2) {
// for use in custom equals() methods
if (d1 == null && d2 == null) {
return true;
}
if (d1 == null || d2 == null) {
return false;
}
return d1.getTime() == d2.getTime();
}
/**
* Utility method for <code>equals()</code> methods.
* @param f1 one float
* @param f2 another float
* @return <code>true</code> if they're equal
*/
protected boolean areEqual(
final float f1,
final float f2) {
// for use in custom equals() methods
return Float.floatToIntBits(f1) == Float.floatToIntBits(f2);
}
/**
* Utility method for <code>hashCode()</code> methods.
* @param values the values to use in calculation
* @return the hash code value
*/
protected int calculateHashCode(final Object... values) {
HashCodeBuilder builder = new HashCodeBuilder();
for (Object value : values) {
builder.append(value);
}
return builder.toHashCode();
}
/**
* {@inheritDoc}
*/
@Override
public int hashCode() {
return calculateHashCode(getHashCodeData());
}
/**
* Get the data used to calculate hash code;
* use getters not fields in case the instance is a proxy.
* @return the data
*/
@Transient
protected abstract Object[] getHashCodeData();
/**
* Allows type-specific "this".
* @return this
*/
@Transient
protected abstract T getThis();
/**
* Get the primary key.
* @return the pk, or <code>null</code> if not set or if not supported
*/
@Transient
public abstract Serializable getPk();
/**
* {@inheritDoc}
*/
@SuppressWarnings("unchecked")
@Override
public final boolean equals(final Object other) {
if (this == other) {
return true;
}
if (other == null
|| // looks into the target class of a proxy if necessary
!Hibernate.getClass(other).equals(Hibernate.getClass(this))) {
return false;
}
// if pks are both set, compare
if (getPk() != null) {
Serializable otherPk = ((BaseEntity)other).getPk();
if (otherPk != null) {
return getPk().equals(otherPk);
}
}
return dataEquals((T)other);
}
/**
* Compare data only; null, class, and pk have been checked.
* @param other the other instance
* @return <code>true</code> if equal
*/
protected abstract boolean dataEquals(T other);
}
A concrete subclass must implement getHashCodeData
and dataEquals
, for example using the username property for a User
class:
/**
* {@inheritDoc}
*/
@Override
protected boolean dataEquals(final User other) {
return areEqual(getUsername(), other.getUsername(), true);
}
/**
* {@inheritDoc}
*/
@Override
@Transient
protected Object[] getHashCodeData() {
return new Object[] { getUsername() };
}
This has worked well so far. It’s consistent, even when an instance transitions from new to persistent and its primary key becomes non-null
. The data used for hashCode
calculation never changes, so that’s consistent. The data for equals
does change, but it stays consistent – comparing two persistent entities or mixing persistent and new returns the correct result in all cases since non-pk data is used for comparison.
Note that equals
is final but hashCode
isn’t to allow caching.
My original implementation of this was way too simplistic, just using the Jakarta Commons Lang
reflection methods that I described here
. This approach is great for regular value objects and DTOs, but is way too expensive for Hibernate entities since every field is used, even collections and the primary key. Lazy-loaded collections will be retrieved from the database to calculate hashCode
and equals
, which is almost always inappropriate.
So I worked on a slightly smarter approach that skipped the primary key and collections but still used reflection for the rest of the fields. This is still too expensive, and breaks when using proxies since the fields will be null
– the getter methods delegate to the data that’s in the proxy’s target instance.
Eventually I ended up with the current implementation, which has lasted for a while with no issues.
I’ve been doing a lot of Hibernate refactoring and performance tuning at work lately and have been spending time wrestling with some interesting anti-patterns that have crept into the code....