Why You Should Avoid Using JPA/Hibernate in Production

April 3, 2021

Disclaimer: I passionately hate JPA and Hibernate.

My relationship with Hibernate (JPA hasn’t yet existed back then) has been rocky from the very beginning. Somewhere as far back as 2005 or 2007, an interviewer asked me how I’d go about mapping the 1-N relationship with Hibernate. ”I have no idea what Hibernate is,” I said.

Then in 2008 I got a job at Softage to work on a project based on Swing and Hibernate. I had zero commercial experience with either, so it seemed to me I was terrible at the job. I spent a few weeks worrying my head over it and then quit.

No Java developer can really eschew having to work with JPA altogether, though. So naturally, I too had to live through with it. I’ve had to dig up and fix a mountain of bugs and performance issues caused by the coding style JPA enforces. To do that, I had to understand how JPA works. Still, I’ll never get back the hours of doing that. That’s why I hate JPA.

In this article, I’ve discussed both the facts and my own experience. It’s up to you to interpret these.

JPA’s Philosophy

Here’s my loose take on JPA’s philosophy: "Forget the database–just declare your own object model. Work with it as if it’s already in the memory in full. We’ll take care about getting all objects saved in the database."

Perhaps the actual philosophy behind JPA is different–I couldn’t find anything in my research. However, that’s certainly what the folk version of it sounds like.

Hibernate’s Simplified Model

en jpa model — Figure 1. Hibernate’s Simplified Model

To make the 'Just think all your objects to already be in the memory' promise come true, that’s how Hibernate works:

The app initiates a transaction via entityManager.getTransaction().begin() (transactions can be stored in memory as well, which doesn’t contradict JPA’s philosophy);
The app loads the data via entityManager:
1. EntityManager forms up queries and receives table rows via JDBC-Driver.
2. ORM creates proxies for entity objects using these table rows.
3. Before passing all the objects on to the app, entityManager saves them into Persistence Context;
The app modifies the objects via setters;
1. Since these are proxies, though, the setters also mark the objects as dirty on the way.
The app commits the transaction via entityManager.getTransaction().commit();
1. Here, the entityManager looks through Persistence Context, then saves or updates all the objects into the database, both dirty and new.

Now, it might seem everything’s perfect. Hey mom, no boilerplate code you’d usually need to work with a database! It’s almost like you could hand this task over to a Java coding bootcamp graduate, and they’d start delivering value from the go. However, you can only get by with this while working on the very first version of a simple system used by you and your QA. It’s only when the system gets under any load and starts to evolve you can see this design crumbling.

JPA’s Advantages

Everything has both pros and cons. I must admit there are some bright sides even to JPA.

With no doubt, it’s the most commonly used database access solution for Java. This implies three more upsides.

There’s a multitude of free JPA learning materials for any level. As long as you’re writing idiomatic JPA code, you’ll easily find a solution for any problem you might have online.
It’s easy to hire a developer who’s familiar with JPA. You can just get anyone, and there’ll be a 99% chance they have at least some experience with JPA.
JPA is supported by everyone and everywhere. For example, Kotlin has made a dedicated plugin for JPA compatibility.

What’s more, JPA is using the imperative object-oriented paradigm everyone’s familiar with. Need someone who can work with JPA? Anyone on the market will fit the bill. This model is genuinely easy-to-use, and in most cases will help you come up with a perfectly smooth solution.

Given as an isolated task, JPA can handle saving and loading entity graphs into the database full well.

JPA is also good at smoothing down the differences between several SQL dialects, which comes handy if your project’s supposed to support multiple DBMSes. However, this only works as long as you can stick with the lowest common denominator of these dialects. As a rule, though, every dialect’s best—performance-wise—features are usually unique to it.

Finally, it’s a solid solution–I can’t recall facing any bugs with Hibernate.

There are downsides to everything as well, though, and JPA has quite a lot of these.

JPA’s issues

The root of all JPA’s problems is paradigmatic, not technical. JPA attempts to make it as if there’s no database at all. Particularly, it tries to take the developer’s mind off the fact that all changes must be reflected in the database. Essentially, DMBSes are state mutations managers. Therefore, JPA has no other choice but to use the imperative programming model. That’s the only way you can pass a "POJO" to the app and keep track of its state mutations. So while pursuing this, JPA excludes the more ergonomic declarative model.

JPA undermines both design and performance. First, let’s take a look at how JPA can jeopardize software design.

Classes must be open for inheritance

JPA requires entity classes to be open for inheritance:

The entity class must not be final
— JSR 338: JavaTM Persistence API; Version 2.2; "2.1 The Entity Class"

However, you must either design and document your classes for inheritance or prohibit it. I’ll quote the classics here: Effective Java, chapter "Item 19: Design and document for inheritance or else prohibit it."

It would take much more effort to design a class for inheritance than to define a data structure with a bunch of fields, as well as setters and getters for these.

I’ve never seen a JPA Entity designed with inheritance in mind.

Although JPA’s entity inheritability can cause potential problems, I’ve never encountered any in my experience.

Default constructors

JPA requires that all entity classes include constructors by default:

The entity class must have a no-arg constructor.
— JSR 338: JavaTM Persistence API; Version 2.2; "2.1 The Entity Class, " https://github.com/javaee/jpa-spec/blob/master/jsr338-MR/JavaPersistence.pdf

Note that default constructors are an antipattern and a ticking bomb, as they allow for invariant-violating objects. They also cause temporal coupling. Look here for more details.

You can partly avoid this problem by making the default constructor package private and marking it as @Deprecated.

I’ve never seen anyone but me adhere to this practice, though.

Objects must be mutable

JPA can’t work with immutable objects "By Design," and mutability is inherent to its specification:

An update to the state of an entity includes both the assignment of a new value to a persistent property or field of the entity as well as the modification of a mutable value of a persistent property or field
— JSR 338: JavaTM Persistence API; Version 2.2; "3.2.4 Synchronization to the Database"

If your entire model is mutable, though, you get all the issues with:

To minimize its abstraction leaks, JPA needs to make sure an object in memory only corresponds to just one row in the table. So if instead of mutating an object you create a new instance of it with an updated state, JPA will treat it as a new object. Naturally, it’ll also link it to a new table row. If you try to save this new instance, JPA will attempt to insert it. This will cause a primary key uniqueness violation.

You can partly avoid this by making your entities immutable and only ever performing updates with UPDATE queries. However, this will only work as long as you have to update just one object. Things get much more complicated if you’re working with an immutable object graph, though. You’ll need to manually write the queries of each type and–manually as well–run the UPDATE queries through the graph.

Bad procedural programming style

The previous two points with addition of several more minor JPA constraints leads to degradation of programming style to procedural. There are data sturctures without behaviour (JPA entities) and imperative procedures to manipulate them (services). Welcome to the 1981.

In seventies classics, like Larry Constantine in Structured Design, had discovered universal structure of maintable programs:

This structure is still actual in disguise of Clean Architecture и Functional core/Imperative shell.

However JPA turns it in following structure:

It’s very difficult to analyze this secret IO. The difficulty leads to big problems with performance - try to understand count of sql queries executed while request handling, and regressions - try to understand which rows and how will be changed in DB. Additionally such structures requires usage of mocks in business rules tests (transformation). And tests with mocks are probing method implementations instead of contracts and breaks after any little refactoring.

It isn’t relates to JPA directly, but in my practise JPA-developers thinks that they are programming in OO-style and do not study "old junk" such as structured programming and structured design. As result they are write bad procedural code with low cohesion, high coupling, scopes of decisions effects exceedings scopes of control (see 9.4 Scope of effect/scope of control) etc.

Welcome to the 1981. I recommend to abstain of usage of go to operator.

All code becomes side effect code

JPA transforms virtually all of your code into side effects code.

Every getter can get the query executed or start doing so tomorrow. Every function call can mutate your object and add a new UPDATE request to a transaction.

You can read more about all the issues that come with side effect code here (rus).

Let’s now take a look at some performance issues JPA can bring.

Lazy loading

JPA is big on lazy loading. It’s the default option for OneToMany and ManyToMany relations. Also, in the JPA world, lazy loading is considered "the best practice."

I wouldn’t be surprised if lazy loading was responsible for 1 percent of global energy consumption. Lazy loading was the reason behind 90% of performance issues I’ve had to deal with in JPA-based projects.

Time and time again, I’d greatly improve on the performance of JPA-based system’s parts with this algorithm.

Count the number of queries run by the code.
Your heart will miss a beat once you see a few hundred queries instead of just a few. Get it back in rhythm.
Throw all your code away. Write a bunch of queries manually. Write new code based off these.
Voila.

I can only assume here, but I believe that’s what’s usually happening behind the curtains.

The developer needs to implement a new function ASAP.
In that part of the code where the new function is supposed to be, there already is an object with a getter returning a list that comprises all the necessary data.
The developer invokes that getter and loops through it.
In ca. 60% of the time, the developer doesn’t realize that they’re adding a new request by invoking the getter. By looping through it, they’re adding N more.
In 30% of the time they do realize that, but brush it off because "Premature optimization is the root of all evil."
In 7% of the time they add a new task to the technical debt graveyard.
Finally, in just 3% of the time they take full responsibility, get the deadline extended, and come up with an effective solution.
In my experience, I usually have it rougly the same way in JPA projects. In the best-case scenario, I end up with a corresponding 60/0/30/10 percent distribution.
The developer repeats Step no. 3 a few times. Bonus points for using a bunch of nested loops with lazy loading. This way, the number of queries will show exponential growth.
The developer runs some tests using demo data with just a couple of rows in the table. No problems arise.
Voila! You’re now free to hire me so that I can fix all these performance issues.

With lazy loading, you must always be on the lookout. Every time you write something like entity.getXXXs, ask yourself whether this may cause a N+1 request to pop up there. Personally, I lack the self-discipline for that.

Speaking about lazy loading, we must also mention the infamous LazyInitializationException. I still keep stumbling upon it in production apps so often you’d be surprised.

Finally, here’s a problem unique to JPA: it doesn’t offer convenient means to dynamically manage lazy loading. You could use NamedEntityGraph in some cases. It’s quite cumbersome, though, so you’d be really tempted to get back to lazy loading.

You need an extra query to refresh entities

This issue is similar the immutable objects one mentioned above. You’ll face it once you need to refresh an entity based off a external DTO, e.g. one you got in a HTTP request. There are two ways to do this in JPA.

The idiomatic way: run an extra SELECT query to put the object into PersistenceContext and mutate it.
The efficient way: use UPDATE again.

The first way seems questionable from the efficiency standpoint. The second one looks like you’re fighting the framework. Wasn’t it supposed to make your life easier?

In theory, you could also store your entities in a HTTP session. However, in the horizontal scaling era, this option is better left in theory.

You need an extra query to reference an entity

Here’s the third problem that stems from the very same root. Let’s say you need to create a new entity that references an existing one with a known ID. There are two ways to do that in JPA: you can either run an extra query–thus sacrificing performance–or fight JPA.

Caching

Basically, you can’t cache JPA entities.

If your entities have setters, they’ll be uncacheable, just because you won’t be able to synchronize the concurrent access to them.

Even if your JPA entities are immutable, being cached, they’ll turn useless once the transaction they were loaded into gets closed. You’ll still be able to access the data within it, but you won’t be able to reference it.

Finally, for an entity with lazy fields, you’ll eventually get the LazyInitializationException.

I’m sure this list will go on and on. For now, I’ve just touched the very tip of the iceberg here.

It seems like JPA can be used without having to sacrifice either design or performance. You’ll need to abandon writing idiomatic code, though, which eliminates all the advantages of using JPA. This approach barely gets talked about, if ever, so learning materials are scarce. Virtually all developers aren’t familiar with it, and maintenance can get tricky.

So naturally, we come to the question whether it’s worth it if we’re prioritizing design and performance for our system. If the answer’s "no," what could be the possible alternatives?

JPA Alternatives

All the issues outlined above aren’t inherent to object relational mapping per se. These problems are inherent to a very specific approach towards ORM. They are but a natural consequence of it trying to emulate the work with objects within memory.

Other solutions exist where you don’t need to sacrifice design and performance for the sake of idiomatic code. Some of these resemble JPA.

Spring Data Jdbc/R2dbc

docs.spring.io/spring-data/jdbc

Right now, I prefer working with databases via Spring Data Jdbc/R2dbc (SDJ).

This technology can tick some of the boxes that are commonly thought to be unique to JPA:

Those developers who are familiar with Spring Data JPA already know most of SDJ.
It’s still the good old Spring Data tech that can automagically generate implementations for methods such as findByName(name: String).
It’s a 'reliable solution from a trusted vendor,' which makes it easier to sell to your client or CTO than other alternatives.

Nonetheless, SDJ is quite ergonomic by design:

Spring Data JDBC aims to be much simpler conceptually, by embracing the following design decisions:
If you load an entity, SQL statements get run. Once this is done, you have a completely loaded entity. No lazy loading or caching is done.
If you save an entity, it gets saved. If you do not, it does not. There is no dirty tracking and no session.
There is a simple model of how to map entities to tables. It probably only works for rather simple cases. If you do not like that, you should code your own strategy. Spring Data JDBC offers only very limited support for customizing the strategy with annotations.
— Spring Data JDBC Reference Documentation, https://docs.spring.io/spring-data/jdbc/docs/2.1.7/reference/html/#jdbc.why

A bit down below we read:

Try to stick to immutable objects — Immutable objects are straightforward to create as materializing an object is then a matter of calling its constructor only. Also, this avoids your domain objects to be littered with setter methods that allow client code to manipulate the objects state. If you need those, prefer to make them package protected so that they can only be invoked by a limited amount of co-located types. Constructor-only materialization is up to 30% faster than properties population.
Provide an all-args constructor — Even if you cannot or don’t want to model your entities as immutable values, there’s still value in providing a constructor that takes all properties of the entity as arguments, including the mutable ones, as this allows the object mapping to skip the property population for optimal performance.
— Spring Data JDBC Reference Documentation, https://docs.spring.io/spring-data/jdbc/docs/2.1.7/reference/html/#mapping.general-recommendations

What’s more, even though…

All Spring Data modules are inspired by the concepts of “repository”, “aggregate”, and “aggregate root” from Domain Driven Design.
— Spring Data JDBC Reference Documentation, https://docs.spring.io/spring-data/jdbc/docs/2.1.7/reference/html/#jdbc.domain-driven-design

…every single Spring Data JPA-based project I’ve ever encountered in the wild was nothing like that. Usually, they’d ignore DDD, create a repository per table, and have a complete bi-directed graph of all entities.

It seems like the team behind SJD is of the same opinion:

These are possibly even more important for Spring Data JDBC, because they are, to some extent, contrary to normal practice when working with relational databases.
— Spring Data JDBC Reference Documentation

This so-called "normal practice" is a disastrous nightmare from design, maintenance, and performance standpoints. It can quickly provide you with a makeshift solution for problem at hands, though.

Since SDJ doesn’t have lazy loading, you won’t get away with this "normal practice." The team will have to go through with designing the data model and breaking it down into aggregates.

So far, I’ve only tried these technologies (that is, JDBC and R2DBC) in two small-scale projects, but I’ve been quite happy with the results.

jooq

jooq.org

jooq is the first JPA alternative I’ve had successfull commercial experience with.

jooq uses Java DSL to make SQL queries. It also features a powerful infrastructure for query execution, as well as DAO generation for CRUD operations.

There are two main downsides to it. First, source code takes an extra step to be generated. Second, you’ll need a paid license to work with commercial databases.

Ebean

ebean.io

Ebean is a yet another technology I’ve had some fairly successful commercial experience with.

This tech is the closest you can get to JPA; it’s a full-fledged ORM. Unlike JPA, though, Ebean poses no strict design limitations and shows much better performance by default.

Learning materials on Ebean are few and far between, however, apart from the official docs. I’ve noticed some peculiarities in its behavior, though. Moreover, Ebean features an annotation preprocessor which slows the build down quite a bit and doesn’t always work smoothly in IntelliJ Idea.

Still, I delivered the project on time. I even managed to develop just my usual fair share of new premature gray hair.

MyBatis

mybatis.org

I haven’t had the chance to try MyBatis myself in commercial environment. As far as I know, though, it makes a popular alternative to JPA as well.

What to do if JPA is unavoidable

Often JPA is unavoidable. Someone is given with large legacy, which must be maintained. Someone is given with new project, where technologies is dictated by "Architect" or customer.

Already after my post publication I stumbled upon this post. In this post author describes all (and little more) rules, that I use to minimize JPA’s damage in projects where I didn’t managed to avoid it. In particular I recommend to:

Stop having public default constructor and setters
Keep JPA DAOs outside of the domain as much as you can
Stop adding multi-directional association
Stop adding entity mappings whenever its possible

Conclusion

In my opinion, JPA comes in handy when you need a fast, poor, and cheap solution.

That means using JPA makes sense if:

you need to come up with a quick prototype, or
you need to develop a small-scale internal system that’s meant for just a few dozen tables and users.

In these cases, saving entities into the HTTP session would make practical sense.

I’d recommend against JPA if your project will be facing a higher load or feature a more complex domain model. Here, you’d be much better off using one of the alternative technologies instead.

Links

More links with JPA critics and hacks to workaround it problems: