Sunday, April 22, 2012

JPA Pagination and batch reading of one-to-many relations

Context: You work at a library and you are to develop a web service which consumers can use to access library information. E.g. there is to be a service to browse authors and their associated books. The library have more than 1 million books from about just as many authors.

Pagination is one way to browse a large result set of a query as e.g. returned from the JPQL query “select a from Author a order by a.lastName, a.firstName “, and luckily, JPA supports query pagination out of the box.

If you in front of your JPA component have a web service which is to return a list of complex structures like e.g. authors and the books they have written, you will need to load all data to be returned. This might cause a performance problem, because with N authors in the result set, you will get N database calls, to fetch the books for each author, in total N+1 database calls.

Maybe you would consider to “remove” the N database calls of the books and select the books together with the authors e.g. using join fetch of the books e.g. “select a from Author a join fetch a.books order by a.lastName, a.firstName” (*), that would give you all your needed information in 1 database call. The problem is that pagination in JPA does not work (as expected) with join fetch of a one-to-many association.

If you are using eclipselink then a solution to this potential performance problem could be to fetch associated data using batch reading. In the author example above you would then be able to reduce N+1 database calls into 2 calls. One call to get the authors and one call to get all the authors associated books.

Eclipselink supports three types of batch reading: EXISTS, IN and JOIN.

One thing to notice is that when using batch reading in eclipselink then associated data will be loaded in batches the first time they are accessed. This means you need to access data to load them. I.e. data is not eagerly loaded.

The javadoc for the BatchFetchType.IN option is shown here

This uses an SQL IN clause in the batch query passing in the source object Ids. This has the advantage of only selecting the objects not already contained in the cache, and can work better with cursors, or if joins cannot be used. This may only work for singleton Ids on some databases.

Below in green I have shown three different ways to configure the use of batch reading in eclipselink. I have shown it for BatchFetchType.IN, but it applies as well for EXISTS and JOIN.

Example 1: Annotation on the associated @OneToMany mapped property
@Entity
@Table(name="AUTHOR")
public class Author implements Serializable {
    …

    @OneToMany(mappedBy="author")
    @BatchFetch(value=BatchFetchType.IN)
    private Set books;

    ...
}
Example 2: @QueryHint in @NamedQuery
@NamedQueries(value={
@NamedQuery(
    name="Author.selectAll", 
    query="select a from Author a order by a.lastName, a.firstName", 
    hints={
        @QueryHint(name="eclipselink.batch.type", value= "IN"),
        @QueryHint(name=QueryHints.BATCH, value="a.books")})})
Example 3: setHint on the query
TypedQuery q = em.createQuery("select a from Author a 
		order by a.lastName, a.firstName", Author.class);
q.setHint("eclipselink.batch.type", "IN");
q.setHint(QueryHints.BATCH, "a.books");
Further reading:

(*): Or “select distinct a from Author a join fetch a.books order by a.lastName, a.firstName” if you want to remove duplicates

No comments: