Thursday, June 9, 2011

How to improve JPA performance by 1,825%

The Java Persistence API (JPA) provides a rich persistence architecture. JPA hides much of the low level dull-drum of database access, freeing the application developer from worrying about the database, and allowing them to concentrate on developing the application. However, this abstraction can lead to poor performance, if the application programmer does not consider how their implementation affects database usage.

JPA provides several optimization features and techniques, and some pitfalls waiting to snag the unwary developer. Most JPA providers also provide a plethora of additional optimization features and options. In this blog entry I will explore the various optimizations options and techniques, and a few of the common pitfalls.

The application is a simulated database migration from a MySQL database to an Oracle database. Perhaps there are more optimal ways to migrate a database, but it is surprising how good JPA's performance can be, even in processing hundreds of thousand or even millions of records. Perhaps it is not a straight forward migration, or the application's business logic is required, or perhaps the application has already been persisted through JPA, so using JPA to migrate the database is just easiest. Regardless, this fictitious use case is a useful demonstration of how to achieve good performance with JPA.

The application consists of an Order processing database. The model contains a Customer, Order and OrderLine. The application reads all of the Orders from one database, and persists them to the second database. The source code for the example can be found here.

The initial code for the migration is pretty simple:

EntityManagerFactory emf = Persistence.createEntityManagerFactory("order");
EntityManager em = emf.createEntityManager();
EntityManagerFactory emfOld = Persistence.createEntityManagerFactory("order-old");
EntityManager emOld = emfOld.createEntityManager();
Query query = emOld.createQuery("Select o from Order o");
List orders = query.getResultList();
em.getTransaction().begin();
// Reset old Ids, so they are assigned from the new database.
for (Order order : orders) {
order.setId(0);
order.getCustomer().setId(0);
}
for (Order order : orders) {
em.persist(order);
for (OrderLine orderLine : order.getOrderLines()) {
em.persist(orderLine);
}
}
em.getTransaction().commit();
em.close();
emOld.close();
emf.close();     
emfOld.close();

The example test runs this migration using 3 variables for the number of Customers, Orders per Customer, and OrderLines per Order. So, 1000 customers, each with 10 orders, and each with 10 order lines, would be 111,000 objects.

The test was run on a virtualized 64 bit Oracle Sun server with 4 virtual cores and 8 gigs of RAM. The databases run on similar machines. The test is single threaded, running in Oracle Sun JDK 1.6. The tests are run using EclipseLink JPA 2.3, and migrating from a MySQL database to an Oracle database.

This code functions fine for a small database migration. But as the database size grows, some issues become apparent. It actually handles 100,000 objects surprisingly well, taking about 2 minutes. This is surprisingly well, given it is thoroughly unoptimized and persisting all 100,000 objects in a single persistence context and transaction.

Optimization #1 - Agent

EclipseLink implements LAZY for OneToOne and ManyToOne relationships using byte code weaving. EclipseLink also uses weaving to perform many other optimizations, such as change tracking and fetch groups. The JPA specification provides the hooks for weaving in EJB 3 compliant application servers, but in Java SE or other application servers weaving is not performed by default. To enable EclipseLink weaving in Java SE for this example the EclipseLink agent is used. This is done using the Java -javaagent:eclipselink.jar option. If dynamic weaving is unavailable in your environment, another option is to use static weaving, for which EclipseLink provides an ant task and command line utility.

Optimization #2 - Pagination

In theory at some point you should run out of memory by bringing the entire database into memory in a single persistence context. So next I increased the size to 1 million objects, and this gave the expect out of memory error. Interestingly this was with only using a heap size of 512 meg. If I had used the entire 8 gigs of RAM, I could, in theory, have persisted around 16 million objects in a single persistence context. If I gave the virtualized machine the full 98 gigs of RAM available on the server, perhaps it would even be possible to persist 100 millions objects. Perhaps we are beyond the day when it does not make sense to pull an entire database into RAM, and perhaps this is no longer such as crazy thing to do. But, for now, lets assume it is an idiotic thing to do, so how can we avoid this?

JPA provides a pagination feature that allows a subset of a query to be read. This is supported in JPA in the Query setFirstResult,setMaxResults API. So instead of reading the entire database in one query, the objects will be read page by page, and each page will be persisted in its own persistence context and transaction. This avoids ever having to read the entire database, and also should, in theory, make the persistence context more optimized by reducing the number of objects it needs to process together.

Switching to using pagination is relatively easy to do for the original orders query, but some issues crop up with the relationship to Customer. Since orders can share the same customer, it is important that each order does not insert a new customer, but uses the existing customer. If the customer for the order was already persisted on a previous page, then the existing one must be used. This requires the usage of a query to find the matching customer in the new database, which introduces some performance issues we will discuss later.

The updated code for the migration using pagination is:

EntityManagerFactory emf = Persistence.createEntityManagerFactory("order");
EntityManagerFactory emfOld = Persistence.createEntityManagerFactory("order-old");
EntityManager emOld = emfOld.createEntityManager();
Query query = emOld.createQuery("Select o from Order o order by o.id");
int pageSize = 500;
int firstResult = 0;
query.setFirstResult(firstResult);
query.setMaxResults(pageSize);
List orders = query.getResultList();
boolean done = false;
while (!done) {
if (orders.size() < pageSize) {
        done = true;
    }
    EntityManager em = emf.createEntityManager();
    em.getTransaction().begin();
    Query customerQuery = em.createNamedQuery("findCustomByName");
    // Reset old Ids, so they are assigned from the new database.
    for (Order order : orders) {
        order.setId(0);
        customerQuery.setParameter("name", order.getCustomer().getName());
        try {
            Customer customer = (Customer)customerQuery.getSingleResult();
            order.setCustomer(customer);
        } catch (NoResultException notPersistedYet) {
            // Customer does not yet exist, so null out id to have it persisted.
            order.getCustomer().setId(0);
        }
    }
    for (Order order : orders) {
        em.persist(order);
        for (OrderLine orderLine : order.getOrderLines()) {
            em.persist(orderLine);
        }
    }
    em.getTransaction().commit();
    em.close();
    firstResult = firstResult + pageSize;
    query.setFirstResult(firstResult);
    if (!done) {
        orders = query.getResultList();
    }
}
emOld.close();
emf.close();     
emfOld.close();

Optimization #3 - Query Cache

This will introduce a lot of queries for customer by name (10,000 to be exact), one for each order. This is not very efficient, and can be improved through caching. In EclipseLink there is both an object cache and a query cache. The object cache is enabled by default, but objects are only cached by Id, so this does not help us on the query using the customer's name. So, we can enable a query cache for this query. A query cache is specific to the query, and caches the query results keyed on the query name and its parameters. A query cache is enabled in EclipseLink through using the query hint "eclipselink.query-results-cache"="true". This should be set where the query is defined, in this case in the orm.xml. This will reduce the number of queries for customer to 1,000, which is much better.

There are other solutions to using the query cache. EclipseLink also supports in-memory querying. In-memory querying means evaluating the query on all of the objects in the object cache, instead of accessing the database. In-memory querying is enabled through the query hint "eclipselink.cache-usage"="CheckCacheOnly". If you enabled a full cache on customer, then as you persisted the orders all of the existing customers would be in the cache, and you would never need to access the database. Another manual solution is to maintain a Map in the migration code keying the new customer's by name. For all of the above solutions if the cache is made fixed sized (query cache defaults to a size of 100), you would never need all of the customers in memory at the same time, so there would be no memory issues.

Optimization #4 - Batch Fetch

The most common performance issue in JPA is in the fetch of relationships. If you query n orders, and access their order-lines, you get n queries for order-line. This can be optimized through join fetching and batch fetching. Join fetching, joins the relationship in the original query and selects from both tables. Batch fetch executes a second query for the related objects, but fetches them all at once, instead of one by one. Because we are using pagination, this make optimizing the fetch a little more tricky. Join fetch which still work, but since order-lines is join fetched, and there are 10 order-lines per order, the page size that was 500 orders, in now only 50 orders (and their 500 order-lines). We can resolve this by increasing the page size to 5000, but given in a real application the number of order-lines in not fixed, this becomes a bit of a guess. But the page size was just a heuristic number anyway, so no real issue. Another issue with join fetching with pagination is the last and first object may not have all of its related objects, if it falls in-between a page. Fortunately EclipseLink is smart enough to handle this, but it does require 2 extra queries for the first and last order of each page. Join fetching also has the draw back that it is selecting more data when a OneToMany is join fetched. Join fetching is enable in JPQL using join fetch o.orderLine.

Batch fetching normally works by joining the original query with the relationship query, but because the original query used pagination, this will not work. EclipseLink supports three types of batch fetching, JOIN, EXISTS, and IN. IN works with pagination, so we can use IN batch fetching. Batch fetch is enabled through the query hint "eclipselink.batch"="o.orderLines", and "eclipselink.batch.type"="IN". This will reduce the n queries for order-line to 1. So for each batch/page of 500 orders, there will be 1 query for the page of orders, and 1 query for the order-lines, and 50 queries for customer.

Optimization #5 - Read Only

The application is migrating from the MySQL database to the Oracle database. So is only reading from MySQL. When you execute a query in JPA, all of the resulting objects become managed as part of the current persistence context. This is wasteful in JPA, as managed objects are tracked for changes and registered with the persistence context. EclipseLink provides a "eclipselink.read-only"="true" query hint that allows the persistence context to be bypassed. This can be used for the migration, as the objects from MySQL will not be written back to MySQL.

Optimization #6 - Sequence Pre-allocation

We have optimized the first part of the application, reading from the MySQL database. The second part is to optimize the writing to Oracle.

The biggest issue with the writing process is that the Id generation is using an allocation size of 1. This means that for every insert there will be an update and a select for the next sequence number. This is a major issue, as it is effectively doubling the amount of database access. By default JPA uses a pre-allocation size of 50 for TABLE and SEQUENCE Id generation, and 1 for IDENTITY Id generation (a very good reason to never use IDENTITY Id generation). But frequently applications are unnecessarily paranoid of holes in their Id values and set the pre-allocaiton value to 1. By changing the pre-allocation size from 1 to 500, we reduce about 1000 database accesses per page.

Optimization #7 - Cascade Persist

I must admit I intentionally added the next issue to the original code. Notice in the for loop persisting the orders, I also loop over the order-lines and persist them. This would be required if the order did not cascade the persist operation to order-line. However, I also made the orderLines relationship cascade, as well as order-line's order relationship. The JPA spec defines somewhat unusual semantics to its persist operation, requiring that the cascade persist be called every time persist is called, even if the object is an existing object. This makes cascading persist a potentially dangerous thing to do, as it could trigger a traversal of your entire object model on every persist call. This is an important point, and I added this issue purposefully to highlight this point, as it is a common mistake made in JPA applications. The cascade persists causes each persist call to order-line to persist its order, and every order-line of the order again. This results in an n^2 number of persist calls. Fortunately there are only 10 order-lines per order, so this only results in 100 extra persist calls per order. It could have been much worse if the customer defined a relationship back to its orders, then you would have 1000 extra calls per order. The persist does not need to do anything, as the objects are already persisted, but the traversal can be expensive. So, in JPA you should either mark your relationships cascade persist, or call persist in your code, but not both. In general I would recommend only cascading persist for logically dependent relationships (i.e. things that would also cascade remove).

Optimization #8 - Batch Writing

Many databases provide an optimization that allows a batch of write operations to be performed as a single database access. There is both parametrized and dynamic batch writing. For parametrized batch writing a single parametrized SQL statement can be executed with a batch of parameter vales instead of a single set of parameter values. This is very optimal as the SQL only needs to be executed once, and all of the data can be passed optimally to the database.

Dynamic batch writing requires dynamic (non-parametrized) SQL that is batched into a single big statement and sent to the database all at once. The database then needs to process this huge string and execute each statement. This requires the database do a lot of work parsing the statement, so is no always optimal. It does reduce the database access, so if the database is remote or poorly connected with the application, this can result in an improvement.

In general parametrized batch writing is much more optimal, and on Oracle it provides a huge benefit, where as dynamic does not. JDBC defines the API for batch writing, but not all JDBC drivers support it, some support the API but then execute the statements one by one, so it is important to test that your database supports the optimization before using it. In EclipseLink batch writing is enabled using the persistence unit property "eclipselink.jdbc.batch-writing"="JDBC".

Another important aspect of using batch writing is that you must have the same SQL (DML actually) statement being executed in a grouped fashion in a single transaction. Some JPA providers do not order their DML, so you can end up ping-ponging between two statements such as the order insert and the order-line insert, making batch writing in-effective. Fortunately EclipseLink orders and groups its DML, so usage of batch writing reduces the database access from 500 order inserts and 5000 order-line inserts to 55 (default batch size is 100). We could increase the batch size using "eclipselink.jdbc.batch-writing.size", so increasing the batch size to 1000 reduces the database accesses to 6 per page.

Optimization #9 - Statement caching

Every time you execute an SQL statement, the database must parse that statement and execute it. Most of the time application executes the same set of SQL statements over and over. By using parametrized SQL and caching the prepared statement you can avoid the cost of having the database parse the statement.

There are two levels of statement caching. One done on the database, and one done on the JDBC client. Most databases maintain a parse cache automatically, so you only need to use parametrized SQL to make use of it. Caching the statement on the JDBC client normally provides the bigger benefit, but requires some work. If your JPA provider is providing you with your JDBC connections, then it is responsible for statement caching. If you are using a DataSource, such as in an application server, then the DataSource is responsible for statement caching, and you must enable it in your DataSource config. In EclipseLink, when using EclipseLink's connection pooling, you can enable statement caching using the persistence unit property "eclipselink.jdbc.cache-statements"="true". EclipseLink uses parametrized SQL by default, so this does not need to be configured.

Optimization #10 - Disabling Caching

By default EclipseLink maintains a shared 2nd level object cache. This normally is a good thing, and improves read performance significantly. However, in our application we are only inserting into Oracle, and never reading, so there is no point to maintaining a shared cache. We can disable this using the EclipseLink persistence unit property "eclipselink.cache.shared.default"="false". However, we are reading customer, so we can enable caching for customer using, "eclipselink.cache.shared.Customer"="true".

Optimization #11 - Other Optimizations

EclipseLink provides several other more specific optimizations. I would not really recommend all of these in general as they are fairly minor, and have certain caveats, but they are useful in use cases such as migration where the process is well defined.

These include the following persistence unit properties:
  • "eclipselink.persistence-context.flush-mode"="commit" - Avoids the cost of flushing on every query execution.
  • "eclipselink.persistence-context.close-on-commit"="true" - Avoids the cost of resuming the persistence context after the commit.
  • "eclipselink.persistence-context.persist-on-commit"="false" - Avoids the cost of traversing and persisting all objects on commit.
  • "eclipselink.logging.level"="off" - Avoids some logging overhead.
The fully optimized code:
EntityManagerFactory emf = Persistence.createEntityManagerFactory("order-opt");
EntityManagerFactory emfOld = Persistence.createEntityManagerFactory("order-old");
EntityManager emOld = emfOld.createEntityManager();
System.out.println("Migrating database.");
Query query = emOld.createQuery("Select o from Order o order by o.id");
// Optimization #2 - batch fetch
// #2 - a - join fetch
//Query query = emOld.createQuery("Select o from Order o join fetch o.orderLines"); // #2 - b - batch fetch (batch fetch is more optimal as avoids duplication of Order data)
query.setHint("eclipselink.batch", "o.orderLines"); query.setHint("eclipselink.batch.type", "IN");
// Optimization #3 - read-only
query.setHint("eclipselink.read-only", "true");
// Optimization #4 - pagination int pageSize = 500; int firstResult = 0; query.setFirstResult(firstResult);
query.setMaxResults(pageSize); 
List orders = query.getResultList();
boolean done = false;
while (!done) {
    if (orders.size() < pageSize) {
        done = true;
    }
    EntityManager em = emf.createEntityManager();
    em.getTransaction().begin();
    Query customerQuery = em.createNamedQuery("findCustomByName");
    // Reset old Ids, so they are assigned from the new database.
    for (Order order : orders) {
        order.setId(0);
        customerQuery.setParameter("name", order.getCustomer().getName());
        try {
            Customer customer = (Customer)customerQuery.getSingleResult();
            order.setCustomer(customer);
        } catch (NoResultException notPersistedYet) {
            // Customer does not yet exist, so null out id to have it persisted.
            order.getCustomer().setId(0);
        }
    }
    for (Order order : orders) {
        em.persist(order);
        // Optimization #5 - avoid n^2 persist calls
        //for (OrderLine orderLine : order.getOrderLines()) {
        //    em.persist(orderLine);
        //}
    }
    em.getTransaction().commit();
    em.close();
    firstResult = firstResult + pageSize;
    query.setFirstResult(firstResult);
    if (!done) {
        orders = query.getResultList();
    }
}
emOld.close();
emf.close();     
emfOld.close();
The optimized persistence.xml:
<persistence-unit name="order-opt" transaction-type="RESOURCE_LOCAL">
    <!--  Optimization #7, 8 - sequence preallocation, query result cache -->
    <mapping-file>META-INF/order-orm.xml</mapping-file>
    <class>model.Order</class>
    <class>model.OrderLine</class>
    <class>model.Customer</class>
    <properties>
        <!-- Change this to access your own database. -->
        <property name="javax.persistence.jdbc.driver" value="oracle.jdbc.OracleDriver" />
        <property name="javax.persistence.jdbc.url" value="jdbc:oracle:thin:@ottvm028.ca.oracle.com:1521:TOPLINK" />
        <property name="javax.persistence.jdbc.user" value="jsutherl" />
        <property name="javax.persistence.jdbc.password" value="password" />
        <property name="eclipselink.ddl-generation" value="create-tables" />
        <!--  Optimization #9 - statement caching -->
        <property name="eclipselink.jdbc.cache-statements" value="true" />
        <!--  Optimization #10 - batch writing -->
        <property name="eclipselink.jdbc.batch-writing" value="JDBC" />
        <property name="eclipselink.jdbc.batch-writing.size" value="1000" />
        <!--  Optimization #11 - disable caching for batch insert (caching only improves reads, so only adds overhead for inserts) -->
        <property name="eclipselink.cache.shared.default" value="false" />
        <!--  Except for Customer which is shared by orders -->
        <property name="eclipselink.cache.shared.Customer" value="true" />
        <!--  Optimization #12 - turn logging off -->
        <!-- property name="eclipselink.logging.level" value="FINE" /-->
        <property name="eclipselink.logging.level" value="off" />
        <!--  Optimization #13 - close EntityManager on commit, to avoid cost of resume -->
        <property name="eclipselink.persistence-context.close-on-commit" value="true" />
        <!--  Optimization #14 - avoid auto flush cost on query execution -->
        <property name="eclipselink.persistence-context.flush-mode" value="commit" />
        <!--  Optimization #15 - avoid cost of persist on commit -->
        <property name="eclipselink.persistence-context.persist-on-commit" value="false" />
    </properties>
</persistence-unit>
The optimized orm.xml:
<?xml version="1.0" encoding="UTF-8"?>
<entity-mappings version="2.1"
    xmlns="http://www.eclipse.org/eclipselink/xsds/persistence/orm"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <named-query name="findCustomByName">
        <query>Select c from Customer c where c.name = :name</query>
        <hint name="eclipselink.query-results-cache" value="true"/>
    </named-query>
    <entity class="model.Order">
        <table-generator name="ORD_SEQ" allocation-size="500"/>
    </entity>
    <entity class="model.Customer">
            <table-generator name="CUST_SEQ" allocation-size="500"/>
    </entity>

</entity-mappings>
So, what is the result? The original un-optimized code took on average 133,496 milliseconds (~2 minutes) to process ~100,000 objects. The fully optimized code took only 6,933 milliseconds (6 seconds). This is very good, and means it could process 1 million objects in about 1 minute. The optimized code is an 1,825% improvement on the original code.

But, how much did each optimization affect this final result? To answer this question I ran the test 3 times with the fully optimized version, but with each optimization missing. This worked out better than starting with the unoptimized version and only adding each operation separately, as some optimizations get masked by the lack of others. So, in the table below the bigger the % difference, the better the optimization (that was removed) was.

OptimizationAverage Result (ms)% Difference
None133,4961,825%
All6,9330%
1 - no agent7,90614%
2 - no pagination8,67925%
3 - no read-only8,32320%
4a - join fetch11,83671%
4b - no batch fetch17,344150%
5 - no sequence pre-allocation30,396338%
6 - no persist loop7,94714%
7 - no batch writing75,751992%
8 - no statement cache7,2334%
9 - with cache7,92514%
10 - other7,3326%

This shows that batch writing was the best optimization, followed by sequence pre-allocation, then batch fetching.

35 comments :

  1. I know you're wanting to primarily use JPA APIs but I'm surprised you didn't mention using a scrollable resultset (as documented at http://wiki.eclipse.org/EclipseLink/Examples/JPA/Pagination#Using_a_ScrollableCursor) instead of pagination. For large numbers of records this can make a huge difference since you end up issuing just a single query against the database to pull out your data instead of one per page.

    Hibernate supports a similar feature. Hopefully scrollable results will make their way into the JPA spec sometime soon.

    Corey

    ReplyDelete
  2. Wonderful article. Thanks for sharing this.

    ReplyDelete
  3. Very good tutorial, I would like a version in Spanish because my English is not very good ... But thanks anyway.

    ReplyDelete
  4. Great nuggets of wisdom, probably earned through trial and tribulation which is exactly why this is timely for my current project. Thanks for sharing!

    ReplyDelete
  5. I don't have words. this is truly fantastic information enriched by lot of experience. Thanks a lot mate for sharing such invaluable information.

    Javin
    How to use comparator and comparable in java with example

    ReplyDelete
  6. Amazing article, awesome blog...

    James, is there an email address I can contact you in private?

    ReplyDelete
  7. I'm not sure about how to increase preallocation size. Can you please provide me an example for "Optimization #6 - Sequence Pre-allocation" please?

    ReplyDelete
  8. @TableGenerator(name="MY_SEQ", allocationSize=100)
    or,
    @SequenceGenerator(name="MY_SEQ", allocationSize=100)
    see,
    http://en.wikibooks.org/wiki/Java_Persistence/Identity_and_Sequencing#Sequence_Strategies

    ReplyDelete
  9. Thanks a lot James. Do you have any setting to enable seperate connection pool for sequence allocation? I don't use any JTA datasource. In my persistence.xml i have these configs:

    name="javax.persistence.jdbc.url"value="xxx"

    name="javax.persistence.jdbc.password"value="xx"

    name="javax.persistence.jdbc.driver"
    value="com.mysql.jdbc.Driver"

    name="javax.persistence.jdbc.user" value="xxxx"

    name="eclipselink.target-database" value="MYSQL"

    name="eclipselink.jdbc.sequence-connection-pool" value="true"

    name="eclipselink.jdbc.read-connections.min" value="1"

    name="eclipselink.jdbc.write-connections.min" value="1"

    What else should i add??

    Thanks.

    ReplyDelete
  10. Thank you for a well written article with practical steps I can take to speed up my project. Technical posts like this can take forever to write, so thanks for the effort.

    ReplyDelete
  11. @Prasath, if you are on the latest release, remove the read/write min setting, by default a single combined pool is now used with a initial of 1, so is more efficient, normally your min should be your max to be most efficient, replace the sequence setting with, "eclipselink.connection-pool.sequence.initial"="1"

    ReplyDelete
  12. Thanks for taking the time to write that. Good info to know.Currently my company is using "IBATIS" and pure "SQL"s as database persistence mechanism. I like SQL query very much, especially in tuning, but i just do not like code all SQL query in Java application, it's easy hit typo error and what a stupid and tedious job? Finally my company has a new project come in, i decided this is the right time to propose Hibernate as our new java database persistence mechanism to my boss.
    personal injury attorney tampa fl

    ReplyDelete
  13. We are just starting of with a new project and decide on JPA/EJB3.0 in Glassfish with Oracle DB. This article is outstanding with the information you discussed here.
    One problem we have and I would really appreciate any input.
    We are using Netbeans to Generate the Persistence Entities. Then used Netbeans to generate the Session Beans for the Entity Classes.
    Database triggers are used to generate the PK value upon DB insert.
    This all works well for us, except that in some instances we need to get the insert PK value back as we need to insert that as part of a reference into other parts. This is all part of the same TX that needs to be committed/rolled back.
    When we query the Entity, the inserted ID is still 0.
    Is there any way of getting this generated value back before flushing the TX?

    The facade (Session Bean) generated look like this
    :
    @Stateless
    public class Tsc06JobQueueFacade extends AbstractFacade {

    public static Logger logger = Logger.getLogger("Tsc06JobQueueFacade");
    @PersistenceContext(unitName = "za.co.fnds_fnds-core_ejb_1.0.0PU")
    private EntityManager em;

    @Override
    protected EntityManager getEntityManager() {
    return em;
    }

    public Tsc06JobQueueFacade() {
    super(Tsc06JobQueue.class);
    }


    And the entity looks like this:

    @Entity
    @Table(name = "TSC06_JOB_QUEUE")
    @XmlRootElement
    @NamedQueries({
    @NamedQuery(name = "Tsc06JobQueue.findAll", query = "SELECT t FROM Tsc06JobQueue t"),
    @NamedQuery(name = "Tsc06JobQueue.findByJobRunId", query = "SELECT t FROM Tsc06JobQueue t WHERE t.jobRunId = :jobRunId"),
    @NamedQuery(name = "Tsc06JobQueue.findByJobStartTime", query = "SELECT t FROM Tsc06JobQueue t WHERE t.jobStartTime = :jobStartTime"),
    @NamedQuery(name = "Tsc06JobQueue.findByJobEndTime", query = "SELECT t FROM Tsc06JobQueue t WHERE t.jobEndTime = :jobEndTime")})
    public class Tsc06JobQueue implements Serializable {
    private static final long serialVersionUID = 1L;

    public static Logger logger = Logger.getLogger("Tsc06JobQueue");

    // @Max(value=?) @Min(value=?)//if you know range of your decimal fields consider using these annotations to enforce field validation
    @Id
    @Basic(optional = false)
    @NotNull
    @Column(name = "JOB_RUN_ID")
    // @SequenceGenerator( name = "appJobSeq", sequenceName = "TSC06_JOB_RUN_ID_SEQ", allocationSize = 1, initialValue = 1 )
    // @GeneratedValue( strategy = GenerationType.SEQUENCE, generator = "appJobSeq" )
    private BigDecimal jobRunId;

    I have tried using the sequence generator option as per uncomented lines, but no success.

    Any help/comments appreciated!
    Andre

    ReplyDelete
  14. We solved this problem - when generating entities the Netbeans tool create the pk fields as not-null. For inserting the object you then need to populate the value with 0. This caused Eclipselink to ignore any reference to sequences,etc.
    Changing the pk column(s) to null-able and then not specifying the pk columns, allows Eclipselink to query the sequence and populate the column and object values correctly. So we learn every hour!

    ReplyDelete
  15. Extremely good piece of writing!

    ReplyDelete
  16. Thanks for your wonderful information which helped us to join java online training

    ReplyDelete
  17. Really nice, would love to have it extended to include the hibernate equivalents.
    Sadly pagination seems to be the best way to improve performance, but it's also the one thing I want to avoid..

    ReplyDelete
  18. Sorry, sir, JPA OneToOne and ManyToOne is Lazy by default?There are EAGER.

    ReplyDelete
  19. This comment has been removed by a blog administrator.

    ReplyDelete
  20. Hi James,

    Thanks for sharing this great article. I have some query related to findCustomByName namedQuery. My understanding is that after using eclipselink.query-results-cache hint as true, all the results including null will be cached. Which means in following try-catch-block, for some customer which does not exist initially, NoResultException should always be raised.

    try {
    Customer customer = (Customer)customerQuery.getSingleResult();
    order.setCustomer(customer);
    } catch (NoResultException notPersistedYet) {
    // Customer does not yet exist, so null out id to have it persisted.
    order.getCustomer().setId(0);
    }

    And we should end up with 10000 Customers instead of 1000 in database. It will be great if you can shed light on mistake in my understanding.

    Thanks.

    ReplyDelete
  21. I think you are correct, the code should be using the hint, "eclipselink.query-results-cache.ignore-null"="true", in 2.5 it should also be using the API setInvalidateOnChange(false), as by default any insert to customer will invalidate the query result cache.

    I think originally the ignore-null option was not working, so that was the default behavior when I ran these tests.

    ReplyDelete
  22. Actually, at the time I wrote this cache indexes were not supported yet, so now you could also use a @CacheIndex instead of the query cache for the customer query.

    There have been a lot of performance enhancements added to EclipseLink since this post, perhaps its time for a "How to improve JPA performance by 2,825%" post.

    ReplyDelete
  23. Hi James,

    Thanks for your response. It is actually encouraging to know that we can do further optimization. I would eagerly wait for your "How to improve JPA performance by 2,825%" post.

    May I ask for further help. It will be great if you could give one example of using setInvalidateOnChange. I think this should resolve my query at http://stackoverflow.com/questions/17465692/eclipselink-query-results-cache-ignore-null-not-caching-any-result

    Also I tried to use @CacheIndex with eclipse link 2.5 but was not successful.
    a) First time customer not found,
    b) created customer,
    c) trying to look for same customer. customer still not found.
    Resulting in 10000 customer.). May I get blessed with some example code, please.

    Thanks.

    ReplyDelete
  24. For an example of @CacheIndex refer to post http://java-persistence-performance.blogspot.com/2013/03/but-what-if-im-not-querying-by-id.html, it includes sample code.

    ReplyDelete
  25. It would be nice to see the source files for the classes involved. I tried the JDBC batch writing, but I din't get any performance gain.
    I use MYSQL and have auto generated primary keys, but since JPA need to know the primary key for each inserted object (to keep the persistence context consistent), batching is not possible, since "select last_insert_id()" on MYSQL only returns the ID of the last inserted record, and not all the keys generated during a batch insert.

    ReplyDelete
    Replies
    1. The code for this is here,

      http://git.eclipse.org/c/eclipselink/examples/performance.git/tree/jpa-performance

      For MySQL you need to enable statement rewriting, see,

      http://java-persistence-performance.blogspot.com/2013/05/batch-writing-and-dynamic-vs.html

      Also DON'T use IDENTITY id generation, instead use TABLE id generation, or a SEQUENCE in other databases.

      Delete
    2. Thanks James.
      I have to support 7 different DBMS's with the same entity classes (which is one of the reasons we're using JPA), so I'm going to use TableGeneration since it will work for all.
      For MySQL (Datasources) we'll have to detect if the customer has configured the connection string for statement rewriting, and then log a 'reduced performance' warning.
      I'm curious to see what the performance gains will be on each of the DBMS.

      Delete
  26. Hi Mr. Sutherland,
    I wonder if you can comment on my question on sof.com
    Here is link.
    http://stackoverflow.com/questions/21174028/eclipselink-entity-mappings-cache

    thanks
    Gopi

    ReplyDelete
  27. Hi!!

    I can't see the point when you say "But the page size was just a heuristic number anyway, so no real issue"

    And what about if I want 4-orders pages, with order-lines join fetched? If a page has, for example, its first order with more than 4 order lines, that page will have only 1 order!!!

    ReplyDelete
  28. Is there any further reading you would recommend on this?

    LDS Infotech
    Oracle Partners India

    ReplyDelete
  29. Thanks for the article, James. It seems like you could use JPA to maximize data usage when accessing a database; could you apply this (or even use JPA) to create self-improving predictive scoring models? Or any predictive modeling, really, it doesn't have to be lead scoring. Predictive modeling requires a lot of iterations and a language/platform that could dynamically write and rewrite data based on the previous output could be useful for setting up predictive models. Can JPA handle this?

    ReplyDelete
  30. This comment has been removed by the author.

    ReplyDelete
  31. This comment has been removed by the author.

    ReplyDelete
  32. This comment has been removed by the author.

    ReplyDelete