Java Persistence Performance: How to improve JPA performance by 1,825%

Thursday, June 9, 2011

How to improve JPA performance by 1,825%

The Java Persistence API (JPA) provides a rich persistence architecture. JPA hides much of the low level dull-drum of database access, freeing the application developer from worrying about the database, and allowing them to concentrate on developing the application. However, this abstraction can lead to poor performance, if the application programmer does not consider how their implementation affects database usage.

JPA provides several optimization features and techniques, and some pitfalls waiting to snag the unwary developer. Most JPA providers also provide a plethora of additional optimization features and options. In this blog entry I will explore the various optimizations options and techniques, and a few of the common pitfalls.

The application is a simulated database migration from a MySQL database to an Oracle database. Perhaps there are more optimal ways to migrate a database, but it is surprising how good JPA's performance can be, even in processing hundreds of thousand or even millions of records. Perhaps it is not a straight forward migration, or the application's business logic is required, or perhaps the application has already been persisted through JPA, so using JPA to migrate the database is just easiest. Regardless, this fictitious use case is a useful demonstration of how to achieve good performance with JPA.

The application consists of an Order processing database. The model contains a Customer, Order and OrderLine. The application reads all of the Orders from one database, and persists them to the second database. The source code for the example can be found here.

The initial code for the migration is pretty simple:

EntityManagerFactory emf = Persistence.createEntityManagerFactory("order");
EntityManager em = emf.createEntityManager();
EntityManagerFactory emfOld = Persistence.createEntityManagerFactory("order-old");
EntityManager emOld = emfOld.createEntityManager();
Query query = emOld.createQuery("Select o from Order o");
List orders = query.getResultList();
em.getTransaction().begin();
// Reset old Ids, so they are assigned from the new database.
for (Order order : orders) {
order.setId(0);
order.getCustomer().setId(0);
}
for (Order order : orders) {
em.persist(order);
for (OrderLine orderLine : order.getOrderLines()) {
em.persist(orderLine);
}
}
em.getTransaction().commit();
em.close();
emOld.close();
emf.close();     
emfOld.close();

The example test runs this migration using 3 variables for the number of Customers, Orders per Customer, and OrderLines per Order. So, 1000 customers, each with 10 orders, and each with 10 order lines, would be 111,000 objects.

The test was run on a virtualized 64 bit Oracle Sun server with 4 virtual cores and 8 gigs of RAM. The databases run on similar machines. The test is single threaded, running in Oracle Sun JDK 1.6. The tests are run using EclipseLink JPA 2.3, and migrating from a MySQL database to an Oracle database.

This code functions fine for a small database migration. But as the database size grows, some issues become apparent. It actually handles 100,000 objects surprisingly well, taking about 2 minutes. This is surprisingly well, given it is thoroughly unoptimized and persisting all 100,000 objects in a single persistence context and transaction.

Optimization #1 - Agent

EclipseLink implements LAZY for OneToOne and ManyToOne relationships using byte code weaving. EclipseLink also uses weaving to perform many other optimizations, such as change tracking and fetch groups. The JPA specification provides the hooks for weaving in EJB 3 compliant application servers, but in Java SE or other application servers weaving is not performed by default. To enable EclipseLink weaving in Java SE for this example the EclipseLink agent is used. This is done using the Java -javaagent:eclipselink.jar option. If dynamic weaving is unavailable in your environment, another option is to use static weaving, for which EclipseLink provides an ant task and command line utility.

Optimization #2 - Pagination

In theory at some point you should run out of memory by bringing the entire database into memory in a single persistence context. So next I increased the size to 1 million objects, and this gave the expect out of memory error. Interestingly this was with only using a heap size of 512 meg. If I had used the entire 8 gigs of RAM, I could, in theory, have persisted around 16 million objects in a single persistence context. If I gave the virtualized machine the full 98 gigs of RAM available on the server, perhaps it would even be possible to persist 100 millions objects. Perhaps we are beyond the day when it does not make sense to pull an entire database into RAM, and perhaps this is no longer such as crazy thing to do. But, for now, lets assume it is an idiotic thing to do, so how can we avoid this?

JPA provides a pagination feature that allows a subset of a query to be read. This is supported in JPA in the Query setFirstResult,setMaxResults API. So instead of reading the entire database in one query, the objects will be read page by page, and each page will be persisted in its own persistence context and transaction. This avoids ever having to read the entire database, and also should, in theory, make the persistence context more optimized by reducing the number of objects it needs to process together.

Switching to using pagination is relatively easy to do for the original orders query, but some issues crop up with the relationship to Customer. Since orders can share the same customer, it is important that each order does not insert a new customer, but uses the existing customer. If the customer for the order was already persisted on a previous page, then the existing one must be used. This requires the usage of a query to find the matching customer in the new database, which introduces some performance issues we will discuss later.

The updated code for the migration using pagination is:

EntityManagerFactory emf = Persistence.createEntityManagerFactory("order");
EntityManagerFactory emfOld = Persistence.createEntityManagerFactory("order-old");
EntityManager emOld = emfOld.createEntityManager();
Query query = emOld.createQuery("Select o from Order o order by o.id");
int pageSize = 500;
int firstResult = 0;
query.setFirstResult(firstResult);
query.setMaxResults(pageSize);
List orders = query.getResultList();
boolean done = false;
while (!done) {
if (orders.size() < pageSize) {
        done = true;
    }
    EntityManager em = emf.createEntityManager();
    em.getTransaction().begin();
    Query customerQuery = em.createNamedQuery("findCustomByName");
    // Reset old Ids, so they are assigned from the new database.
    for (Order order : orders) {
        order.setId(0);
        customerQuery.setParameter("name", order.getCustomer().getName());
        try {
            Customer customer = (Customer)customerQuery.getSingleResult();
            order.setCustomer(customer);
        } catch (NoResultException notPersistedYet) {
            // Customer does not yet exist, so null out id to have it persisted.
            order.getCustomer().setId(0);
        }
    }
    for (Order order : orders) {
        em.persist(order);
        for (OrderLine orderLine : order.getOrderLines()) {
            em.persist(orderLine);
        }
    }
    em.getTransaction().commit();
    em.close();
    firstResult = firstResult + pageSize;
    query.setFirstResult(firstResult);
    if (!done) {
        orders = query.getResultList();
    }
}
emOld.close();
emf.close();     
emfOld.close();

Optimization #3 - Query Cache

This will introduce a lot of queries for customer by name (10,000 to be exact), one for each order. This is not very efficient, and can be improved through caching. In EclipseLink there is both an object cache and a query cache. The object cache is enabled by default, but objects are only cached by Id, so this does not help us on the query using the customer's name. So, we can enable a query cache for this query. A query cache is specific to the query, and caches the query results keyed on the query name and its parameters. A query cache is enabled in EclipseLink through using the query hint "eclipselink.query-results-cache"="true". This should be set where the query is defined, in this case in the orm.xml. This will reduce the number of queries for customer to 1,000, which is much better.

There are other solutions to using the query cache. EclipseLink also supports in-memory querying. In-memory querying means evaluating the query on all of the objects in the object cache, instead of accessing the database. In-memory querying is enabled through the query hint "eclipselink.cache-usage"="CheckCacheOnly". If you enabled a full cache on customer, then as you persisted the orders all of the existing customers would be in the cache, and you would never need to access the database. Another manual solution is to maintain a Map in the migration code keying the new customer's by name. For all of the above solutions if the cache is made fixed sized (query cache defaults to a size of 100), you would never need all of the customers in memory at the same time, so there would be no memory issues.

Optimization #4 - Batch Fetch

The most common performance issue in JPA is in the fetch of relationships. If you query n orders, and access their order-lines, you get n queries for order-line. This can be optimized through join fetching and batch fetching. Join fetching, joins the relationship in the original query and selects from both tables. Batch fetch executes a second query for the related objects, but fetches them all at once, instead of one by one. Because we are using pagination, this make optimizing the fetch a little more tricky. Join fetch which still work, but since order-lines is join fetched, and there are 10 order-lines per order, the page size that was 500 orders, in now only 50 orders (and their 500 order-lines). We can resolve this by increasing the page size to 5000, but given in a real application the number of order-lines in not fixed, this becomes a bit of a guess. But the page size was just a heuristic number anyway, so no real issue. Another issue with join fetching with pagination is the last and first object may not have all of its related objects, if it falls in-between a page. Fortunately EclipseLink is smart enough to handle this, but it does require 2 extra queries for the first and last order of each page. Join fetching also has the draw back that it is selecting more data when a OneToMany is join fetched. Join fetching is enable in JPQL using join fetch o.orderLine.

Batch fetching normally works by joining the original query with the relationship query, but because the original query used pagination, this will not work. EclipseLink supports three types of batch fetching, JOIN, EXISTS, and IN. IN works with pagination, so we can use IN batch fetching. Batch fetch is enabled through the query hint "eclipselink.batch"="o.orderLines", and "eclipselink.batch.type"="IN". This will reduce the n queries for order-line to 1. So for each batch/page of 500 orders, there will be 1 query for the page of orders, and 1 query for the order-lines, and 50 queries for customer.

Optimization #5 - Read Only

The application is migrating from the MySQL database to the Oracle database. So is only reading from MySQL. When you execute a query in JPA, all of the resulting objects become managed as part of the current persistence context. This is wasteful in JPA, as managed objects are tracked for changes and registered with the persistence context. EclipseLink provides a "eclipselink.read-only"="true" query hint that allows the persistence context to be bypassed. This can be used for the migration, as the objects from MySQL will not be written back to MySQL.

Optimization #6 - Sequence Pre-allocation

We have optimized the first part of the application, reading from the MySQL database. The second part is to optimize the writing to Oracle.

The biggest issue with the writing process is that the Id generation is using an allocation size of 1. This means that for every insert there will be an update and a select for the next sequence number. This is a major issue, as it is effectively doubling the amount of database access. By default JPA uses a pre-allocation size of 50 for TABLE and SEQUENCE Id generation, and 1 for IDENTITY Id generation (a very good reason to never use IDENTITY Id generation). But frequently applications are unnecessarily paranoid of holes in their Id values and set the pre-allocaiton value to 1. By changing the pre-allocation size from 1 to 500, we reduce about 1000 database accesses per page.

Optimization #7 - Cascade Persist

I must admit I intentionally added the next issue to the original code. Notice in the for loop persisting the orders, I also loop over the order-lines and persist them. This would be required if the order did not cascade the persist operation to order-line. However, I also made the orderLines relationship cascade, as well as order-line's order relationship. The JPA spec defines somewhat unusual semantics to its persist operation, requiring that the cascade persist be called every time persist is called, even if the object is an existing object. This makes cascading persist a potentially dangerous thing to do, as it could trigger a traversal of your entire object model on every persist call. This is an important point, and I added this issue purposefully to highlight this point, as it is a common mistake made in JPA applications. The cascade persists causes each persist call to order-line to persist its order, and every order-line of the order again. This results in an n^2 number of persist calls. Fortunately there are only 10 order-lines per order, so this only results in 100 extra persist calls per order. It could have been much worse if the customer defined a relationship back to its orders, then you would have 1000 extra calls per order. The persist does not need to do anything, as the objects are already persisted, but the traversal can be expensive. So, in JPA you should either mark your relationships cascade persist, or call persist in your code, but not both. In general I would recommend only cascading persist for logically dependent relationships (i.e. things that would also cascade remove).

Optimization #8 - Batch Writing

Many databases provide an optimization that allows a batch of write operations to be performed as a single database access. There is both parametrized and dynamic batch writing. For parametrized batch writing a single parametrized SQL statement can be executed with a batch of parameter vales instead of a single set of parameter values. This is very optimal as the SQL only needs to be executed once, and all of the data can be passed optimally to the database.

Dynamic batch writing requires dynamic (non-parametrized) SQL that is batched into a single big statement and sent to the database all at once. The database then needs to process this huge string and execute each statement. This requires the database do a lot of work parsing the statement, so is no always optimal. It does reduce the database access, so if the database is remote or poorly connected with the application, this can result in an improvement.

In general parametrized batch writing is much more optimal, and on Oracle it provides a huge benefit, where as dynamic does not. JDBC defines the API for batch writing, but not all JDBC drivers support it, some support the API but then execute the statements one by one, so it is important to test that your database supports the optimization before using it. In EclipseLink batch writing is enabled using the persistence unit property "eclipselink.jdbc.batch-writing"="JDBC".

Another important aspect of using batch writing is that you must have the same SQL (DML actually) statement being executed in a grouped fashion in a single transaction. Some JPA providers do not order their DML, so you can end up ping-ponging between two statements such as the order insert and the order-line insert, making batch writing in-effective. Fortunately EclipseLink orders and groups its DML, so usage of batch writing reduces the database access from 500 order inserts and 5000 order-line inserts to 55 (default batch size is 100). We could increase the batch size using "eclipselink.jdbc.batch-writing.size", so increasing the batch size to 1000 reduces the database accesses to 6 per page.

Optimization #9 - Statement caching

Every time you execute an SQL statement, the database must parse that statement and execute it. Most of the time application executes the same set of SQL statements over and over. By using parametrized SQL and caching the prepared statement you can avoid the cost of having the database parse the statement.

There are two levels of statement caching. One done on the database, and one done on the JDBC client. Most databases maintain a parse cache automatically, so you only need to use parametrized SQL to make use of it. Caching the statement on the JDBC client normally provides the bigger benefit, but requires some work. If your JPA provider is providing you with your JDBC connections, then it is responsible for statement caching. If you are using a DataSource, such as in an application server, then the DataSource is responsible for statement caching, and you must enable it in your DataSource config. In EclipseLink, when using EclipseLink's connection pooling, you can enable statement caching using the persistence unit property "eclipselink.jdbc.cache-statements"="true". EclipseLink uses parametrized SQL by default, so this does not need to be configured.

Optimization #10 - Disabling Caching

By default EclipseLink maintains a shared 2nd level object cache. This normally is a good thing, and improves read performance significantly. However, in our application we are only inserting into Oracle, and never reading, so there is no point to maintaining a shared cache. We can disable this using the EclipseLink persistence unit property "eclipselink.cache.shared.default"="false". However, we are reading customer, so we can enable caching for customer using, "eclipselink.cache.shared.Customer"="true".

Optimization #11 - Other Optimizations

EclipseLink provides several other more specific optimizations. I would not really recommend all of these in general as they are fairly minor, and have certain caveats, but they are useful in use cases such as migration where the process is well defined.

These include the following persistence unit properties:

"eclipselink.persistence-context.flush-mode"="commit" - Avoids the cost of flushing on every query execution.
"eclipselink.persistence-context.close-on-commit"="true" - Avoids the cost of resuming the persistence context after the commit.
"eclipselink.persistence-context.persist-on-commit"="false" - Avoids the cost of traversing and persisting all objects on commit.
"eclipselink.logging.level"="off" - Avoids some logging overhead.

The fully optimized code:

EntityManagerFactory emf = Persistence.createEntityManagerFactory("order-opt");
EntityManagerFactory emfOld = Persistence.createEntityManagerFactory("order-old");
EntityManager emOld = emfOld.createEntityManager();
System.out.println("Migrating database.");
Query query = emOld.createQuery("Select o from Order o order by o.id");
// Optimization #2 - batch fetch
// #2 - a - join fetch
//Query query = emOld.createQuery("Select o from Order o join fetch o.orderLines"); // #2 - b - batch fetch (batch fetch is more optimal as avoids duplication of Order data)
query.setHint("eclipselink.batch", "o.orderLines"); query.setHint("eclipselink.batch.type", "IN");
// Optimization #3 - read-only
query.setHint("eclipselink.read-only", "true");
// Optimization #4 - pagination int pageSize = 500; int firstResult = 0; query.setFirstResult(firstResult);
query.setMaxResults(pageSize); 
List orders = query.getResultList();
boolean done = false;
while (!done) {
    if (orders.size() < pageSize) {
        done = true;
    }
    EntityManager em = emf.createEntityManager();
    em.getTransaction().begin();
    Query customerQuery = em.createNamedQuery("findCustomByName");
    // Reset old Ids, so they are assigned from the new database.
    for (Order order : orders) {
        order.setId(0);
        customerQuery.setParameter("name", order.getCustomer().getName());
        try {
            Customer customer = (Customer)customerQuery.getSingleResult();
            order.setCustomer(customer);
        } catch (NoResultException notPersistedYet) {
            // Customer does not yet exist, so null out id to have it persisted.
            order.getCustomer().setId(0);
        }
    }
    for (Order order : orders) {
        em.persist(order);
        // Optimization #5 - avoid n^2 persist calls
        //for (OrderLine orderLine : order.getOrderLines()) {
        //    em.persist(orderLine);
        //}
    }
    em.getTransaction().commit();
    em.close();
    firstResult = firstResult + pageSize;
    query.setFirstResult(firstResult);
    if (!done) {
        orders = query.getResultList();
    }
}
emOld.close();
emf.close();     
emfOld.close();

The optimized persistence.xml:

<persistence-unit name="order-opt" transaction-type="RESOURCE_LOCAL">
    <!--  Optimization #7, 8 - sequence preallocation, query result cache -->
    <mapping-file>META-INF/order-orm.xml</mapping-file>
    <class>model.Order</class>
    <class>model.OrderLine</class>
    <class>model.Customer</class>
    <properties>
        <!-- Change this to access your own database. -->
        <property name="javax.persistence.jdbc.driver" value="oracle.jdbc.OracleDriver" />
        <property name="javax.persistence.jdbc.url" value="jdbc:oracle:thin:@ottvm028.ca.oracle.com:1521:TOPLINK" />
        <property name="javax.persistence.jdbc.user" value="jsutherl" />
        <property name="javax.persistence.jdbc.password" value="password" />
        <property name="eclipselink.ddl-generation" value="create-tables" />
        <!--  Optimization #9 - statement caching -->
        <property name="eclipselink.jdbc.cache-statements" value="true" />
        <!--  Optimization #10 - batch writing -->
        <property name="eclipselink.jdbc.batch-writing" value="JDBC" />
        <property name="eclipselink.jdbc.batch-writing.size" value="1000" />
        <!--  Optimization #11 - disable caching for batch insert (caching only improves reads, so only adds overhead for inserts) -->
        <property name="eclipselink.cache.shared.default" value="false" />
        <!--  Except for Customer which is shared by orders -->
        <property name="eclipselink.cache.shared.Customer" value="true" />
        <!--  Optimization #12 - turn logging off -->
        <!-- property name="eclipselink.logging.level" value="FINE" /-->
        <property name="eclipselink.logging.level" value="off" />
        <!--  Optimization #13 - close EntityManager on commit, to avoid cost of resume -->
        <property name="eclipselink.persistence-context.close-on-commit" value="true" />
        <!--  Optimization #14 - avoid auto flush cost on query execution -->
        <property name="eclipselink.persistence-context.flush-mode" value="commit" />
        <!--  Optimization #15 - avoid cost of persist on commit -->
        <property name="eclipselink.persistence-context.persist-on-commit" value="false" />
    </properties>
</persistence-unit>

The optimized orm.xml:

<?xml version="1.0" encoding="UTF-8"?>
<entity-mappings version="2.1"
    xmlns="http://www.eclipse.org/eclipselink/xsds/persistence/orm"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <named-query name="findCustomByName">
        <query>Select c from Customer c where c.name = :name</query>
        <hint name="eclipselink.query-results-cache" value="true"/>
    </named-query>
    <entity class="model.Order">
        <table-generator name="ORD_SEQ" allocation-size="500"/>
    </entity>
    <entity class="model.Customer">
            <table-generator name="CUST_SEQ" allocation-size="500"/>
    </entity>

</entity-mappings>

So, what is the result? The original un-optimized code took on average 133,496 milliseconds (~2 minutes) to process ~100,000 objects. The fully optimized code took only 6,933 milliseconds (6 seconds). This is very good, and means it could process 1 million objects in about 1 minute. The optimized code is an 1,825% improvement on the original code.

But, how much did each optimization affect this final result? To answer this question I ran the test 3 times with the fully optimized version, but with each optimization missing. This worked out better than starting with the unoptimized version and only adding each operation separately, as some optimizations get masked by the lack of others. So, in the table below the bigger the % difference, the better the optimization (that was removed) was.

Optimization	Average Result (ms)	% Difference
None	133,496	1,825%
All	6,933	0%
1 - no agent	7,906	14%
2 - no pagination	8,679	25%
3 - no read-only	8,323	20%
4a - join fetch	11,836	71%
4b - no batch fetch	17,344	150%
5 - no sequence pre-allocation	30,396	338%
6 - no persist loop	7,947	14%
7 - no batch writing	75,751	992%
8 - no statement cache	7,233	4%
9 - with cache	7,925	14%
10 - other	7,332	6%

This shows that batch writing was the best optimization, followed by sequence pre-allocation, then batch fetching.

35 comments :

YerocJune 9, 2011 at 12:21 PM
I know you're wanting to primarily use JPA APIs but I'm surprised you didn't mention using a scrollable resultset (as documented at http://wiki.eclipse.org/EclipseLink/Examples/JPA/Pagination#Using_a_ScrollableCursor) instead of pagination. For large numbers of records this can make a huge difference since you end up issuing just a single query against the database to pull out your data instead of one per page.

Hibernate supports a similar feature. Hopefully scrollable results will make their way into the JPA spec sometime soon.

Corey
ReplyDelete
Replies
James SelvakumarJune 10, 2011 at 12:40 AM
Wonderful article. Thanks for sharing this.
ReplyDelete
Replies
FernandoJune 18, 2011 at 1:36 PM
Very good tutorial, I would like a version in Spanish because my English is not very good ... But thanks anyway.
ReplyDelete
Replies
ChrisJune 21, 2011 at 1:34 PM
Great nuggets of wisdom, probably earned through trial and tribulation which is exactly why this is timely for my current project. Thanks for sharing!
ReplyDelete
Replies
javin paulJuly 19, 2011 at 6:10 AM
I don't have words. this is truly fantastic information enriched by lot of experience. Thanks a lot mate for sharing such invaluable information.

Javin
How to use comparator and comparable in java with example
ReplyDelete
Replies
Ilias TsagklisAugust 19, 2011 at 12:48 PM
Amazing article, awesome blog...

James, is there an email address I can contact you in private?
ReplyDelete
Replies
janaka SooriyaarachchiOctober 5, 2011 at 11:01 PM
http://goo.gl/Dmt5F
ReplyDelete
Replies
Prasath PremkumarOctober 24, 2011 at 11:48 PM
I'm not sure about how to increase preallocation size. Can you please provide me an example for "Optimization #6 - Sequence Pre-allocation" please?
ReplyDelete
Replies
JamesOctober 25, 2011 at 6:59 AM
@TableGenerator(name="MY_SEQ", allocationSize=100)
or,
@SequenceGenerator(name="MY_SEQ", allocationSize=100)
see,
http://en.wikibooks.org/wiki/Java_Persistence/Identity_and_Sequencing#Sequence_Strategies
ReplyDelete
Replies
Prasath PremkumarNovember 2, 2011 at 10:56 AM
Thanks a lot James. Do you have any setting to enable seperate connection pool for sequence allocation? I don't use any JTA datasource. In my persistence.xml i have these configs:

name="javax.persistence.jdbc.url"value="xxx"

name="javax.persistence.jdbc.password"value="xx"

name="javax.persistence.jdbc.driver"
value="com.mysql.jdbc.Driver"

name="javax.persistence.jdbc.user" value="xxxx"

name="eclipselink.target-database" value="MYSQL"

name="eclipselink.jdbc.sequence-connection-pool" value="true"

name="eclipselink.jdbc.read-connections.min" value="1"

name="eclipselink.jdbc.write-connections.min" value="1"

What else should i add??

Thanks.
ReplyDelete
Replies
AnonymousNovember 18, 2011 at 1:44 PM
Thank you for a well written article with practical steps I can take to speed up my project. Technical posts like this can take forever to write, so thanks for the effort.
ReplyDelete
Replies
JamesNovember 21, 2011 at 5:54 AM
@Prasath, if you are on the latest release, remove the read/write min setting, by default a single combined pool is now used with a initial of 1, so is more efficient, normally your min should be your max to be most efficient, replace the sequence setting with, "eclipselink.connection-pool.sequence.initial"="1"
ReplyDelete
Replies
simonDecember 30, 2011 at 6:36 AM
Thanks for taking the time to write that. Good info to know.Currently my company is using "IBATIS" and pure "SQL"s as database persistence mechanism. I like SQL query very much, especially in tuning, but i just do not like code all SQL query in Java application, it's easy hit typo error and what a stupid and tedious job? Finally my company has a new project come in, i decided this is the right time to propose Hibernate as our new java database persistence mechanism to my boss.
personal injury attorney tampa fl
ReplyDelete
Replies
Andre KappMay 19, 2012 at 1:55 AM
We are just starting of with a new project and decide on JPA/EJB3.0 in Glassfish with Oracle DB. This article is outstanding with the information you discussed here.
One problem we have and I would really appreciate any input.
We are using Netbeans to Generate the Persistence Entities. Then used Netbeans to generate the Session Beans for the Entity Classes.
Database triggers are used to generate the PK value upon DB insert.
This all works well for us, except that in some instances we need to get the insert PK value back as we need to insert that as part of a reference into other parts. This is all part of the same TX that needs to be committed/rolled back.
When we query the Entity, the inserted ID is still 0.
Is there any way of getting this generated value back before flushing the TX?

The facade (Session Bean) generated look like this
:
@Stateless
public class Tsc06JobQueueFacade extends AbstractFacade {

public static Logger logger = Logger.getLogger("Tsc06JobQueueFacade");
@PersistenceContext(unitName = "za.co.fnds_fnds-core_ejb_1.0.0PU")
private EntityManager em;

@Override
protected EntityManager getEntityManager() {
return em;
}

public Tsc06JobQueueFacade() {
super(Tsc06JobQueue.class);
}

And the entity looks like this:

@Entity
@Table(name = "TSC06_JOB_QUEUE")
@XmlRootElement
@NamedQueries({
@NamedQuery(name = "Tsc06JobQueue.findAll", query = "SELECT t FROM Tsc06JobQueue t"),
@NamedQuery(name = "Tsc06JobQueue.findByJobRunId", query = "SELECT t FROM Tsc06JobQueue t WHERE t.jobRunId = :jobRunId"),
@NamedQuery(name = "Tsc06JobQueue.findByJobStartTime", query = "SELECT t FROM Tsc06JobQueue t WHERE t.jobStartTime = :jobStartTime"),
@NamedQuery(name = "Tsc06JobQueue.findByJobEndTime", query = "SELECT t FROM Tsc06JobQueue t WHERE t.jobEndTime = :jobEndTime")})
public class Tsc06JobQueue implements Serializable {
private static final long serialVersionUID = 1L;

public static Logger logger = Logger.getLogger("Tsc06JobQueue");

// @Max(value=?) @Min(value=?)//if you know range of your decimal fields consider using these annotations to enforce field validation
@Id
@Basic(optional = false)
@NotNull
@Column(name = "JOB_RUN_ID")
// @SequenceGenerator( name = "appJobSeq", sequenceName = "TSC06_JOB_RUN_ID_SEQ", allocationSize = 1, initialValue = 1 )
// @GeneratedValue( strategy = GenerationType.SEQUENCE, generator = "appJobSeq" )
private BigDecimal jobRunId;

I have tried using the sequence generator option as per uncomented lines, but no success.

Any help/comments appreciated!
Andre
ReplyDelete
Replies
Andre KappMay 21, 2012 at 1:51 AM
We solved this problem - when generating entities the Netbeans tool create the pk fields as not-null. For inserting the object you then need to populate the value with 0. This caused Eclipselink to ignore any reference to sequences,etc.
Changing the pk column(s) to null-able and then not specifying the pk columns, allows Eclipselink to query the sequence and populate the column and object values correctly. So we learn every hour!
ReplyDelete
Replies
UnknownJune 20, 2012 at 3:00 AM
Extremely good piece of writing!
ReplyDelete
Replies
MonstercoursesJuly 18, 2012 at 10:18 AM
Thanks for your wonderful information which helped us to join java online training
ReplyDelete
Replies
Thomas6767November 27, 2012 at 5:15 AM
Really nice, would love to have it extended to include the hibernate equivalents.
Sadly pagination seems to be the best way to improve performance, but it's also the one thing I want to avoid..
ReplyDelete
Replies
UnknownNovember 27, 2012 at 11:13 AM
Sorry, sir, JPA OneToOne and ManyToOne is Lazy by default?There are EAGER.
ReplyDelete
Replies
UnknownMarch 28, 2013 at 11:44 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
phoneynkJuly 17, 2013 at 12:41 AM
Hi James,

Thanks for sharing this great article. I have some query related to findCustomByName namedQuery. My understanding is that after using eclipselink.query-results-cache hint as true, all the results including null will be cached. Which means in following try-catch-block, for some customer which does not exist initially, NoResultException should always be raised.

try {
Customer customer = (Customer)customerQuery.getSingleResult();
order.setCustomer(customer);
} catch (NoResultException notPersistedYet) {
// Customer does not yet exist, so null out id to have it persisted.
order.getCustomer().setId(0);
}

And we should end up with 10000 Customers instead of 1000 in database. It will be great if you can shed light on mistake in my understanding.

Thanks.
ReplyDelete
Replies
JamesJuly 18, 2013 at 7:28 AM
I think you are correct, the code should be using the hint, "eclipselink.query-results-cache.ignore-null"="true", in 2.5 it should also be using the API setInvalidateOnChange(false), as by default any insert to customer will invalidate the query result cache.

I think originally the ignore-null option was not working, so that was the default behavior when I ran these tests.
ReplyDelete
Replies
JamesJuly 18, 2013 at 7:32 AM
Actually, at the time I wrote this cache indexes were not supported yet, so now you could also use a @CacheIndex instead of the query cache for the customer query.

There have been a lot of performance enhancements added to EclipseLink since this post, perhaps its time for a "How to improve JPA performance by 2,825%" post.
ReplyDelete
Replies
phoneynkJuly 19, 2013 at 2:39 AM
Hi James,

Thanks for your response. It is actually encouraging to know that we can do further optimization. I would eagerly wait for your "How to improve JPA performance by 2,825%" post.

May I ask for further help. It will be great if you could give one example of using setInvalidateOnChange. I think this should resolve my query at http://stackoverflow.com/questions/17465692/eclipselink-query-results-cache-ignore-null-not-caching-any-result

Also I tried to use @CacheIndex with eclipse link 2.5 but was not successful.
a) First time customer not found,
b) created customer,
c) trying to look for same customer. customer still not found.
Resulting in 10000 customer.). May I get blessed with some example code, please.

Thanks.
ReplyDelete
Replies
JamesAugust 6, 2013 at 10:27 AM
For an example of @CacheIndex refer to post http://java-persistence-performance.blogspot.com/2013/03/but-what-if-im-not-querying-by-id.html, it includes sample code.
ReplyDelete
Replies
UnknownSeptember 19, 2013 at 1:05 AM
It would be nice to see the source files for the classes involved. I tried the JDBC batch writing, but I din't get any performance gain.
I use MYSQL and have auto generated primary keys, but since JPA need to know the primary key for each inserted object (to keep the persistence context consistent), batching is not possible, since "select last_insert_id()" on MYSQL only returns the ID of the last inserted record, and not all the keys generated during a batch insert.
ReplyDelete
Replies
GKJanuary 21, 2014 at 10:24 AM
Hi Mr. Sutherland,
I wonder if you can comment on my question on sof.com
Here is link.
http://stackoverflow.com/questions/21174028/eclipselink-entity-mappings-cache

thanks
Gopi
ReplyDelete
Replies
OptimusFebruary 20, 2014 at 12:43 AM
Hi!!

I can't see the point when you say "But the page size was just a heuristic number anyway, so no real issue"

And what about if I want 4-orders pages, with order-lines join fetched? If a page has, for example, its first order with more than 4 order lines, that page will have only 1 order!!!
ReplyDelete
Replies
UnknownNovember 24, 2015 at 10:48 PM
Is there any further reading you would recommend on this?

LDS Infotech
Oracle Partners India
ReplyDelete
Replies
UnknownApril 29, 2016 at 10:45 AM
Thanks for the article, James. It seems like you could use JPA to maximize data usage when accessing a database; could you apply this (or even use JPA) to create self-improving predictive scoring models? Or any predictive modeling, really, it doesn't have to be lead scoring. Predictive modeling requires a lot of iterations and a language/platform that could dynamically write and rewrite data based on the previous output could be useful for setting up predictive models. Can JPA handle this?
ReplyDelete
Replies
UnknownJune 28, 2016 at 4:42 AM
This comment has been removed by the author.
ReplyDelete
Replies
Phen24 StoreSeptember 11, 2016 at 9:55 PM
This comment has been removed by the author.
ReplyDelete
Replies
Phen24 StoreSeptember 11, 2016 at 9:56 PM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment

Subscribe to: Post Comments ( Atom )

Pages