Java and the relational database
- Published
- 16 min reading
The experience shows that the bigger the project, the more interesting problems we encounter on our way. One of the biggest projects I had a chance to participate in during my long-term programming adventure is called Comarch Business Banking (CBB). It’s a powerful platform for streamlining all banking operations, which makes it an excellent example in terms of discussing data consistency. After all, the first association we have in terms of technical transactions are payment transactions in banking systems, along with the need to maintain the correct system status – both when these transactions are being executed, and after that.
It would seem that the topics covered here should already be well known, not causing too many problems – after all, the issues we will be dealing with (such as JPA, JTA, Hibernate, Spring transaction management) are not new. However, during the implementation of the CBB project, as well as others, I noticed that problems related to data access are still a big part of programmers' struggle. Sometimes things just don’t save, or save in excess, other times we lock things down or can’t see data that should be in the database. It also gets interesting when we want to ditch an application server that usually greatly facilitates transaction management.
This piece is only an introduction to a number of issues that come with the relational database in enterprise-class applications; in subsequent publications I’ll present new topics or go deeper into those already presented.
The code presented in the examples was simplified as much as possible in order to improve readability.
Spring Proxy and @Transactional
The blessings of Spring sometimes become our curse. Something that should keep working under certain conditions, just ceases to work at some point. A tedious analysis process then begins, usually with a debug set somewhere in the org.springframework packages. So, is it worth going with the Spring? Maybe it's better to go back to developing the key mechanisms in our application by ourselves? Undoubtedly Spring is better options, but it’s also important to still understand the basics of Spring.
One of them is Spring AOP and the associated dynamic proxy mechanism. In the proxy mode (which is the default one), calls to Spring bean methods are captured, allowing additional code to be run before the target method is actually called. This makes it possible to approach some technical elements, such as transaction handling, with the aspect-oriented programming. However, a proxy is created when initiating the Spring context and is closely related to the bean that is being created; and wherever we inject it, we have an object in a proxy packet. By calling a method on such an object, we can run Spring mechanisms associated with a given bean, i.e., for example, handling a transaction defined with the @Transactional annotation. The problem occurs when a call takes place outside the Spring context, such as a local method call, as in the case below:
@Service
@AllArgsConstructor
public class UserService {
private UserRepository userRepository;
public void updateUserPhone(String phone) {
User user = userRepository.loadUser();
user.setPhone(phone);
updateUser(user);
}
@Transactional
public void updateUser(User user) {
userRepository.updateUser(user);
}
}
Calling the updateUserPhone method is not included in the transaction, and as calling the updateUser method is local, the @Transactional annotation will be ignored. For the above code to work, the following approach could be used (similar constructions could be found in the so-called eight-thousanders in EJB 2.x):
@Service
public class UserService {
@Autowired
private UserRepository userRepository;
@Lazy
@Autowired
private UserService userService;
public void updateUserPhone(String phone) {
User user = userRepository.loadUser();
user.setPhone(phone);
userService.updateUser(user);
}
@Transactional
public void updateUser(User user) {
userRepository.updateUser(user);
}
}
This approach is obviously not recommended, it makes more sense to add the @Transactional annotation to the updateUser method as well.
It should also be remembered that for the aspect-oriented mechanisms to work, the proxy must be fully initiated. Therefore, transactions in initialization blocks such as PostContruct should be avoided. This is mentioned in the documentation:
“Also, the proxy must be fully initialized to provide the expected behaviour so you should not rely on this feature in your initialization code, i.e. @PostConstruct.”
PlatformTransactionManager
PlatformTransactionManager is the central interface used by Spring to manage transactions. It has two main implementations: JpaTransactionManager and JtaTransactionManager.
JpaTransactionManager
The JpaTransactionManager pays dividends in simple applications with one DataSource and one PersistenceContext. It is worth remembering this, because trying to define a second PersistenceContext (using EntityManagerFactory) can lead to unexpected problems. JpaTransactionManager is always associated with only one EntityManagerFactory. The example configuration may look like this:
@Bean(name = "transactionManagerFactory")
LocalContainerEntityManagerFactoryBean transactionManagerFactory(EntityManagerFactoryBuilder builder,
DataSource dataSource) {
return builder
.dataSource(dataSource)
.packages("com.comarch")
.persistenceUnit("DEFAULT_PERSISTENCE_UNIT")
.build();
}
@Bean(name = "transactionManager")
PlatformTransactionManager transactionManager(
@Qualifier("transactionManagerFactory") EntityManagerFactory entityManagerFactory) {
return new JpaTransactionManager(entityManagerFactory);
}
However, there’s no reason why we couldn’t add a second EntityManagerFactory:
@Bean(name = "userTransactionManagerFactory")
LocalContainerEntityManagerFactoryBean userTransactionManagerFactory(EntityManagerFactoryBuilder builder,
DataSource dataSource) {
return builder
.dataSource(dataSource)
.packages("com.comarch")
.persistenceUnit("USERS_PERSISTENCE_UNIT")
.build();
}
and then use it to implement our repository:
@PersistenceContext(unitName ="USERS_PERSISTENCE_UNIT")
private EntityManager userEntityManager;
@Transactional
public void updateUser(User user) {
userEntityManager.merge(user);
}
Such code will not return any error but won’t work either – the entity will not be saved in the database. If you use two PersistenceContext in the same transaction (in this case with DEFAULT_PERSISTENCE_UNIT and USERS_PERSISTENCE_UNIT), only the entities saved with DEFAULT_PERSISTENCE_UNIT will be saved in the database.
The basis for discovering why this happens is to understand how Hibernate works, along with other JPA implementations. The merge() method simply connects the entity with the current context, and only calling the flush() method is a signal to synchronize the current context with the database. However, in this example we did not call the flush() method manually. This operation will only be performed by the JpaTransactionManager just before the transaction is commited, and since the JpaTransactionManager is associated with a specific PersistenceContext (in this case with DEFAULT_PERSISTENCE_UNIT), the synchronization will be performed exclusively on that context. The second context will be ignored. This may be all the more incomprehensible because Spring will in no way inform us that something is wrong. Only when we manually run flush() for USERS_PERSISTENCE_UNIT will we find out about the problems (or more precisely that there are no transactions for this context).
JtaTransactionManager
The second main implementation of PlatformTransactionManager is JtaTransactionManager. In this case, the problems above are not there, because the JtaTransactionManager is not directly related to PersistenceContext. It can use a transaction managed by the server (using such implementations as WebLogicJtaTransactionManager) or by independent versions of transaction managers (e.g. Narayana, Atomikos).
Hibernate Session Cache
The basic cache provided by Hibernate, despite its simplicity, saves a lot of server time. As part of the process, an entity is available in memory and can be easily and, most importantly, quickly recalled. This way, we don't have to build our own cache mechanism or, what's worse, change the business logic in such a way as to avoid performance problems resulting from excessive database references. However, as the intuition tells us, there are also some limitations that can cause problems in some cases.
The first one is that the cache is about entities retrieved from the primary key using the EntityManager.find() method and does not work for query, even as simple as the following:
entityManager.createQuery("Select u from User u where u.id='" + id + "'").getSingleResult()
You should also remember to correctly define equals() and hashCode() methods. This is particularly important when the state of an entity may change. When calling the flush() method, Hibernate checks whether the entities have changed from what is in the cache. This allows it to skip unnecessary updates, and update only the actually modified entities. However, in case of badly implemented equals() and hashCode() methods, Hibernate may mark entities incorrectly as modified, and make unnecessary updates.
The next point is the cache's reach. As the name suggests, it is related to Hibernate's session, and this role in turn is played by PersistenceContext. In the past, the sessions were done by hand:
Session session = factory.openSession();
Nowadays, it’s taken care of by Spring – right, but when? And here the answer, of course, can only be one: it depends.
For web applications, this is the responsibility of the OpenEntityManagerInViewInterceptor, which is linked to the HTTP request lifecycle. It creates PersistenceContext at the very beginning of request processing. This is the most comfortable situation because we have the same session (so also cache) in the whole process, whether transactional or not.
If the action is initiated in a different way, PersistenceContext will only be created when needed. And so, if we are dealing with a transaction process, the need for PersistenceContext will arise as soon as the transaction starts. In this case, the same PersistenceContext is valid for the whole transaction, and we have a consistent cache for it. The situation gets complicated when the process is not transactional, then the context will only be created when the EntityManager is actually called. Worse still, it will not be associated with any parent entity, so it will be created every time there is a reference to the EntityManager. In that case, there's practically no benefit in having cache.
To illustrate this, below are two cases of non-transactional endpoints.
The implementation of the first one:
@GetMapping(value = "/testLoad")
public void testLoad(@RequestParam String id) {
LOGGER.info("First load: " + userRepository.loadUser(id).getName());
LOGGER.info("Second load: " + userRepository.loadUser(id).getName());
}
The result:
Hibernate: select user0_.ID as ID1_15_0_, user0_.NAME as NAME2_15_0_, user0_.PHONE as PHONE3_15_0_, products1_.USER_ID as USER_ID4_14_1_, products1_.ID as ID1_14_1_, products1_.ID as ID1_14_2_, products1_.AMOUNT as AMOUNT2_14_2_, products1_.NAME as NAME3_14_2_, products1_.USER_ID as USER_ID4_14_2_ from activities.USERS user0_ left outer join activities.PRODUCTS products1_ on user0_.ID=products1_.USER_ID where user0_.ID=?
2019-12-30 18:52:24,042 INFO - First load: Jan
2019-12-30 18:52:24,043 INFO - Second load: Jan
The implementation of the second one:
@GetMapping(value = "/testLoad")
public void testLoad(@RequestParam String id) {
new Thread(() -> {
LOGGER.info("First load: " + userRepository.loadUser(id).getName());
LOGGER.info("Second load: " + userRepository.loadUser(id).getName());
}).start();
}
The result:
Hibernate: select user0_.ID as ID1_15_0_, user0_.NAME as NAME2_15_0_, user0_.PHONE as PHONE3_15_0_, products1_.USER_ID as USER_ID4_14_1_, products1_.ID as ID1_14_1_, products1_.ID as ID1_14_2_, products1_.AMOUNT as AMOUNT2_14_2_, products1_.NAME as NAME3_14_2_, products1_.USER_ID as USER_ID4_14_2_ from activities.USERS user0_ left outer join activities.PRODUCTS products1_ on user0_.ID=products1_.USER_ID where user0_.ID=?
2019-12-30 18:54:56,712 INFO - First load: Jan
Hibernate: select user0_.ID as ID1_15_0_, user0_.NAME as NAME2_15_0_, user0_.PHONE as PHONE3_15_0_, products1_.USER_ID as USER_ID4_14_1_, products1_.ID as ID1_14_1_, products1_.ID as ID1_14_2_, products1_.AMOUNT as AMOUNT2_14_2_, products1_.NAME as NAME3_14_2_, products1_.USER_ID as USER_ID4_14_2_ from activities.USERS user0_ left outer join activities.PRODUCTS products1_ on user0_.ID=products1_.USER_ID where user0_.ID=?
2019-12-30 18:54:56,714 INFO - Second load: Jan
In the first case, both references to the database are within the same PersistenceContext (created by OpenEntityManagerInViewInterceptor), so the second entity fetch does not require a reference to the database, but only to the cache. In the second case, although it is also an HTTP endpoint, the reference to the database has been delegated to a new thread, in which the one created by the PersistenceContext interceptor does not apply. Since the calls are also not covered by the transaction, there is no single PersistenceContext covering both references to the database. The result is that the same query is called twice.
Of course, this synthetic example is not very life-like, but it does shed light on one of the many reasons why your own threads in Spring-managed applications should be preceded by profound reflection. It is also always worthwhile to analyze the logs with the SQL queries being written to verify whether Hinernate's behavior is in line with our expectations. You may also consider using readOnly for read-only processes (more on this in the future).
Rollback
In addition to transaction commit and the persistence of all data, it is equally important to correctly rollback the changes when they should not finally reach the database. This can be done manually using TransactionAspectSupport:
TransactionAspectSupport.currentTransactionStatus().setRollbackOnly();
But this is obviously not the best approach. Therefore, by default, Spring automatically rollback the transaction when the trading method ends in an exception. However, it does so only for the RuntimeException and Error exceptions, the checked exceptions are omitted. This behavior is a bit unintuitive and can lead to strange situations and inconsistencies. Like most of the original approaches to enterprise solutions, it is derived from the EJB specifications. The following explanation can be found in the Spring documentation:
“While the EJB default behavior is for the EJB container to automatically roll back the transaction on a system exception (usually a runtime exception), EJB CMT does not roll back the transaction automatically on an application exception (that is, a checked exception other than java.rmi.RemoteException). While the Spring default behavior for declarative transaction management follows EJB convention (roll back is automatic only on unchecked exceptions), it is often useful to customize this.”
However, the default behaviour can be changed in one of two ways:
- locally – by configuring Transactional annotation (rollbackFor annotation from Spring or rollbackOn annotation from javax),
- globally – overwriting the Spring configuration:
@Configuration
public class CustomProxyTransactionManagementConfiguration extends ProxyTransactionManagementConfiguration {
@Bean
@Role(BeanDefinition.ROLE_INFRASTRUCTURE)
@Override
public TransactionAttributeSource transactionAttributeSource() {
return new AnnotationTransactionAttributeSource() {
@Override
protected TransactionAttribute determineTransactionAttribute(AnnotatedElement element) {
TransactionAttribute ta = super.determineTransactionAttribute(element);
if (ta == null) {
return null;
}
return new DelegatingTransactionAttribute(ta) {
@Override
public boolean rollbackOn(Throwable ex) {
return super.rollbackOn(ex) || ex instanceof Exception;
}
};
}
};
}
}
In Spring Boot 2.1 it may be necessary to set the spring.main.allow-bean-definition-overriding=true
That's all for now, I'm very much counting on the next part to come soon, looking at the problems that may arise in connection with concurrent access to data.
Szymon Kubicz, Senior Designer – Developer, Comarch