Overview

This page is intended to be a space for collaboration and sharing of solutions and workarounds for various Kuali Rice performance issues. This page should not serve as a substitute for jira, but attempts to provide a single point where people can learn about potential performance gotchas and how to avoid and/or address them.

For real-time discussion on Kuali Rice performance, you can visit the #kuali channel on irc.freenode.net

Workflow Engine Processing

SELECT ... FOR UPDATE WAIT 3600 against KREW_DOC_HDR_T

Description

Use of SELECT ... FOR UPDATE against KREW_DOC_HDR_T is problematic, especially for transactions that lock more than one document. At least when using an Oracle database, deadlocks in this scenario appear to not be recoverable until the timeout has elapsed at least (which by default is 3600 seconds which is an hour). The reason that Rice performs this row-level locking is because workflow documents are oftentimes being processed or updated in a concurrent fashion (by more than one user or by more than one background processing thread). If no locking is performed, frequent OptimisticLockExceptions are thrown from OJB when the document is updated. The "serialization" of processing at the database level "solves" this but is very taxing on the database and leads to problematic deadlock situations.

Possible Solutions and Workarounds

Notes/Comments

Direct access to Kuali Rice database from client application

Description

Direct database access to certain tables is part of the way in which embedded KEW is implemented. As additional client applications come online, the number of connections being established to the database grow which requires planning and resource allocation on the database side.

Possible Solutions and Workarounds

Notes/Comments

Action List

Caching Behavior and deadlocks in KREW_USR_OPTN_T

Description

Action List caching behavior is controlled by inserting and deleting records from the KREW_USR_OPTN_T table. This can cause a lot of contention and occasional deadlocks against this table. Additionally, overtime this table can grow very large with lots of entries in there with the option id starting with "REFRESH_ACTION_LIST".

Possible Solutions and Workarounds

Notes/Comments

Large Action Lists

Description

Very large actions lists are not handled well in many cases, resulting in large amounts of memory usage. This is in particular a problem when modifying a group in KIM by either adding or removing somewhere which will trigger a large number of action items and action requests to be loaded into memory for processing. Large action lists can happen when people are the recipient of a large number of FYI requests and don't clear them frequently enough. One of the main issues here is the use of the ORM to load these for processing since it will cache them in memory for the life of the transaction, even once the object loaded by the ORM goes out of local scope in the code which is processing the data.

An example of problematic code in Rice which loads the entire action list into memory can be found below, it is code from the WorkgroupMembershipChangeProcessor class:

    private void updateActionListForUserRemovedFromGroup(String principalId, String groupId) {
		List<String> allGroupsToCheck = KIMServiceLocator.getIdentityManagementService().getParentGroupIds(groupId);
        allGroupsToCheck.add(0, groupId);
        Collection<ActionItem> actionItems = getActionListService().findByPrincipalId(principalId);
		for (Iterator<ActionItem> itemIt = actionItems.iterator(); itemIt.hasNext();) {
        	ActionItem item = itemIt.next();
        	if (item.isWorkgroupItem()) {
        		for (String groupIdToCheck : allGroupsToCheck) {
        			if (item.getGroupId().equals(groupIdToCheck)) {
        				getActionListService().deleteActionItem(item);
        			}
        		}
        	}
        }
    }

Possible Solutions and Workarounds

Notes/Comments

Remote CustomActionListAttribute callbacks from Kuali Rice Standalone Server

Description

TODO...

Possible Solutions and Workarounds

Notes/Comments

Finding primary delegates when loading action list

Description

Every time a user loads their action list a list of the primary delegates for the user is created. To do this it is essentially running this query:

select * from KREW_ACTN_ITM_T where DLGN_TYP = 'P' and (DLGN_PRNCPL_ID = '<current user's principal ID>' or DLGN_GRP_ID in (<list of group IDs the user is a member of>))

This query can be very slow and causes the action list to load slower every time it is accessed.

Possible Solutions and Workarounds

create index KREW_ACTN_ITM_TI# on KREW_ACTN_ITM_T (DLGN_TYP, DLGN_PRNCPL_ID, DLGN_GRP_ID);

Notes/Comments

Document Search

Remote SecurityAttribute callbacks from Kuali Rice Standalone Server for document search result filtering

Description

Document search does security filtering on each row in the document search result set that is returned. Document search supports the ability to create a custom SecurityAttribute which can be invoked remotely. The problem is that this gets invoked for every single row returned from the document search which uses this security attribute. This means each individual row results in a remote callback into a client application which is a large amount of network and bus traffic that could really be accomplished more easily with a single call.

Possible Solutions and Workarounds

Notes/Comments

Document Search becomes slow as KREW_DOC_HDR_T and KREW_DOC_HDR_EXT_*_T grow larger

Description

TODO...

Possible Solutions and Workarounds

Notes/Comments

Re-resolving Workflow Requests

Rule Change Re-Resolution

Description

When a change is made to a routing rule, it will trigger a requeue of any documents with pending action requests that could be affected by that rule change. This can often result in a massive amount of documents being requeued depending on the number of outstanding documents.

Role Membership Change Re-Resolution

Description

When a change is made to the membership of a role, it will trigger a requeue of any documents with pending action requests that could be affected by that rule change. This can often result in a massive amount of documents being requeued and processed depending on the number of outstanding documents.

Kuali Nervous System

SessionDocumentService causes poor performance when working with large documents

Description

See "Disabling of the SessionDocumentService" at the bottom of this page: UC Davis Production Environment Details

Kuali Service Bus

Quartz Deadlocks

Description

When an asynchronous message get sent on the bus and fails, it is scheduled for retry in Quartz. When many thousand messages fail around the same time (which can happen when an application is down), this causes a large amount of database contention on KRSB_QRTZ_LOCKS table because of it's use of SELECT ... FOR UPDATE.

Possible Solutions and Workarounds

Notes/Comments

Too Much Cache Flushing Activity

Description

All Rice client apps that are running embedded KEW or embedded KIM services publish an OSCacheNotificationService which essentially receives messages whenever cachable data is updated. This works well in general, but there are a few problematic scenarios:

  1. When a document type is updated, it sends out a flush messages for each document type in the hierarchy. It should really only need to send one to each cache endpoint and combine all relevant information into a single message. This can result in hundreds of messages sent to each individual endpoint.
  2. Whenever a rule is updated, it sends out a flush message for that rule for each document type in it's hierarchy. As with document types, it should be possible to send all related information in a single message. This can result in hundreds of messages sent to each individual cache endpoint.
  3. In many cases, certain clients might not even care about certain pieces of information. For example, the Kuali Financial System application is (probably) never going to load Kuali Coeus rules. However, if a KC routing rule is modified, KFS is still sent a message to let it know it can flush that rule from it's cache.

Misbehaving Clients

Constant Updates

Description

Some applications are using rice in ways that cause frequent (and hopefully unnecessary) database writes. This causes any caching solutions to be ineffective since the cache is always dirty.

One example is the way Kuali Coeus creates many KIM Roles on the fly, which results in lots of cache flush messages and effectively eliminates caching of roles in KIM (since the KIM cache flush is very course-grained).

Possible Solutions and Workarounds

Notes/Comments

OJB (Object Relational Bridge)

Persistence Broker Pool settings not allowing enough brokers in the pool to accommodate connection pool settings.

TODO

XAPool and JOTM

XAPool does not recover properly from a connection drop from the database server.

TODO