Skip to end of metadata
Go to start of metadata

Overview

As we decrease the direct database access from rice clients and expose services remotely, service-level caching becomes more important. Previously in rice we didn't approach caching with a standard comprehensive solution. This was problematic for many reasons explained later in the document. As caching starts to take a greater role in rice it is clear that we must have a well thought out plan for caching. The caching solution we are looking for must have the following properties:

  1. usable by developers without introducing bugs
  2. current (not built on dead technologies)
  3. concise (doesn't pollute the codebase with caching logic)
  4. flexible (works for most/many caching situations)
  5. supports client/server side caching
  6. tunable/customizable (max cache size, cache to disk, etc)
  7. supports distributed caching (will it work with the KSB?)
  8. performant
  9. usable by kuali clients for their own caching needs not just rice
  10. version compatible
  11. pluggable (allows using different caching implementations)

Legacy Caching

Currently rice does caching in two ways which are explained below.

RiceCacheAdministrator

The RiceCacheAdministrator handles local caching and distributed cache flush through a central service. To use the RiceCacheAdministrator you must do the following:

As you can see, this is very verbose. In fact, one might argue that the caching logic is hiding what we are doing in our service - reading and writing to a database. There is also the possibility of cache key conflicts with other rice code and rice client apps. In addition to the verbosity, the RiceCacheAdministrator has been plagued with performance problems. Finally, the RiceCacheAdministrator is using a dead technology - OSCache. The good thing about this approach is that client who wants to force a cache flush can go straight to the RiceCacheAdministrator without a service directly exposing cache flush methods. The other good thing about this approach is cache behavior can be tuned through OSCache configuration.

Service-specific solutions

Since the RiceCacheAdministrator was not acceptable from a performance perspective, many of our KIM services did their own thing with caching. The solution in the KIM apis has been to have several synchronized Maps along with exposing cache flush methods. For example:

This solution solved some of the performance problems of the RiceCacheAdministrator. This approach was even more verbose than the RiceCacheAdministrator. It also was very error prone; it was easy to incorrectly synchronize, store mutable objects, forget to flush, etc. There have been numerous bugs as a result of the complexity of rolling our own caching. Furthermore, there wasn't a standard approach for caching so every service was a little different. Finally, in order to support distributed cache flushes each service had to expose a cache flush method. This is known as a leaky abstraction. Caching by definition should be hidden from the client. Exposing cache flush methods on a remote service forces rice to support these methods in the future for version compatibility purposes.

Proposal

Spring 3.1 includes a declarative cache abstraction api. This is an annotation driven approach which significantly reduces caching logic. The only thing service authors should have to do is annotate service interfaces (or implementation code) with spring cache annotations. For example:

Then the service implementation would look like:

As you can see, all the boilerplate caching logic was magically melted away through the wonders of AOP proxies. When Spring creates a spring managed service (bean) it will automatically returns a proxy containing caching logic. This works great for most cases but falls apart when clients are calling services remotely. This is because the remote proxy is not created by spring but is instead created by the KSB (ServiceConnectorFactory). In order to handle this case, we will need to directly cache proxy our remote proxies.

To make sure the annotations are actually being read by Spring we must include the following in our spring xml files:

and declare a cache manager like:

Due to the fact that Spring is using proxies, there is a slight overhead in going through an extra layer. This will probably not be a problem but if it is Spring provides the option to use aspectj and aspect weaving. This will remove the proxying at the expense of complexity.

Gliffy Zoom Zoom Caching in Rice

Open Points

  • how to handle distributed flushes? Probably involves using EHcache on the backend and creating our own distributed cache flush impl that goes through the KSB. The basic design is figured out (see diagram) but we need to put it to code.
  • how to reduce traffic through KSB for cache flush notification that clients don't care about?
    • We have a possible solution for this: Have a config parameter (ParameterService parameter) an applicationId's opt out list. This parameter will be a list of cache names (as regex) and cache keys (as regex). Before sending out distributed flush messages it will consult each application's parameter value to see if it cares about a message.
  • Clients need to make sure they are not calling Soap services in loops - Will Kill rice. What do we do about that? Anything?
  • How do we name our caches? After service name? After fully-qualified object we are caching? We want to avoid the possibility of key conflicts. We want to be granular enough to allow tuning of specific caches. We also want to pick a scheme that will not change because this could affect version compatibility. Imagine a new Rice server is sending cache flush messages that old clients don't understand because a cache was renamed.
  • Do we have multiple cacheManagers and/or multiple remotable cache endpoints. One advantage is allowing different configs per cache manager. Would having a single soap endpoint for all cache messages be a bottleneck? This is addressed below as there are important considerations.

The Implementation

The above proposal has been put to code while addressing many of the Open Points listed above. Here is the code explained in more detail. To understand the various parts of the Spring Cache abstraction and the implementation it is recommended to read the spring cache documentation before going any further.

The Spring Parts

  • CacheManager: An interface that defines a way to retrieve a particular cache. This cache manager has a name and manages one of more Cache objects
  • Cache: An interface that defines a data structure to hold objects to cache. The cache has a name and can be thought of as a Map-like structure. In fact, some Cache implementations are backed by a java.util.Map.
  • Cacheable: An annotation to use on a spring-managed (or non-spring-managed w/ kuali extensions) bean to enable method caching. This annotation has two important parts. One or more cache name(s) to put the cached object in and the key to use for caching. Both should be present. It is recommended that cache keys be simple string (or primitive) values.
  • CacheEvict: An annotation to use on a spring-managed (or non-spring-managed w/ kuali extensions) bean to enable cache eviction. This annotation has several important parts. You must always specify one or more cache name(s). You can optionally specify either a clearAll flag to force the entire cache to be cleared or you can specify a cache key so that only one item is cleared from the Cache.
  • Spring annotation processor: an xml snippet to enable spring caching on spring beans. You must specify the CacheManager to use for caching. There are several optional settings that can be used on this declaration which wont be explained here.

Important: Due to the way rice is using the Spring Expression Language with Cacheable & CacheEvict annotations, rice must be compiled with debug symbols.

The Kuali Parts

  • CacheService: An interface that defines operations to invoke on a local cache. This is used in distributed cache operations. Currently only supports flush style operations.
  • CacheServiceImpl: The default implementation of the CacheService. It contains a reference to a CacheManager and invokes caching operations on it. Most standard Kuali apps will have multiple CacheService enpoints remotely available.
  • DistributedCacheManagerDecorator: A CacheManager that decorates an existing CacheManager. It adds distributed caching operations by retrieving a list of CacheServices deployed on the bus and calling each one asynchronously. In the future, this will only call CacheService enpoints that are interested in receiving a certain message. Although some of the diagrams on this page may suggest that the distributed cache messages execute immediately, they are actually queued up and sent in bulk at the end of a transaction. This means that our distributed caching is transaction aware. The queuing nature of this class helps decrease the chattiness of cache flush messages on the KSB. Important! Since all cache keys must generate stable soap values, all cache keys are coerced to a String by this decorator. This is why our cache keys should be primitive values otherwise we might be relying on unstable toString implementations.
  • CacheProxy: A utility class provides an extension to the Spring cache abstraction. This allows the proxing on non-spring managed beans with spring caching behavior. This is used for client-side caching behavior for remote proxies. See Spring enhancement JIRA

A Real Example

FooService.java
FooSpringBeans.xml
FooServiceBusSpringBeans.xml

Standards and Rules

Version Compatibility Rules

  1. Cache Names cannot change (use the object's namespace is a good way to enforce this)
  2. Cache Keys cannot change (may want to create a utility method for this on each object we are caching....)
  3. Always use simple keys (Strings or primitives)
  4. When doing a single evict (allEntries=false) object can only be present with a single cache key. *more on this below

Suggested Standards

  1. Only effectively immutable/thread-safe objects should be cached!
  2. One cache manager per module KimCacheManager, KewCacheManager
  3. One cache per top-level object Permission, Responsibility, etc.
  4. One remotely available CacheService per cacheManager *more on this below
  5. Use jdk style proxying *more on this below
  6. All Remotable services should cache.
  7. Always annotate service interfaces so remote proxies automatically get client-side caching

Notes on Standards, Rules, etc.

Many CacheService Enpoints: One CacheService endpoint per CacheManager allows client apps to use rice's caching infrastructure without sending distributed cache flush messages to apps that don't care. For example: KC exposes a remote service (AwardService) to KFS. KC hands KFS a fully cache annotated service interface. KFS and KC clusters can participate is distributed cache messages w/o bothering other Kuali apps that don't ever call the AwardService and don't have a AwardCacheService exposed remotely. Another interesting prospect is a kuali ecosystem may have rice installs with different "modules" enabled. This design allows the rice installs to only receive messages for the modules they have enabled (XXXCacheService available)

Spanning CacheManagers: This design cannot currently handle flushing across CacheManagers. This is a current limitation although in practice it may not be needed. For example: Say the GenericType object is used and cached in KIM and KEW (KimCacheManager, KewCacheManager). If a Kim api updates the GenericType object the KimCacheManager will handle flushing the kim module cache but the KewCacheManager's cache will be stale.

One way we can handle this in the situations that we definitively need to access another cache manager. We could execute the following code in the service implementation (in normal cases this should be avoided):

Same object, multiple cache keys: See Spring enhancement JIRA #2 for more info. Seems like we will be doing a lot of @CacheEvict(value="cache_name" allEntries=true) because the same object may be present under multiple cache keys. Not exactly sure what to do about this...We could have a cache per method but that will be hard to manage. Maybe the underlying caching implementations can handle this for us?

One way we can handle this in the situations that we definitively want to avoid flushing an entire cache. We could execute the following code in the service implementation (in normal cases this should be avoided):

jdk proxying? With the Spring caching abstraction you can either proxy a service to inject the caching logic (like a decorator) or use bytecode weaving with aspectj. Proxying is a simpler solution while less performant than aspectj. Unless jdk proxying becomes a significant bottleneck (which seems doubtful), then using code weaving should be an option implementers can turn on but not enabled by default. Tuning the cache setting (like ehcache settings) is probably a more important thing to do than proxy versus code weaving.

pushing/priming: Distributed cache updates (pushing updates to clients), cache priming or cache warming is currently not supported.

where to cache? Although we have primarily targeted our remotable services for caching. There is no reason why caching couldn't get used anywhere in rice or a client application. We just need to be mindful of the version compatibility rules.

caching mutable objects? This depends on the implementation of the caching framework. If using ConcurrentHashMap as a caching implementation, then mutable values should NOT be cached. If using ehcache then mutable values can be cached as long as the cache is configured correctly to do a defensive copy. The safest rule of thumb in rice is to only store immutable values in a cache. This gives implementers the greatest flexibility in regards to what caching implementation to use.

duplicate cache flush messages: This is the biggest drawback to this design. The server has to be the entity to send out the distributed cache flush messages. Why? This is because the server knows if a destructive call succeeds and therefore causes a stale cache. Since the server does not know which client made the service request, the server will send out a cache flush message to the calling client even though the client already cleared his own cache. If there was some way to pass along the instanceId of the calling client this could be avoided. It appears the RiceCacheAdministrator (RiceDistributedCacheListener) has the same limitation if used for client and server side caching. Maybe, the KSB could maintain a ThreadLocal variable that contains the calling client's applicationId, instanceId, etc. It could do this through some interceptor style pattern. The interceptor would need to make sure the variable is cleared even when exceptions happen.... The thread local idea is kind of a code smell but may be just what the doctor ordered in this case.

make sure we support bundled: This should be working now but we need to confirm that when in dev.mode in a bundled architecture this still works correctly.

no compile dependency on ehcache: By using Spring's Cache Abstraction there is no need to compile against any ehcache APIs. In fact, the maven dependency for ehcache is runtime only (which could even be switched to optional). It's important that we be mindful of this in the future because this allows implementers to switch ehcache for some other solution (like JBoss' native caching support).

cache keys: Cache keys should be made up of the important arguments to a method and optionally the method name. They key is meant to uniquely identify a method's return value in a cache. A few examples are:

Caching Administration UI

Requirements

The caching UI should allow a system administrator visualize the "local" caches in a running instance of a cache enabled Kuali Application. The administrator should have the ability to trigger a distributed cache flush of cached item(s). To demonstrate the items that must be displayed on this ui see the following example:

  • KimCacheManager
    • RoleCache
      • CacheEntry (id-1)
      • CacheEntry (id-2)
    • PermissionCache
  • KewCacheManager
    • DocumentTypeCache
      • CacheEntry(ParameterDocumentType)

With the above example, an admin should be able to do the following:

  • Flush All CacheManagers (KimCacheManager, KewCacheManager)
  • Flush KimCacheManager
  • Flush RoleCache in KimCacheManager
  • Flush CacheEntry (id-1) in RoleCache in KimCacheManager

Access to the screen and flush actions must also be locked down through KIM Permissions.

Non-requirements

  • We have not identified the need to do a non-distributed flush through the UI (local flush).
  • We have not identified the need to do a complete flush of all caches across the Kuali-ecosystem from a single point. For example: If you wanted to flush KFS specific cache you would have to login to the KFS admin screen to perform that action rather than pushing an uber-flush button from rice.
  • We have not identified the need to dynamically disable caching from a UI on a running application

Notes

We should probably use KRAD to produce our UI. This will be a good "dogfood" test for KRAD. This is non-traditional in that cache information is not backed by a database or DataObjects like most KRAD screen have. We could certainly make DataObject representations of all the cache information. If we did that, we may want to be careful and load the information lazily (possibly via ajax or something) because may not want to iterate over all cache information to render a screen. There could be considerable overhead in doing that…

Putting it all together

Below are a couple pseudo examples of UML sequence diagrams to help illustrate a couple standard call flows.

Gliffy Zoom Zoom Rice Client Caching Get From Cache Gliffy Zoom Zoom Rice Client Caching Flush Cache

Implementation Plug Points

One critical piece of this design is the ability to plugin in different cache implementations with very little impact to the rice codebase. Why would you want to do this? Simply put some applications servers or infrastructures have alternative caching frameworks that have advantages over what we provide with rice. In order to achieve this the rice team (and other Kuali apps) must make an effort to NOT directly use a caching framework in code but to always go through Spring's caching abstraction. In rice we will achieve this by making our default caching implementation (ehcache) a runtime or optional dependency. Remember: the following hints for customization will have to be done for every module of rice and every cache enabled kuali app.

Option 1: replacing the default caching implementation

To do this you must replace(or override) the following Spring entries for the local CacheManagers. For example:

Could be replaced with:

Option 2: replacing the Distributed CacheManager

Doing Option 1 changes the caching implementation but still uses the Kuali Service Bus for transaction-aware flush messages. Many Caching implementations already provide these facilities. You could remove or replace the following following:

Doing this may mean that the CacheService enpoints are no longer used so the following entries could be removed as well:

And finally remember to update the cache section of the Spring files like the following:

Option 3: Aspect Weaving

There have been some concern that rice's choice to use jdk proxying may cause some overhead. To switch to aspect weaving which is more performant change the following:

to

You must also include the spring-aspectj.jar on the classpath.

References

Kuali Rice JIRA
Design/Code Review

Spring Cache Abstraction
EhCache
Spring enhancement JIRA #1
Spring enhancement JIRA #2
Spring bug JIRA #1

  • No labels

10 Comments

  1. regarding naming... for remotable object what if we used namespace/name. example for Parameter: http://rice.kuali.org/core/2_0/parameter. This will always be compatible and would not cause conflicts ever. Kind of a long name for a cache though.... Also, not sure what we do in places where we want to use caching but not for remotable objects.

  2. Another couple things rice has never really supported are: distributed cache updates (pushing updates to clients), cache priming or warming. Something to think about.

  3. Important from the Spring Docs:

    Note
    <cache:annotation-driven/> only looks for @Cacheable/@CacheEvict on beans in the same application context it is defined in. This means that, if you put <cache:annotation-driven/> in a WebApplicationContext for a DispatcherServlet, it only checks for @Cacheable/@CacheEvict beans in your controllers, and not your services. See Section 16.2, “The DispatcherServlet” for more information.

  4. Table 28.1. Cache SpEL available metadata

    Looks like you can get the name/class of the target being invoked. I believe this would allow us to figure out if it is a remote proxy and thus not send distributed cache flush messages. This might make for a clean solution actually.

    1. handling this in another way. The remote proxies only deal with the local cache while everything else uses the DistributedCacheManagerDecorator. Much cleaner than the idea above.

  5. Since we are creating a distributed cache we cannot use the "Default Key Generation". We must always specify a cacheKey. The cacheKey must be made up of all the important parameters of the method. We may want to settle on the key using the following format: arg1Name=arg1Value, arg2Name=arg2Value.... Note that the cacheKeys should NOT change across rice versions so I suggest we use the names of the args in the @WebParam annotation.

  6. After we fully add caching in a consistent manner we need to go through rice and make sure we are calling the methods that are doing caching and not dropping down to the Dao layer or using BOService....

  7. going to want to make sure we have a decent amount of methods on our CacheEndpoint so that client can clear a key from multiple caches or a client can clear multiple keys in one request. We don't want client calling distributed message sends in a loop. bad client!

    1. Actually, I'm not sure we need this with our current design. Traditional clients will probably not call the CacheService directly except through the DistributedCacheManagerDecorator.

  8. Interesting problem.

    With the cache endpoint this will be a service exposed on the bus. Normally our rice services must be typed to a specific type jaxb understands. In this case though keys and values can actually be any type of Object. Realistically we want keys to be some safe type like a String. So we could make our cache implementation coerce keys to a String with the caveat that the key you are using better have an overridden, well-behaved toString method. OK, the key problem is solved but is this what we want? What about values? This isn't a problem right now because our cache endpoint will only deal with eviction. So there isn't any methods on it that takes the "value". Of course supporting put type operations (for cache priming, etc) would cause other challenges like does this value I'm broadcasting exist on application X? for example: KC broadcasts "I just updated Foo please proactively stuff the new value in your cache cause you are gonna want it for sure." KFS says what the hell is Foo. I don't understand.... Maybe these types of operations will be something the cache infrastructure will never support.