Hal was lucky enough to be able to do research work at Oracle, even before the Sun acquisition.
His talk was about how he approached the architecting and implementing a Jess-based distributed rules engine, supporting a large-scale system (called C0) that also relied on Coherence. Hal’s long experience with OODBs and app servers certainly came through.
Hal made a number of interesting points.
Hal needed to make sure that not too many resources are used to manage (in particular not more than what is getting managed…), and that the decision-making is located at the natural place to avoid introducing more cost and coupling within the system.
One of the key decisions was the separation between local decision-making and global decision-making – and ensuring their proper separation of concerns and coordination.
Given the complexity of the distributed system and the event nature of its dynamics, the system had to remain asynchronous and event-driven.
The rules engine was set up as the driver of the application, as opposed to a “consultant”. With that architecture choice, the rules engine ended up in a mode where it continuously evaluated the observed state, applying the declarative rules to the state transitions.
The rules architecture was set up in a hierarchy, with goals becoming more abstract and declarative as you climb the hierarchy. The organization allows for breaking down the processing into local problems – where you have most control and information to do so – enabling robust parallelization. Global problems, on the other hand, where handled within a global context. This approach not only allows for efficient robust parallel execution, it also allows for minimal communication and friction.
Given that all is communicated through facts, the execution ends up being in-database fact manipulation with some Java execution.
Hal makes the point that while declarative systems (enabled by rules-based approaches) do not perform any worse than procedural systems, they exhibit much better representation and management characteristics. Amen!
In terms of the implementation, Hal described the work that needed to go into integrating Jess and Coherence. In essence the work consisted in having a Coherence continuous query on the cache to trigger the engine. All the state remained stored in the cache.
He explored the use of backward chaining to select which facts to consider – ie modifying the continuous query on the cache – this is pretty cool.
The approach actually went further, leveraging rules for the configuration, the workflow, etc… Declarative rules-driven everything!
Hal did need to make 1 key modification to Jess – essentially adding a GUID to uniquely identify facts in the cache.
Some metrics corresponding to the tests made:
- EC based cloud deployment
- 500 VMs with 10 system VMs
- 5000 managed processes
- 15% CPU load on active system processes, 2% network bandwidth (very good), and 1% of host CPU for managed nodes.
The system has not yet gone into production.
Hal walked us through a few future considerations for this system. This include (not restricted to):
- Implementation of ECA state diagrams into rules
- And… dynamic provisioning of rules set (debugging, analysis, monitoring) – which would be a very interesting development.
More futuristic, although interesting: taking this same model and apply to BPEL/SOA.
Learn more about Decision Management and Sparkling Logic’s SMARTS™ Data-Powered Decision Manager