Do you remember how Artificial Intelligence (AI) became taboo?
Many of us were fascinated by Artificial Intelligence a few decades ago. This discipline carried an extraordinary potential. We started dreaming about expert systems that would outperform humans, systems that would write themselves and other fantastic progress outpacing our own capabilities. We certainly did a great job exciting our imagination leading to a long list of sci-fi movies and other fantasies.
Then winter came
The technology ended up disappointing the masses. We wanted to believe in the incredible potential but realized that the current approach was much more limited than we had hoped. I personally believe that the industry mostly failed in managing expectations. The technological progress was nothing to be ashamed of. Lots of techniques, algorithms and new approaches came out of this period but living up to those over-inflated expectations was unrealistic.
I was personally involved in Expert Systems back then. Neuron Data had a great product: Nexpert Object. In other posts I might describe some of the projects I did back in the 90’s. It was very exciting… although, admittedly, the systems did not write themselves. The skills required to transcode the business expertise were not common. The learning curve was steep.
It was harsh, certainly, to bury all those efforts and banish the terminology altogether. The dream did not vanish completely though. AI remained alive in the collective imagination. More movies and books kept satisfying our hunger for that dream of fabricated intelligence that could help humanity — or destroy it as we enjoy fearing in those movies –, that could approach it to the point we could confuse an android with a real person, or that could perform amazing tasks.
AI technology also survived the winter in hibernation. Those passionate AI researchers looked for more realistic objectives for the technology. I read once in a book — back in 1997 — a great expression that I have shared more than once with some of you in the past: “From Artificial Intelligence to Intelligent Agents”. If nothing else, this book helped me realize back then that we needed to change our approach to AI. Instead of the monolithic expert system, there was an opportunity to add intelligence in small specific tasks distributed over the network. The budding concept of Decision Services was born. It may or may not be a coincidence that Blaze Advisor and ILOG JRules were conceived around that time. The BRMS movement was another perspective on AI, still focused on adding intelligence to our systems but intelligence that came straight from the Business Experts, intelligence that was under their control the whole time.
AI spanned much more than one technology in reality and many other fields of research kept investing and refining the technology for the same purpose of making systems smarter. It is no wonder that, after a long and rigorous winter, AI is finally able to bubble up to the public once again, this time as a more mature discipline, less ambitious in many ways and more accomplished too.
And now we are at the dawn of the Summer of AI
BRMS and Decision Management is certainly a topic I am passionate about but, looking around me, I realize the phenomenal progress and applicability of other techniques (that I am also very interested in although not as dedicated to).
As a proof that AI may have finally become an accepted term for the public (again), I have collected a few pieces of evidence. The list is long. I decided to focus on a couple of recent articles.
How could I ignore the Watson phenomenon? This is most definitely the triggering event for this flooding of AI publicity. Granted IBM had Deep Blue playing chess in the 90’s, but we may have considered the game too structured to recognize the talent. Now, beating the Jeopardy champions is a greater challenge since it requires more than brute force. Winning the game requires an amazing ability to deal with a general lack of precision that does require “intelligence”. For the first time in a very long time, the Press was impressed by the performance of the machine, by its intelligence. Not that the world is desperately looking for greater Jeopardy champions, but the idea that such technology could be used in other contexts where precision is approximate at best, where data is partially know, is extremely appealing. Think about call centers or emergency situations where humans are pressed to make decisions when data-points are lacking.
Factoid: Joseph Bigus, who wrote the book I quoted earlier, is a Senior Technical Staff Member at the IBM T.J. Watson Research Center. AI is a small world I keep realizing.
After the wave of press coverage — I was going to say tsunami but decided not to out of respect for our Japanese friends — for the Watson project at IBM, more thoughts converged on the usability and potential usefulness of AI in other areas. Peter Norvig elaborated on the progress made by AI in this great article. I like in particular his analysis on the limitation of Expert Systems: reliance on Experts interviews.
Learning turned out to be more important than knowing. In the 1960s and 1970s, many A.I. programs were known as “Expert Systems,” meaning that they were built by interviewing experts in the field (for example, expert physicians for a medical A.I. system) and encoding their knowledge into logical rules that the computer could follow. This approach turned out to be fragile, for several reasons. First, the supply of experts is sparse, and interviewing them is time-consuming. Second, sometimes they are expert at their craft but not expert at explaining how they do it. Third, the resulting systems were often unable to handle situations that went beyond what was anticipated at the time of the interviews.
From a Decision Management perspective, we do face a similar challenge but that would be the topic for another post.
The third proof of the Summer of AI is the recent Turing Award going to Leslie Valient. AI is popular again. Although Leslie’s work is not very recent, he is being recognized now for his contribution to machine learning.
I could go on and on. You have probably seen articles on AI in general newspapers. Summer is here.
Man Versus Machine
The main difference between the old days of AI and the new Summer that may be starting now is the role of the Machine. We dreamt of Machines that would be able to replace Humans.
The possibility that one day Machines will replace humans has of course been at the center of long debates, raising deep issues, going as far as making us question what being human really means. Kurtzweil has famously argued that the Singularity is near and will have profound implications on human evolution (we will transcend biology, he claims). On a more negative note, Bill Joy wrote a famous article in Wired in 2000, in which he worried that we will in effect lose control of our technology and run the risk of becoming an endangered species. Recently, the Atlantic published a long article on Mind vs Machine, in which a more nuanced approach is taken – yes, Machines may well pass the Turing test but that does not signify a path towards irrelevance for humans.
The reality in my mind is that we need Machines that can augment Humans. We need better processing power to supplement our human limitations but we are not ready yet to let a Machine make the final decision. Think about Healthcare for example, it is appealing to think that a virtual doctor could have access to the latest and greatest research on every possible topic and would be able to compare and analyze all possible treatments including the side effects and possible risks. Hasn’t Star Trek (Voyager) already painted that vision with its android doctor? But in the end, we like that a real person is making those life and death decisions with ethical safeguards we would be hard-pressed to implement completely and accurately for a machine.
John Seely Brown, from the Deloitte Center for the Edge and author of “The Power of Pull”, commented in a recent article in the NY Times that machines that are facile at answering questions only serve to obscure what remains fundamentally human. My take is that the success of AI resides in the ability to combine both. If we could, the Machine’s incredible power with the unique intuition of Humans, we could get the best of both worlds.
In part 1 of this blog series, I covered the Rete Algorithm and its origin and even the origin of its name. Now as promised, I am going to explain how it works. The Rete Algorithm is one of several decision engine algorithms used in modern rule engines such as Sparkling Logic SMARTS, to learn more about decision performance and the decision engines that drive today’s automated decisions see our recent blogpost, Decision Engine Performance.
It is a challenge to explain how the Rete algorithm works simply, so as not to lose our less technical audience and yet keep things interesting enough for the techies. I hope I will reach that goal. And without further ado, let me introduce the Rete algorithm!
How it works
The brilliant idea that Dr. Charles Forgy developed was to decouple the evaluation of hypothesis from execution sequencing.
Why is that brilliant you might ask? When large volumes of business rules need to be assessed against facts or data, the constant screening for applicability can be costly — both in terms of evaluation (and re-evaluation) and in terms of ordering the statements properly. This innovative approach for inference allows savings in both areas. Let’s explore how.
In absence of this algorithm, or derivatives, the traditional approach is to sequence and nest the IF-THEN-ELSE statements of the rules in such a way that you capitalize on the test evaluations — you reuse them as much as possible. When you think about it, it would be akin to creating the leanest decision tree possible, in a graphical form or more likely in straight code. This “sequential” approach breaks (mostly in terms of performance) when rules need to execute as a result of the execution of other rules. This is what we call inference. In a sequential approach, you would have to restart the complete ruleset evaluation after each rule execution (if rules are prioritized) or after a full pass through the ruleset (in absence of prioritization). Keep in mind too that prioritization might prevent you from reusing test evaluations all together in a sequential approach: if the conditions are not co-located in the logic, then you will likely have to perform those tests multiple times. Systems have become much more sophisticated than they used to be back then, and most BRMS products provide a sequential deployment option which is still pretty good when inference is not needed. The beauty of those products is that you can still manage the rules independently of each other and the sequential engine takes care of optimizing the generated spaghetti code.
For illustration, I’ll use a simple set of rules that we are mostly familiar with: airline miles calculations. The decision service is responsible for processing the frequent flyer’s activity.
if award miles for last year or current year > 25,000 then status = Silver
if award miles for last year or current year > 100,000 then status = Gold
If flight is less than 500 miles then award 500 miles
If flight is 500 miles or more then award flight miles
if category is business or first then award 50% bonus miles
if status is Gold and airline is not partner then award 100% bonus miles
if status is Silver and airline is not partner then award 20% bonus miles
if member signed up for 3-flights-for-5k-in-March and number of return flights in March 2011 = 3 then award 5,000 additional miles
if status becomes Gold then award 8 upgrade certificates
I will live it up to the creative minds of the Airline marketers to add more promotions to our short list here: hotel or rental car bonuses, incentive programs, lifetime members, accelerator programs, etc.
This example does not use a lot of inference but that is enough to illustrate our point. If you run through the rules, sequentially — let’s keep the current order — and this Frequent Flyer member reaches Gold status in that very transaction, we need to process the rules once more to update the status and award the proper bonus miles and other incentives.
Inference engines assess all the applicable rules and then fire the first applicable rule, propagate the changes and fire again. How does that work?
Constructing the Rete Network
The Rete network is the heart of the Rete algorithm. It is made of nodes that each hold a list of objects that satisfy the associated condition. The original Rete algorithm worked out of facts but I will simplify the description to refer to objects since all commercial engines have evolved to be object-oriented nowadays.
The Rete network is constructed when you compile a rules project — only once when you start that service (or class for simpler designs) — and then shared across all invocations.
The discrimination tree is the first part of the Rete network. It starts with Alpha nodes that are associated with Classes (in the object-oriented sense). All instances of a given class will be listed in any given Alpha node. The discrimination happens by adding conditions as single nodes attached to the Alpha node or to another parent node.
Let’s review what that means for our Airline example:
Alpha nodes are created for each class: Frequent Flyer Account and Flight.
Conditions are then appended:
- Frequent Flyer
- status is Gold
- status is Silver
- award miles for last year or current year > 25k
- award miles for last year or current year > 100k
- signed up for “3-flights-for-5k-in-March” promotion
- number of flights in March = 3
- miles >= 500
- miles < 500
- airline is not a partner
- airline is partner
- category is business or first
Each node represents an additional test to the series of conditions applied upstream. If you follow a path from top to bottom, you should be able to read the complete set of conditions THAT APPLY TO ONE GIVEN CLASS OF OBJECTS.
Finally the nodes are connected across classes. This is where we can combine the conditions for a non-partner flight taken by a Gold member. We call those “joints” as we will combine the list of objects that verify conditions on one branch with the list of objects that verify the conditions on another branch.
On the performance side, you may have been warned in the past not to combine too many patterns in a single rule. It is clear looking under the hood that the product of all possible accounts by all possible flights could result into a combinatorial explosion. The more discrimination upfront, the better obviously.
The path eventually end with the action part of the rules. The content of the actions is irrelevant for the Rete network. you could replace the labels here with the name of the rule (I did not name the rules in our example so I displayed the actual actions). The rules can be reconstituted by following the incoming paths. All nodes are AND-ed. This is an interesting point. ORs are typically not natively supported: rules are duplicated for each OR-ed flavor. If you were to add a rule to grant the same bonus for SILVER or GOLD customers traveling to Hawaii then you would just have nodes connected to the existing GOLD and SILVER nodes as if they were written as separate rules. In terms of performance or maintenance, it does not matter though since the Rete algorithm handles the supposed duplication as efficiently as any manual code would.
It is a great time to emphasize the great savings you would get when the number of rules increases. Let’s say that you want to introduce a new promotion for GOLD customers to/from specific destinations or for reached milestones (50k and 75k award miles). The nodes that test for GOLD status do not need to be created or duplicated. The test as well as the execution of the test is leveraged and reused transparently regardless of the order in which the rules need to be executed.
Rete Cycle: Evaluate
Let’s look now at what happens at runtime when you invoke that service with customer data. Your design will dictate whether those transactions are applied “real-time” (or on-demand) or whether you want to execute a batch of transactions at once, at night when you have access to your Mainframe window for example. Rules can actually be used either way and in many projects, they are actually used both ways by two different systems. For an Insurance underwriting project, you might deploy the same rules online in your self-service website as well as your older batch application that processes the applications coming from your brokers.
Let’s assume that we are processing transactions one by one for simplicity.
The evaluation phase consist in running the data through the Rete network to identify the applicable rules, for which all conditions are satisfied. The nodes in the network will hold lists of objects that satisfy the conditions in the incoming path.
In our airline example, Joe flies from Washington DC (IAD) to San Francisco (SFO). Flight accounts for 2,419 miles I recall. Starting with already 150k award miles, Joe should bank double miles (4,838) for his GOLD status. How does Rete propagates those facts to make the same decision?
The Account Alpha node references Joe as a Frequent Flyer. Because of his 150k award miles, he also qualifies for the “> 100k” condition. He is referenced in this node too. My handwriting did not allow me to write down Joe in the small real estate so I just highlighted the node with a X. In reality, Joe would be referenced specifically as we need to keep track of who qualifies in case we process multiple objects at once. For this exercise, I also assume that service does not store the GOLD status and computes it every time so the associated node still has an empty list.
Similarly, the IAD-SFO flight is referenced in the list of all flights in the transaction. The flight is more than 500 miles and on the airline. The associated lists reference the flight accordingly.
At that point in time we do not have any joint to worry about since the account status is not known yet to qualify for GOLD.
2 rules are satisfied. All facts have been propagated. We are ready for execution.
Rete Cycle: Execute
The rules for which all conditions are satisfied are said to be active in the agenda. The agenda contains the list of all rules that should be executed, along with the list of objects that are responsible for the conditions to be true. I insist on the word “SHOULD”. The agenda will sort them according to their priorities (and other conflict resolution methods). If the execution of a rules invalidates a rule with a lower priority, then this second rule will logically never execute. This would have been the case if Joe had started with 99k award miles – he would have qualified for the SILVER bonus until we bank the IAD-SFO miles and granted him GOLD status. Joe can qualify for SILVER or GOLD bonus miles but certainly not both.
Priorities are a touchy subject in the Rules community as some purists are set against its use.
The agenda lists the 2 rules (GOLD status and 500+ Flight Miles). The first one is executed. In our airline example, order does not matter for those 2 rules. Let’s assume that flight miles are awarded first. Joe gets 152,419 miles in his account so far.
Do it again and again…
After the execution of the first rule, facts are propagated again. Flights have not changed so none of that subtree is re-evaluated. Joe’s account has changed so the conditions related to award miles are re-assessed, with no change in our example. The GOLD status rule remains active in the agenda.
We are ready to execute the following rule: GOLD Status which is the only rule left in the agenda.
As a result, the GOLD Status node is updated for Joe, which leads to the GOLD status and not partner node to update as well, which leads to the 100% GOLD bonus rule to become activated in the agenda.
In the end, Joe gets the proper credit of 154,839 in his award miles account.
As you add more rules that apply to GOLD or SILVER customers, the RETE network will grow always reusing the same nodes as much as possible — which is the part that takes time to do properly by hand if you are still coding it manually. Those spaghetti in the “Joint” part of the algorithm are a nightmare to maintain properly and can cost you significant debug time. This trivial example can become quickly messy when you consider the many different ways you can qualify for status with eligible segments rather than eligible miles, with accelerators, with lifetime miles, as well as the incentive promotions that you might get by default and those that you need to sign up for. Reusing those conditions — and their execution — regardless of when and where they show up is a performance boost that largely compensate for the initial overhead when inference is needed.
When inference is not needed and you only go through one cycle of propagation, it is debatable of course.
In my opinion, Charles opened the door for an algorithmic solution to properly ensuring the enforcement of many rules in an uncertain order. In other words, he created an approach where very large volumes of rules could be assessed in a short time — much faster than any other algorithms when rules need to be executed repeatedly on a given data set. The further performance improvements he introduced in Rete II, Rete III and lately in Rete-NT have also changed the initial “ideal” characteristics of a Rete project to better support much larger data set — what was marketed as the Rete wall by the non-Rete vendors.
It is somewhat surprising that no new algorithm break-through (other than Charles’s) were made since then. The basis for Business Rules execution is vastly based on Rete, or sequential, but hardly anything else. It reminds me of the sharks and their evolution. I remember, in my life science days, learning that sharks have not evolved at all in a very long time… likely because they are perfectly adapted to their environment.
Editor’s Note: This post was originally published in March 2011 and has been updated for accuracy and comprehensiveness.
In Part 1 of this blog series, we introduced the Rete Algorithm and its origin. I ran a poll on the origin of the Rete name. Thank you all for your participation!
Let me now share the results of our little poll on the origin of the name……………………
Let me remind you the options:
- RETE – per the Latin word for net
- RETE – per the Latin word for comb, which refers more precisely to the first part of the network called the discrimination tree
- RETE – per the Latin word for comb, in that case referring to the complex structure of a honeycomb
- RETE – per the Italian word for network or fishing net
- RETE – per the English word for an anatomical network of blood vessels and nerve fibers
- RETE – per the English word for the pierced plate on an astrolabe, having projections whose points correspond to the fixed stars
- RETE – as Rete California, Sonoma County’s ultimate shopping destination for guys and gals with discriminating taste in designer brands, referring to the complex network of shops
And the answer is in this short video! Do check it out as the answer is going to surprise more than 9 out 10 of the respondents!
I want to thank Michael for his Xtranormal skills.
To compensate for the “fun break” in my Valentine posting, I decided to tackle a much more technical and deep subject: The Rete Algorithm, and in this post more specifically, the origin of the Rete Algorithm.
As you probably know if you have read any Rules material, Rete is the dominant algorithm out there. There are actually 2 schools: those that believe in Inference and those who don’t. Rete is the foundational algorithm for executing rules in Inference mode. I discussed in an earlier post the difference and importance of the various algorithms used in commercial BRMS. You certainly do not need to know how it works to use those kinds of technologies — you do not need to understand mechanics to drive a car. That being said, I know that some curious minds out there do enjoy a little sneak peek under the hood now and then. In today’s post, I will focus on the origin of the Rete Algorithm. In part 2, I will explain how it really works.
My First Encounter
Dr Charles L. Forgy developed this infamous algorithm in the late 70’s but it took a little while for the algorithm to become widely popular, when Business Rules technology finally emerged. I do not recall that Rete was taught at school back then when I got my Masters in Computer Science. We did a fair amount of Artificial Intelligence though. When I was a young consultant, I got my hands dirty with Expert Systems before I was introduced to ILOG Rules. As I joined the company, I became a big fan of the technology and eventually JRules when it was released. I remember when Changhai Ke in person came to the US to teach us the Rete algorithm and I finally learned the internals of this great technology. Of course ILOG, Blaze and most inference engine providers have made slight modifications to the algorithm to improve its speed, or sometimes add capabilities. Unfortunately you will have to wait until my next post to dive deep in the algorithm.
What surprised me the most though through the years is that there is actually little literature on the algorithm itself. I was shocked when I joined Blaze Software and only a handful (if that many) knew how it actually worked inside. As I finally accepted, you do not need to know how it works to use it but I thought that, as vendors, we had a duty of knowing “mechanics”. It was over 10 years ago and I was still young and purist back then.
Rete was the product of Charles’s work at Carnegie Mellon University. While the algorithm has been refined from the original working paper to his PhD thesis, we often reference the final paper published in 1982. The genius of the approach is to keep in memory the “status” of individual condition evaluations removing the need for constant calculations. As a result, rules project that are by definition pattern-matching intensive can be evaluated much faster, especially as new facts need to be propagated, than using a systematic approach.
“Inference” projects are the ideal use case for this approach since they make interrelated decisions — rule1 (High Spender) implies rule2 (VIP Customer) which implies rule3 (Free Shipping), etc. — that would be tedious to recompute constantly as we look at facts. In this example, a new fact of a significant purchase return might affect the initial decision eventually leading to shipping charges to be added to the invoice. This is a simplistic example since we typically have all facts upfront in a checkout service. Most academic examples of inference projects are more complicated to describe in business terms as they involve monkeys trying to reach a banana by jumping on tables and chairs, or guests to seat next to each other based on interest.
The best usage of the Rete network I have seen in a business environment was likely Alarm Correlation and Monitoring. This implies a “stateful” kind of execution where alarms are received over time as they occur. When you consider that the actual Telecom network is made of thousands of pieces of equipment to keep in “mind” while processing the alarms, there is no wonder that Rete outperforms brute force. When one alarm is raised on one Router, the Rete network does not have to reprocess all past events to realize that we reached the threshold of 5 major alerts on the same piece of equipment and trigger the proper treatment, eventually providing a probable diagnostic. Hours of processing time turned into seconds in Network Management Systems. Fabulous.
All Business Rules projects do not require inference though. When you have all data known upfront and there is very little back-and-forth between the rules, then sequential maybe as fast or faster than Rete. When Inference is needed though, per a French saying “y a pas photo” — meaning literally “no need for a picture”, there is no doubt.
Origin of the Name
I heard many different origins for the name of the algorithm. They all refer to the structure created in memory, the Rete Network, that looks like a complicated mesh of nodes and links. I wonder if you will guess which one is the real one! I will publish the real answer given to me personally by Charles in the next post!
Technically the answer is not an acronym so the right way to refer to RETE is not in all-caps. So I should correct it and from now on refer to it as “Rete”. I have done better over the years but I still like uppercase!
The original Rete algorithm evolved over the years.
Most BRMS vendors developed their own algorithm that are known in the elite circles as XRete or uni-rete, which involved mostly performance improvements and built-in support for collections for example.
Rete 2 is the version that Charles developed in his own engine called OPS/J. The performance improvements broke the legendary “Rete wall” which referred to the dramatic performance degradation when the number of objects in working memory increased.
Rete 3 is the evolution of Rete 2 developed for RulesPower and subsequently integrated with Blaze Advisor. I am not at liberty to tell you much about the trade secrets in there. To avoid any wrath from FICO, I will not say anything.
So what is next?
We announced in our blog last year that Charles had come up with yet another variation on the Rete algorithm that dwarfs the performance of its predecessor. You might wonder why he did not call it Rete 4, following the established naming scheme…
Was that in celebration of Windows-NT which broke also the number sequence previously established?
I will not make you guess this time! Rete 4 is the name of a private Italian television channel and was not available for grab without copyright infringement at worst and confusion at best. For some time, Charles referred to the algorithm as TECH but acknowledged that it was not Google-able so decided to rename it Rete-NT.
I am looking forward to part 2 now… Talk to you soon!
It is hot off the press and not yet publicly announced (the website has not even launched) but we heard from chairman Jason Morris that Rules Fest will come (again) to us in the Bay area. Take note in your calendar, the show will take place at:
Monday through Thursday, October 24-27, 2011
San Francisco, CA
I hope to see you there! Stay tuned for more details as they get released.
David shared his extensive experience with building rules-based systems, starting from scratch building an engine, a whole framework around it, and using it.
His example – 5 boxes, two red, two green and one blue – highlights the problem in interpreting very simple statement.
Take: (box (color ?c))
The “truth” perspective: What (distinct) colors are used by the boxes?
The “implementation” perspective: What color is each box?
The discrepancy, while cartoonish in this case, is true. This is why I believe we cannot really use low level languages without a lot of training, and this is why a non trained programmer – let alone a “business” type – cannot really leverage them. Even worse, they may think they can leading to potentially huge consequences.
David expresses this as: (paraphrasing) “we need to be explicit in our rules engines, we need to remove the ambiguities as early as possible in the expression chain”.
To illustrate the power of syntax and extended engine support for it (and I apologize if I misunderstood):
David’s example in terms of low level syntax
when (customer (state ?s)) then (logical-assert(customer-in-state ?s))
when (customer-in-state ?s) then (print ?s)
Supporting David’s point, the same in Blaze Advisor’s SRL (the pattern declarations are re-usable through rules and can have their own additional filtering clauses):
customer is any customer.
s is any state.
if (at least 1 customer satisfies it’s state is equal to s) then print s.
David went into “fact collisions”.
Take the following:
customer N is preferred if their volume is high
customer N is preferred if they’ve ordered recently
customer N is preferred
This last fact is unconditional. When it is retracted, what do we really mean? To retract, or to blow away the truth.
David makes the point that this is a problem at the low level, and that business rules systems do not surface it. I agree – this is precisely the reason by the engines supporting BRMS have invested in syntax and engine extensions and contraints.
David makes the following distinction between types of rules:
– Logical (stateless) rules
– Application (stateful) rules
With Application rules, keeping track of the state of the world (or the truth) is a core issue
I am not a fan of this distinction, and I do not think it brings a lot to David’s very good points. First, the issues with dealing with truth also exist in the typical stateless invocations simply because most decision is multi-step – not just one rules set/group activation, but many of them at different points in time through different conditions. Furthermore, the distinction between stateless invocation, short duration stateful invocation, long duration stateful invocations, are too coarse: it’s a fairly gray area when you look at what the industry does with these engines.
This is why the engines in BRMS implementations are not any simpler than those used at the lower level: they also have to cope with that and need to minimize these issues. They are actually more complex in their implementation – most of them started with well known implementations and extended them.
But except for that this difference in opinion on this classification, I agree with David’s points.
Another thing I would like to throw into the discussion is that we do need to get out of the closed world assumption whenever we end up dealing with anything connected to the rest of the world. We can simply not make the assumption that all we need to know is known or even knowable. That has a huge implication on this whole “truth” issue.
I am not going into a theoretical academic discussion here – I am going into the very concrete requirement that we need to engineer into the rules engine technology some level of support for open world realities – and I really mean “some”, as in at least a practical workable compromise. Some BRMS rules engines do support things like “unknown” as a potential value for anything. Others combine that with the support for things like “unavailable” – allowing to make the distinction between what can be knowable through opportunistic backwards chaining to get to the value, and what cannot has any chance of being known in the corresponding context. All the while allowing rules to actually reason against “unknown”, “unavailable”, “known”, “not known”, “available” values, etc…
This is not even special to rules engine. Witness what happens with DBMS and the semantic (ab)use of NULL.
Hal was lucky enough to be able to do research work at Oracle, even before the Sun acquisition.
His talk was about how he approached the architecting and implementing a Jess-based distributed rules engine, supporting a large-scale system (called C0) that also relied on Coherence. Hal’s long experience with OODBs and app servers certainly came through.
Hal made a number of interesting points.
Hal needed to make sure that not too many resources are used to manage (in particular not more than what is getting managed…), and that the decision-making is located at the natural place to avoid introducing more cost and coupling within the system.
One of the key decisions was the separation between local decision-making and global decision-making – and ensuring their proper separation of concerns and coordination.
Given the complexity of the distributed system and the event nature of its dynamics, the system had to remain asynchronous and event-driven.
The rules engine was set up as the driver of the application, as opposed to a “consultant”. With that architecture choice, the rules engine ended up in a mode where it continuously evaluated the observed state, applying the declarative rules to the state transitions.
The rules architecture was set up in a hierarchy, with goals becoming more abstract and declarative as you climb the hierarchy. The organization allows for breaking down the processing into local problems – where you have most control and information to do so – enabling robust parallelization. Global problems, on the other hand, where handled within a global context. This approach not only allows for efficient robust parallel execution, it also allows for minimal communication and friction.
Given that all is communicated through facts, the execution ends up being in-database fact manipulation with some Java execution.
Hal makes the point that while declarative systems (enabled by rules-based approaches) do not perform any worse than procedural systems, they exhibit much better representation and management characteristics. Amen!
In terms of the implementation, Hal described the work that needed to go into integrating Jess and Coherence. In essence the work consisted in having a Coherence continuous query on the cache to trigger the engine. All the state remained stored in the cache.
He explored the use of backward chaining to select which facts to consider – ie modifying the continuous query on the cache – this is pretty cool.
The approach actually went further, leveraging rules for the configuration, the workflow, etc… Declarative rules-driven everything!
Hal did need to make 1 key modification to Jess – essentially adding a GUID to uniquely identify facts in the cache.
Some metrics corresponding to the tests made:
– EC based cloud deployment
– 500 VMs with 10 system VMs
– 5000 managed processes
– 15% CPU load on active system processes, 2% network bandwidth (very good), and 1% of host CPU for managed nodes.
The system has not yet gone into production.
Hal walked us through a few future considerations for this system. This include (not restricted to):
– Implementation of ECA state diagrams into rules
– And… dynamic provisioning of rules set (debugging, analysis, monitoring) – which would be a very interesting development.
More futuristic, although interesting: taking this same model and apply to BPEL/SOA.
Mark Proctor – using the most technology during the show (projection control, laser pointer, etc…) – focused on improving rule systems, in particular through combining technologies and making them massively parallel. Mark is a very passionate engineer – investing a lot of energy in Drools and making progress in engine technology. To start: “Rule engines suck. How can we make them suck less?”. Why do they suck in Mark’s opinion? – There are too many limits to the expressiveness of rules – It’s too difficult to control execution flow when it’s needed – There is too much ambiguity and complexity (things like infinite loops through data changes and rules firing, etc). He also added a number of things that I think have nothing to do with rules: transactions for example. Those are concerns that are in my opinion external to rules engines. He is talking essentially about production rule systems, of course. The examples he gives use the usual old-style notation that has not evolved – but only hard-core rules engine folks use that type of syntax and work at that level. You would expect me to react. I like Mark, but, come on, there were some of us doing some work in this space between the early guys and Drools, and we did do some good things. Stating that nothing has changed in the last 30 years is a good provocation artifact, but I think it is unfair. And I am also a programmer, I still program a lot. When we designed Blaze Advisor in the mid to late ’90s, it included a high level syntax (contrast SRL and the syntax used by other engines), included full support for execution flow control, function support, included support for open world hypothesis (support not just for “unknown” but also for “known”, “unavailable”, “available”, etc…), included forward chaining through Rete and opportunistic backward chaining, etc… The product also included formal static analysis – based on flow control and canonical representations – to identify not just firing cycles, but logic completeness and overlap issues, etc… It even ended up with multiple engines under a single framework – Rete, sequential, constraints, predictive model executor,… When we started selling Blaze Advisor, we sold it as an engine with an IDE – no server deployment, no repository, no templates, no business user interface – and we sold to technical people, lots of hardcore AI folks that we had to convince to pay a lot of money for rules engine technology. So, sorry, there has been a lot of progress, rewarded by significant market success, in engine technology. Of course there is space for improvement, but I wanted to make the point with respect to Mark’s flamboyant statement. I appreciate Mark’s motivation – keep improving the engine technology, but we can also do it by taking into account the progress that has been made. Going back to the presentation. Mark spent some time talking about the complexity introduced by salience based conflict resolution. Yes, that is a key problem in applications. His proposal is to use declarative logic to handle the cases in which there is conflict – this approach is similar in concept to what Soar does (as presented yesterday). He then made the parallel between rules and queries and gradually moved into an explanation on how functional programming concepts can enrich rules engine. This is the right direction, and one that is also taken by much more mainstream efforts, such as the group responsible for C# – witness the introduction of things like lambdas, closures, and cool things like LinQ. He finally spent time talking about some of the aspects in parallelizing engine execution and operating with sets (witness the work on MVCC). Both are important, and I do agree that changes in engine technology to properly address them is necessary and will introduce significant improvements, extending their applicability. For having looked at some of these things in the past, I can confirm that what Mark emphasizes in terms of implementation complexity is absolutely true – these are tough problems to crack properly. So… while I reacted to the reasons given on why current rule engines suck, I am glad that passion is going into making them better.