One of the subjects that frequently comes up when considering decision engines is performance, and, more broadly, the performance characterization of decisions: how do decision engines cope with high throughput, low response, high concurrency scenarios?
In fact, the whole world of forward-chaining business rules engines (based on the RETE algorithm) was able to develop largely because of the capability of that algorithm to cope with some of the scalability challenges that were seen. If you want to know more about the principles of RETE, see here.
However, there is significantly more to decision engine performance than the RETE algorithm.
A modern decision engine, such as SMARTS;, will be exercised in a variety of modes:
- Interactive invocations where the caller sends data or documents to be decided on, and waits for the decision results
- Batch invocations where the caller streams large data or sets of documents through the decision engine to get the results
- Simulation invocations where the caller streams large data and sets of documents through the decision to get both the decision results and decision analytics computations made on them
Let’s first look at performance as purely execution performance at the decision engine level.
The SMARTS; decision engine allows business users to implement decisions using a combination of decision logic representations:
- Decision flows
- Rules groups in rule sets
- Lookup models
- Predictive models
These different representations provide different perspectives of the logic, and the most optimal representation for implementing, reviewing, and optimizing the decision. For example, SMARTS allows you to cleanly separate the flow of control within your decision from your smaller decision steps – check the document data is valid first, then apply knock out rules, then etc.
However, SMARTS does also something special for these representations: it executes them with dedicated engines tuned for high performance for their specific task. For example, if you were to implement a predictive model on top of a rules engine, your result will typically be sub-par. However, in SMARTS, each predictive model is executed by a dedicated and optimized execution engine.
Focusing on what is traditionally called business rules, SMARTS provides:
- A compiled sequential rules engine
- A compiled Rete-NT inference rules engine
- A fully indexed lookup model engine
These are different engines, and apply to different use cases, and they are optimized for their specific application.
Compiled Sequential Rules Engine
This engine simply takes the rules in the rule set, orders them by explicit or implicit priority, and evaluates the rule guards and premises in the resulting order. Once a rule has its guard and premise evaluated to true, it fires. If the rule set is exclusive, the rule set evaluation is over, and if not, the next rule in the ordered list is evaluated.
There is a little bit more than that to it, but that’s the gist.
The important point is that there is no interpreter involved – this is executed in code compiled to the bytecode of the chosen architecture (Java or .NET or .NET Core). So, this executes at bytecode speed and gets optimized by the same JITs as any code in the same architecture.
This yields great performance when the number of transactions is very large, and the average number of rules evaluated (i.e. having their guards and premises evaluated) is not very large. For example, we’ve implemented batch fraud systems processing over 1 billion records for fraud in less than 45 minutes on 4 core laptops.
When the number of rules becomes very large, in the 100K+ in a single rule set, then the cost of evaluating premises that do not match starts getting too high. Our experience is that is very likely that with that number of rules your problem is in fact a lookup problem and would be better served by a lookup model(As an aside, lookup models also provide superior management for large numbers of rules). If that is not the case, then a Rete-NT execution would be better.
Compiled Rete-NT Inference Rules Engine
This engine takes the rules within the rule set and converts them to a network following the approach described in this blog post. What this approach does is revert the paradigm – in RETE, the data is propagated through the network, and the rules ready to fire are more optimally found and put in an agenda. The highest priority rule in the agenda is retrieved, and executed. The corresponding changes get propagated into the network, the agenda updated, etc., until there is nothing left in the agenda.
One important distinction with respect to the sequential engine is that in the case of the Rete-NT engine, a rule that already fired may well be put back in the agenda. This capability is sometimes required by the problem being solved.
Again, there is much more to it, but this is the principle.
SMARTS implements the Rete-NT algorithm – which is the latest optimized version of the algorithm provided by its inventor, Charles Forgy, who serves on the Sparking Logic Advisory Board. RETE-NT has been benchmarked to be between 10 and 100 times faster than previous versions of the algorithm in inference engine tests. In addition, SMARTS compiles to the bytecode of the chosen architecture everything that is not purely the algorithm, allowing all expressions to be evaluated at bytecode speed.
In the case where your number of rules per rule set is very large, in the 100K+ range, and you are not dealing with a lookup model, the RETE-NT engine yields increasingly better performance compared to the sequential engine. SMARTS has been used with 200k+ rules in rule sets – these rules end up exercising 100s of fields in your object model, and the Rete-NT advantages make these rule sets perform better than the pure sequential engine.
Fully Indexed Lookup Model Engine
There are numerous cases where what a rule set with a large number of rules is doing is selecting out of a large number of possibilities. For example, going through a list of medication codes to find those matching parametrized conditions.
In many systems, that is done outside a decision engine, but in some cases, it makes sense to make it part of the decision. For example, when it is managed by the same people and at the same time, it is intimately tied to the rest of the decision logic, and it needs to go through its lifecycle exactly like the rules.
SMARTS provides a Lookup Model specifically for those cases: you specify the various possibilities as data, and then a query that is used to filter from the data the subset that matches the parameters being used. At a simplified level, this engine works by essentially doing what database engines do: indexing the data as much as possible and convert the query into a search through indexes. Because the search is fully indexed, scalability with the number of possibilities is great, and superior to what you can get with the Sequential or Rete-NT engine.
The SMARTS decision engine can also run concurrent branches of the decision logic, implemented in any combination of the engines above. That allows a single invocation to the decision engine to execute on more than 1 core of your system. There are heavy computation cases in which such an approach yields performance benefits.
Of course, performance is also impacted by deployment architecture choices. We’ll explore that topic in the next post in this series.
The Rete Algorithm Demystified blog series attracted a huge crowd. I want to thank you for your readership! Following the popular demand, let me continue on the series and add a few words on the latest and greatest Rete-NT.
Well, this is where I can’t say much without violating Charles’s trade secrets… Sorry!
So what can I share?
For the evolution of the Rete Algorithm, Charles has focused primarily on runtime performance, looking for ways to accelerate rule evaluations and reduce memory usage. Mission accomplished.
Faster: With Rete III, the speed increase came with the ability to efficiently handle a larger number of objects per transaction. With Rete-NT, the speed increase came from optimizations on complex joins in the Rete network. As described in part 2, the discrimination tree performs a product of object lists. The list of all drivers satisfying x,y,z requirement is combined with the list of all vehicles matching some other requirements, for example, producing the cartesian cross product. The more patterns, you add, the more joins will be added. This has been referred to as the problem of multi-patterns. The combinatorial explosion is kept under control with the latest algorithm, in a dramatically different way than previously attempted, achieving unprecedented performance. This algorithm shines when business rules involve complex conditions, which tends to be the case in real-life applications.
Slimmer: This is related to the complex join speed increase. The less combinatorial explosion, the less memory usage. It is actually a lot more sophisticated than that, but I am unfortunately bound to secrecy… The most important thing to remember is that memory usage goes down quite a bit. This is a concern that software architects can truly appreciate!
James Owen scrutinized the actual performance increase and published the results in InfoWorld last year. Although the overhead may slow down a tiny bit the performances for overly simple tests, the performance gain is huge: an order of magnitude faster than the previous generation.
Does performance matter?
Most rules engines have achieved some excellent levels of runtime performance, so performance for the sake of performance is not an objective in itself.
I am excited about Rete-NT because it improves performance where it is needed. Previous generations of Rete engines put pressure on rules writers to design rules that avoid as much as possible multi-patterns. This algorithmic innovation removes a painful hurdle, or at least move the boundary. In my career, especially in the past 2 years at Sparkling Logic, I have come across new use cases that do require more flexibility, more expressibility that would be hard to cope with using less efficient algorithms. One thing we can always seem to be able to count on is for complexity to increase…
How does that compare to non-inference engines?
You can over-simplify the inference versus compiled sequential debate by saying that:
- Rete shines when the number of rules is large and the number of objects in memory is small
- Non-inference shines when the number of rules is small and the number of objects in memory is large
Rete-NT changes the game a bit by expanding the scope of problems that can be handled effectively. As a result, non-inference engines dominate a smaller and smaller number of use cases, while Rete keeps its lead on large rulebases.
In part 1 of this blog series, I covered the Rete Algorithm and its origin and even the origin of its name. Now as promised, I am going to explain how it works. The Rete Algorithm is one of several decision engine algorithms used in modern rule engines such as Sparkling Logic SMARTS, to learn more about decision performance and the decision engines that drive today’s automated decisions see our recent blogpost, Decision Engine Performance.
It is a challenge to explain how the Rete algorithm works simply, so as not to lose our less technical audience and yet keep things interesting enough for the techies. I hope I will reach that goal. And without further ado, let me introduce the Rete algorithm!
How it works
The brilliant idea that Dr. Charles Forgy developed was to decouple the evaluation of hypothesis from execution sequencing.
Why is that brilliant you might ask? When large volumes of business rules need to be assessed against facts or data, the constant screening for applicability can be costly — both in terms of evaluation (and re-evaluation) and in terms of ordering the statements properly. This innovative approach for inference allows savings in both areas. Let’s explore how.
In absence of this algorithm, or derivatives, the traditional approach is to sequence and nest the IF-THEN-ELSE statements of the rules in such a way that you capitalize on the test evaluations — you reuse them as much as possible. When you think about it, it would be akin to creating the leanest decision tree possible, in a graphical form or more likely in straight code. This “sequential” approach breaks (mostly in terms of performance) when rules need to execute as a result of the execution of other rules. This is what we call inference. In a sequential approach, you would have to restart the complete ruleset evaluation after each rule execution (if rules are prioritized) or after a full pass through the ruleset (in absence of prioritization). Keep in mind too that prioritization might prevent you from reusing test evaluations all together in a sequential approach: if the conditions are not co-located in the logic, then you will likely have to perform those tests multiple times. Systems have become much more sophisticated than they used to be back then, and most BRMS products provide a sequential deployment option which is still pretty good when inference is not needed. The beauty of those products is that you can still manage the rules independently of each other and the sequential engine takes care of optimizing the generated spaghetti code.
For illustration, I’ll use a simple set of rules that we are mostly familiar with: airline miles calculations. The decision service is responsible for processing the frequent flyer’s activity.
if award miles for last year or current year > 25,000 then status = Silver
if award miles for last year or current year > 100,000 then status = Gold
If flight is less than 500 miles then award 500 miles
If flight is 500 miles or more then award flight miles
if category is business or first then award 50% bonus miles
if status is Gold and airline is not partner then award 100% bonus miles
if status is Silver and airline is not partner then award 20% bonus miles
if member signed up for 3-flights-for-5k-in-March and number of return flights in March 2011 = 3 then award 5,000 additional miles
if status becomes Gold then award 8 upgrade certificates
I will live it up to the creative minds of the Airline marketers to add more promotions to our short list here: hotel or rental car bonuses, incentive programs, lifetime members, accelerator programs, etc.
This example does not use a lot of inference but that is enough to illustrate our point. If you run through the rules, sequentially — let’s keep the current order — and this Frequent Flyer member reaches Gold status in that very transaction, we need to process the rules once more to update the status and award the proper bonus miles and other incentives.
Inference engines assess all the applicable rules and then fire the first applicable rule, propagate the changes and fire again. How does that work?
Constructing the Rete Network
The Rete network is the heart of the Rete algorithm. It is made of nodes that each hold a list of objects that satisfy the associated condition. The original Rete algorithm worked out of facts but I will simplify the description to refer to objects since all commercial engines have evolved to be object-oriented nowadays.
The Rete network is constructed when you compile a rules project — only once when you start that service (or class for simpler designs) — and then shared across all invocations.
The discrimination tree is the first part of the Rete network. It starts with Alpha nodes that are associated with Classes (in the object-oriented sense). All instances of a given class will be listed in any given Alpha node. The discrimination happens by adding conditions as single nodes attached to the Alpha node or to another parent node.
Let’s review what that means for our Airline example:
Alpha nodes are created for each class: Frequent Flyer Account and Flight.
Conditions are then appended:
- Frequent Flyer
- status is Gold
- status is Silver
- award miles for last year or current year > 25k
- award miles for last year or current year > 100k
- signed up for “3-flights-for-5k-in-March” promotion
- number of flights in March = 3
- miles >= 500
- miles < 500
- airline is not a partner
- airline is partner
- category is business or first
Each node represents an additional test to the series of conditions applied upstream. If you follow a path from top to bottom, you should be able to read the complete set of conditions THAT APPLY TO ONE GIVEN CLASS OF OBJECTS.
Finally the nodes are connected across classes. This is where we can combine the conditions for a non-partner flight taken by a Gold member. We call those “joints” as we will combine the list of objects that verify conditions on one branch with the list of objects that verify the conditions on another branch.
On the performance side, you may have been warned in the past not to combine too many patterns in a single rule. It is clear looking under the hood that the product of all possible accounts by all possible flights could result into a combinatorial explosion. The more discrimination upfront, the better obviously.
The path eventually end with the action part of the rules. The content of the actions is irrelevant for the Rete network. you could replace the labels here with the name of the rule (I did not name the rules in our example so I displayed the actual actions). The rules can be reconstituted by following the incoming paths. All nodes are AND-ed. This is an interesting point. ORs are typically not natively supported: rules are duplicated for each OR-ed flavor. If you were to add a rule to grant the same bonus for SILVER or GOLD customers traveling to Hawaii then you would just have nodes connected to the existing GOLD and SILVER nodes as if they were written as separate rules. In terms of performance or maintenance, it does not matter though since the Rete algorithm handles the supposed duplication as efficiently as any manual code would.
It is a great time to emphasize the great savings you would get when the number of rules increases. Let’s say that you want to introduce a new promotion for GOLD customers to/from specific destinations or for reached milestones (50k and 75k award miles). The nodes that test for GOLD status do not need to be created or duplicated. The test as well as the execution of the test is leveraged and reused transparently regardless of the order in which the rules need to be executed.
Rete Cycle: Evaluate
Let’s look now at what happens at runtime when you invoke that service with customer data. Your design will dictate whether those transactions are applied “real-time” (or on-demand) or whether you want to execute a batch of transactions at once, at night when you have access to your Mainframe window for example. Rules can actually be used either way and in many projects, they are actually used both ways by two different systems. For an Insurance underwriting project, you might deploy the same rules online in your self-service website as well as your older batch application that processes the applications coming from your brokers.
Let’s assume that we are processing transactions one by one for simplicity.
The evaluation phase consist in running the data through the Rete network to identify the applicable rules, for which all conditions are satisfied. The nodes in the network will hold lists of objects that satisfy the conditions in the incoming path.
In our airline example, Joe flies from Washington DC (IAD) to San Francisco (SFO). Flight accounts for 2,419 miles I recall. Starting with already 150k award miles, Joe should bank double miles (4,838) for his GOLD status. How does Rete propagates those facts to make the same decision?
The Account Alpha node references Joe as a Frequent Flyer. Because of his 150k award miles, he also qualifies for the “> 100k” condition. He is referenced in this node too. My handwriting did not allow me to write down Joe in the small real estate so I just highlighted the node with a X. In reality, Joe would be referenced specifically as we need to keep track of who qualifies in case we process multiple objects at once. For this exercise, I also assume that service does not store the GOLD status and computes it every time so the associated node still has an empty list.
Similarly, the IAD-SFO flight is referenced in the list of all flights in the transaction. The flight is more than 500 miles and on the airline. The associated lists reference the flight accordingly.
At that point in time we do not have any joint to worry about since the account status is not known yet to qualify for GOLD.
2 rules are satisfied. All facts have been propagated. We are ready for execution.
Rete Cycle: Execute
The rules for which all conditions are satisfied are said to be active in the agenda. The agenda contains the list of all rules that should be executed, along with the list of objects that are responsible for the conditions to be true. I insist on the word “SHOULD”. The agenda will sort them according to their priorities (and other conflict resolution methods). If the execution of a rules invalidates a rule with a lower priority, then this second rule will logically never execute. This would have been the case if Joe had started with 99k award miles – he would have qualified for the SILVER bonus until we bank the IAD-SFO miles and granted him GOLD status. Joe can qualify for SILVER or GOLD bonus miles but certainly not both.
Priorities are a touchy subject in the Rules community as some purists are set against its use.
The agenda lists the 2 rules (GOLD status and 500+ Flight Miles). The first one is executed. In our airline example, order does not matter for those 2 rules. Let’s assume that flight miles are awarded first. Joe gets 152,419 miles in his account so far.
Do it again and again…
After the execution of the first rule, facts are propagated again. Flights have not changed so none of that subtree is re-evaluated. Joe’s account has changed so the conditions related to award miles are re-assessed, with no change in our example. The GOLD status rule remains active in the agenda.
We are ready to execute the following rule: GOLD Status which is the only rule left in the agenda.
As a result, the GOLD Status node is updated for Joe, which leads to the GOLD status and not partner node to update as well, which leads to the 100% GOLD bonus rule to become activated in the agenda.
In the end, Joe gets the proper credit of 154,839 in his award miles account.
As you add more rules that apply to GOLD or SILVER customers, the RETE network will grow always reusing the same nodes as much as possible — which is the part that takes time to do properly by hand if you are still coding it manually. Those spaghetti in the “Joint” part of the algorithm are a nightmare to maintain properly and can cost you significant debug time. This trivial example can become quickly messy when you consider the many different ways you can qualify for status with eligible segments rather than eligible miles, with accelerators, with lifetime miles, as well as the incentive promotions that you might get by default and those that you need to sign up for. Reusing those conditions — and their execution — regardless of when and where they show up is a performance boost that largely compensate for the initial overhead when inference is needed.
When inference is not needed and you only go through one cycle of propagation, it is debatable of course.
In my opinion, Charles opened the door for an algorithmic solution to properly ensuring the enforcement of many rules in an uncertain order. In other words, he created an approach where very large volumes of rules could be assessed in a short time — much faster than any other algorithms when rules need to be executed repeatedly on a given data set. The further performance improvements he introduced in Rete II, Rete III and lately in Rete-NT have also changed the initial “ideal” characteristics of a Rete project to better support much larger data set — what was marketed as the Rete wall by the non-Rete vendors.
It is somewhat surprising that no new algorithm break-through (other than Charles’s) were made since then. The basis for Business Rules execution is vastly based on Rete, or sequential, but hardly anything else. It reminds me of the sharks and their evolution. I remember, in my life science days, learning that sharks have not evolved at all in a very long time… likely because they are perfectly adapted to their environment.
Editor’s Note: This post was originally published in March 2011 and has been updated for accuracy and comprehensiveness.
In Part 1 of this blog series, we introduced the Rete Algorithm and its origin. I ran a poll on the origin of the Rete name. Thank you all for your participation!
Let me now share the results of our little poll on the origin of the name……………………
Let me remind you the options:
- RETE – per the Latin word for net
- RETE – per the Latin word for comb, which refers more precisely to the first part of the network called the discrimination tree
- RETE – per the Latin word for comb, in that case referring to the complex structure of a honeycomb
- RETE – per the Italian word for network or fishing net
- RETE – per the English word for an anatomical network of blood vessels and nerve fibers
- RETE – per the English word for the pierced plate on an astrolabe, having projections whose points correspond to the fixed stars
- RETE – as Rete California, Sonoma County’s ultimate shopping destination for guys and gals with discriminating taste in designer brands, referring to the complex network of shops
And the answer is in this short video! Do check it out as the answer is going to surprise more than 9 out 10 of the respondents!
I want to thank Michael for his Xtranormal skills.
To compensate for the “fun break” in my Valentine posting, I decided to tackle a much more technical and deep subject: The Rete Algorithm, and in this post more specifically, the origin of the Rete Algorithm.
As you probably know if you have read any Rules material, Rete is the dominant algorithm out there. There are actually 2 schools: those that believe in Inference and those who don’t. Rete is the foundational algorithm for executing rules in Inference mode. I discussed in an earlier post the difference and importance of the various algorithms used in commercial BRMS. You certainly do not need to know how it works to use those kinds of technologies — you do not need to understand mechanics to drive a car. That being said, I know that some curious minds out there do enjoy a little sneak peek under the hood now and then. In today’s post, I will focus on the origin of the Rete Algorithm. In part 2, I will explain how it really works.
My First Encounter
Dr Charles L. Forgy developed this infamous algorithm in the late 70’s but it took a little while for the algorithm to become widely popular, when Business Rules technology finally emerged. I do not recall that Rete was taught at school back then when I got my Masters in Computer Science. We did a fair amount of Artificial Intelligence though. When I was a young consultant, I got my hands dirty with Expert Systems before I was introduced to ILOG Rules. As I joined the company, I became a big fan of the technology and eventually JRules when it was released. I remember when Changhai Ke in person came to the US to teach us the Rete algorithm and I finally learned the internals of this great technology. Of course ILOG, Blaze and most inference engine providers have made slight modifications to the algorithm to improve its speed, or sometimes add capabilities. Unfortunately you will have to wait until my next post to dive deep in the algorithm.
What surprised me the most though through the years is that there is actually little literature on the algorithm itself. I was shocked when I joined Blaze Software and only a handful (if that many) knew how it actually worked inside. As I finally accepted, you do not need to know how it works to use it but I thought that, as vendors, we had a duty of knowing “mechanics”. It was over 10 years ago and I was still young and purist back then.
Rete was the product of Charles’s work at Carnegie Mellon University. While the algorithm has been refined from the original working paper to his PhD thesis, we often reference the final paper published in 1982. The genius of the approach is to keep in memory the “status” of individual condition evaluations removing the need for constant calculations. As a result, rules project that are by definition pattern-matching intensive can be evaluated much faster, especially as new facts need to be propagated, than using a systematic approach.
“Inference” projects are the ideal use case for this approach since they make interrelated decisions — rule1 (High Spender) implies rule2 (VIP Customer) which implies rule3 (Free Shipping), etc. — that would be tedious to recompute constantly as we look at facts. In this example, a new fact of a significant purchase return might affect the initial decision eventually leading to shipping charges to be added to the invoice. This is a simplistic example since we typically have all facts upfront in a checkout service. Most academic examples of inference projects are more complicated to describe in business terms as they involve monkeys trying to reach a banana by jumping on tables and chairs, or guests to seat next to each other based on interest.
The best usage of the Rete network I have seen in a business environment was likely Alarm Correlation and Monitoring. This implies a “stateful” kind of execution where alarms are received over time as they occur. When you consider that the actual Telecom network is made of thousands of pieces of equipment to keep in “mind” while processing the alarms, there is no wonder that Rete outperforms brute force. When one alarm is raised on one Router, the Rete network does not have to reprocess all past events to realize that we reached the threshold of 5 major alerts on the same piece of equipment and trigger the proper treatment, eventually providing a probable diagnostic. Hours of processing time turned into seconds in Network Management Systems. Fabulous.
All Business Rules projects do not require inference though. When you have all data known upfront and there is very little back-and-forth between the rules, then sequential maybe as fast or faster than Rete. When Inference is needed though, per a French saying “y a pas photo” — meaning literally “no need for a picture”, there is no doubt.
Origin of the Name
I heard many different origins for the name of the algorithm. They all refer to the structure created in memory, the Rete Network, that looks like a complicated mesh of nodes and links. I wonder if you will guess which one is the real one! I will publish the real answer given to me personally by Charles in the next post!
Technically the answer is not an acronym so the right way to refer to RETE is not in all-caps. So I should correct it and from now on refer to it as “Rete”. I have done better over the years but I still like uppercase!
The original Rete algorithm evolved over the years.
Most BRMS vendors developed their own algorithm that are known in the elite circles as XRete or uni-rete, which involved mostly performance improvements and built-in support for collections for example.
Rete 2 is the version that Charles developed in his own engine called OPS/J. The performance improvements broke the legendary “Rete wall” which referred to the dramatic performance degradation when the number of objects in working memory increased.
Rete 3 is the evolution of Rete 2 developed for RulesPower and subsequently integrated with Blaze Advisor. I am not at liberty to tell you much about the trade secrets in there. To avoid any wrath from FICO, I will not say anything.
So what is next?
We announced in our blog last year that Charles had come up with yet another variation on the Rete algorithm that dwarfs the performance of its predecessor. You might wonder why he did not call it Rete 4, following the established naming scheme…
Was that in celebration of Windows-NT which broke also the number sequence previously established?
I will not make you guess this time! Rete 4 is the name of a private Italian television channel and was not available for grab without copyright infringement at worst and confusion at best. For some time, Charles referred to the algorithm as TECH but acknowledged that it was not Google-able so decided to rename it Rete-NT.
I am looking forward to part 2 now… Talk to you soon!