Call to Action

Webinar: Take a tour of Sparkling Logic's SMARTS Decision Manager Register Now

Origin of the Rete Algorithm

Written by: Carole-Ann BerliozPublished on: Feb 21, 20119 comments

To compensate for the “fun break” in my Valentine posting, I decided to tackle a much more technical and deep subject: The Rete Algorithm, and in this post more specifically, the origin of the Rete Algorithm.

As you probably know if you have read any Rules material, Rete is the dominant algorithm out there.  There are actually 2 schools: those that believe in Inference and those who don’t.  Rete is the foundational algorithm for executing rules in Inference mode.  I discussed in an earlier post the difference and importance of the various algorithms used in commercial BRMS.  You certainly do not need to know how it works to use those kinds of technologies — you do not need to understand mechanics to drive a car.  That being said, I know that some curious minds out there do enjoy a little sneak peek under the hood now and then.   In today’s post, I will focus on the origin of the Rete Algorithm.  In part 2, I will explain how it really works.

The Matrix

My First Encounter

Dr Charles L. Forgy developed this infamous algorithm in the late 70’s but it took a little while for the algorithm to become widely popular, when Business Rules technology finally emerged.  I do not recall that Rete was taught at school back then when I got my Masters in Computer Science.  We did a fair amount of Artificial Intelligence though.  When I was a young consultant, I got my hands dirty with Expert Systems before I was introduced to ILOG Rules.  As I joined the company, I became a big fan of the technology and eventually JRules when it was released.  I remember when Changhai Ke in person came to the US to teach us the Rete algorithm and I finally learned the internals of this great technology.  Of course ILOG, Blaze and most inference engine providers have made slight modifications to the algorithm to improve its speed, or sometimes add capabilities.  Unfortunately you will have to wait until my next post to dive deep in the algorithm.

What surprised me the most though through the years is that there is actually little literature on the algorithm itself.  I was shocked when I joined Blaze Software and only a handful (if that many) knew how it actually worked inside.  As I finally accepted, you do not need to know how it works to use it but I thought that, as vendors, we had a duty of knowing “mechanics”.  It was over 10 years ago and I was still young and purist back then.

About Rete

Rete was the product of Charles’s work at Carnegie Mellon University.  While the algorithm has been refined from the original working paper to his PhD thesis, we often reference the final paper published in 1982.  The genius of the approach is to keep in memory the “status” of individual condition evaluations removing the need for constant calculations.  As a result, rules project that are by definition pattern-matching intensive can be evaluated much faster, especially as new facts need to be propagated, than using a systematic approach.

“Inference” projects are the ideal use case for this approach since they make interrelated decisions — rule1 (High Spender) implies rule2 (VIP Customer) which implies rule3 (Free Shipping), etc. — that would be tedious to recompute constantly as we look at facts.  In this example, a new fact of a significant purchase return might affect the initial decision eventually leading to shipping charges to be added to the invoice.  This is a simplistic example since we typically have all facts upfront in a checkout service.  Most academic examples of inference projects are more complicated to describe in business terms as they involve monkeys trying to reach a banana by jumping on tables and chairs, or guests to seat next to each other based on interest.

The best usage of the Rete network I have seen in a business environment was likely Alarm Correlation and Monitoring.  This implies a “stateful” kind of execution where alarms are received over time as they occur.  When you consider that the actual Telecom network is made of thousands of pieces of equipment to keep in “mind” while processing the alarms, there is no wonder that Rete outperforms brute force.  When one alarm is raised on one Router, the Rete network does not have to reprocess all past events to realize that we reached the threshold of 5 major alerts on the same piece of equipment and trigger the proper treatment, eventually providing a probable diagnostic.  Hours of processing time turned into seconds in Network Management Systems.  Fabulous.

All Business Rules projects do not require inference though.  When you have all data known upfront and there is very little back-and-forth between the rules, then sequential maybe as fast or faster than Rete.  When Inference is needed though, per a French saying “y a pas photo” — meaning literally “no need for a picture”, there is no doubt.

Origin of the Name

I heard many different origins for the name of the algorithm.  They all refer to the structure created in memory, the Rete Network, that looks like a complicated mesh of nodes and links.  I wonder if you will guess which one is the real one! I will publish the real answer given to me personally by Charles in the next post!

Technically the answer is not an acronym so the right way to refer to RETE is not in all-caps.  So I should correct it and from now on refer to it as “Rete”.  I have done better over the years but I still like uppercase!

Many Retes

The original Rete algorithm evolved over the years.

Most BRMS vendors developed their own algorithm that are known in the elite circles as XRete or uni-rete, which involved mostly performance improvements and built-in support for collections for example.

Rete 2 is the version that Charles developed in his own engine called OPS/J.  The performance improvements broke the legendary “Rete wall” which referred to the dramatic performance degradation when the number of objects in working memory increased.

Rete 3 is the evolution of Rete 2 developed for RulesPower and subsequently integrated with Blaze Advisor.  I am not at liberty to tell you much about the trade secrets in there.  To avoid any wrath from FICO, I will not say anything.

So what is next?

We announced in our blog last year that Charles had come up with yet another variation on the Rete algorithm that dwarfs the performance of its predecessor.  You might wonder why he did not call it Rete 4, following the established naming scheme…

Was that in celebration of Windows-NT which broke also the number sequence previously established?

I will not make you guess this time!  Rete 4 is the name of a private Italian television channel and was not available for grab without copyright infringement at worst and confusion at best.  For some time, Charles referred to the algorithm as TECH but acknowledged that it was not Google-able so decided to rename it Rete-NT.

I am looking forward to part 2 now…  Talk to you soon!

9 thoughts on “Origin of the Rete Algorithm

  1. Hi Carole-Ann,

    Thank you for mentioning me in your post.

    It is a shame to see that the rule-based technology has matured (I hope it has) almost with no support from the academic community.

    Back 20 years ago, there were many papers, some published variations of RETE, some others challenged RETE. Just to cite a few names: TREAT, Lazy matching, agenda-less optimization, etc.

    It is true that one does not have to know the internals to use a rule engine. Yet it was frequent to see articles proposing to re-order the conditions or the tests to make the RETE better (you will surely tell us later what is a better RETE). Some have proposed to read execution results to feed the optimizations. All these are so heuristic that we end up by telling the users how to optimize a RETE manually. 20 years after, it’s not so clear that we have made significant progress in this area. I’m wondering what the old rules guys are doing now. Do they follow the CEP trend, prefer the predictive analytics, or simply want to manipulate more maths with the optimization techniques?

    I will not reveal too much and reserve some to comment your future posts.

    Changhai Ke

    1. Hi Changhai,

      It is my pleasure. I should be the one thanking you for teaching me Rete in the first place and getting me hooked on those technologies!

      You are quite right on the lack of research on Rete. Most of the improvements I know of were made either by Charles Forgy himself or rules vendors like yourself or Carlos / Johan. Academic research has been focusing much more on other aspects of Decision Management like analytics or semantic web. The few academic speakers at Rules Fest seemed to apply Rete or other algorithms to a specific purpose.

      Should we infer that Rete is mature enough and so widely accepted that there is no justification for more research?

      Changhai, you should take that as a compliment!

      I think you might be right though that clever & powerful algorithms get less emphasis in research labs. France is one country that usually values algorithms but most of the world prefers math. Do you remember how much skepticism we face with Constraint Propagation algorithm versus traditional linear programming?

      I look forward to your comments in our Part 2! I think we might even need a Part 3 contributed by Carlos to satisfy your expectations.



  2. Hi Carole-Ann,

    I have to agree with you that few people understand how Rete works. In fact, even if I google Rete today, there are very few materials detailing the internal of Rete, even fewer with good examples. Your series is going to be a big help.

    To be honest, in the (business) decision management domain, where we implement and automate business policies and decisions, I hardly read a good example where inference is required. As you rightfully pointed out, if we have the facts upfront, and there is no new/removed/modified facts during decision making (meaning, stateless), and it’s important to make a snap decision in a limited timeframe, sequential is probably much faster. I’ll be interested if you can give more concrete examples where Rete is the choice in business world.

    Now, on monitoring applications, I can clearly see inference dominates. It’s a world that decision is not based on a snapshot of facts, but on a series of events, over time. How is inference used in CEP? I suppose CEP is a great transportation + framework for such monitoring applications, is Rete THE choice of pattern matching in CEP?



    1. Hi Kenny,

      Indeed, in most transactional business applications, Rete is an overhead. That being said, if you have a collection of facts to take into consideration for all transactions — and those facts seldom vary but more frequently than your code, very much like your business logic — then Rete can save you time as the state of the Rete network can be persisted and reused across transaction.

      Let me pick a good example to illustrate… Product configuration for example. You might have a product configuration to validate with numerous parts and pieces. I have seen over the years such applications for physical goods like chemicals or trucks as well as services such as a life insurance policy or a complex Telecom offering (Internet / Voice / Mobile). In that case, there are often numerous conditions that you might want to pre-assess on compatible / incompatible / required components regardless of the actual product order you will process. Rete implementations out there are quite good at doing that.

      Regarding CEP, this is a very a-propos comment. CEP is often referred to as a technology when it is more of a domain.

      There are Rete-based CEP implementations out there. If I recall Mark proctor presented on this very subject at a past Rules Fest event. Or maybe it was Charles Forgy in his talk on Closely Coupled Parallel Rulebases… Anyway, there are many technologies involved for the purpose of Event Correlation / Event Processing and Rete certainly brings some good features to the table.

      Thanks, kenny!

  3. The way I describe Rete in a technical nutshell is that it’s an exhaustive partial inference cache. Of course, that doesn’t help most non-programmers. 😉

    @Kenny, regardless of speed, and regardless of changing (stateful) facts, the largest advantage of rule-writing is to be able to express, understand, and *maintain* complex tests much easier than if they’re baked into an imperative code block.

    1. David,

      No argument on expressiveness of rules. It is indeed the largest benefit. 🙂

      One of the few examples I can think of that benefits from inference is aggregation. For example, I want to reward my son if he gets A for at least 2 subjects on his report card. In inference, I can write:

      If Subject.Grade = A Then increment counter
      If counter >= 2 Then reward

      Same logic in sequential will probably be:

      If number of subjects with grade A >= 2 Then reward

      Then, the aggregation will be implemented in imperative code, behind the scene, outside the rules.

      Is it what you had in mind?



      1. (Just joined group, so I may need to catch up on some terminology used here.)

        To me, saying:
        If count (grade of subject == A) > 2 Then reward
        is Declarative.

        If grade of subject == A Then increment Counter
        If Counter > 2 Then reward
        is Procedural

        At some level they are the same but the second approach seems to me like a domain expert asked to think like a procedural programmer.

        I’m probably not quite understanding your point or being nit-picky. I’ll have to read more in the group for the terminology on inference, imperative, aggregation, etc.

  4. Interesting discussion.

    People who use a rule engine do not need to know in detail the algorithm behind, either it is a RETE, its variations or another one. When the developer is requested to optimize a rule base, then he/she may need to understand the execution behind the scene, and play with different optimization factors.

    To complement, I think that there are 2 important aspects in using a rule engine (an inference engine):

    The first is the “programming paradigm”. You can add rules in disorder, you can add exceptional rules. The rules and the data are not (or weakly) coupled, rules become an aspect of the data processing. The rule base provides you a reactive processing of the changes occurred on the data.

    Then the level of “rule language”. There is of course a question of expressiveness, or the power of the rule language. There is this question of whether it is easy to understand the rules by reading them. In this regard, we can afford that the language be a bit technical, such as the usage of “.” for fields, while staying relatively understandable. We do not necessarily have to turn the whole rule into a smooth English sentence (which BTW can often reveal to be fuzzy). A rule language like the old OPS is not wished neither, it was concise, but too much cryptic. (The reverse question of this being “how to parse natural language text to infer rules”, which is a very pertinent question and very challenging).


  5. For a retired engineer, This was old home week. I’ve worked with forward chaining and backward chaining enginers. The rule based solutions and have an understanding of how the RETE network operated and how it was influenced was a big fouse for me.. This was a BIG thing in 1986. Dave Ferucci (RPI) worked on one of our projects and I learned a lot from my grad student. Sadly, this was at a time when a Corporation (Digital) was down-sizing. Just found this thread and enjoying the discoveries.

Please share your thoughts on this post:

Your email address will not be published. Required fields are marked *

 2019 SparklingLogic. All Rights Reserved.