One of the subjects that frequently comes up when considering decision engines is performance, and, more broadly, the performance characterization of decisions: how do decision engines cope with high throughput, low response, high concurrency scenarios?
In fact, the whole world of forward-chaining business rules engines (based on the RETE algorithm) was able to develop largely because of the capability of that algorithm to cope with some of the scalability challenges that were seen. If you want to know more about the principles of RETE, see here.
However, there is significantly more to decision engine performance than the RETE algorithm.
A modern decision engine, such as SMARTS;, will be exercised in a variety of modes:
- Interactive invocations where the caller sends data or documents to be decided on, and waits for the decision results
- Batch invocations where the caller streams large data or sets of documents through the decision engine to get the results
- Simulation invocations where the caller streams large data and sets of documents through the decision to get both the decision results and decision analytics computations made on them
Let’s first look at performance as purely execution performance at the decision engine level.
The SMARTS; decision engine allows business users to implement decisions using a combination of decision logic representations:
- Decision flows
- Rules groups in rule sets
- Lookup models
- Predictive models
These different representations provide different perspectives of the logic, and the most optimal representation for implementing, reviewing, and optimizing the decision. For example, SMARTS allows you to cleanly separate the flow of control within your decision from your smaller decision steps – check the document data is valid first, then apply knock out rules, then etc.
However, SMARTS does also something special for these representations: it executes them with dedicated engines tuned for high performance for their specific task. For example, if you were to implement a predictive model on top of a rules engine, your result will typically be sub-par. However, in SMARTS, each predictive model is executed by a dedicated and optimized execution engine.
Focusing on what is traditionally called business rules, SMARTS provides:
- A compiled sequential rules engine
- A compiled Rete-NT inference rules engine
- A fully indexed lookup model engine
These are different engines, and apply to different use cases, and they are optimized for their specific application.
Compiled Sequential Rules Engine
This engine simply takes the rules in the rule set, orders them by explicit or implicit priority, and evaluates the rule guards and premises in the resulting order. Once a rule has its guard and premise evaluated to true, it fires. If the rule set is exclusive, the rule set evaluation is over, and if not, the next rule in the ordered list is evaluated.
There is a little bit more than that to it, but that’s the gist.
The important point is that there is no interpreter involved – this is executed in code compiled to the bytecode of the chosen architecture (Java or .NET or .NET Core). So, this executes at bytecode speed and gets optimized by the same JITs as any code in the same architecture.
This yields great performance when the number of transactions is very large, and the average number of rules evaluated (i.e. having their guards and premises evaluated) is not very large. For example, we’ve implemented batch fraud systems processing over 1 billion records for fraud in less than 45 minutes on 4 core laptops.
When the number of rules becomes very large, in the 100K+ in a single rule set, then the cost of evaluating premises that do not match starts getting too high. Our experience is that is very likely that with that number of rules your problem is in fact a lookup problem and would be better served by a lookup model(As an aside, lookup models also provide superior management for large numbers of rules). If that is not the case, then a Rete-NT execution would be better.
Compiled Rete-NT Inference Rules Engine
This engine takes the rules within the rule set and converts them to a network following the approach described in this blog post. What this approach does is revert the paradigm – in RETE, the data is propagated through the network, and the rules ready to fire are more optimally found and put in an agenda. The highest priority rule in the agenda is retrieved, and executed. The corresponding changes get propagated into the network, the agenda updated, etc., until there is nothing left in the agenda.
One important distinction with respect to the sequential engine is that in the case of the Rete-NT engine, a rule that already fired may well be put back in the agenda. This capability is sometimes required by the problem being solved.
Again, there is much more to it, but this is the principle.
SMARTS implements the Rete-NT algorithm – which is the latest optimized version of the algorithm provided by its inventor, Charles Forgy, who serves on the Sparking Logic Advisory Board. RETE-NT has been benchmarked to be between 10 and 100 times faster than previous versions of the algorithm in inference engine tests. In addition, SMARTS compiles to the bytecode of the chosen architecture everything that is not purely the algorithm, allowing all expressions to be evaluated at bytecode speed.
In the case where your number of rules per rule set is very large, in the 100K+ range, and you are not dealing with a lookup model, the RETE-NT engine yields increasingly better performance compared to the sequential engine. SMARTS has been used with 200k+ rules in rule sets – these rules end up exercising 100s of fields in your object model, and the Rete-NT advantages make these rule sets perform better than the pure sequential engine.
Fully Indexed Lookup Model Engine
There are numerous cases where what a rule set with a large number of rules is doing is selecting out of a large number of possibilities. For example, going through a list of medication codes to find those matching parametrized conditions.
In many systems, that is done outside a decision engine, but in some cases, it makes sense to make it part of the decision. For example, when it is managed by the same people and at the same time, it is intimately tied to the rest of the decision logic, and it needs to go through its lifecycle exactly like the rules.
SMARTS provides a Lookup Model specifically for those cases: you specify the various possibilities as data, and then a query that is used to filter from the data the subset that matches the parameters being used. At a simplified level, this engine works by essentially doing what database engines do: indexing the data as much as possible and convert the query into a search through indexes. Because the search is fully indexed, scalability with the number of possibilities is great, and superior to what you can get with the Sequential or Rete-NT engine.
The SMARTS decision engine can also run concurrent branches of the decision logic, implemented in any combination of the engines above. That allows a single invocation to the decision engine to execute on more than 1 core of your system. There are heavy computation cases in which such an approach yields performance benefits.
Of course, performance is also impacted by deployment architecture choices. We’ll explore that topic in the next post in this series.
Integration with data is key to a successful decision application: Decision Management Systems (DMS) benefit from leveraging data to develop, test and optimize high value decisions.
This blog post focuses on the usage of data by the DMS for the development, testing and optimization of automated decisions.
Read More »
A key benefit of using a Decision Management System is to allow the life-cycle of automated decisions to be fully managed by the enterprise.
When the decision logic remains in the application code, it becomes difficult to separate access to decision logic code from the rest. For example, reading through pages of commit comments to find the ones relevant to the decision is close to impossible. And so is ensuring that only resources with the right roles can modify the logic.
Clearly, this leads to the same situation you would be in if your business data were totally immersed in the application code. You would not do that for your business data, you should not do that for your business decision logic for exactly the same reasons.
Decision Management Systems separate the decision logic from the rest of the code. Thus, you get the immense benefit of being able to update the decision logic according to the business needs. But the real benefit comes when you combine that with authentication and access control:
- you can control who has access to what decision logic asset, and for what purpose
- and you can trace who did what to which asset, when and why
Of course, a lot of what is written here applies to other systems than Decision Management Systems. But this is particularly important in this case.
Roles and access control
The very first thing to consider is how to control who has access to what in the DMS. This is access control — but note that we also use authorization as an equivalent term.
In general, one thinks of access control in terms of roles ans assets. Roles characterize how a person interacts with the assets in the system.
And the challenge is that there are many roles involved in interacting with your automated decision logic. The same physical person may fill many roles, but those are different roles: they use the decision management system in different ways. In other words, these different roles have access to different operations on different sets of decision logic assets.
Base roles and access control needs
Typically, and this is of course not the only way of splitting them, you will have roles such as the following:
The administrator role administers the system but rarely is involved in anything else. In general, IT or operations resources are those with this role.
- Decision definer
The decision definer role is a main user role: this role is responsible for managing the requirements for the automated decision and its expected business performance. Typically, business owners and business analysts are assigned this role.
- Decision implementer
The decision implementer role is the other main user role: this role designs, implements, tests and optimizes decisions. Generally, business analysts, data analysts or scientists, decision owners, and sometimes business-savvy IT resources are given this role.
- Decision tester
The decision tester role is involved in business testing of the decisions: validating they really do fit what the business needs. Usually, business analysts, data analysts and business owners fill this role.
- Life-cycle manager
The life-cycle manager role is responsible for ensuring that enterprise-compliant processes are followed as the decision logic assets go from requirements to implementation to deployment and retirement.
More advanced needs
There may be many other roles, and the key is to realize that how the enterprise does business impacts what these roles may be. For example, our company has a number of enterprise customers who have two types of decision implementer roles:
- General decision implementer: designs, implements the structure of the decision and many parts of it, tests and optimizes it
- Restricted decision implementer: designs and implements only parts of the decision — groups of rules, or models
The details on what the second role can design and implement may vary from project to project, etc.
Many other such roles may be defined: those who can modify anything but the contract between the automated decision and the application that invokes, etc.
It gets more complicated: you may also need to account for the fact that only specific roles can manage certain specific assets. For example, you may have a decision that incorporates a rate computation table that only a few resources can see, although it is part of what the system manages and executes.
Requirements for the Decision Management System
Given all this, the expectation is that the DMS support directly, or through an integration with the enterprise systems, the following:
- Role-based access control to the decision logic asset
- And ability to define custom roles to fit the needs of the enterprise and how it conducts its business
- And ability to have roles that control access to specific operations on specific decision logic assets
This can be achieved in a few ways. In general:
- If all decision assets are in a system which is also managed by the enterprise authentication and access control system: you can directly leverage it
- And if that is not the case: you delegate authentication and basic access control to the enterprise authentication and access control system, and manage the finer-grained access control in the DMS, tied to the external authentication
Of course, roles are attached to a user, and in order to guarantee that the user is the right one, you will be using an authentication system. There is a vast number of such systems in the enterprise, and they play a central role in securing the assets the enterprise deals with.
The principle is that for each user that needs to have access to your enterprise systems, you will have an entry in your authentication system. Thus, the authentication system will ensure the user is who the user claims, and apply all the policies the enterprise wants to apply: two-factor authentication, challenges, password changes, etc. Furthermore, it will also control when the user has access to the systems.
This means that all systems need to make sure a central system carries out all authentications. And this includes the Decision Management System, of course. For example:
- The DMS is only accessible through another application that does the proper authentication
- Or it delegates the authentication to the enterprise authentication system
The second approach is more common in a services world with low coupling.
Requirements for the Decision Management System
The expectation is that the DMS will:
- Delegate its authentication to the enterprise authentication and access control systems
- Or use the authentication information provided by an encapsulating service
Vendors in this space have the challenge that in the enterprise world there are many authentication systems, each with potentially more than one protocol. Just in terms of protocols, enterprises use:
- OpenID Connect
- and more
Additionally, enterprises are interested in keeping a close trace of who does what and when in the Decision Management System. Of course, using authentication and the fact that users will always operate within the context of an authenticated session largely enables them to do so.
But this is not just a question of change log: you also want to know who has been active, who has exported and imported assets, who has generated reports, who has triggered long simulations, etc.
Furthermore, there are three types of usages for these traces:
- Situational awareness: you want to know what has been done recently and why
- Exception handling: you want to be alerted if a certain role or user carries out a certain operation. For example, when somebody updates a decision in production.
- Forensics: you are looking for a particular set of operations and want to know when, who and why. For example, for compliance verification reasons.
A persisted and query-able activity stream provides support for the first type of usage. And an integration with the enterprise log management and communication management systems support the other types of usages.
Requirements for the Decision Management System
The expectation is that the DMS will:
- Provide an activity stream users can browse through and query
- And support an integration with the enterprise systems that log activity
- And provide an integration with the enterprise systems that communicate alerts
There are many more details related to these authentication, access control and trace integrations. Also, one interesting trend is the move towards taking all of these into account for the beginning as the IT infrastructure moves to the models common in the cloud, even when on-premise.
This blog is part of the Technical Series, stay tuned for more![Image Designed by security from Flaticon]
Decision Management and Business Rules Management platforms cater to the needs of business oriented roles (business analysts, business owners, etc.) involved in operational decisions. But they also need to take into account the constraints of the enterprise and its technology environment.
Among those constraints are the ones that involve integrations. This is the first series of posts exploring the requirements, approaches and trade-offs for decision management platform integrations with the enterprise eco-system.
Operational decisions do not exist in a vacuum. They
- are embedded in other systems, applications or business processes
- provide operational decisions that other systems carry out
- are core contributors to the business performance of automated systems
- are critical contributors to the business operations and must be under tight control
- must remain compliant, traced and observed
- yet must remain flexible for business-oriented roles to make frequent changes to them
Each and every one of these aspects involves more than just the decision management platform. Furthermore, more than one enterprise system provides across-application support for these. Enterprises want to use such systems because they reduce the cost and risk involved in managing applications.
For example, authentication across multiple applications is generally centralized to allow for a single point of control on who has access to them. Otherwise, each application implements its own and managing costs and risk skyrocket.
In particular, decision management platforms end up being a core part of the enterprise applications, frequently as core as databases. It may be easy and acceptable to use disconnected tools to generate reports, or write documents; but it rarely is acceptable to not manage part of core systems. In effect, there is little point in offering capabilities which cannot cleanly fit into the management processes for the enterprise; the gain made by giving business roles control of the logic is negated by the cost and risk in operating the platform.
In our customer base, most do pay attention to integrations. Which integrations are involved, and with which intensity, depends on the customer. However, it is important to realize that the success of a decision management platform for an enterprise also hinges on the quality of its integrations to its systems.
Which integrations matter?
We can group the usual integrations for decision management platforms in the following groups:
- Authentication and Access Control
- Implementation Support
- Management Audit
- Life-cycle management
- Execution Audit
- Business Performance Tracking
Authentication and access control integrations are about managing which user has access to the platform, and, beyond that, to which functionality within the platform.
Implementation support integrations are those that facilitate the identification, implementation, testing and optimization of decisions within the platform: import/export, access to data, etc.
Management audit integrations enable enterprise systems to track who has carried out which operations and when within the platform.
Life-cycle management integrations are those that support the automated or manual transitioning of decisions through their cycles: from inception to implementation and up to production and retirement.
Similarly, execution integrations enable the deployment of executable decisions within the context of the enterprise operational systems: business process platforms, micro-services platforms, event systems, etc. Frequently, these integrations also involve logging or audit systems.
Finally, performance tracking integrations are about using the enterprise reporting environment to get a business-level view of how well the decisions perform.
Typically, different types of integrations interest different roles within the enterprise. The security and risk management groups will worry about authentication, access control and audit. The IT organization will pay attention to life-cycle management and execution. Business groups will mostly focus on implementation support and performance tracking.
The upcoming series of blog posts will focus on these various integrations: their requirements, their scope, their challenges and how to approach them.
In the meantime, you can read the relevant posts in the “Best Practices” series:
A little while ago, I ran into a question in Quora that hit me in the stomach… figuratively, of course. Someone asked “why do rules engines fail to gain mass adoption?“. I had mixed feelings about it. In one hand, I am very proud of our decision management industry, and how robust and sophisticated our rules engines have become. In the other hand, I must admit that I see tons of projects not using this technology that would help them so much. I took a little time to reflect on the actual roadblocks to rules engines.
A couple of points I want to stress first
In addition to the points I make below, with a little more time to think about it, I think it boils down to evangelization. We, in the industry, have not been doing a good job educating the masses about the value of the technology, and its ease of use. We rarely get visibility up the CxO level. Business rules is never one of the top 10 challenges of executives, though it might be in disguise. We need to do a better job.
I’m signing up, with my colleagues, for an active webinar series, so that we can address one of the roadblocks to rules engines, and decision management!
Rules are so important, they are already part of platforms
The other key aspect to keep in mind is that business rules are so important in systems that they often become a de-facto component of the ecosystem. Business rules might be used under the form of BPM rules or other customization, but not called out as a usage for rules engines. Many platforms will claim they include business rules. The capability might be there, though it may not be as rich as a decision management system. Many vertical platforms like Equifax’s InterConnect platform include a full-blown decision management system though. When decision makers have to allocate the budget for technology, this becomes another of the roadblocks to rules engines, as they assume that rules are covered by the platform. Sometimes they are right, often not.
Rules in code is not a good idea
Let me stress once more that burying your rules into code or SQL procedures is not a good idea. It is one of the roadblocks to rules engines excuse we have heard probably the most. I explain down below that this is tempting for software developers to go back to their comfort zone. This is not sustainable. This is not flexible. We did that many decades ago as part of Cobol system, mostly because decision management systems did not exist back then. We suffered with maintenance of these beasts. In many occurrences, the maintenance was so painful that we had to patch the logic with pre-processing or post-processing to avoid touching the code. We have learned from these days that logic, when it is complex and/or when it changes decently often, needs to be externalized.
Business owners do not want to go to IT and submit a change request. They want to be able to see the impact of a change before they actually commit to the change. They want the agility to tweak their thresholds or rate tables within minutes or hours, not days and weeks. While there is testing needed for rules like for any software, it is much more straight-forward as it does not impact the code. It is just about QA testing and business testing.
Here is my answer:
I have been wondering the same thing. Several decades ago, I discovered expert systems at school, in my AI class. I fell in love with them, and even more so with rules engines as they were emerging as commercial products.
While coding is more powerful and intuitive than it used to be, the need is still there to make applications more agile. Having software developers change code is certainly more painful than changing business logic in a separate component, ie the decision service.
Some argue that the technology is difficult:
- ability to find issues
Is the technology too difficult to use?
Because of syntax
I can attest that writing LISP back in the days was nothing ‘intuitive’. Since then, thanks God, rules syntax has improved, as programming syntax did too. Most business analysts I have worked with have found the syntax decently understandable (except for the rare rules engine that still use remnants of OPS5). With a little practice, the syntax is easily mastered.
Furthermore, advances in rules management have empowered rules writers with additional graphical representations like decision tables, trees, and graph. At Sparkling Logic, we went a step further to display the rules as an overlay on top of transaction data. This is as intuitive as it gets in my opinion.
Because of debugging
The second point seems more realistic. When rules execute, they do not follow a traditional programmatic sequence. The rules that apply simply fire. Without tooling, you might have to have faith that the result will be correct. Or rely on a lot of test case data! Once again, technology has progressed and tooling is now available to indicate which rules fired for a given transaction, what path was taken (in the form of an execution trace), etc. For the savvy business analyst, understanding why rules did or did not execute has become a simplistic puzzle game… You just have to follow the crumbs.
So why are rules not as prevalent as they should be?
Is the technology too easy to not use?
Because of ownership
I am afraid to say that IT might be responsible. While it is now a no-brainer to delegate to a database for storage, and for some other commonly accepted components in the architecture for specialized functions, it remains a dilemma for developers to let business analysts take ownership. You need to keep in mind that business rules are typically in charge of the core of the system: the decisions. If bad decisions are made, or if no decision can be made, some heads will roll.
Because of budget
Even if the management of decisions is somewhat painful, and inflexible, it is a common choice to keep these cherished rules inside the code, or externalized in a database or configuration file. The fact that developers have multiple options to encode these rules is certainly not helping. First, they can see it as their moment of fun, for those that love to create their own framework (and there are plenty of them). Second, it does not create urgency for management to allocate a budget. It ends up being a build vs. buy decision.
Without industry analysts focused exclusively on decision management, less coverage by publications, and less advertisement by the tech giants, the evangelization of the technology is certainly slower than it deserves to be.
Yet, it should be used more…
Because of Decision Analytics
I would stress that the technology deserves to be known, and used. Besides agility and flexibility (key benefits of the technology), there is more that companies could benefit from. In particular, decision analytics are on top of my list. Writing and executing rules is clearly an important aspect. But I believe that measuring the business performance is also critical. With business rules, it is very easy to create a dashboard for your business metrics. You can estimate how good your rules are before you deploy them, and you can monitor how well they perform on a day-to-day basis.
Because of ease of integration
For architects, there are additional benefits too in terms of integration. You certainly do not want to rewrite rules for each system that needs to access them. Rules engines deployed as a component can be integrated with the online and batch system, and any changing architecture, without any rewrite, duplication, or any nightmare.
With that in mind, I hope that you will not let these roadblocks to rules engines stop you. There are plenty of reasons to consider decision management systems, or rules engines as they are often called. You will benefit greatly:
- Flexibility to change your decision logic as often and as quickly as desired
- Integration and deployment of predictive analytics
- Testing from a QA and business perspective
- Measure business performance in sand-box and in production
- and yet, it will integrate beautifully in your infrastructure
Our Best Practices Series has focused, so far, on authoring and lifecycle management aspects of managing decisions. This post will start introducing what you should consider when promoting your decision applications to Production.
Make sure you always use release management for your decision
Carole-Ann has already covered why you should always package your decisions in releases when you have reached important milestones in the lifecycle of your decisions: see Best practices: Use Release Management. This is so important that I will repeat her key points here stressing its importance in the production phase.
You want to be 100% certain that you have in production is exactly what you tested, and that it will not change by side effect. This happens more frequently than you would think: a user may decide to test variations of the decision logic in what she or he thinks is a sandbox and that may in fact be the production environment.
You also want to have complete traceability, and at any point in time, total visibility on what the state of the decision logic was for any decision rendered you may need to review.
Everything they contributes to the decision logic should be part of the release: flows, rules, predictive and lookup models, etc. If your decision logic also includes assets the decision management system does not manage, you open the door to potential execution and traceability issues. We, of course, recommend managing your decision logic fully within the decision management system.
Only use Decision Management Systems that allow you to manage releases, and always deploy decisions that are part of a release.
Make sure the decision application fits your technical environments and requirements
Now that you have the decision you will use in production in the form of a release, you still have a number of considerations to take into account.
It must fit into the overall architecture
Typically, you will encounter one or more of the following situations
• The decision application is provided as a SaaS and invoked through REST or similar protocols (loose coupling)
• The environment is message or event driven (loose coupling)
• It relies mostly on micro-services, using an orchestration tool and a loose coupling invocation mechanism.
• It requires tight coupling between one (or more) application components at the programmatic API level
Your decision application will need to simply fit within these architectural choices with a very low architectural impact.
One additional thing to be careful about is that organizations and applications evolve. We’ve seen many customers deploy the same decision application in multiple such environments, typically interactive and batch. You need to be able to do multi-environment deployments a low cost.
It must account for availability and scalability requirements
In a loosely coupled environments, your decision application service or micro-service with need to cope with your high availability and scalability requirements. In general, this means configuring micro-services in such a way that:
• There is no single point of failure
○ replicate your repositories
○ have more than one instance available for invocation transparently
• Scaling up and down is easy
Ideally, the Decision Management System product you use has support for this directly out of the box.
It must account for security requirements
Your decision application may need to be protected. This includes
• protection against unwanted access of the decision application in production (MIM attacks, etc.)
• protection against unwanted access to the artifacts used by the decision application in production (typically repository access)
Make sure the decision applications are deployed the most appropriate way given the technical environment and the corresponding requirements. Ideally you have strong support from your Decision Management System for achieving this.
Leverage the invocation mechanisms that make sense for your use case
You will need to figure out how your code invokes the decision application once in production. Typically, you may invoke the decision application
• separately for each “transaction” (interactive)
• for a group of “transactions” (batch)
• for stream of “transactions” (streaming or batch)
Choosing the right invocation mechanism for your case can have a significant impact on the performance of your decision application.
Manage the update of your decision application in production according to the requirements of the business
One key value of Decision Management Systems is that with them business analysts can implement, test and optimize the decision logic directly.
Ideally, this expands into the deployment of decision updates to the production. As the business analysts have updated, tested and optimized the decision, they will frequently request that it be deployed “immediately”.
Traditional products require going through IT phases, code conversion, code generation and uploads. With them, you deal with delays and the potential for new problems. Modern systems such as SMARTS do provide support for this kind of deployment.
There are some key aspects to take into account when dealing with old and new versions of the decision logic:
• updating should be a one-click atomic operation, and a one-API call atomic operation
• updating should be safe (if the newer one fails to work satisfactorily, it should not enter production or should be easily rolled back)
• the system should allow you to run old and new versions of the decision concurrently
In all cases, this remains an area where you want to strike the right balance between the business requirements and the IT constraints.
For example, it is possible that all changes are batched in one deployment a day because they are coordinated with other IT-centric system changes.
Make sure that you can update the decisions in Production in the most diligent way to satisfy the business requirement.
Track the business performance of your decision in production
Once you have your process to put decisions in the form of releases in production following the guidelines above, you still need to monitor its business performance.
Products like SMARTS let you characterize, analyze and optimize the business performance of the decision before it is put in production. It will important that you continue with the same analysis once the decision is in production. Conditions may change. Your decisions, while effective when they were first deployed, may no longer be as effective after the changes. By tracking the business performances of the decisions in production you can identify this situation early, analyze the reasons and adjust the decision.
In a later installment on this series, we’ll tackle how to approach the issue of decision execution performance as opposed to decision business performance.
Let’s continue with our series on best practices for your decision management projects. We covered what not to do in rule implementation, and what decisions should return. Now, let’s take a step back, and consider how to think about decisions. In other words, I want to focus on the approaches you can take when designing your decisions.
Think about decisions as decision flows
The decision flow approach
People who know me know that I love to cook. To achieve your desired outcome, recipes give you step by step instructions of what to do. This is in my opinion the most natural way to decompose a decision as well. Decision flows are recipes for making a decision.
In the early phases of a project, I like to sit down with the subject matter experts and pick their brain on how they think about the decision at hand. Depending on the customer’s technical knowledge, we draw boxes using a whiteboard or Visio, or directly within the tool. We think about the big picture, and try to be exhaustive in the steps, and sequencing of the steps to reach our decision. In all cases, the visual aid allows experts who have not prior experience in decision management design to join in, and contribute to the success of the project.
What is a decision flow
In short, a decision flow is a diagram that links decision steps together. These links could be direct links, or links with a condition. You may follow all the links that are applicable, or only take the first one that is satisfied. You might even experiment on a step or two to improve your business performance. In this example, starting at the top, you will check that the input is valid. If so, you will go through knock-off rules. If there is no reason to decline this insurance application, we will assess the risk level in order to rate it. Along the way, rules might cause the application to be rejected or referred. In this example, green ball markers identify the actual path for the transaction being processed. You can see that we landed in the Refer decision step. Heatmaps also show how many transactions flow to each bucket. 17% of our transactions are referred.
Advantages of the decision flow approach
The advantage of using this approach is that it reflects the actual flow of your transactions. It mirrors the steps taken in a real life. It makes it easy to retrace transactions with the experts and identify if the logic needs to be updated. Maybe the team missed some exotic paths. maybe the business changed, and the business rules need to be updated. When the decision flow links to actual data, you can use it also as a way to work on your strategies to improve your business outcome. If 17% referral rate is too high, you can work directly with business experts on the path that led to this decision and experiment to improve your outcome.
Think about decisions as dependency diagrams
A little background
In the early days of my career, I worked on a fascinating project for the French government. I implemented an expert system that helped them diagnose problems with missile guidance systems. The experts were certainly capable of layout of the series of steps to assess which piece of equipment was faulty. However, this is not how they were used to think. Conducting all possible tests upfront was not desirable. First, there was a cost to these tests. But more importantly, every test could cause more damage to these very subtle pieces of engineering.
As it was common back then in expert systems design, we thought more in a “backward chaining” way. That means that we reversed engineered our decisions. We collected evidences along the way to narrow down the spectrum of possible conclusions.
If the system was faulty, it could be due to the mechanical parts or to the electronics onboard. If it was mechanical, there were 3 main components. To assess whether it was the first component, we could conduct a simple test. If the test was negative, we could move on to the second component. Etc.
In the end, thinking about dependencies was much more efficient than a linear sequence, for this iterative process.
The dependency diagram approach
Today, the majority of the decision management systems might pale in sophistication compared to this expert system. But the approach taken by experts back then is not so different from the intricate knowledge in the head of experts nowadays in a variety of fields. We see on a regular basis projects that seem better laid out in terms of dependencies. Or at least, it seems more natural to decompose them this way to extract this precious knowledge.
What is a dependency diagram
A dependency diagram starts with the ultimate decision you need to make. The links do not illustrate sequence, as they do in the decision flows. Rather, they illustrate dependencies obviously, showing what input or sub-decision needs to feed into the higher level decision. In this example, we want to determine the risk level, health-wise, of a member in a wellness program. Many different aspects feed into the final determination. From a concrete perspective, we could look at obesity, blood pressure, diabetes, and other medical conditions to assess the current state. From a subjective perspective, we could assess aggravating or improving factors like activity and nutrition. For each factor, we would look at specific data points. Height and weight will determine BMI, which determines obesity.
Similarly to the expert system, there is no right or wrong sequence. Lots of factors help make the final decision, and they will be assessed independently. One key difference is that we do not diagnose the person here. We can consider all data feeds to make the best final decision. Branches are not competing in the diagram, they contribute to a common goal. The resulting diagram is what we call a decision model.
Advantages of the dependency diagram approach
Dependency diagrams are wonderful ways to extract knowledge. As you construct your decision model, you decompose a large problem into smaller problems, for which several experts in their own domain can contribute their knowledge. When decisions are not linear, and the decision logic has not yet been documented, this is the right approach.
This approach is commonly used in the industry. OMG has standardized the notation under the “DMN” label, which stands for Decision Model and Notation. This approach allows you to harvest knowledge, and document source rules.
Choose the approach that is best for you
Decision flows are closest to an actual implementation. In contrast, dependency diagrams, or decision models, focus on knowledge. But they feed straight into decision management systems. In the end, think about decisions in the way that best fits your team and project. The end result will translate into an executable decision flow no matter what.
Now that we covered some basics about what not to do when writing rules, I would like to focus on what to do. The first aspect that comes to mind is the very basics of making decisions: the decision you make.
It is a common misconception that decision services make only one decision. There is of course a leading decision that sets a value to true, to a string or compute a numerical value. Examples are:
- approving a loan application
- validating an insurance claim
- flagging a payment transaction as fraudulent
A decision service typically makes that key decision, but it will often come with other sub-decisions. For example:
- risk assessment…
- best mortgage product…
- payment amount…
- likely type of fraud…
Making decisions or not
In my career, I have seen projects in many different industries. They applied to totally different types of decisions. I heard a few times, now and then, the desire to simplify the decision outcome. Just making a decision ‘in some cases’ is not enough though. Don’t assume that no decision equates to the opposite decision. This over-simplification is not a good design. Your decision services need to make decisions.
Business rules often flag a negative outcome, such as a decline decision, a suspicion of fraudulent activity, etc. It might be tempting to only respond when the adverse action occurs. If the person is not declined or referred, then he or she is approved. Why state it explicitly, when it is an obvious truth?
Your decision service, in such a design, would return “Declined” when the applicant is declined, and nothing when he or she is not.
There is no wrongdoing in flagging bad situations. I personally believe strongly in affirmatively stating the positive outcome as well. Your service eventually determines that your application is declined. It should respond that the application was approved as well, when applicable. It seems intuitive to focus on this negative outcome if that is what your decision service is designed to do. But it is also possible that the applicant could not be vetted due to lacking information.
Don’t let any doubt as to what your decision service decides.
Why is it useful?
You could assume that ‘no bad decision’ is equivalent to a ‘good decision’ of course. I find it brittle though. When an unexpected situation arises, assuming you reached a positive outcome can hurt your business. If your decision logic failed in the middle, don’t let it go unnoticed.
Don’t infer from this statement that I do not trust business rules. Clearly, business rules will faithfully reproduce the decision logic you authored. I am concerned about data changing over time. Established decision logic can change over time too, and often will.
Do not take for granted that all possible paths are covered. Today, your rules might check that all the mandatory pieces of information are provided. Let’s assume your decision service requires a new piece of data tomorrow. If the business analyst in charge forgets to check for it up front, you will reach a no decision when this data is missing. You will end up with many approved applications that were invalid to start with.
In simple terms, I highly recommend preparing for the unexpected, by not allowing it to happen without notice. Always state clearly the decision you reach. Add a rule to set the status to “Approved” if the application is not “Declined” or “Referred”. If and when you encounter these no decision situations, just update your decision logic accordingly. Plus, you significantly increase your chances to catch these problems at QA time. You don’t want to scramble after deployment of your decision logic into Production. That is the 101 rule about making decisions.