Using the if-then-else framework, Part 3

Enhance the framework to support large-scale projects

The if-then-else framework is a tool for coding complex branching logic in a maintainable way. In the first two parts of this series, I described how to use the framework in a simple but typical case. The purpose of this final installment is to study the problems that arise when you place more significant demands on the framework. I will show how its performance starts to bog down as you increase the number of conditions, rules, and actions. I will then identify the performance bottlenecks, introduce a sequencing mechanism that will automate bookkeeping, and introduce an automated consistency checker that will throw an exception if inconsistent rules have been loaded. These modifications will not alter the fundamental steps for using the framework, already described in Parts 1 and 2. The end result will be a tool for handling branching logic that is industry ready and much more efficient.

TEXTBOX:

TEXTBOX_HEAD: Using the if-then-else framework: Read the whole series!

:END_TEXTBOX

For convenience, Listing 1 below displays the implementation of the if-then-else framework for the example used in Parts 1 and 2. I suggest you review the use of Conditions, Actions, Rules, the Updateable interface, and the Invoker subclass before diving into the rest of the discussion. (You may also wish to compare my approach to coding branching logic with the Hashed Adapter Objects design pattern, discussed in Mark Grand’s book on design patterns — see Resources for more information.)

Listing 1. The URLProcessor_good class with user-defined inner classes

class URLProcessor_good extends URLProcessor implements logic.Updateable {
    static final String URL_STR = "urlString";
    URLProcessor_good(){
        super();
    }
    void decideUrl() {
        try {
            Invoker inv = new ConcreteInvoker((Updateable)this);
            inv.execute();
        }
        catch(NestingTooDeepException ntde) {}
        catch(IllegalExpressionException iee) {}
        catch(RuleNotFoundException rnfe) {}
        catch(DataNotFoundException dnfe) {}
    }
/** Updateable interface implementation */
    public void doUpdate(Hashtable ht) {
       if(ht == null) return;
       Object ob = ht.get(Action.MAIN_KEY);
       if(ob != null && ob instanceof String) {
           setUrl((String)ob);
       }
    }
    static class OtherCondition extends Condition {
        public static final String KEY = "key";
        public OtherCondition(Hashtable ht) {
            super(ht);
        }
        public Boolean evaluate() throws DataNotFoundException {
            Object ob = null;
            if(getData() == null || (ob = getData().get(KEY)) == null) {
                throw new DataNotFoundException();
            }
            if(!(ob instanceof DataBank.Region)) {
                return Boolean.FALSE;
            }
            DataBank.Region region = (DataBank.Region)ob;
            boolean result = !region.equals(DataBank.WEST_REGION) &&
                             !region.equals(DataBank.EAST_REGION);
            return new Boolean(result);
        }
    }
    static class MemberCondition extends Condition {
        public static final String KEY = "id";
        public static final String TABLE = "table";
        public MemberCondition(Hashtable ht) {
            super(ht);
        }
        public Boolean evaluate() throws DataNotFoundException {
            Object id = null, table = null;
            if(getData() == null || 
               ((id = getData().get(KEY)) == null) || 
                ((table = getData().get(TABLE))==null) ||
                 !(table instanceof Hashtable)) {
                throw new DataNotFoundException();
            }                                
            Hashtable members = (Hashtable)table;
            return new Boolean(members.containsKey(id));
        }
    }
    public class ConcreteInvoker extends Invoker {
        public ConcreteInvoker(Updateable ud) throws NestingTooDeepException {
            super(ud);
        }
        public void loadRules() throws NestingTooDeepException {
            rules = new Rules();
            try {
                //             Conditions:                              Actions:
                //        East|West|Other|Lim|Member    
                rules.addRule("TFFT*",               new Action(ud,EAST_PRIVILEGED));
                rules.addRule("TFFF*",                    new Action(ud,EAST_NOT_PRIVILEGED));
                rules.addRule("FTFTT",               new Action(ud,WEST_MEMBER_PRIVILEGED));
                rules.addRule("FTFTF",               new Action(ud,WEST_NONMEMBER_PRIVILEGED)); 
                rules.addRule("FTFFT",               new Action(ud,WEST_MEMBER_NOT_PRIVILEGED));
                rules.addRule("FTFFF",               new Action(ud,WEST_NONMEMBER_NOT_PRIVILEGED)); 
                rules.addRule("FFT**",               new Action(ud,OTHER_REGION));
             }
             catch(NestingTooDeepException e) {
                 throw e;
             }
        }
                
        public void loadConditions() throws IllegalExpressionException {
            try {  
                conditions = new Vector(5);
                conditions.addElement(new Condition(db.getRegion(),DataBank.EAST_REGION));
                conditions.addElement(new Condition(db.getRegion(),DataBank.WEST_REGION));
                Hashtable other = new Hashtable();
                other.put(OtherCondition.KEY,db.getRegion());
                conditions.addElement(new OtherCondition(other));
                conditions.addElement(new Condition(db.LIMIT_THRESHOLD, Condition.LESS,db.getLimit() ));                         
                Hashtable mem = new Hashtable();
                mem.put(MemberCondition.TABLE, db.getWestMembers());
                mem.put(MemberCondition.KEY, db.getUserId());
                conditions.addElement(new MemberCondition(mem));
            }
            catch(IllegalExpressionException e) {
                throw e;
            }
        }
    }
}

Main issues

The following issues, which I outlined at the end of Part 2, will provide a structure for this discussion:

  1. Order of evaluation. As you can see in Listing 1, the comments in the loadRules() method spell out the intended order of evaluation. I always consider conditions in the following order, whether I am loading conditions or creating a sequence of Booleans: East, West, Other, Limit, Member.

    But what happens when the number of conditions grows to 15 or 20? You can no longer expect to rely on a difficult-to-read set of notes in your documentation as a reliable means to guarantee adherence to a particular order of evaluation. Clearly, this part of the framework begs for an automated solution that can eliminate human error.

  2. Performance. In Listing 1, you will notice that one of the exceptions that is handled in the decideURL() method (which is the point of control for handling exceptions) is the NestingTooDeepException. The framework will throw this exception whenever the number of Condition instances exceeds 30. This limitation is in place to guard against unexpectedly poor performance. When the number of conditions gets too close to 30, performance becomes unacceptably slow. You can actually force the framework to falter with as few as 12 artificially created conditions (I will show how this works shortly). A review of the framework code will reveal one main reason for performance degradation: the presence of two exponential-time algorithms. Here you need faster algorithms that will improve performance under all conditions.
  3. Consistency checking. In Part 2, I pointed out how careless use of wildcards in creating Boolean sequences can easily result in a pair of inconsistent rules. Recall that a rule is a matchup between a Boolean string and a corresponding Action instance. Typically, you add a rule to the application’s list of rules using the addRule(String boolStr, Action action) method of Invoker. Two such rules are inconsistent if the same Boolean string is matched with two or more different actions.

    It is easy to introduce inconsistent rules inadvertently when using wildcards. For instance, suppose you have Action instances action1 and action2. You add one rule that associates the string F*TT to action1 and later add another rule that associates the string FF*T to action2. Because the wildcard (*) indicates that both the true (T) and false (F) cases have to be considered, these rules implicitly associate the string FFTT to action1 in the first case and to action2 in the second. In other words, the use of wildcards in these two rules has introduced an inconsistency. The framework, as it operates now, will not detect the inconsistency, and will quietly select one of the rules to use. Here you need an automated consistency checker that will alert the user to any inconsistency as soon as it arises during the rule-loading process.

Order of evaluation

To reduce the risk of error in adhering to an evaluation order, you need to automate the procedure that guarantees that the same order that is used in loading conditions is also used in building up Boolean strings during the rule-loading phase. To do this, you need to move the order specification out of the comments section and into the code, and then use this piece of code to properly arrange the conditions and the Boolean strings as you load them.

One way to specify an order of evaluation within code is to associate the objects to be ordered with integer constants. You can then use the natural ordering of the integers to maintain your intended order of objects. To be concrete, I will create an interface of appropriately named constants that specifies an order of evaluation for my main example (see Listing 1):

public interface OrderOfEvaluation {
            final int EAST_CONDITION  = 0;
            final int WEST_CONDITION  = 1;
            final int OTHER_CONDITION = 2;
            final int LIMIT_CONDITION = 3;
            final int MEMBER_CONDITION= 4;
}

This simple device precisely specifies the following order for evaluating conditions: East, West, Other, Limit, and Member. As soon as my Invoker subclass implements the new interface, it can use these constants to preserve the order by passing them as arguments as Invoker loads up conditions and rules. To handle the logic involved and provide the necessary public methods, I provide wrappers, called sequencers, for the vector of conditions and table of rules.

I’ll begin with the implementation of a ConditionSequencer:

public class ConditionSequencer {
    Vector conditions;
    public ConditionSequencer(Vector conditions) {
        this.conditions = conditions;
    }
    public void addCondition(Condition c, int position) throws RuleNotFoundException {
        if(conditions == null) throw new RuleNotFoundException();
        if(position >= conditions.capacity()) throw new RuleNotFoundException();
        conditions.insertElementAt(c, position);
   }
}

You instantiate ConditionSequencer by passing in the vector of conditions that you need to load. You can then add conditions one by one using the addCondition() method. This method is very much like the addElement(..) method provided by the Vector class, except that it requires as a second argument the position in Vector to store the condition. This integer argument is just the corresponding interface constant. A call to addCondition(), therefore, inserts the specified condition into the vector of conditions at the specified location. A RuleNotFoundException is thrown if you fail to initialize conditions or fail to set its capacity to a sufficiently large value. The code below shows how you would begin to rewrite the loadConditions() method in Listing 1, using this new approach:

public void loadConditions() throws IllegalExpressionException, RuleNotFoundException{
    try {  
       conditions = new Vector(numConditions);
       ConditionSequencer sequencer = new ConditionSequencer(conditions);
       sequencer.addCondition(new Condition(db.getRegion(),DataBank.EAST_REGION), 
                              EAST_CONDITION);
       sequencer.addCondition(new Condition(db.getRegion(),DataBank.WEST_REGION),
                              WEST_CONDITION);
          . . .
     }
     catch(Exception e){}
}

The technique for sequencing rules is similar but slightly more complicated. Here, the crucial step in preserving the order of evaluation lies in the arrangement of characters in each Boolean string. The proper order of the characters in TFTF requires that the T at position 0 signify that condition 0 should evaluate to true, and that the F at position 1 signify that condition number 1 should evaluate to false, and so forth. You can use the interface constants to ensure that this matchup is occurring correctly by encapsulating the requirement in a small class containing both a Boolean value and the appropriate condition name. I call this class ConditionValue; a skeleton of the class appears below:

public class ConditionValue {
    private String value;
    private int position;
    //constructor
    public ConditionValue(int pos, char c); 
}

The instance variable value in the class will always be a T, F, or *. And the variable position will always store the value of the appropriate interface constant, and therefore the correct location of the value in the Boolean string.

To complete loading a rule, you need to match a sequence of ConditionValues (which specifies a Boolean string with an enforced order) with an appropriate Action. I will store such a sequence of ConditionValues in a collection type known as a Set. (I use this data type, which is an unordered collection, to emphasize that the information about order is not stored in the collection but rather in each of the ConditionValues.) Sun added a Set data type to the Java API in JDK 1.2, but I have implemented my own specialized version, which JDK 1.1 users can use as well. I have called this new class ConditionValueSet.

To associate a ConditionValueSet with an Action, I have introduced another new class, Rule, which encapsulates the notion of a rule. A Rule has two attributes: a ConditionValueSet and an Action. Here is a skeleton of the Rule class:

public class Rule {
    public Rule(ConditionValueSet cvs, Action action);
    public Action getAction();
    public ConditionValueSet getCVS();
    public void setAction(Action anAction);
    public void setCVS(ConditionValueSet aCVS);
}

Finally, you need a rule sequencer, analogous to the ConditionSequencer class, that will serve as a wrapper to Rules and will handle the logic involved in rule loading. The RuleSequencer class will let you add rules (instances of Rule) one by one with an addRule() method. This method will unpack the data stored in the Rule instance and invoke the addRule() method on Rules, ensuring that the Boolean string is assembled according to the order of evaluation specified in the component ConditionValues. Here is the skeleton of RuleSequencer:

public class RuleSequencer {
    //constructor
    public RuleSequencer(Rules rules);
    public void addRule(Rule rule);
}

Below, I show the first part of a rewrite of the loadRules() method from Listing 1 that takes into account this new approach:

public void loadRules(){
    rules = new Rules();
    RuleSequencer sequencer = new RuleSequencer(rules);
    try {
        //add rule TFFT*, EAST_PRIVILEGED
        ConditionValueSet cvs_eastPriv = new ConditionValueSet();
        cvs_eastPriv.addMember(new ConditionValue(EAST_CONDITION,'T'));
        cvs_eastPriv.addMember(new ConditionValue(WEST_CONDITION,'F'));
        cvs_eastPriv.addMember(new ConditionValue(OTHER_CONDITION,'F'));
        cvs_eastPriv.addMember(new ConditionValue(LIMIT_CONDITION,'T'));
        cvs_eastPriv.addMember(new ConditionValue(MEMBER_CONDITION,'*'));
        sequencer.addRule(new Rule(cvs_eastPriv, new Action(ud,EAST_PRIVILEGED)));
       . . .
    }
    catch(Exception e){}
}

As you can see, this new approach involves one additional step at the top level of loadRules(). In addition to creating a new Rules instance, you also create a new RuleSequencer. It is then straightforward, albeit a little tedious, to create one ConditionValue for each of the Conditions being considered, associating each with a truth value. You add these ConditionValues one by one to a ConditionValueSet, associate this set with the appropriate Action by creating a new Rule instance, and then add this rule to the RuleSequencer.

For lack of space, I have not explained all the backend details of processing these steps; you can find these in the code itself in Resources.

By automating the enforcement of an order of evaluation, you avoid the pitfalls of relying on a specification in the code’s comments section. The price you pay for this solution is an increased load of coding, particularly for loading rules. If you had 100 or more rules (a not uncommon occurrence), the increased coding demands would certainly become too great.

When you have a large number of rules, it is often possible to automate the rule-loading process. With many rules, you often have relatively few actions, and this lets you load rules by executing several loops, one for each action. I have given a fairly simple example of larger-scale rule loading in my test environment (see the Main class in the sampleProfile package in the new code file in Resources).

To add extra flexibility to the Invoker class’s initialization (to initialize additional user-defined classes that you might use for automated rule loading), I have slightly changed the way you use the framework, whether or not you use sequencers. In the updated version, when you wish to run the framework, you make the following sequence of calls:

    Invoker inv = new ConcreteInvoker(anUpdateable);
    inv.init();
    inv.execute();

The line of code that differs from the original framework is the explicit call to the Invoker‘s init() method. You can now squeeze in other lines of code for initialization before calling the loadRules() method (which is always called by init()).

Performance

When I began performance testing of my framework, I already knew that one of the big bottlenecks would be in the initialization section of the code, which handles rule loading. As you will see, I implemented some rather slow (but convenient) algorithms in this section. On the other hand, I expected the runtime section of code — responsible for loading conditions, locating rules, and looking up actions and firing them off — to perform well because the processes involved are simple, and because I stored rules in a Hashtable. A Hashtable allows for lookups that are as fast as possible (for a discussion of the optimality of Hashtables for lookups, see references to Lafore and Grand in Resources).

Performance testing revealed that close to 100 percent of execution time occurred in the rule-loading sections of code, particularly as the number of conditions and rules increased. Moreover, I was able to artificially slow framework performance to a crawl using relatively few conditions and many wildcards — but had no performance problems with fewer wildcards. Therefore, my efforts to optimize will focus on rule-loading code with many wildcards.

Before examining why the framework does so poorly in these scenarios, I’ll describe my testing approach: I created a test environment that implemented the framework and did a simple kind of automated rule loading (a simple case of the rule loading I described in the last section). I created just one instance of Action, which I reused in all my rules. I required every Boolean string to begin with one of the following four T-F sequences: TT, TF, FT, FF. The test environment required that I specify how many wildcards I wished to use; the environment would then append that many copies of * to each of these four base strings, and then load these wildcard-intensive strings in the loadRules() section of code, each matched with the single Action instance defined earlier. This environment was ideal for determining exactly how many wildcards were needed to degrade performance.

To begin the investigation of rule-loading performance, consider the diagrammatic view given in Figure 1 below.

Figure 1. Code responsible for rule loading

As the diagram indicates, loadRules() makes multiple calls to the addRule() method of Rules. Each instance of such a call entails a trip to the BooleanTupleFactory to transform a Boolean string into one or more BooleanTuples. The Rules instance then matches each of the BooleanTuples with the Action instance passed in, using the BooleanTuple as a key and the Action as a value in the Hashtable ruleTable inside Rules.

As you may recall, a BooleanTuple encapsulates the notion of a sequence of Booleans — in particular, it represents a sequence of Boolean values not as a string of Ts and Fs, but as an array of Booleans. I chose to use BooleanTuples as keys in my Hashtable of rules rather than raw strings of Ts and Fs (and wildcards) because it was easy to formulate a nice hashing algorithm for BooleanTuples. (The hashing algorithm for the Java String class, prior to JDK 1.2, was known to be relatively inefficient — see Weiss’s book in Resources. See also the inline comments about my hashCode() implementation in BooleanTuple.)

The BooleanTupleFactory generates one or more BooleanTuples from each string of Ts, Fs, and wildcards. In doing so, the factory replaces any Boolean string having a * with two strings — one in which the * is replaced with a T, another in which it is replaced with an F. The factory then transforms each Boolean string that it generates into an instance of BooleanTuple, which is then loaded into the rules table. This inevitable production of so many Boolean strings in order to accommodate wildcards provides one of the main reasons for performance degradation. A review of the code will make this point clearer, and will uncover other performance problems as well:

public class BooleanTupleFactory {
     
     public static BooleanTuple [] getTuples(String tf) throws NestingTooDeepException {
          if(tf == null || tf.equals("")) return null;
          if(tf.length() > BooleanTupleConstants.MAX_LEVELS_OF_NESTING){
               throw new NestingTooDeepException("Can't have more than "+
                       BooleanTupleConstants.MAX_LEVELS_OF_NESTING + " levels of nesting");    
          }
          int numTuples = countTuples(tf);
          int len = tf.length();
          int numStars = countStars(tf);
          int [] indicesOfStarredCols = {};
          if(numStars > 0) {
              indicesOfStarredCols = computeIndicesOfStarredCols(tf);
          }
          StringBuffer [] tfArr = new StringBuffer[numTuples];
          BooleanTuple [] btArr = new BooleanTuple[numTuples];
          for(int col = 0; col < len; ++col) {
               for(int row = 0; row < numTuples; ++row) {
                    if(col == 0) tfArr[row] = new StringBuffer("");                    
                    char c = tf.charAt(col);
                    tfArr[row].append(obtainBoolChar(col, row, c, indicesOfStarredCols));
               }
          }
          for(int row = 0; row < numTuples; ++row) {
               btArr[row] = getTuple(tfArr[row].toString());
          }
          return btArr;
     }
     
      public static int truthTable(int starCol, int row, int numStarCols) {
          //validate input 
          if(starCol < 0 || starCol > numStarCols - 1 || row < 0 || 
             row > Math.pow(2,numStarCols)-1 || numStarCols < 1) {
                
               System.out.println("Invalid input ["+starCol+", "+row+", "+numStarCols+"]"+
                                  "for truthTable method");
               return -1;
        }
        //if starCol is largest possible, return 0 if row lies in first half of the rows, 
        // return 1 if in second half
        if( starCol == numStarCols - 1) {
            return (row < Math.pow(2,numStarCols-1)) ? 0 : 1;
        }
        //if starCol is one of the smaller possible values, use the version of truthTable
        //for which total number of columns is 1 less than the current value
        else {
            return truthTable(starCol, row % ((int)Math.pow(2,numStarCols-1)), numStarCols-1);
        }
    }
    private static char obtainBoolChar(int col,int row,char c, int[] indicesOfStarredCols) {
          switch(c) {
               case 'T':
               case 't':
                    return 'T';
               case 'F':
               case 'f':
                    return 'F';
               case '*':
                   //find index of starred col index in the indicesOfStarredCols array
                   int adjustedColNum = Util.findIndex(col, indicesOfStarredCols);
                   
                   //call truthTable to determine appropriate truth value for this 
                      //row and column
                   int booleanAsNumeric = 
                       truthTable(adjustedColNum, row, indicesOfStarredCols.length);
                       
                   //convert to 'T' or 'F'
                    return (booleanAsNumeric == 0) ? 'T' : 'F';
               default:
                    break;
          }
          return 'F';
     }
     private static int countStars(String s) {
          int count = 0;
          if(s == null) return count;
          for(int i = 0; i < s.length(); ++i) {
               if(s.charAt(i) == '*') ++count;
          }
          return count;
     }
     
      private static int [] computeIndicesOfStarredCols(String s) {
          int arraySize = countStars(s);
          int [] arrayOfIndices = new int[arraySize];
          int currIndex = 0;
          for(int i = 0; i < s.length(); ++i) {
            if(s.charAt(i) == '*') {
                arrayOfIndices[currIndex] = i;
                ++currIndex;
             }
          }
          return arrayOfIndices;
      }
     
     private static int countTuples(String s) {
          return (int)Math.pow(2,countStars(s));
     }
     
     private static Boolean getBoolean(char c) {
          if(c == 't' || c == 'T') return Boolean.TRUE;
          return Boolean.FALSE;
     }
     private static BooleanTuple getTuple(String tf) throws NestingTooDeepException {
          if(tf == null || tf.equals("")) return null;
          if(tf.length() > BooleanTupleConstants.MAX_LEVELS_OF_NESTING){
               throw new NestingTooDeepException("Can't have more than "+
                       BooleanTupleConstants.MAX_LEVELS_OF_NESTING + " levels of nesting");    
          }
          int len = tf.length();
          Vector v = new Vector(len);
          
          for(int i = 0; i < len; ++i) {
               char c = tf.charAt(i);
               Boolean b = getBoolean(c);
               v.addElement(b);
          }
          return new BooleanTuple(v);
     }
    
}

In the code, the BooleanTupleFactory receives a request to produce an array of BooleanTuples from a string of Ts, Fs, and wildcards when its getTuples(String tf) method is called. Early on, this method determines how many BooleanTuples it will have to create — the method countTuples(tf) takes care of this computation. A close look at this computation reveals a major bottleneck: if the string tf has no wildcard, clearly only one BooleanTuple will be created. If tf has one wildcard, two BooleanTuples will be required — replacing * with T in one case and F in the other.

In general, if there are n wildcards, then 2n BooleanTuples must be created. This explains why performance degrades so quickly as the number of wildcards grows: if the average number of wildcards in a set of Boolean strings is n, then the average number of BooleanTuples created by the BooleanTupleFactory for each Boolean string is 2n. Therefore, this tuple-creating algorithm, which I will refer to as the BooleanTuple Algorithm, is an exponential-time algorithm.

Moving further through the code, you will see that an array of strings (tfArr) and an array of BooleanTuples (btArr) are initialized. The code will first load up tfArr with strings having just Ts and Fs, after replacing all occurrences of * in the original string tf. Then it will transform each of these strings into a BooleanTuple and load it into btArr. The first for loop loads tfArr; the second for loop loads btArr.

As I’ve already mentioned, because of the exponential character of the BooleanTuple Algorithm, loading up the array tfArr in the first loop is a performance killer. However, you can find another performance problem in the call to obtainBoolChar(). The purpose of this method is to read the next unexamined character in tf and append the appropriate character in tfArr. If it finds a T or an F, it simply appends the T or F, respectively. If it finds a * however, it sometimes has to append a T and sometimes an F. The way it makes this decision involves another horrendous algorithm.

To see what’s involved, let’s briefly consider the notion of a truth table. A truth table with k symbols is a table that lists all possible values of Ts and Fs for k propositional symbols p1, p2, …, pk. Below are truth tables for the cases k = 1 and k = 2. (For these simple cases, I’ll use symbols p and q rather than p1 and p2):

Table 1. Simple truth tables
p
T
F
p q
T T
F T
T F
F F

Notice that each table displays all the possible ways Ts and Fs could be assigned to p (in the first table) and to p and q (in the second table).

When the getTuples() method discovers that there are k wildcards in the Boolean string tf, it views these wildcards as determining a truth table with k symbols, because Ts and Fs need to be assigned to each of the k occurrences of *, just as in a truth table.

When the obtainChar() method encounters a *, it makes a call to the truthTable() method in order to locate which value (T or F) to read next. It does this by recursion, which in this case means that it first builds the truth table for one symbol, then for two, and so forth, all the way up to k symbols; it then reads off the appropriate value from the table.

The truthTable() method has to build up all these truth tables from scratch each time it is called. Since the number of rows in a truth table grows exponentially in relation to the starting number of symbols, this is another exponential-time algorithm. I will refer to this recursive truth-table-generating algorithm as the Truth Table Algorithm.

I will present my solutions to these algorithms in a descriptive, rather than technical, way because a full account of the details would take me beyond my space constraints. You will certainly be able to fill in any gaps by looking at the relevant source code (see Resources).

Fixing the Truth Table Algorithm

The problem with the Truth Table Algorithm is not that it builds so many truth tables, but that it does so every time the calling function requests the value of a cell in one of these tables. The sensible thing to do would be to create all the necessary truth tables in a session during initialization. This is the solution I have implemented in the new logic package.

In the updated framework, you now create a sequence of truth tables in the revised Util class, via the static method truthTable(int k), where the parameter k specifies the number of symbols of the largest truth table to be created. The method creates a three-dimensional array to represent the sequence of truth tables and stores it in a public static variable TRUTH_TABLES. When the BooleanTupleFactory method obtainBoolChar() makes a request to find out which truth value occupies a particular cell of a particular truth table, it reads off the value stored in TRUTH_TABLES.

This revision requires another small change to the framework: in your implementation of the loadRules() method, you must pass in an extra parameter, maxNumStars (which specifies the maximum number of wildcards occurring in any Boolean string), when you construct a new Rules instance. You can then use this value in calling truthTable(int k).

Fixing the BooleanTuple Algorithm

The main premise of the BooleanTuple Algorithm is that, before transforming Boolean strings into BooleanTuples and loading up the rule table, you transform all wildcards into actual Ts and Fs, creating potentially huge numbers of BooleanTuples. An alternative would be to use the raw Boolean strings — which may contain wildcards — as the Hashtable keys, in place of BooleanTuples.

Using raw Boolean strings as keys, however, has a basic flaw. To see the difficulty, consider a simple example: you have just two rules with three Conditions, involving two Actions, action1 and action2. Suppose the two Boolean strings involved in these rules are FF* and TFF, and that, using the addRule() method, you match FF* with action1 and TFF with action2 in the ruleTable. Eventually, the application will perform an evaluation of the Conditions involved, and a call to the Invoker subclass will pass this sequence of Conditions to the IfThenElse class for evaluation. Each Condition will evaluate to true or false. For the sake of discussion, let’s say that the corresponding Boolean string turns out to be FFT. When the lookUp() method attempts to use the string FFT as a key to look up the appropriate Action in the ruleTable, it will fail to find this key since this exact string is not one of the ruleTable keys. Of course the problem is that the hash of FFT probably won’t look anything like the hash of FF*; even if it did, the equals() method defined for the string class has no way of knowing that the string FFT is equivalent to the string FF*.

Despite that obvious difficulty, using the raw Boolean strings as keys is appealing because there are typically so few of them to worry about. However, by introducing an auxiliary array, I can sidestep this difficulty. I will use raw Boolean strings as keys in my rules Hashtable; I will also have on hand an auxiliary array that stores these raw Boolean strings. When I load rules, I will match a Boolean string — possibly containing wildcards — with an Action in the rules table as usual. I will also add this Boolean string to my array. When it comes time to look up a Boolean string (which consists only of Ts and Fs), I first compare the lookup string with each element in my auxiliary array. However, the comparison I use will not be the usual equals() method. Instead, I will define a new equals(String s, String t) method that returns false if the two strings have unequal size, or if, for some integer i less than s.length(), the ith character in one of the strings is a T, and the ith character in the other string is an F. This equals() method will, for example, declare that FF* and FFT are equal.

Using this equals() method, I will search my auxiliary array for a match with my test string, and, unless a user was careless in loading rules, I will certainly find a match. The match that I find may contain wildcards, but it will definitely be one of the keys in my rules Hashtable. Thus, it’s guaranteed that I’ll obtain the appropriate Action to fire off.

The solution works, but at what price? The lookup time does increase somewhat. Let’s take a look at the efficiency of this new search algorithm. First of all, let’s say that there are m conditions under consideration and n rules that are loaded — in particular, n Boolean strings to consider. First, to compare two Boolean strings of length m using the new equals() method would require c*m comparisons in the worst case, where c is a small constant (around 3 or 4, depending on the implementation). To see this, note that the worst case occurs when this character-by-character comparison of two Boolean strings does not turn up a mismatch. For that case to occur, the procedure has to examine every character in both strings. In particular, the procedure must check that the first string does not have a T at position i when the other string has an F there (two comparisons) and vice versa (two more comparisons). For the sake of discussion, let’s say that c = 4. Of course, if the worst case involves 4*m comparisons, the average case will require only half as many — namely, 2*m comparisons.

Next, as I compare a test string against the strings in the auxiliary array, I can expect that on average, the procedure will have to search 50 percent of the elements before finding a match. This means that it will compare against n/2 strings. Until a match is found, I can expect average case performance of the comparison procedure provided by the equals() method (as described above). Thus, the total number of comparisons on average is 2 * m * (n/2) = m*n. In practice, n doesn’t tend to be much larger than m, so the average lookup time is on the order of m2, which isn’t too bad for an unsorted collection. In practice, I can load thousands of rules and thousands of conditions using this new algorithm and still maintain acceptable lookup times.

To see the implementation of the new algorithm, look at the new framework class Algorithm_manyStars, and examine the addRule() and lookUp() methods. You will notice that to initialize the auxiliary array, this class needs to know how many rules there are. For this reason, you must now pass this data into the new Rules class when you create a new Rules instance. Also, the new equals() method, which performs a comparison of strings that takes into account the wildcard symbol, is located in the updated Util class.

It appears that this new algorithm makes the old one obsolete. For the most part, this is true. However, as I will explain in the next section, there are special cases in which the old algorithm works better; there is certainly little difference between the two in terms of performance when there are no wildcards. To handle two algorithms, either of which may be dynamically invoked, I have created an abstract class, Algorithm, and two subclasses, Algorithm_manyStars and Algorithm_fewStars. When the Rules class needs to select an algorithm to use, the framework uses certain criteria to decide one or the other, and then turns control over to the selected class. I will say more about this new design in the next section.

Since I have decided not to abandon the original rule-handling algorithm, now encapsulated in Algorithm_fewStars, any extra performance boost to this section of code will still be worthwhile. My main improvement, as described earlier, is the new version of the Truth Table Algorithm. I have also sharpened the performance of loading and looking up rules by replacing the use of a Java Hashtable with an array of Actions. I can use an array here as a super-fast Hashtable because of the extremely nice hashing algorithm that I’ve implemented in the BooleanTuple class. I can guarantee that BooleanTuples, which represent distinct Boolean sequences, always have distinct hash values. I use those values as lookup indexes for the new ruleTable array. Now, when the framework adds a rule (in response to a call to the addRule() method), it obtains the BooleanTuple‘s hash value and places the corresponding Action instance in the ruleTable array using the hash value as an index.

My next section, which covers consistency checking, will make it clear that, at least in certain cases, the old algorithm for rule handling outperforms the new one. I will develop precise criteria for deciding which one you should use in any given situation.

Consistency checking

It is vital to verify that no pairs of inconsistent rules are creeping into your framework implementations. Luckily, implementing a reliable consistency checker is easy to do. The basic idea is that, with each call to addRule(boolStr, action) in Rules, you should look to see if the key boolStr has already been used, and, if so, whether in that usage its matching Action is the same as action. If the key has already been used but has been matched with a different Action, you need to throw an InconsistentRulesException.

Since I have two algorithms for loading rules, I will need to implement this consistency-checking strategy for both of them. In fact, since this discussion compares these algorithms thoroughly, I will take a moment to describe how I have integrated them into the new framework design. Figures 2 and 3 below show their roles and relationships.

Figure 2. Class diagram for the new algorithm classes
Figure 3. Interactions with the new algorithm classes

As you can see in the new design, the Rules class now aggregates a single instance of Algorithm, which it obtains from a call to the AlgorithmAndConsisCheckSelector that selects the correct algorithm. All the criteria that I use for selecting an algorithm live in AlgorithmAndConsisCheckSelector. Algorithm is an abstract class with two primary abstract methods: addRule(String boolStr, Action action) and lookUp(Vector results). Its two subclasses, Algorithm_manyStars and Algorithm_fewStars, implement these methods using the different algorithms described in the last section. The manyStars algorithm (which works well with many wildcards) is the new, speedier algorithm I described, while the fewStars algorithm (used only when the number of wildcards is small) is the old algorithm.

When Rules receives a request to addRule() or to lookUp(), it delegates the request to its Algorithm instance, which then invokes its own algorithm to handle the request. (This design is a version of the Strategy pattern described in the well-known Design Patterns — see Resources.)

In implementing consistency checking for the old algorithm, I will invoke consistency checking inside the BooleanTupleFactory at the point where the Boolean string that has been passed in has been converted to an array of Boolean strings with no wildcards — in other words, just as they are being converted to BooleanTuples. Each time the factory produces another BooleanTuple to add to the BooleanTuple array, I invoke its hashCode() method, check to see if that code is matched with a nonnull Action in the rule table, and, if so, whether the passed in Action is equal to the Action at this hash index. This procedure is extremely efficient. Below is my implementation of this consistency checker:

private static void checkConsistency(Action action, int hash, BooleanTuple booltup,Action[] ruleTable) {
   Action currentAction = ruleTable[hash];
   if(currentAction != null && !currentAction.equals(action)) {
       throw new InconsistentRulesException();
   } 
}

The loadTuples() method in the BooleanTupleFactory that calls this method hashes the new BooleanTuple and passes the hash value in, along with the Action, BooleanTuple, and ruleTable.

For the new algorithm, I will also perform consistency checking as each rule is added. To compare each new Boolean string with the ones added so far, I will need to use the slower lookup procedure that this algorithm uses. Here is the implementation:

for(int i = 0; i < nextAvailableIndex; ++i) {
    if(Util.equals(boolStrings[i], boolStr) && 
       !(action.equals((Action)ruleTable.get(boolStrings[i])))) {
              throw new InconsistentRulesException();
    }
}

To evaluate this procedure’s efficiency, let’s say that there are m Conditions and n Rules. Suppose you are in the process of adding the kth rule, so that your auxiliary array has k-1 strings so far. It will take on average 4*m*(k-1)/2 = 2*m*(k-1) comparisons to verify that the new Boolean string has not been used so far (the typical situation). Summing over the n rules yields on the order of m*n*n comparisons. At best, when n is close to m, this is a cubic algorithm; at worst, the computing time is proportional to the square of a large number of rules.

In the presence of many rules — over 1,000, say — the algorithm just described has to perform around 10 million comparisons in order to load rules with consistency checking. (By contrast, if you load this many rules without wildcards using the old algorithm, you will find little, if any, performance degradation due to consistency checking.)

Using the test environment I described earlier, I discovered some valuable performance details in comparing the algorithms in loading large numbers of rules. Though exact execution times will vary, a performance comparison between the algorithms is reasonably accurate across platforms and processors. I have summarized my test results in the following table:

Table 2. Best performing algorithms with consistency checking
Average number of wildcards
Number of rules 0 1 2 3 4
500-1,000 old new new new new
1,000-2,000 old old old new new
2,000-4,000 old old old new new
4,000-8,000 old old old
8,000-16,000 old old

The entries in the main cells of Table 2 indicate which of the algorithms performs the best in varying situations. The blank cells indicate that neither algorithm performs acceptably under those conditions.

In my tests, all my Boolean strings had exactly the same number of wildcards. Therefore, what I call the “average number of wildcards” in Table 2 always turned out to be the actual number of wildcards in each string. In practice, however, the number of wildcards varies from string to string. When there are many rules, it would be unrealistic to expect a framework user to compute the average number of wildcards in order to use the framework. Luckily, it is possible to estimate this average number using the number of rules and conditions — see inline comments in the computeAvgNumStars() in the new Rules class for details. Because I need to know the average number of wildcards, the new Rules constructor requires that the number of conditions be passed in. (This, combined with requirements mentioned earlier, means that you can construct a new Rules instance only if you pass in the number of conditions, rules, and maximum wildcards occurring.)

Those test results led me to allow a framework user the option of turning consistency checking on or off (by default, it is off), and also the option of selecting one of the algorithms, rather than letting the framework use its criteria to make the choice (when the user does not select an algorithm, the framework makes a selection based on these test results; the logic is encapsulated in the class AlgorithmAndConsisCheckSelector).

When the consistency-checking option is selected, and the framework cannot find a suitable algorithm (for example, if there are more than 16,000 rules), or if an algorithm has also been selected but is unsuitable, the framework throws a PoorPerformanceException. You specify these options in the new framework by using new versions of the Rules constructor. The signatures of the two main versions of this constructor are as follows:

public Rules(int numConditions, int numRules, int maxNumStars);
public Rules(int numConditions, int numRules, int maxNumStars, Algorithm alg, Boolean 
  consisCheck);

If you don’t wish to use any of the options, you can use the first or second constructor — in the latter case, you would pass in null in the last two arguments. If you wish to use just one of the options, you would use the second constructor and pass in null for the argument you don’t want.

Summary of results

The following table summarizes the results of my performance/maintenance analysis and enhancement session:

Table 3. Summary of solutions to performance/maintenance challenges
Problem Solution
No enforcement of proper order of evaluation The new framework provides, as an option, condition and rule sequencers that do enforce an order of evaluation. This order is specified with a user-defined interface of integer constants, implemented by the user-defined subclass of Invoker.
Performance degradation as the number of conditions and rules increases Analysis and testing showed that the bottleneck was located in the addRules() method. The new framework provides a new default algorithm for adding rules and doing lookups. Since the old algorithm is still occasionally preferable, an enhanced version of that algorithm has been retained; the framework now dynamically chooses which algorithm to use based on criteria gathered from testing.
Lack of consistency checking as rules are added The new framework makes consistency checking available. Since this feature can degrade performance, it is provided as a selectable option rather than as the default. The new consistency checking feature does not degrade performance when the number of rules does not exceed 1,000.

The changes that I have introduced in this article involve very few modifications to the way a user interfaces with the framework. However, I do need to note the few changes made in that area. I have summarized them all in the table below:

Table 4. New idioms for interfacing with the modified framework.
Framework function How user interfaces with new framework
Use of the Invoker class When you create an instance of Invoker, before calling the execute() method you must now explicitly call its init() method. This makes it possible to initialize other classes, such as a rule loader, before calling the loadRules() method (which is called by init()).
New Exceptions When you create and initialize Invoker and ask it to execute(), you must handle the following new Exceptions: InconsistentRulesException, and PoorPerformanceException. (Note that the NestingTooDeepException has been replaced by the PoorPerformanceException.)
Use of the Rules class To create a new Rules instance (which is typically done when you implement the loadRules() method), you must use one of the new constructors. The three-parameter constructor requires that you pass in the number of conditions, number of rules, and maximum number of wildcards. The five-parameter constructor requires that you pass in the same first three parameters along with a selection of an algorithm (an instance of Algorithm) and a choice (a Boolean) about consistency checking — passing in Boolean.TRUE means that you are requesting consistency checking.
Optional use of order of evaluation, interface, and sequencers If you want to enforce adherence to an order of evaluation, you can create an interface of integer constants whose names correspond to Condition instances, and then use sequencers to load rules and load conditions. This approach is optional, though; it is still possible to use the framework in the original way, using only code comments as a reminder about the order of evaluation.
Optional use of consistency checking By default, consistency checking is not performed. You can request the framework to activate consistency checking when you create a new Rules instance, by passing in a Boolean.TRUE as the fifth parameter in the five-parameter Rules constructor.
Optional selection of an algorithm If you do not select an algorithm, the framework will select one by considering the number of rules, the average number of wildcards used in Boolean strings, whether consistency checking has been requested, and by consulting, if necessary, an internal table that indicates which algorithm performs better in a variety of scenarios. If you do select an algorithm, the framework will attempt to use it. To create an instance of an algorithm, you can use the following code: Algorithm alg = Algorithm)Util.getInstance(Algorithm_<choice>.NAME>); where <choice> is either fewStars or manyStars.

Conclusion

Careful analysis and testing of the framework algorithms and implementation commitments has resulted in the creation of a new and better product. The revised framework performs well even in the presence of thousands of conditions and rules. It also provides new user options and safety mechanisms — such as consistency checking and automatic enforcement of order of evaluation — that enable a framework user to reduce the risks involved in using the framework for large-scale projects. For logic-intensive projects in which you must implement complex branching logic, I would recommend the if-then-else framework as the right tool for the job.

Paul Corazza is a software professional who
has applied his PhD training in mathematical logic to a broad range
of pure and applied fields in the past twelve years. In the last
four years, he has concentrated on the development of business
rules frameworks, joining design patterns from the object-oriented
world with tools for analysis of syntax from the world of logic.
Paul is president and lead consultant of Corazza Software
Solutions, a Java consulting firm that provides software
engineering and training services.

Source: www.infoworld.com