Keeping outliers out of the loop?

This post is about dealing with outliers in the data when scripting a loop or writing a While ( ) function in FileMaker. The post will show examples in scripting a loop, but the same reasoning applies to While ( ) functions.

As before, this post is about optimizing in situations of long running scripts that usually deal with big found sets of records and that need to be processed as fast as possible. That is, it is about situations where waiting times are ( too ) long and optimization can make a significant difference.

When I started writing this post, I thought keeping outliers out of the loop is a relatively simple matter. As becomes clear below, that is indeed the case with examples 1 and 2. They could be considered Low Hanging Fruit. With example 3 and more complicated cases involving multiple groups of outliers, it turns out to be a different matter. Much relies on the possibilities of storing ( and indexing ) the conditions that define those different groups.

 

Outliers

With outliers, I mean exceptions in the data or in the treatment of data. However, the word ‘exception’ in programming refers to exceptional situations that could include unexpected data, but also errors that occur during execution. This post is not about such exceptions but about certain expected data or situations that nevertheless need different treatment than the rest. Here are some typical examples:

 

Example 1 : first or last record

Typically in databases, the first or the last record may need a different treatment than the rest. For example because the treatment involves retrieving data from the previous record, or the next respectively.

Schematically, the script would contain code like this:

Please note that although the examples in this post look like FileMaker ( hereafter FM ) script steps, they are merely pseudo code.


Go to Record/Request/Page [ First ]

Loop

  If [ Get ( ActiveRecordNumber ) = 1 ]

    < Treatment of first record >

  Else

    < Treatment of other records >

  End If
  
  Go to Record/Request/Page [ Next; Exit after last ]
  
End Loop





In all but one of the records, the test for the active record number resolves to False, and only in one case to True. That is unnecessary overhead for all those other records.

 

Example 2 : certain records require different treatment

Another typical example is where outliers are defined by a particular test on their data. There may be more than one such outliers, but their number is limited.

Let us say that in some system for package deliveries in The Netherlands, all clients need a certain treatment, except those who live in Amsterdam.

Schematically the minimal script would look like the following.


Go to Record/Request/Page [ First ]

Loop

  If [ Condition A ]

    < Treatment of Condition A compliant records >

  Else

    < Treatment of other records >

  End If
  
  Go to Record/Request/Page [ Next; Exit after last ]
  
End Loop





Instead of just one record there are multiple records that require separate treatment, but they still are a minority. For all the other records, the If-Else block is a waste of time.

 

Example 3 : multiple groups of records each require their respective treatment

Let us say that not only people living in Amsterdam ( Condition A ) need different treatment, but also those in Rotterdam ( Condition B ). Of the remaining clients, those having more than 100 orders ( Condition C ) also need their own treatment.

Schematically the minimal script would look like the following.


Go to Record/Request/Page [ First ]

Loop

  If [ Condition A ]

    < Treatment of Condition A compliant records >

  Else If [ Condition B ]

    < Treatment of Condition B compliant records >

  Else If [ Condition C ]

    < Treatment of Condition C compliant records >

  Else

    < Treatment of other records >

  End If
  
  Go to Record/Request/Page [ Next; Exit after last ]
  
End Loop





Assuming that 100 orders per client is uncommon, the If-Else if-Else if-Else block is a waste for most of the records, the first two tests ( for Condition A and B ) are a waste for Condition C compliant records, and the first test is a waste for Condition B compliant records.

 

Solution – take them out of the loop

Is the solution worth the effort?

The solution, as indicated by this post’s title, is to take the outliers out of the loop and run the loop only for the ‘other records’. Before going into the ‘how’ of that, it should be noted that, more often than not, doing the (re)programming work is NOT worth the effort.

The If, Else If, and Else script lines per se do not cost a lot of time. Neither does skipping the treatment blocks in between them. To the best of my knowledge, it is only worth the effort if all micro seconds count heavily, and if one is willing to pay the price of (re-)programming and other potential prices [ See below ].

The crux for time saving lies in the number and the nature of the Tests. To get to the treatment of the other records, FM needs to execute the calculations for the Tests preceding that block. If their number is small and the calculations simple, this will take little time. However, if their number is big and the calculations sufficiently complex, then those may indeed cause a tremendous drag.

Before re-programming it may be worth the effort to first see if those calculations can be made to execute faster and if they or their results can be stored in fields. See the other posts in the series on Low Hanging Fruit.

If that can not be done or does not provide sufficient improvement, one can consider moving on and taking the outliers out of the loop. The following sections will deal with the three examples respectively. The solutions for keeping the exceptions out of the loop differ for the different examples and are increasingly more complex

 

Example 1

In example 1, the solution is very simple. It would schematically look like this:


Go to Record/Request/Page [ First ]

< Treatment of first record >

Go to Record/Request/Page [ Next ]

Loop

  < Treatment of other record >
  
  Go to Record/Request/Page [ Next; Exit after last ]
  
End Loop





Interestingly, the code is actually simpler because the entire If-Else If-Else structure disappeared, and even the test disappeared.

 

Example 2

In the case of Example 2, one takes the outliers out by Finding all records that comply to Condition A, looping through these records and give them the treatment for Condition A compliant records. Next, one switches to the omitted records and constrains them with Condition B, and then loops through these records. The code would schematically look like this:


Perform Find [ Condition A ]

If [ Get ( FoundCount ) ]

  Go to Record/Request/Page [ First ]

  Loop

    < Treatment of Condition A compliant record >

    Go to Record/Request/Page [ Next; Exit after last ]

  Loop
  
End If


Show Omitted Only


Constrain Found set [ Condition B ]

If [ Get ( FoundCount ) ]

  Go to Record/Request/Page [ First ]

  Loop

    < Treatment of Condition B compliant record >

    Go to Record/Request/Page [ Next; Exit after last ]

  Loop
  
End If





To search for records that comply to A, one could add a (un)stored calculation field ( let us call this field condition_A ) with Condition A pasted into it, then perform a search on condition_A = True.

Note that performing a Find in general is much faster than looping through records and checking conditions on every single one. However, there are many things to consider to optimize the calculation of condition_A. Preferably, one wants to arrive at a stored and indexed calculation field. ( For Condition B that is not the case ) Unfortunately, this is not always possible, or may require significant additional re-design of the database structure and/or (re-)writing of scripts and calculations.

 

Example 3

The solution for Example 3 and similar cases with even more tests is more complicated. Perhaps even to the point that it does not work faster.

The important thing to realize in Examples 2 and 3 is that Condition B and subsequent tests are performed subsequently. Which means that in terms of Boolean algebra, Condition B is not simply Condition B, but ‘Condition B and not Condition A’. In Example 2, this is achieved by the steps of Show Omitted Only followed by a Constrain Found Set.

In Example 3, this trick can not be repeated because Show Omitted Only will re-introduce records that comply to Condition A.

I see three basic solutions:

Example 3, solution 1

One is to somehow incorporate the results or tests of previous searches into the next. For Condition B the code can remain as it is. For Condition C, this would schematically look like this ( One would have to add logic to deal with the possibility that the Find for Condition A has no result. )


< Finding and treating Condition B compliant records >

Perform Find [ Condition A ]

Constrain Found Set [ Condition B ]

Constrain Found Set [ Condition C ]

< Looping through Condition C compliant records >





And so on for Condition D and subsequent tests.

Obviously, the more groups of outliers, the more often tests have to be repeated. This may at some point end up being more time costly than leaving the treatment of the different groups in the original loop! Where ‘some point’ is will depend on the actual FM solution and the only definitive answer would be reached through testing.

An alternative would be to include condition_A in the calculation for condition_B, and condition_B in condition_C, and so on. If the fields are unstored calculation fields, this would also result in escalating repetitions of tests. If however all these fields can be stored and indexed calculations, then that could be a viable and fast alternative. After all, and as mentioned before, finds based on indexed fields are substantially much faster than looping through individual records and checking those fields.

Example 3, solution 2

The other basic solution is to work with multiple windows. One window which keeps track of the progressive constraining of the found set, and one window to find the records complying to the next condition and to process them. This window will be closed immediately after.

The script for Example 3 would schematically look like this:


Set Window Title [ Of Window: "progressive constrain" ]

Perform Find [ Condition A ]

< Looping through Condition A compliant records >

Show Omitted Only



New Window [ Name: Condition B ]

Constrain Found Set [ Condition B ]

< Looping through Condition B compliant records >

Close Window

Select Window [ Name: "progressive constrain" ]

Constrain Found Set [ Specified Find Requests: Omit Records; Condition B ]



New Window [ Name: Condition C ]

Constain Found Set [ Condition C ]

< Looping through Condition C compliant records >

Close Window

Select Window [ Name: "progressive constrain"]

Constrain Found Set [ Specified Find Requests: Omit Records; Condition C ]



With this approach, each test ( with the exception of Condition A ) will need to be executed twice, once to find records complying to a particular condition, and once to omit them. This is still double the amount of work, but with additional outlier groups of records, the number of tests will not escalate beyond this factor two. Of course there is the additional work of opening and closing windows, but this too won’t escalate.

As before, the question is at which point – i.e. at how many and what kind of tests – this solution is faster than that of the original single loop of Example 3.

Example 3, solution 3

The third basic solution is a variant to the previous one. The idea is to mark the processed records through a boolean field, as part of their treatment. This uses only one window, and every search needs to be done only once. This means that each test except the first, needs to be expanded to : table::mark_field and Condition X. In multi-user situations where there is a risk that multiple users run the script at the same time, this needs solving. It can be done, but it will come at a price: either marking records becomes more time consuming, or users can not run the script at the same time.

 

A Traps

I see two mayor traps with solutions discussed in this post.

One trap has to do with the fact that, compared with the original examples, the current found set is different, that is, smaller for each of the groups of outliers. This will be a problem to address – and perhaps a deal breaker – when the processing of the records somehow depends on the found set and/or the order in which records occur. For example when summary calculations on the found set are used or the GetNthRecord ( ) calculation.

The other trap has to do with the structure of the If-Else if-…-Else block. In the examples they are straightforward, and even then introduced the problem of the sequence logic of the tests and how that needs to be reproduced when taking each of the groups of outliers their own loop. When there are multiple If-Else if-…-Else blocks in a row, or when If-Else if-…-Else blocks are nested, reproducing their logic can become rather complicated.

 

B Limitations

Other than the limitations following from the traps discussed above and the price below, I see no other.

 

C Price

A mayor price to pay is that programming or re-programming an existing script costs time and a different way of thinking about algorithms.

Another price is that the resulting code may become difficult to maintain, depending on how big the differences are between the processing of the respective outlier groups. Also, usually, a loop not only contains an If-Else if-…-Else block but also other code that need to be performed irrespective of which outlier group a record is in. Both the size of the differences and the code outside the If-Else if-…-Else block can result in the need to repeat large batches of code in different places. Obviously, this repetition can be prevented by moving the code to sub-scripts, but that would introduce the overhead of calling those and possibly moving data to and from the sub-scripts.

Last but not least, a price one may have to pay ( i.e. it is a risk, not necessarily a price ) is to do all the refactoring work and testing, only to find out that it did not result in significant improvement.

 

Conclusions

In the post, I have discussed how to improve performance of dealing with different groups of ‘outliers’ in scripts iterating through large found sets. Alternatives to the common solution of an If-Else if-…-Else block within a loop iterating through a found set were outlined and discussed. For the simple cases of a different treatment for the first or last record, or of only one group of outlier records, the solution is simple and straightforward. For cases with multiple outlier groups, ‘taking them out of the loop’ and giving each their own loop, the solution is more complicated.

With all solutions certain traps were identified that have to do with differences in found sets ( between the original example, and the alternative solution ) and the logic of the code within the loop and the structure of the If-Else if-…-Else block(s).

The actual performance improvement will depend on both the data and the code, and comparative testing may be needed to determine if there is an improvement and how big it is.

For more complicated cases with more than one outlier groups of records, much will depend on the possibility of storing the condition results – or at least parts thereof – in stored and indexed fields.

Frank van der Most, 15 April 2026

 

 

 

 

 


Leave a comment