ISO standards and their stories
The Untold History of Algorithm A
I recently faced the problem of trying to understand why different people applying the same algorithm to the same data obtained different results, even though everyone was apparently applying the algorithm correctly.
The algorithm in question, known as Algorithm A, is one of the most widely used statistically robust algorithms in the area of quality assurance. It was first published under the name Algorithm A in the ISO 5725-5 Standard in 1998. Anyone involved in the evaluation of quantitative proficiency tests has been familiar with it since it was taken over in the ISO 13528 in 2005.
Indeed, it is in the context of PT evaluation that I was confronted with inconsistencies between independent applications of Algorithm A. Unable to find an explanation for the divergent results, I tried a more systematic approach and wrote a program enabling me to play around with different termination conditions for the algorithm’s iterations. What I found is that the convergence behavior can be problematic. An example is provided in the appendix.
Now, I do not want to claim originality for this finding.
Indeed, in a letter addressed to Dr. Peter Lischer in June 1988, Prof. Brian D. Ripley, mathematics professor at Glasgow’s University of Strathclyde, wrote the following:
“The convergence can be a problem, particularly with triplicate replications for which I found 50 or more iterations were necessary together with a very strict stopping condition.”
Again, in October 1988, Ripley to Lischer:
“… the iterative procedure converged very slowly, and I was terminating when three significant figures were unchanged.”
“… in that one case over 100 iterations are needed and I now iterate until six significant figures are unchanged.”
Allow me briefly to provide some context for Ripley’s comments.
In May 1987, Dr. Peter Lischer, a Swiss mathematician and specialist in robust algorithms employed at the Eidgenössische Forschungsanstalt für Agrikulturchemie und Umwelthygiene, gave a presentation in Geneva in the context of a workshop on Harmonization. In his presentation, Lischer made the following proposal:
“Instead of the hitherto usual outlier tests, robust statistical methods should be used to calculate repeatability r and reproducibility R and the means.” 
It is worthwhile quoting the reasons Lischer evoked:
“For the classical calculation formulae in the analysis of variances, the data to be evaluated should belong to a normal distribution. However, every analytical chemist knows that deviant or suspect results occur much more frequently. There are many reasons for this; it is enough if just one parameter of the analytical process is not completely under control. Since very few suspect values deviate by orders of magnitude, it is often difficult to decide whether the suspect result should be regarded as valid.
In evaluating collaborative studies extreme results or all results from a suspect laboratory often are eliminated before the analysis of variances is conducted, in order to avoid obtaining excessively high values for r and R and, hence, an excessively bad evaluation of the precision of the method. But any such elimination inevitably entails the risk of evaluating the precision of the method too well.” 
Though written almost thirty years ago, this description of the dilemmas attending outlier elimination should sound familiar to anyone involved in method validation today.
In his presentation, Lischer then went on to present a robust algorithm based on Huber’s Proposal 2. Proposal 2 had first been published in P. J. Huber's seminal paper "Robust estimation of a location parameter" . In his PhD Thesis “Robust Regression” (Ecole Polytechnique Fédérale de Lausanne, 1973), Lischer had shown the consistency and asymptotic normality of Huber’s Proposal 2 Algorithm for regression parameters. In the following, I will refer to Lischer’s robust algorithm as “Proposal 2”.
The time was apparently ripe for Lischer’s call for the application of robust algorithms. Dr. Roger Wood of the Royal Society of Chemistry’s Analytical Methods Committee (AMC) was present at the 1987 Geneva workshop and took a copy of the abstract of Lischer’s talk (actually an English summary of his German article “Robuste Ringversuchsauswertung” published in the journal Lebensmittel-Technologie) back with him to the UK. A few months later, Dr. Michael Thompson, chairman of the AMC, wrote a letter to Lischer telling him that he shared his reservations about outlier tests and that the AMC was considering the use of robust statistics in within-laboratory data quality control. And we finally return to Ripley’s June 1988 letter to Lischer, from which I have already quoted, and which begins as follows:
“The Statistical Sub-Committee of the Royal Society of Chemistry’s Analytical Methods Committee brought to my attention your paper in Lebensmittel-Technologie and also a typescript with E. Walter ‘New proposal for calculating precision data from collaborative studies.’ Our Sub-Committee is currently investigating robust methods and so was interested to see your work. It does however appear to contain a number of errors.”
Ripley proceeds to enumerate these errors, mainly minor technicalities, easily remedied. The last error listed in Ripley’s letter, however, concerned the convergence of the iterative procedure. I have already quoted one of the relevant passages from Ripley’s first letter (June 1988), as well as two related passages from Ripley’s later correspondence with Lischer. The point is that Ripley was aware of and concerned about the convergence problems of Proposal 2.
Peter Lischer took Ripley’s criticisms regarding Proposal 2 very seriously. He managed to get Prof. Frank Hampel (ETH Zürich) interested, and one of Hampel’s students, Andreas Reichenbach, was assigned the task of evaluating the performance of Proposal 2 for his Diploma Thesis. The upshot was that Proposal 2 was substantially modified, and it is this modified version which was eventually published in Chapter 60 of the Swiss Food Manual in 1990. In the following, I will refer to the modified version of Proposal 2 as the Swiss Food Manual Algorithm. In a March 1990 letter to Ripley, Lischer described the improvements to Proposal 2 in the following terms:
“You will see that we have modified the algorithm. Simulations of a student of Prof. Hampel have confirmed your conjecture that the convergence of the algorithm of Huber ‘proposal 2’ is bad. We are computing now the location and scale separately so the convergence is better and the breakdown point higher (35 % instead of 25 %).”
The point is that in Proposal 2, the location and scale parameters (i.e. the mean and the standard deviation) are both updated in each iteration. The main idea behind the modifications which Hampel, Reichenbach and Lischer made to Proposal 2 was to apply the iteration process to compute first the mean and then to carry out a second iteration process to compute the standard deviation.
And that should have been the end of Proposal 2.
Three years later, however, Lischer received a letter (October 1993) from Prof. Peter-Theodor Wilrich, professor at Berlin’s Freie Universität. At the time, Wilrich was the chairman of the ISO/TC 69/SC 6, i.e., the scientific committee responsible for the preparation of the new ISO 5725 Standard. A first draft of ISO 5725-5, the part in which robust statistical methods would be introduced, had been prepared by Dr. Roger Sym, then convener of Working Group 1 of the above-mentioned scientific committee.
In his letter to Lischer, Wilrich wrote that he was concerned about the fact that the robust algorithm described in Sym’s draft, and thus about to be canonized in the new ISO 5725 Standard, was the one “prevalent in England.” Wilrich preferred Lischer’s robust algorithm (i.e. the Swiss Food Manual Algorithm), but had been unable to convince Sym to adopt the latter. Given Lischer’s expertise in the field of robust statistics, Wilrich was hoping Lischer would be willing to provide him with “adequate ammunition” in order to persuade Sym.
The robust algorithm “prevalent in England” was none other than Proposal 2. In fact, four years previously, in December 1989, while Hampel, Lischer and Reichenbach were busy investigating and modifying Proposal 2, the AMC had published an article called “Robust Statistics—How Not to Reject Outliers.” The authors’ list includes: Dr. M. Thompson, Prof. B. D. Ripley and Dr. R. Wood. On page 1695, they describe Proposal 2 and even name it as such: “This combined estimator is known as H15 or ‘Huber proposal 2.’” 
At the time, Lischer had not yet communicated to Ripley the results from the Zürich investigation of and modification to Proposal 2. This explains why the Swiss Food Manual Algorithm is not referenced in the AMC paper. Given Ripley’s concerns about Proposal 2, as documented in his correspondence with Lischer, it is quite surprising that three years later, when Wilrich sought Lischer’s assistance to convince Sym, Proposal 2 was still the robust algorithm “prevalent in England” and about to be adopted in the new ISO 5725 Standard.
Complying with Wilrich’s request for “adequate ammunition”, Lischer wrote a 17 page article explaining the advantages of the Swiss Food Manual Algorithm—to no avail. In 1998, Part 5 of the new ISO 5725 Standard was published. In Section 6 “Robust methods for data analysis,” two robust algorithms are described, the first of which, introduced as “Algorithm A”, is Proposal 2. In Annex B of ISO 5725-5, the AMC paper  is referenced as the source for Algorithm A. And the rest, as they say, is history.
I would like to thank Dr. Peter Lischer for kindly providing copies of the documents referenced above.
Consider the following data:
These 28 measurement values can be divided into two groups. The first one represents 75 % of the measurements, with a mean of 10.11. The second consists of the remaining 25 % of the values, with a mean of 12.54. A robust algorithm with breakdown point greater than 25 % should disregard the second group.
Algorithm A yields the following results:
As can be seen, the values for the mean x* and the standard deviation s* depend on the termination condition for the iterations. Note that the value 10.60 for x* lies closer to the overall mean of 10.72 than to the mean of the first group of measurements of 10.11. This does not meet our expectations for a robust algorithm. More worrisome is the fact that, taking the values x*=10.60 and s*=0.97, the upper tolerance limit is
x*+ 2s*=12.54. This means that, of the 7 outliers, 4 would lie within the tolerance limits. This would not happen if we took the values corresponding to the weaker termination condition. In other words, whether or not a measurement value lies within the tolerance limits depends on the implementation of the algorithm. And stricter termination conditions yield results which do not meet our expectations for a robust algorithm.
P. Lischer and E. Walter, "New Proposal for calculating precision data from collaborative studies," 1987.
P. J. Huber, Robust estimation of a location parameter, Ann. Math. Statist., Vol. 35 , 1964.
Analytical Methods Committee, Royal Society of Chemistry, "Robust Statistics--How Not to Reject Outliers. Part 1. Basic Concepts," Analyst, Vol. 114, pp. 1693-1697, December 1989.