# I do hate sums

**Posted:**February 1, 2011

**Filed under:**Data Analysis Leave a comment

I do hate sums.

There is no greater mistake than to call arithmetic an exact science.

There are … hidden laws of number which it requires a mind like mine to perceive.

For instance, if you add a sum from the bottom up, and then again from the top down, the result is always different.”

— Mrs. La Touche, Letters of a Noble Woman

The abacus is perhaps the oldest computational device humans used. Its primary purpose is to aid in addition. Computational devices at our disposal evolved from the first abacus (circa 2400 BC) to warehouses of thousands of computers capable of performing complex computations, simulations and data analysis. The most important operation performed by any computational device, however, is still addition. It is hard to find an operation more central to computing than repeated addition or summation. Sums are everywhere. No mean, no variance, no model can be computed without summing up some data points. When approximating irrational numbers like π or integrals of functions, we use summations. Despite our technological advances, we haven’t created an error-free summation device. The reason for this deficiency is that computers are incapable of dealing with the continuity of real numbers. They live in a discrete world, while numbers live in a continuous one. I will describe the sources of error in a single addition, how these errors add-up with summation and how to compensate for these errors.

First, what does it mean to live in discrete world? Computers store numbers using a fixed number of bits. If we have 32 bits, 0’s and 1’s, then we have a 2^32 possible bit patterns. 2^32 is the maximum number of unique numbers we could represent with 32 bits or 4 bytes. Between 0 and 1, however, we have an infinite number of numbers. So we can’t possibly capture all the numbers using a fixed number of bits. Instead we approximate numbers. We use a few significant digits (or bits) and we scale these digits using exponents in a fixed range. So to approximate 123456 using 3 significant digits we represent it as 1.23 * 10^5. This system of representing numbers is called floating point. The arithmetic operations {+, -, /, *} that operate on floating points are always associated with an error value.

We owe the current standard for binary floating point numbers to three numerical analysts, Kahn, Coonan, and Stone, and Intel’s floating point manager back in 1970, Palmer . They put forward standard formats for representing floating point numbers in binary using 4-, 8- and 10-bytes while designing Intel’s 8086 microprocessor chip (circa 1976) — the parent chip of all current Intel processor chips. These levels of precision are called single, double and extended. (For more details on floating point formats — see Art of Assembly, Chapter 4). With only 4 bytes, we could represent numbers with exponents in the range of 10^±38, and a precision of 6.5 significant digits.

With limited precision, an addition could lead to a loss of significant digits. Figure 1 shows addition in a machine that has a precision of 6 significant digits.

Adding *a+b*, gives us *c, *with a round-off error *e*. With a single addition operation, it is perhaps unreasonable to expect a more accurate result than 1234.68 as that requires our machine to have a higher precision (more bits to represent numbers with). With repeated addition, however, errors add up if we are not careful: If we add *b* ten times to *a*, we get a result of 1235.78. Using the same machine and precision, if we change the order of our addition: add *a* to *b* added ten times, we get the more accurate result of 1235.79. As she writes in her letter in 1878, Mrs. La Touche was “not appreciated as an accountant” but she was a “mechanical thinker”, literally, for when she added up values in different orders she got different results.

How big could round-off error get in a summation? The error grows with the number of data values we are adding. If we have a thousand 4-byte floating point non-negative numbers that sum up to around 390 million, the error value is less than 20,000. If we have 260 million 4-byte floating point non-negative numbers (a gigabyte’s worth of data values) that sum up to around 390 million, the worst case error value is bounded by 6000 million!!! If we increase our available computation precision to 8-bytes instead of 4-bytes, the worst case error is less than 10. If we further increase our precision to 10 bytes, our error drops to 0.005. (For a discussion on how to analyze error, I recommend reading Chapter 4 of Accuracy and Stability of Numerical algorithms while drinking coffee.)

From the discussion above, we could reduce error by sorting non-negative values from smallest to largest and then adding them up. No efficiency-obssesed analyst would ever sort data values to simply add them. We could also increase the precision of our floating point arithmetic. Extended or 10-byte precision, however, hurts performance as it is slower than double or 8-byte precision arithmetic†. A third option is being clever about how we sum up values.

Kahan (co-creator of the binary floating point standard) came up with a way to compensate for errors at every addition step. The Kahan Summation algorithm (1965) performs the steps illustrated in Figure 1. It calculates the error value *e* and then adds it to the next value to be added to the running sum. Kahan’s summation has a constant error bound that doesn’t grow with the number of values. A clever trick helps a lot. The algorithm, of course, adds a few more steps to the basic summation process. Here is the pseudcode for the Kahan summation (work through an actual example to see how it works):

```
sum = 0
previous_error = 0
for each value in the list {
temp_sum = sum
new_value = current_value + previous_error
sum = temp_sum + new_value
previous_error = (temp_sum - sum) + new_value
}
```

When dealing with real data, we usually have negative as well as positive values. Negative values are trickier to deal with especially when adding a positive and a negative number with very close absolute values. If we are calculating the variance of data values in a 6-digit precision machine using **Var(X) = E[X²] – (E[X])²**, we could get a more inaccurate result than using

**Var(X) = E[X – μ]²**. The culprit behind the first method’s inaccuracy is known as ‘subtractive cancellation’ or ‘catastrophic cancellation’ for added drama. Take the following numbers: (100002, 100005, 100012, 100015). The true variance is 36.3333. The sum of squares calculated in a 6-digit precision machine is 400068*10^5 and the mean square of sums is also 400068*10^5. So the calculated variance is 0 — What a catastrophic cancellation? If we use the second method for calculating variance, we reduce error. The mean

**μ**is 100009. The sum of squared differences is 110 and the variance is 36.6667. The absolute error of this technique, 0.3333, is much less than an error of -36.3333 (0 – 36.3333). To get more accuracy, we used the less efficient

*two-pass*technique of calculating variance: we passed our data twice, once to get the mean and another time to compute squared differences from mean. Performance and accuracy, however, are not always at opposing ends. There are one-pass algorithms for accurately computing variance. These algorithms are not widely known to developers without a background in numerical analysis.

Developers rarely think of numerical accuracy until a problem happens. With ever-growing data sizes, accuracy becomes a more pressing concern. Large data-processing paradigms like Google’s MapReduce and its open-source Java implementation, Hadoop, aid developers with writing data-analysis scripts over petabytes of data. So much attention is paid into accessing the entire data set during analysis. Yet, what is the point of such fine-grained analysis if we don’t check (or code) for accuracy for even the simplest of all operations: summation. Four and a half millenniums since the first abacus and mechanical addition hasn’t changed much. The only difference is now we spend a lot more resources running exponentially more addition operations. If we care about accuracy, we should keep in mind the inherent limitations of floating point arithmetic.

† Most processors have 8-byte floating point units and so they automatically increase precision when adding up 4-byte numbers. We must, however, preserve this precision during intermediate computations that deal with data stored using only 4 bytes: So we should use an 8-byte running sum and then convert it to 4-bytes when summing 4-byte data values.