# What to expect with failures?

**Posted:**March 14, 2011

**Filed under:**Databases Leave a comment

*This is the last post in a series comparing MapReduce to Parallel Databases using stochastic models. See earlier posts for an introduction, cluster failure modeling and MapReduce failure modeling.*

Parallel databases or MapReduce, your technology of choice for data processing on clusters will depend on performance. The technology that delivers a response to your information quests first wins. We could argue on the relative importance of performance vs. ease of use, but at the end of the day, faster is stronger. This is the zeitgeist of the 21st century. As Daft Punk sings it: “Work it harder, make it better; *Do it faster, makes us stronger*; More than ever, Hour after; Our work is never over.” So putting this debate to rest by quoting lyrics from a pop-album, lets settle which technology is faster.

Current empirical estimates put parallel databases an order of magnitude faster than MapReduce. These estimates, however, are embarrassingly unfair: we are comparing databases that underwent 30+ years of refinement and optimization to an infant, MapReduce. We can estimate the actual performance difference when both technologies are at the same maturity level by using asymptotic complexity analysis. Lets say we have *N* integers that we would like to sum. We have a fixed number of machines, *k, *in the cluster. *k* is a small constant compared to *N*. Each machine has *N/k *integers. A parallel database will perform O(*N/k*) additions on each machine and a final aggregation step of O(*k*) operations. MapReduce divides data up into very small blocks. Each block has a constant number, *c*, of integers. Each machine has *N/kc* blocks. MapReduce will perform O*(N/k)* additions on each machine that will produce O(*N/kc*) intermediate sums. One machine will produce the final aggregation step of *k* *O(*N/kc*) or O(*N/c*) operations. MapReduce takes O(*N/k + N/c*) operations in comparison to the O(*N/k + k*) operations needed by databases. In other words, MapReduce is at most a factor of two worse due its blocked data model, not an order-of-magnitude (factor of 10) worse, when it comes to queries that aggregate the entire data.

Our pure failure-free estimate of relative performance is: *MapReduce is twice as slow as parallel databases*. What happens when failures enter the equation? In my previous post, I calculated the expected query response time of MapReduce to be *x/(1-λ _{m}x)* for a query that takes

*x*seconds in a failure-free cluster.

Databases have a very different fault-tolerance model. The database aborts and restarts the query from scratch every time a machine fails. τ_{c} is the expected time to failure of the cluster *τ _{c}=1/λ_{c}*, where λ

_{c}is the probability of failure per second of the cluster. In a failure-free cluster, the query takes

*x*seconds to execute. Intuitively, if

*x < τ*

_{c}, then the query will likely finish without a restart. With longer queries, chances of failure and restart increase.

We calculate the expected query time by using a recursion trick: when a failure happens, we add the work wasted up until the failure and the expected query time as we need to restart the query from scratch. For a query to complete, the system has to be failure-free for at least *x* consecutive seconds. This has a probability of *e ^{-λcx}*.

**E[**T_{x}**]** = **E[**T_{x}| failure **]** + **E[**T_{x} | no failure**]**

**E[**T_{x} | no failure**]** = *xe ^{-λcx}*

**E[**T

_{x}| failure

**]**=

_{i=0}∫

^{x}

*(i +*=

**E[**T_{x}**]**)*P(**failure at i***di***)*_{i=0}∫

^{x}

*(i +*

**E[**T_{x}**]**)λ_{c}e^{-λci}di = 1/λ_{c}+**E[**T_{x}**]**– e^{-λcx}(1/λ_{c}+**E[**T_{x}**]**+ x)

**E[**T_{x}**]**=*(e*^{λcx}– 1)/λ_{c}How does the expected query time of MapReduce compare to parallel databases assuming MapReduce is twice as slow? Lets look at a visual comparison with different failure rates.

Figure 1 shows that with small clusters (100 machines) and a machine failure rate of once a week, then parallel databases perform better. The turning point occurs at around 8000 seconds where MapReduce despite its factor of two slowdown performs better. For large clusters and high failure rates, MapReduce is the clear winner. For the remaining combinations, MapReduce will outperform databases as the query time increases. A key point is that MapReduce performance does not depend on the cluster size. It depends on the individual machine failure rate. However databases depend on the collective cluster failure rate.

What happens as the query time increases to week-long queries? Figure 2 shows that for different failure rates, our model for MapReduce, will never complete queries of a certain duration. For all other shorter queries, MapReduce’s expected query time is linearly proportional to the query time without failures.

So which technology will survive the massive cluster era? Only time can tell! Parallel databases need to move towards a blocked data organization instead of a partitioned data organization to survive massive clusters. This move will drastically impact query execution and optimization, changing some of the long-standing database concepts. We might need a complete rewrite.

There are more fundamental questions: Will massive clusters survive or is this just a fad? Clusters are power-guzzlers. Energy is becoming the most expensive resource. So much so that Google is investing in renewable and nuclear power. How much of data is clutter? Could we justify the dollar price and value of each bit stored and processed?

That’s it for this series.