Tag Archive for 'parallelization'

Performance Basics: Scalability

Scalability is a word you hear so much in IT departments that it is sometimes on the verge of being a “buzz word”, you know like “synergy”. The word has its roots in High Performance Computing and became widely used in enterprise environments when big distributed systems started to be used to solve complex problems or serve a great number of users (hundred of thousands or even millions in the case of big websites) .

How it started in HPC (High Performance Computing)

I studied parallel computing in the mid-90s and, at the time, I remember our teachers saying “maybe one day, everything you are learning will be used outside of the realm of High Performance Computing”. That was highly prophetic when the Internet was mainly a research tool and computers with more than one processor were laboratory machines, the very notion of having two cpus in a laptop would have been mind boggling at the time.

And the most important thing to know when it comes to parallel computing is that some problems cannot be parallelized. For instance, iterative calculations can be very tricky because iteration n+1 needs the results of iteration n, etc. On the other hand, processing of 2D images is generally easy to parallelize since you can cut the image in portions and each portion will be processed by a different CPU. I am over simplifying things but this is such an important notion that a guy named Gene Amdhal came up with the Amdahl’s law.

Let me quote Wikipedia here: “it [Amdahl’s law] is used to find the maximum expected improvement to an overall system when only part of the system is improved”. In other words, if you take a program and 25% of it cannot be parallelized then you will see that you will not be able to make it more that 4 times faster whatever the number of cpus you are throwing at it:

Ahmdahl's law

This is the exact same problem that everybody is now experiencing on their home computers equipped with several cores, some programs will just use one core and the other cores will do nothing. In some cases, it might be because the programmer is lazy or has not learned how to parallelize code, in other cases, it is simply because the problem cannot be parallelized. I always find amusing to hear or read people ranting about their favorite program not taking advantage of their shiny new 4 core machines.

Well it is because it can be very hard to parallelize some portions of code and a lot of people have spent their academic lives working on these issues. In the field of HPC, the way to measure scalability is to measure the speedup or “how much is the execution time of my program reduced with regard to the number of processors I throw at it”.

The coming of distributed systems to the Enterprise

In 1995, something called PVM (Parallel Virtual Machine) was all the rage since it allowed scientists to spread calculations over networked machines and these machines could be inexpensive workstations. Of course, Amdahl’s law still applied and it was only worth it if you could parallelize the application you were working on. Since then, other projects like MPI or OpenMP have been developped with the same goal in mind. The convergence of these research projects, although not entirely linked, and the availability of the Internet to a wide audience is quite remarkable.

The first example that comes to mind is the arrival of load balancer appliances in the very late 90s to spread web server load over several machines thus increasing the throughput. Until then, web servers often ran on a single machine on the desk of someone. But when the Internet user population numbered in hundreds of thousands instead of a few thousands this way of doing things did not cut it anymore. So programs but more often specialized appliances were invented to spread the load over more than one web server. This means that if 100 users tried to access your website simultaneously, 50 would be directed to webserver 1 and 50 to webserver 2. This is not that different  a concept from what people had been doing in High Performance Computing using PVM/MPI,etc.  And luckily for us serving static content is very easy to parallelize, there is no interdependency or single bottleneck.

The modern notion of Scalability for Enterprise applications

I will stop here the comparisons between the ultra specialized HPC world and its enterprise counterpart but I just wanted to show that these two worlds might sometimes benefit in looking over each other’s shoulders.

Nowadays, scalability can have multiple meanings but it often boils down to this: if I throw more distributed resources to a IT system, will it be able to serve more customers (throughput) in an acceptable time (latency)? Or, what does it take to increase the capacity of my system?

Scalability in an enterprise environment is indeed about how to handle the growing usage of a given IT system. Back in prehistoric ages, circa 1990, new generations of computer arrived every 18 months like clockwork, offered twice the processing speed and most program benefited from it since they were all mono-threaded. But nowadays, most IT systems are made of different components each with their own scalability issues.

Take a typical 3-tier web environment composed of these tiers:

  1. Web Servers
  2. Application servers
  3. Database servers

The scalability of the whole system depends on the scalability of each tier. In other words, if one tier is a bottleneck, increasing capacity for other systems will not increase your overall capacity. This might seem obvious but what is often not obvious is which tier is actually the bottleneck!

The good news is that this is not exactly a new problem since it pretty much falls under Amdahl’s law. So what you need to ask yourself is :

  • How much of the system (and subsystems) can be improved by throwing more ressources at it? In other words, how parallelized is it already?
  • What does it take to improve the system and its subsystems? Better code? More cpu? more IO throughput? more Memory?
  • What improvement will it yield? What will be the consequences? Will more customers be served? Will they be served faster or as fast? etc.

In the end, it is back to finding the bottleneck in the overall system and solving it which might be easy (e.g. serving static content is very parallelizable) or extremely difficult (e.g. lots of threads waiting on a single resource to be available). Note that IT system should usually be built with scalability in mind, which would avoid any detective work when the time to increase capacity has come, but alas it is not always the case.