peterhurford/parallelization.md

## parallelization.md

      
    Raw
  

              parallelization.md
            
          
    Computer code is a series of executed statements.  Frequently, these statements are executed one at a time.  If one part of your code takes a long time to run, the rest of your code won't run until that part is finished.
However, this isn't how it has to be.  We can often make the exact same code go much faster through parallelization, which is simply running different parts of the computer code simaltaneously.
Asynchronous Code

The first example of this is asynchronous code.  The idea here is that many times you do things like send a call to another computer, perhaps over the internet, using an API.  Normally, code then has to simply wait for the other computer to give it a response over the API.  But asynchronous code can simply keep on going and then the API call returns later.
This makes code harder to reason about and handle because you don't know when the API call will return or what your code will be like when it returns, but it makes your code faster because you don't have to wait around for the API call to take place.
This is also an example of parallelization, since the code is running both on the second API-connected computer and on your computer.
Paraellism on Your Computer

But it's also possible to have parallel code on just one computer.  How is that?
Processes, Threads, and Cores

First we have to discuss processes, threads, and cores.
A process is an executing instance that provides everything needed to execute a program (e.g., address space, open handles to files, environment variables.)
A thread is a subset of a process that shares the resources of a process but can execute code indepedently.
Threading vs. Forking

Processes usually start out with just one thread.  However, they can spawn additional threads through threading.  These threads can then be used to run multiple parts of the same program.
Forking creates an entirely new process that resembles the old process and can be used to run another part of the same program.
Multiple Cores

In addition to having multiple threads and processes, your computer may have multiple cores.  Each core (aka "CPU") is essentially it's own individual computer with its own processes.
Within each core, the computer will constantly context-switch, going from thread to thread, executing each thread a bit at a time before switching to the next thread.  Thus single-core computing is not truly paralell, since only one given thread will be executing a time.  However, code across multiple threads can still be faster than single-threaded code, because the CPU can context-switch away from code that is waiting on something (e.g., API call, file read) to a thread that can be processing right away.
Paralell Problems

Race Conditions

The biggest problem with code executed in multiple places is that you can get all sorts of accidents when multiple things want access to the same thing.  One example is a race condition.
Imagine you have one function, Alpha, that wants to take a Number, add 1 to it, and read it.  You also have another function, Beta, that wants to divide Number by two and read it.  If Number starts out as 5, what will Alpha and Beta return if Alpha and Beta are run in parallel?
Well it depends on how fast Alpha and Beta are.  You can end up with a wide variety of scenarios:
Alpha adds 1 to Number.
Alpha reads Number: 6.
Beta divides Number by two.
Beta reads Number: 3.
...
Alpha adds 1 to Number
Beta divides Number by two.
Alpha reads Number: 3.
Beta reads Number: 3.
...
Beta divides Number by two.
Alpha adds 1 to Number.
Alpha reads Number: 3.5.
Beta reads Number: 3.5.
...
etc.
Depending on which program is faster, different numbers can result.  This is bad!
Multiple Connections

Another problem is having multiple connections to the same object, such as a file or a database.  This can often cause unstable behavior.
Preventing Parallel Problems

Locking

Programs that are designed with parallelization in mind can be smart about race conditions and other resources shared across processes.
One idea is called locking, where a process "owns" an object until it is done and then relinquishes it for anyone else to use.  In this example:
Beta divides Number by two.
Alpha adds 1 to Number.
Alpha reads Number: 3.5.
Beta reads Number: 3.5.
...if it were done with locks, it would have happened like this:
Beta divides Number by two.
Alpha tries to add 1 to Number, but is denied because Number is locked.  Alpha waits.
Beta reads Number: 2.5.
Alpha tries to add 1 to Number and succeeds.
Alpha reads Number: 3.5.
This cuts down on problems like Alpha interfering with Beta's variable before Beta is done, but it still doesn't solve whether Alpha or Beta will get there first.
Forking vs. Threading Revisited

Forking is safer than threading because forks have their own individual address space and a crash will not crash the other processes.  Forked processes are also easier to maintain and debug.  Moreover, forking is faster on a single-core implementation because there is less overhead from the context switching.
Threading is generally faster on multiple-core implementations because there's less overhead to creating threads.
Global Interpreter Locks

To solve problems with multiple threads, some programming languages (such as Python, Ruby, and R) have global interpreter locks, where they only allow one thread to be run at a time, even on multi-core platforms.
This is great at preventing bad things from happening, but makes parallel code much harder to write.  The solution to write parallel code in these languages is to fork the interpreter itself, running code on multiple interpreters.
Functional Programming

Languages with global interpeter locks are frequently written as imperative code (the code is organized as a series of statements) or object-oriented code (the code is organized as a series of objects, which contain functions).  However, another style of coding is functional programming, where the code is organized as a series of functions.  It turns out that functional code makes it much easier to write parallel code.
This is because when you have functions, you can avoid having shared state.  Remembering back to Alpha and Beta, the shared state refers to Number which they were both sharing.  If Alpha, instead of modifiying Number externally to them, had Number passed as an argument to Alpha, Alpha could compute with Number indepedently of Beta.
Furthermore, functions are easily vectorizable, where the same process can be repeated on multiple inputs.  If you pass 100 inputs to a function, it can be broken up into four parallel processes each with 25 of those inputs, then aggregated back together.
Lastly, functions can be pure, which means that they are guaranteed not to interact with external state or do anything at all with the outside world (e.g., write to the disk, delete from your database).  This means that pure functions can be guaranteed not to cause the paralell problems we talked about earlier.