gszauer/Alpha Blending Tutorial

## Alpha Blending Tutorial
Everything you ever wanted to know about alpha blending and more.


When doing alpha channels, few algorithms put in the extra effort for maximum flexibility. This is because there is always a battle between speed, memory and complexity. As we shall see, there are many different ways of doing alpha blending.

By far, the most common form of alpha channel is where you have a semi-transparent image that you want to apply over a static background. The semi-transparent image will contain red, green and blue channels as well as an alpha channel that specifies how transparent each pixel is. Each channel contains a series of values that range from 0 to 255 for 32bit images.

What this means is that an alpha channel of 255 would mean full intensity and an alpha value of 0 would mean that the background would show completely through. Any other value would be some percentage of the image and the inverse percentage of the background. For example, if an alpha channel is at 30%, then you'll see 70% of the background showing through.

The equation for a semi transparent image applied on a static background is as follows:

pixel = (alpha * image + (255 - alpha) * background)/255

It may not be obvious what is happening here. Most articles on the subject do not even attempt to explain it. First, let's look at how a pixel is actually stored. In most cases, you'll see blue, then red, then green and finally the alpha channel for each and every pixel.

Byte 0 is B (blue).
Byte 1 is G (green).
Byte 2 is R (red).
Byte 3 is A (alpha).

This is for the image. The background is stored the same way, but the alpha channel is ignored. So if we have S for the source image and D for the destination (background), we would get the following algorithm.

D[B] = (S[A] * S[B] + (255 - S[A]) * D[B])/255
D[G] = (S[A] * S[G] + (255 - S[A]) * D[G])/255
D[R] = (S[A] * S[R] + (255 - S[A]) * D[R])/255

Here, most other articles proceed to shortening the above equations. We'll get to that, but there is something much more important here. Notice that the source is multiplied by the alpha channel while the destination is multiplied by its additive inverse. Why is this? Well, let's look at the equation for the blue channel again.

D[B] = (S[A] * S[B] + (255 - S[A]) * D[B])/255

Let's propagate the division by 255 so that we get two independent parts being added together.

D[B] = (S[A] * S[B])/255  +  ((255 - S[A]) * D[B])/255

The left part takes a certain percentage from S[ B ] (source blue). And the right part takes the leftover percentage from the destination D[ B ] (dest blue). What this means is that the total percentage is always 100%. Let's look at that.

S[A]/255 + (255 - S[A])/255
(S[A] + 255 - S[A])/255
255/255
1

So the reason we take the additive inverse from the destination is because we want to fill in the leftover percentage not used by source. What's important is that the total percentage is always 100%. This way, the final result will never exceed 100% of the maximum value for that pixel. If the source takes 10%, the dest takes 90%. The maximum will be 100% of 255 which is 255. If the source takes 32%, the dest will take 68%. Again, it adds up to 100%.

Ok, so what does this mean? Let's back up first. In most other articles, the main obstacle is the division by 255. So what they do is show how the equations can be simplified. Let's do that right now. Here's our original equations.

D[B] = (S[A] * S[B] + (255 - S[A]) * D[B])/255
D[G] = (S[A] * S[G] + (255 - S[A]) * D[G])/255
D[R] = (S[A] * S[R] + (255 - S[A]) * D[R])/255

We will expand the inside brackets.

D[B] = (S[A] * S[B] + 255 * D[B] - S[A] * D[B])/255
D[G] = (S[A] * S[G] + 255 * D[G] - S[A] * D[G])/255
D[R] = (S[A] * S[R] + 255 * D[R] - S[A] * D[R])/255

We pull out S[A].

D[B] = (S[A] * (S[B] - D[B]) + 255 * D[B])/255
D[G] = (S[A] * (S[G] - D[G]) + 255 * D[G])/255
D[R] = (S[A] * (S[R] - D[R]) + 255 * D[R])/255

We can separate the mults by 255.

D[B] = (S[A] * (S[B] - D[B]))/255 + D[B]
D[G] = (S[A] * (S[G] - D[G]))/255 + D[G]
D[R] = (S[A] * (S[R] - D[R]))/255 + D[R]

That's much shorter since there is now only one multiplication for each channel. This is usually the best that you can find.

But there's more. When we paste transparent images onto a background, the source images usually never change. So we can precalculate what their values would be when used in the above equations. So we would store precalculated values for our source images. This would save a lot of computations if the same images are used several times. For this, we need to use an earlier set of equations.

D[B] = (S[A] * S[B])/255  +  ((255 - S[A]) * D[B])/255
D[G] = (S[A] * S[G])/255  +  ((255 - S[A]) * D[G])/255
D[R] = (S[A] * S[R])/255  +  ((255 - S[A]) * D[R])/255

We precompute the following:

S[PB] = (S[A] * S[B])/255
S[PG] = (S[A] * S[G])/255
S[PR] = (S[A] * S[R])/255
S[PA] = 255 - S[A]

We convert all our source images with the above equations. When it comes time to blend them with a background, we can now use the much simplified equations below.

D[B] = S[PB] + (S[PA] * D[B])/255
D[G] = S[PG] + (S[PA] * D[G])/255
D[R] = S[PR] + (S[PA] * D[R])/255

Notice that you just need to add to the source. This is significantly easier and faster.

The only other topic you are likely to find is about how to get rid of the division by 255. Divisions are inhently slow. One easy way to speed up this algorithm is to just divide by 256. Dividing by 256 is simply shifting right by 8 bits. It doesn't get any faster than that. The problem with this is that you never get maximum intensity. You'll always get some of the background showing through. No matter how hard you try, you'll always get 254 as your maximum value. All your images will be a little darker. And if you apply an image several times, it will become noticeable.

Another option is to do some bit twiddling. Someone found that by inverting 255 (ie. 1/255), you get 257 shifted over by 16 bits. This means that taking your number X and multiplying it with 257 and then shifting it by 16 bits should give a result close to what is expected. Note however that 257 contains two bits each 8 bits apart. So you can do ((X>>8) + X)>>8 to get an approximate value for dividing X by 255. But this too is a little off. Someone found that if you add 128 to X and then apply the preceeding technique, it should give the correct result. I could not get this to work and frankly, it's not supposed to. The problem lies with the fact that this approximation is just that, an approximation. Multiplying by 257 is not the actual inverse of 255. The inverse of 255 is actually an infinite series. This means it's a limit. I don't know how good is the reader's knowledge of calculus, but for now it suffices to say that a limit is an equation (by repeated use) that at each iteration approaches the expected value, but never actually reaches it. Any value that is reached is only because we throw away the remainder when dividing. So errors sometime cancel out.

Now, we can finally go back to what we talked about earlier. We said before that the entire purpose of the equations for alpha channels is to have the source use up a certain percentage and have the destination use up the remaining percentage to make up a total of 100%. So what really matters is to be able to take percentages of the source and destination and have them add up to 100%. The fact that the alpha channel can take values up to 255 makes us think that we have to divide by 255. But this is not so. We can divide by 256 instead with a little tweaking.

Looking at alpha channels with maximum values of 255, we want the source to use up a portion of this 255, say 50, and have the destination use up the rest. So the remainder of 50 from 255 is 205. 205+50 = 255. Basically, we want the total to always be 255 (which is 100%). The techniques used to divide by 255 may well provide an adequate result. They'll just be slightly off some of the time by one intensity level. Not a big deal. But it takes two shifts and two additions. Maybe there's a better way.

Why not use a total of 256 instead of 255? Well, because the maximum value we can use in an alpha channel is 255. We'd never be able to have an opaque pixel because we can never store the required 256. So how about adding 1 to all alpha channel values above or equal to 128? Now, we can easily shift right by 8 to divide by 256 with proper results.

But how do we add 1 for alpha values above or equal to 128? Would this not require a comparison? Comparisons are expensive. Well, no. Not if you use assembly. In assembly, you can check if a number is greater than or equal to 128. If it is, the carry bit will be set. And now you can use an add with carry operation to get the correct adjusted alpha value.
	Everything you ever wanted to know about alpha blending and more.


	When doing alpha channels, few algorithms put in the extra effort for maximum flexibility. This is because there is always a battle between speed, memory and complexity. As we shall see, there are many different ways of doing alpha blending.

	By far, the most common form of alpha channel is where you have a semi-transparent image that you want to apply over a static background. The semi-transparent image will contain red, green and blue channels as well as an alpha channel that specifies how transparent each pixel is. Each channel contains a series of values that range from 0 to 255 for 32bit images.

	What this means is that an alpha channel of 255 would mean full intensity and an alpha value of 0 would mean that the background would show completely through. Any other value would be some percentage of the image and the inverse percentage of the background. For example, if an alpha channel is at 30%, then you'll see 70% of the background showing through.

	The equation for a semi transparent image applied on a static background is as follows:

	pixel = (alpha * image + (255 - alpha) * background)/255

	It may not be obvious what is happening here. Most articles on the subject do not even attempt to explain it. First, let's look at how a pixel is actually stored. In most cases, you'll see blue, then red, then green and finally the alpha channel for each and every pixel.

	Byte 0 is B (blue).
	Byte 1 is G (green).
	Byte 2 is R (red).
	Byte 3 is A (alpha).

	This is for the image. The background is stored the same way, but the alpha channel is ignored. So if we have S for the source image and D for the destination (background), we would get the following algorithm.

	D[B] = (S[A] * S[B] + (255 - S[A]) * D[B])/255
	D[G] = (S[A] * S[G] + (255 - S[A]) * D[G])/255
	D[R] = (S[A] * S[R] + (255 - S[A]) * D[R])/255

	Here, most other articles proceed to shortening the above equations. We'll get to that, but there is something much more important here. Notice that the source is multiplied by the alpha channel while the destination is multiplied by its additive inverse. Why is this? Well, let's look at the equation for the blue channel again.

	D[B] = (S[A] * S[B] + (255 - S[A]) * D[B])/255

	Let's propagate the division by 255 so that we get two independent parts being added together.

	D[B] = (S[A] * S[B])/255 + ((255 - S[A]) * D[B])/255

	The left part takes a certain percentage from S[ B ] (source blue). And the right part takes the leftover percentage from the destination D[ B ] (dest blue). What this means is that the total percentage is always 100%. Let's look at that.

	S[A]/255 + (255 - S[A])/255
	(S[A] + 255 - S[A])/255
	255/255
	1

	So the reason we take the additive inverse from the destination is because we want to fill in the leftover percentage not used by source. What's important is that the total percentage is always 100%. This way, the final result will never exceed 100% of the maximum value for that pixel. If the source takes 10%, the dest takes 90%. The maximum will be 100% of 255 which is 255. If the source takes 32%, the dest will take 68%. Again, it adds up to 100%.

	Ok, so what does this mean? Let's back up first. In most other articles, the main obstacle is the division by 255. So what they do is show how the equations can be simplified. Let's do that right now. Here's our original equations.

	D[B] = (S[A] * S[B] + (255 - S[A]) * D[B])/255
	D[G] = (S[A] * S[G] + (255 - S[A]) * D[G])/255
	D[R] = (S[A] * S[R] + (255 - S[A]) * D[R])/255

	We will expand the inside brackets.

	D[B] = (S[A] * S[B] + 255 * D[B] - S[A] * D[B])/255
	D[G] = (S[A] * S[G] + 255 * D[G] - S[A] * D[G])/255
	D[R] = (S[A] * S[R] + 255 * D[R] - S[A] * D[R])/255

	We pull out S[A].

	D[B] = (S[A] * (S[B] - D[B]) + 255 * D[B])/255
	D[G] = (S[A] * (S[G] - D[G]) + 255 * D[G])/255
	D[R] = (S[A] * (S[R] - D[R]) + 255 * D[R])/255

	We can separate the mults by 255.

	D[B] = (S[A] * (S[B] - D[B]))/255 + D[B]
	D[G] = (S[A] * (S[G] - D[G]))/255 + D[G]
	D[R] = (S[A] * (S[R] - D[R]))/255 + D[R]

	That's much shorter since there is now only one multiplication for each channel. This is usually the best that you can find.

	But there's more. When we paste transparent images onto a background, the source images usually never change. So we can precalculate what their values would be when used in the above equations. So we would store precalculated values for our source images. This would save a lot of computations if the same images are used several times. For this, we need to use an earlier set of equations.

	D[B] = (S[A] * S[B])/255 + ((255 - S[A]) * D[B])/255
	D[G] = (S[A] * S[G])/255 + ((255 - S[A]) * D[G])/255
	D[R] = (S[A] * S[R])/255 + ((255 - S[A]) * D[R])/255

	We precompute the following:

	S[PB] = (S[A] * S[B])/255
	S[PG] = (S[A] * S[G])/255
	S[PR] = (S[A] * S[R])/255
	S[PA] = 255 - S[A]

	We convert all our source images with the above equations. When it comes time to blend them with a background, we can now use the much simplified equations below.

	D[B] = S[PB] + (S[PA] * D[B])/255
	D[G] = S[PG] + (S[PA] * D[G])/255
	D[R] = S[PR] + (S[PA] * D[R])/255

	Notice that you just need to add to the source. This is significantly easier and faster.

	The only other topic you are likely to find is about how to get rid of the division by 255. Divisions are inhently slow. One easy way to speed up this algorithm is to just divide by 256. Dividing by 256 is simply shifting right by 8 bits. It doesn't get any faster than that. The problem with this is that you never get maximum intensity. You'll always get some of the background showing through. No matter how hard you try, you'll always get 254 as your maximum value. All your images will be a little darker. And if you apply an image several times, it will become noticeable.

	Another option is to do some bit twiddling. Someone found that by inverting 255 (ie. 1/255), you get 257 shifted over by 16 bits. This means that taking your number X and multiplying it with 257 and then shifting it by 16 bits should give a result close to what is expected. Note however that 257 contains two bits each 8 bits apart. So you can do ((X>>8) + X)>>8 to get an approximate value for dividing X by 255. But this too is a little off. Someone found that if you add 128 to X and then apply the preceeding technique, it should give the correct result. I could not get this to work and frankly, it's not supposed to. The problem lies with the fact that this approximation is just that, an approximation. Multiplying by 257 is not the actual inverse of 255. The inverse of 255 is actually an infinite series. This means it's a limit. I don't know how good is the reader's knowledge of calculus, but for now it suffices to say that a limit is an equation (by repeated use) that at each iteration approaches the expected value, but never actually reaches it. Any value that is reached is only because we throw away the remainder when dividing. So errors sometime cancel out.

	Now, we can finally go back to what we talked about earlier. We said before that the entire purpose of the equations for alpha channels is to have the source use up a certain percentage and have the destination use up the remaining percentage to make up a total of 100%. So what really matters is to be able to take percentages of the source and destination and have them add up to 100%. The fact that the alpha channel can take values up to 255 makes us think that we have to divide by 255. But this is not so. We can divide by 256 instead with a little tweaking.

	Looking at alpha channels with maximum values of 255, we want the source to use up a portion of this 255, say 50, and have the destination use up the rest. So the remainder of 50 from 255 is 205. 205+50 = 255. Basically, we want the total to always be 255 (which is 100%). The techniques used to divide by 255 may well provide an adequate result. They'll just be slightly off some of the time by one intensity level. Not a big deal. But it takes two shifts and two additions. Maybe there's a better way.

	Why not use a total of 256 instead of 255? Well, because the maximum value we can use in an alpha channel is 255. We'd never be able to have an opaque pixel because we can never store the required 256. So how about adding 1 to all alpha channel values above or equal to 128? Now, we can easily shift right by 8 to divide by 256 with proper results.

	But how do we add 1 for alpha values above or equal to 128? Would this not require a comparison? Comparisons are expensive. Well, no. Not if you use assembly. In assembly, you can check if a number is greater than or equal to 128. If it is, the carry bit will be set. And now you can use an add with carry operation to get the correct adjusted alpha value.