emilyst/why-image-stacking-works.md

## why-image-stacking-works.md

      
    Raw
  

              why-image-stacking-works.md
            
          
    Image stacking is a collection of related techniques used in astrophotography which all use multiple images taken by a camera to achieve a more detailed photo than would otherwise be possible. I’ve talked about image stacking in the past, but I hand-waved the statistical parts a bit. In this post, I want to dig into those mathematical details so that you understand why image stacking actually works.
But first, I’ll tell a story about thermometers. We’ll come back to image stacking after that.

Let’s imagine you’re a scientist, and you need to know the temperature of the room you’re in to within a tenth of a degree in order to run an experiment. You dutifully order a super expensive NIST-traceable thermometer online, but on the day it arrives, you open the box only to discover that you got a hundred cheap alcohol thermometers instead.
The expensive thermometer would have done the job to within a hundredth of a degree, but now it’s too late to wait for a new one to arrive, and all you have are these cheap bulk thermometers. To your dismay, you discover that these thermometers are only accurate to within two whole degrees. They can’t be calibrated because they’re just dinky little glass tubes with a bit of liquid inside. Besides, what could you calibrate them with?
In desperation, you call up a scientist colleague to ask for help. She tells you there might be a way to get an accurate temperature, even using these inaccurate thermometers. The key is having a bunch of them, she says. She tells you to take each one’s temperature as precisely as possible and write it down.
So you take out all your thermometers and lay them on a table. After they’ve reached room temperature, you write down the temperature indicated on each in a table as precisely as you can. After squinting closely at the thermometers for a while, your table starts to look like this one.


Thermometer
Temperature (°C)


1
20.23


2
21.39


3
19.98


4
19.93


And so on it goes, until you have a hundred measurements. Once that’s done, you call back your colleague, who eyeballs the values and tells you that they look like they’re normally distributed. Reluctant to cede your scientific street cred, you nod your head as if you know what this means.
She then goes on to say you could use a Q-Q plot as a quick test for normality. If the values tend to cluster around a straight line through the plot, then it’s likely you have a normal distribution. Before you can say anything else, she’s created a plot like the one below.

You thoughtfully intone: “Hmmmmm. That grouping of values along the line does look pretty conclusive.” But your colleague isn’t satisfied. She says she can run a Shapiro-Wilk test on the values. It’s not long before she’s produced more figures. Eventually, you hear her say that p is equal to 0.148, which she says is “good.”
At this point, you’re lost, and you have to come clean: you slept through statistics class. What does that actually mean? She tells you it means that you can treat this set of values as a normal distribution. It might make more sense, she adds, if we plot the raw values on a number line. And soon enough, another plot appears.

There’s no single value that comes out as the clear winner. This is just a huge range of values, spread over five whole degrees! You feel like you’re even farther away from the answer than ever, but your fellow scientist seems pleased. You ask, “How on Earth does this help us?”
She answers that this helps enormously. “This is a normal distribution, right? So the value you want is right in the middle!” On your perplexed expression, she begins to make a new chart. “By rubbing a little more math on this,” she says, “we can further specify the distribution. In fact, it’s even easier to see if we bin the values up to a tenth of a degree.”

With a practiced flair, she reveals the chart above—there is our normal distribution! And indeed you can see what she’s getting at now. The values seem to lump up in the middle.
She tells you that the standard deviation is almost exactly 1°C, and the average value is 20.76°C. The median value is 20.66°C. In this scenario, the median value is a little more accurate, so you can say the temperature is probably about 20.7°C.
You are enormously relieved by this result, but your colleague warns you this only means the temperature is probably about 20.7°C. It’s quite likely, she admits. But any of those values along the distribution could be correct. It’s simply much more likely to be that value in the middle, or something very close to it. “After all,” she asks rhetorically, “how likely is it that most of the thermometers are wrong, but in only one direction?”
You consider asking her how likely that could be, but she seems satisfied, so instead, you decide to skip the rest of her statistics class. You thank her for her time and end the call. This is good enough for your experiment. Time to throw all these bulk thermometers in a cabinet somewhere and move on.

Now, with that example in mind, let’s turn back to the topic of image stacking. The point of the thermometer story is about finding signal within noise, and that’s what image stacking is all about. It even functions on a very similar set of statistical techniques.
What does image stacking have in common with using a bunch of cheap thermometers in the first place? They’re both noisy, meaning they’re quite likely to be slightly wrong.
To understand why, we need to review how digital cameras work. Any photos taken by a digital camera are recorded by a sensor. That sensor is made up of a rectangular grid of millions of very small photodetectors. A photodetector records how much light hit it when it was exposed. Each one of these photodetectors corresponds to one per pixel in the resulting image. When you take a photo, the camera’s software measures the signal from each photodetector. Once it has done this for the entire grid of photodetectors, the result is a matching grid of pixels which represent the image.
In the daytime, enough light hits the sensor so that there’s very little trouble getting a clear signal from the sensor. But when you’re taking astrophotos, the sensor’s job becomes far more delicate, and noise plays a much bigger role. This is because the individual photodetectors may be slightly wrong about how much light hit it, and when you’re taking in very little light, there’s not a lot of room for error.
In other words, a photodetector’s job is usually to tell how much light hit it using a number between (for example) zero and 255. If that value is off by one or two, no big deal. However, in low light, the photodetector now must measure if the amount of light was between zero and five, and even an error of one becomes a big deal.
You might already see where this is going. Wouldn’t it be nice if you could somehow measure much more precisely how much light hit the sensor? And for those measurements that were way off, wouldn’t it be cool if you could somehow correct or ignore those values automatically?
That’s where image stacking comes in. By taking lots of photos, we can very easily have as many “thermometers” as we like. It’s possible to take hundreds or even thousands of photos, which gives us lots of possible values for each pixel. All we have to do is photograph the same thing, the same way, over and over.
Once all the photos are taken, the rest of the process is typically done in software. You can use a general photo editor like Photoshop or something more purpose-built, but the process is generally the same either way.
First, it’s important to use landmarks in each photo to make sure they’re perfectly aligned with each other. This means that your software can be assured that when it looks at any specific pixel in one photo, that pixel represents the same part of the image as in all the other photos. Typically, stars function as useful landmarks. The software might call this process “alignment” or “registration.”
Once that’s done, the image stacking itself can begin. The software will step through all the images, pixel by pixel, gathering up all the values for it across every photo. The first pixel will be the one in the top left corner, so the software starts there, looking at all the top-left-most corner pixels from every picture. This process usually results in a bunch of possible values for that one pixel, but those values probably won’t be identical (remember—sensors are noisy). Ask yourself, which value is the most likely to be correct?
At this point, let’s think back to the story of the thermometers. Do you remember how concerned our colleague was with “normality”? This is a statistical term of art, referring to a distribution (a collection of observations or measurements) which tends to hover around a central value. In other words, our colleague was really trying to determine how likely it would be for any one measurement to tend toward the real signal. In a normal distribution, this happens in a way that’s possible to predict.
Once she inspected the values, she found that they did indeed tend to clump up together. This was an important clue that the values were probably normally distributed, and she knew that the real signal was likely to be closer to the middle of that clump.
In practice, we can usually assume that the values for a given pixel will be normally distributed without actually testing them for normality. This is due to the central limit theorem, which says that random values tend to converge on a normal distribution. It also implies that we can approximate other kinds of distributions (binomial, Poisson, etc.) with a normal distribution. Convenient!
Setting aside this math for a moment, the fact that the values would tend to clump around the real signal is somewhat intuitive, if you think about it. There will always noise in the measurements, of course. But all things being equal, that noise is no more likely to be greater than the signal than it is to be less than the signal.
Make sure you read that last sentence a couple of times, because it is the most crucial point. This clever statistical fact allows us to use the randomness against itself.
If you remember from the thermometer story, we knew at the outset that each thermometer probably read within a degree or two of the real temperature. However, within that constraint, each value could be anything at all. It could be greater than the real temperature, or it could be less than the real temperature. But once we looked at a hundred thermometers, we could make the assumption that half of them were slightly too warm and half were slightly too cold. By testing for a normal distribution, we justified that assumption.
The median is the value that divides the all the slightly-too-warm measurements from the slightly-too-cold measurements. Given how many thermometers we had and how widely spread out their measurements were, we could even determine how confident we were that the median represented the true temperature!
This same principle applies to the pixel values. If we took a hundred photos of the exact same thing and compare the top-left-most pixel of every photo, we would likely see a set of values that tend to group themselves around the ideal signal. By taking the median of all these values, we can make a very educated guess about what that ideal signal probably would be.
For example, let’s suppose we saw a set of values for that first pixel that look like the following table.


Photo
Photodetector value


1
4


2
2


3
3


4
2


In this example, the median of those measurements is 2.5. Ideally we’d have several hundred measurements—the more we have, the more accurate our median will be.
Notice also that our median allows for a more precise number than any of the individual measurements. In effect, we’re giving ourselves more room for error.
Then we can repeat this process for the next pixel in the grid, and the next, and so on. Finally, we end up with a grid of median pixel values which form a new image. If we’ve done our job right, this new image is a smoother, cleaner, and more precise one than any of our originals. This allows us to brighten the image (and thereby increase the signal) without exaggerating the noise. And that’s why image stacking works.

We can go even further with this idea. So far, we’ve considered the kind of noise that each photodetector might measure across many images. Photodetector noise remains constrained within a single photodetector, so you can treat each pixel’s values separately from the pixels next to it.
But there are other kinds of noise. For example, the atmosphere is constantly moving and causing things in the sky to shimmy just a tiny bit. Over time, this can slightly smear the features of your subject. There are still more sources of noise as well, such as slight imperfections in your camera’s focus.
Conquering sources of noise like atmospheric turbulence or optical errors gets somewhat more complex because you’re no longer considering a simple, linear, one-dimensional range of values. These kinds of noise affect adjacent pixels: right and left, up and down. We can still clean up such images statistically. This time, however, we need to work with two-dimensional distributions.
If you’re having trouble imagining what I mean, consider a point source of light, like a distant star. If all conditions were perfect—perfect optics with perfect focus and no atmosphere—this star would light up a single photodetector, and all the others around it would be perfectly dark. Once we introduce some noise, however, some of the signal from that star would get smeared onto the adjacent photodetectors. With a turbulent atmosphere to photograph through, sometimes the light gets bent slightly to the right or sometimes to the left (or up or down). In the resulting image, instead of having a perfect little pinpoint of light in a single pixel, you would see a kind of imperfect fuzzy blur, brighter in the middle and fainter at the edges.
The important thing to note is that this noise behaves just like the sensor noise we talked about before: it is no more likely to smear in one direction than any other. In other words, the randomness of the noise can help us again.
Moreover, since the randomness is usually uniform across the entire image, we no longer have to consider each pixel’s randomness separately. That means that if we can figure out how smeared one of the pixels is, we can guess that all the others are smeared in practically the same way. The best part is, we only need one image for this process (ideally one which has been cleaned up by image stacking already).
We can describe the way in which each pixel is smeared as a mathematical function called a “point spread function.” This term is just the optical way of saying “two-dimensional distribution.” The best way to estimate this point spread function means finding some part of our image that probably contains a point source of light (like a small star) and measuring how it’s been smeared. Then we can mathematically model this smearing by considering how wide and tall the smear is, how bright it is in the middle, and so on.
To become more certain you’ve modeled this point spread function correctly, you can repeat this process on as many stars as you like, measuring how smeared each one is. You’ll end up with a set of measurements describing each star’s roundness, brightness, and so on. Since there’s a bit of noise in this process as well—each star may not have been uniformly smeared the same as each other one—we can average or median these values together, too.
This process leaves us with the parameters we can use to model the point spread function. It’s easiest to use specialized software for this task. I used a program called PixInsight to analyze an already stacked image of a quasar I took recently, using its “DynamicPSF” process. Once I gave it three stars to look at, it spit out a table like the following.
Average Gaussian PSF
N ........ 3 stars
B ........   0.074703
A ........   0.229162
sx .......   2.18 px
sy .......   1.78 px
FWHMx ....   5.14 px
FWHMy ....   4.20 px
r ........   0.817
theta .... +16.53 deg
beta .....   2.00
flux ..... 2.989e+00
signal ... 1.255e-01
MAD ...... 2.878e-01

I then asked PixInsight to visualize the point spread function as an image, and it showed me the following image.

What this point spread function tells me about my image is that, for any given single pixel, I would actually find its signal mingled with the adjacent pixels in the shape I see above. Where that image is brighter, more of the signal lives. In the darker parts, there is less signal.
This mathematical combination of signal and smear is called “convolution,” meaning that the perfect single bright pixel we want has been combined with a kind of hypothetical math function representing the smearing effect. By measuring some of the stars, we can estimate what that math function might look like. Knowing this function allows me to create another point spread function which has the opposite effect. If I were to apply this new function, that’s called “deconvolution.”
“Deconvolution” is the image processing way of saying “undo.” The trick, as we just saw above, is knowing what to undo. Earlier, in the thermometers story, we could think of taking the median as a kind of deconvolution. There was an underlying real temperature (our signal) which got “smeared” across the various temperatures we read, and we undid that smear by applying a new function to the values (median) to counteract the noise.
This time, we’re doing a more sophisticated version of the same thing. Instead of using a simple median, we create another point spread function which does the opposite of the one we estimated earlier. Then we apply that new point spread function to each pixel in the image. This process involves math that goes beyond the scope of this article, but at a high level, it works similarly to the stacking process, but on a single image.
To deconvolve the original point spread function, using special-purpose software like PixInsight, we visit each pixel in the image. Given our educated guess at how the original signal for that pixel would have been smeared, we might do something like the following.

First, we measure that pixel’s value. Imagine it’s fully lit up.
Then we look at the nearby pixels’ values and compare it to the first one.
Knowing how smeared the first pixel is, we can subtract some of its value from the nearby pixels.

We can subtract more from the immediately adjacent pixels because we know the smear is concentrated there.
Further away pixels we can subtract less from because it’s less likely the smear affected them as much.


If the surroundings of the pixel happen to be darker, we will end up brightening the middle pixel and darkening the ones nearby. For a point like a star, this ends up making the star look sharper and more distinct. Once we’ve finished doing this process to every pixel, the final image will be clearer, revealing features that might otherwise have been swamped out by the turbulent atmosphere and imperfect optics.
Image stacking helps this process work because we gather enough images to form a uniformly randomized distortion in the first place. The more images we have, the more likely it is that the random noise will truly be uniform. Most image stacking techniques rely on these processes and similar ones to create astrophotos with less noise and more clarity than would otherwise be possible.