celoyd/swath-devlog.md

## swath-devlog.md

      
    Raw
  

              swath-devlog.md
            
          
    Notes on a big SAR swath

Introduction

I just had a big mug of coffee and now I have some black tea and here is a slapdash devlog about a “A lightly annotated SAR view from Los Angeles to Woss”.
Summary

This is basically a combination of several conversations I’ve had with friends about different aspects of the project.
Okay. It’s an image viewer for a big synthetic aperture radar swath. It has map labels that fade in and out, placed in an algorithmically unusual way. It ended up very much like I imagined it, but it doesn’t seem to be very fun or compelling to most people. Various lessons and reflections are adduced.
I don’t expect anyone would enjoy reading this top to bottom; it doesn’t have a gripping plot. Skip to what looks interesting.
Backstory

The data

Like everyone else, I’m intrigued by synthetic aperture radar. Sar works on principles entirely different from those of any ordinary camera – for example, there is no equivalent of a focal point – and, because it’s based on radar, it can see through clouds and even some solid material. (The usual example: sar of the Sahara showed the traces of unknown ancient river systems buried under the loose, dry, radar-translucent sand.) I think sar would be interesting and useful to far more people than know about it today; I also think that if it’s being used for industrial and military purposes in our names, there is a civic need to understand it. And there is the joy of seeing in new ways. As I described it to a friend while thinking about this, I am in the awkward position of knowing 100× more about sar than most people do, but 1/1000 as much as experts do; I can explain the tentpole ideas, but it’ll be Mann-Gell amnesia bait.
(In fact I have an old project that’s just a lot of big sar images with rambling commentary, but every time I think about finishing it it’s like: (a) Who is this for?, and (b) Do I want to get e-mails from people pointing out that I neglected to mention a some fact that a hypothetical other reader might find useful to know, and also it’s actually spelled SAR, and also why did I write “in orbit” when they prefer “on orbit”?)
The main source of open sar data today is Sentinel-1, in the Copernicus program. (See Vertex for it and other options.) Sentinel-1 satellites are typically in what they call IW mode, collecting a 250 km wide strip, when over non-polar land. EW mode, 400 km wide but less sharp, is more usual near the poles and over oceans. The inclination of the orbit means that the swaths can roughly align with the west coast of the US, but in IW mode, they’re narrow enough that no one swath follows the coast’s undulations for long; you can’t see, say, Oakland and Bakersfield, let alone Mendocino and Vancouver, in one strip. If we filter by EW mode, there are only a few matching swaths in almost a decade of Sentinel-1 data.
But there is one really good one! It’s from Sentinel-1B in late September 2016, two days after it finished calibration and began normal operations. I imagine the EW swath was some kind of test or experiment, but I don’t know. In 2020, I downloaded the frames (image slices), re-assembled them, and resampled them. It looked great and I posted a link to it but didn’t know where to go from there. It stayed in the back of my head, though.
The intended effect

There’s a wonderful feeling in looking at a big image or a big map, and it’s doubled near the image/map borderline. But a big geographical picture alone can be a lot, especially for people who haven’t been where it shows, and especially if it’s in an unfamiliar modality (i.e., not true color). So it seems like too much just shove a 145 megapixel image under someone’s nose. “Fascinating, isn’t it? No questions, please.”
As well as extremely dry and minimal maps, I do enjoy squirrely animation and cheesy interactivity. We’re in a bit of a rut with best-practice map styling these days, I think. We’re very literal with it. The one kind of style everyone uses is very refined and very good at what it’s intended for, but maybe we could intend a little wider. This project is not a big challenge to cartographic orthodoxy, but it is a small experiment in ways of showing space in space. I want to see more experiments.
Among other things, I want the feeling of being lost in the scroll – of having a big window looking onto a far bigger thing. Part of that is not having zoom, because if you can zoom out, you can navigate more hierarchically. I want more of the sense of looking at something in the distance, where you can’t (easily) get closer; you have to look more carefully. And often you get the feeling of not seeing what you want to see. That’s interesting and fun to play with. And the labels are there to balance this.
Did this come out in the final product? Yes, for me. No, not for most other people, I think. It’s simply not self-explanatory, and I don’t really see how to get there without paragraphs on paragraphs. That’s what’s made this project so interesting for me. (More reflection in the Results section.)
My partner got covid and some very bad geopolitics was happening over the weekend of the eclipse. This project gave me something to do other than sit and fret. I decided to write this as fast as possible over the weekend, without worrying about code quality or correctness beyond what it took to make it work well.
Animation

I experimented with animation on the canvas element briefly in May 2012 and haven’t really thought about it since then. I don’t know how to think in terms of frames or event loops; I don’t even know the vocabulary to look up answers to the most basic questions. I leaned heavily on CSS variables, which seems to have been an accidental good choice. You set a variable, you make CSS expressions depend on them, you alter the variables, and apparently optimized loops somewhere in the very hot and dense core of Firefox make it work.
So we listen for cursor motion events. From the line of the cursor’s motion, and especially its endpoint, hype radiates. This is implemented as events appearing in the schedule arrays of nearby labels, scaled by functions of distance from the start and end of the motion. In each animation tick, each label looks at the events in its schedule and adds an amount of excitement that ramps up toward the event and down after it. It decays its excitement slightly, clears its schedule of events in the past, and draws its text with various attributes (like weight) proportional, via scaling functions and CSS variables, to its excitement.
If I knew more math I think it would be better to implement this as more of an ODE and less of a state machine, to better account for dropped frames, but this way does work. There are probably standard optimizations I’m missing around, e.g., keeping quiescent labels out of the loop. Also, everything is strictly sequential right now; it could be async and probably in some kind of worker thread? Or do you not want to use those for animation? It’s embarrassing how little I know about this – but that’s the nature of the project.
Label placement

The problem

Say we want to put a label on the sign marking the 45th parallel on Interstate 5 northbound. It’s within a few meters of longitude −122.999° (by coincidence), and of course latitude 45°. Given a georeferenced image, how do we know where to place the text?
Usually, geospatial image metadata specifies two things: a projection, meaning a function between Earth’s (approximately) spherical shape and the plane, and a transform, meaning basically a position, scale, and rotation within the plane. For example, georeferencing metadata might say “I’m in the plane of the Mollweide projection, with my top left pixel at such and such a point, and my pixels extend east and south at such and such a scale”. (I am leaving out a few technically crucial parts, but listing them wouldn’t give a clearer picture of the fundamental idea.) Given this, we have the information needed to say that some pixel in image space (x, y) is at some coordinates in terrestrial space (longitude and latitude), and vice versa.
We do not have that for Sentinel-1. It comes unprojected, like a phone photo out a plane window. Its x and y dimensions do not lie in any defined projection. But unlike a phone photo, it has the metadata needed to create a defined projection: ground control points, or GCPs. These are pieces of information that, for a grid of pixel coordinates scattered over each image, give estimates of the longitude and latitude coordinates. Using these, reprojection software can build a function that approximates a continuous transformation from source pixel space to destination pixel space in whatever projection you want. For example, if the top left pixel (x=0, y=0) is at (longitude=−125, latitude=50), the reprojection software “pins down” that source point to that destination point, and as it adds more pins, it tries to bend the image as smoothly as possible to fit the constraints of the GCPs.
(It may even have permission to skip the GCPs that add the most warp to the image. With Sentinel-1, it’s safe to assume the GCPs are very good, but if you were doing your own with KAP, say, you might have made a few typos, and ignoring outliers would be reasonable.)
It seemed fun to place labels, specified in longitude and latitude coordinates, with GCPs alone – not because this is wise but because it is unwise. Also because I wanted to deliver the unprojected swath to the browser. I like that the unprojected data makes the cardinal effect align with the pixel grid, and I like the shear as a reminder that this is not an ordinary image. And I liked the idea of finding a general way of addressing this kind of problem of navigating by landmarks.
(Deleted like three paragraphs getting way into the weeds about how the GCPs are laid out and what kinds of assumptions I was making about them. If you like, we can imagine that I left that in, and you read it, and you thought it was interesting and will let it inform your reading of the following. 🤝.)
(Actually, one little bit of that does matter: that I dumped the GCPs locally using rio info and sidecar them to the browser in – if I haven’t fixed it by the time you read this – inefficiently loaded and extremely wastefully precise geoJSON. There are about 450 GCPs per source image, and 6 source images for the whole swath.)
Ways it’s not implemented

Some label placement strategies I rejected:
Place labels in pixel coordinates. This makes perfect sense, and the results would be ideal, but its simplicity is against the spirit of the project. It’s the right answer but not the one we want.
Inverse distance weighting. A reasonable intuition here is: Why not approximate a point’s longitude and latitude as an average of nearby points, weighted by how close they are? In geostatistics this is called IDW, and it works for a lot of questions, but not this one. It tends to regress toward the mean. As you get further from a group of points, no matter the spatial trend within the group, your extrapolated value approaches the mean of the group’s values. It’s sort of reverse extrapolation – and again, it makes perfect sense for some kinds of sampling, but not for this. If we pare it down to consider only the 3 nearest enclosing points, we get a more applicable kind of interpolation, but if that’s the only part of IDW we want, we can probably choose a better interpolation.
Project the data. The problem is: What projection? The obvious choice would be space-oblique Mercator, which tries to follow an orbit, but the JS projection library I tried doesn’t seem to support it properly. More than that, even if you factor the curve of the orbit out of the image, it’s still curved over terrestrial coordinates because – unlike Landsat, say – it’s not nadir-centered: it’s taken to one side and therefore is buffered out from the sine shape of the orbit. Yikes.
Fit polynomials, thin-plate splines, or some other kind of spline. I don’t know how to do that at this scale, and don’t want to try for the first time in a language I don’t know – picking an algorithm, understanding it, implementing it, testing it, optimizing it, etc., would be a weekend in itself. Fitting thousands of GCPs is likely to be slowish, but underfitting means giving up fine-scale information. Fitting only locally (to the GCPs near a given label) makes sense but would get complicated. All in all, this seemed sensible but not particularly fun or interesting.
Fit a neural network. This does sound like fun, but I have never implemented backpropagation from scratch in a language I do know, so I thought it would be safer to leave for another day. (Though as I tried some other ideas, I did implement bits and bobs like naïve matrix multiplication in JS, so I might have more of the pieces of this at hand than I thought I would. Hmm.)
Affine zoom-in. A useful concept in this kind of geometry is the set of operations on shapes that keep their parallel lines parallel: resizing them, reflecting them, rotating them, and so on. Taken together, these are called affine tranformations, and they can be expressed neatly and implemented efficiently with linear algebra that even I know. (Well, that I know enough to look up.) We’re trying to think about curves, and affine operations are definitionally unable to curve or uncurve things, but we can still break most of our problem into affine parts.
Methods like this and this give us fast ways to turn any 3 non-collinear GCPs into an affine transformation that represents the mapping they imply between terrestrial and screen coordinates. In other words, if we have a triangle with known pixel and degree locations at its vertexes, we can linearly interpolate, extrapolate, and translate both coordinate systems around that triangle; we have a local estimate of how the two coordinate systems lie across each other. This is great as far as it goes, but the linearity is an important limit. If we pick samples near 3 of the 4 corners of our swath, for example, the mapping will be wrong around the center, where there is no sample to (so to speak) tell the transform that there’s a big overall curve in lon/lat terms. Some triangles will produce better transforms than others, but none of them can make a curve.
What we could do is something like this:


Pick 3 GCPs at, say, Portland, Eureka, and Bakersfield; calculate the affine transformation that they collectively imply for (x, y) → (longitude, latitude), and call that the overall transform. (Typically you might say global instead of overall, but one of the joys of working with spatial things is stumbling over the thousands of spatial metaphors in English.) This transform amounts to an origin point, scale, and cardinal directions for the whole image. It’s only an approximation, since it can’t represent the curve of the orbit, topography, etc., but it’s accurate to somewhere near the level that a scale bar and north arrow are on a typical wide-area map. Now we can:


Place the label where the transform says its longitude and latitude should fall.


Choose a new, smaller triangle around the label.


Calculate a new (and more locally accurate) affine transform from the new triangle.


If the label is close enough, stop; if not, take the new transform to step 2.


I think this would work well, but there are questions. For example, what happens if a transform (most likely the first one) is wrong enough to put the label outside the GCPs, so there is no triangle that encloses it? (We could clip it back into the well-defined area.) How do we pick triangles efficiently? (Maybe we pre-sort the GCPs on two axes to start with.) What’s a good triangle size reduction per step? (Big, I think; maybe even from the overall transform to the finest scale in one step.) What happens if triangle A projects the label into triangle B, but B puts it in A? Well, this should only be able to happen at few-pixel scales, so we can simply give up after a few steps. But it leads in an interesting direction:
What if we smooth the discontinuities between affine transforms, so we’re interpolating transforms? Technically this is forbidden. In general it is not possible to fairly interpolate an affine transformation between two others. But we can interpolate between the positions that they project a point to! That is, if we’re inside triangle A but near triangle B, we take our final position to be a weighted average of the positions suggested by A and by B. (Or maybe not exactly that, but something like that.) There are a lot of interesting questions here, for example whether this gets easier if seen in terms of barycentric coordinates, or whether it makes sense to triangularize all the GCPs, or whether this might end up resembling a Bézier surface, and I’d like to come back and explore a little one day.
Descending lerped gradients. It’s useful that the GCPs are in neat rectangles within each image, because this makes it easy to bilinearly interpolate between them. (There are generalizations of bilinear interpolation that work for arbitrary convex polygons, but they look like big kegs of headache juice.) Bilinear interpolation in a rectangle is just a few linear interpolations (lerps). While we’re lerping, we can take the slope of the (lon, lat) coordinates across the distances of the rectangle edges and interpolate those too. This means that at any point in – and around – the rectangle, we can get a local estimate of where we are and which directions are east and north and what their scales are, all in terms of pixel space. This amounts to much the same as the triangle affine idea but translated into the tongue of the rectangles.
One way of looking at our setup now is that we have a tangent space we can use to answer questions on the template: Given that I am here in pixel x and y coordinates, if I want to go there in longitude and latitude coordinates, then based on the correspondences implied by the nearest GCPs, how many pixels right or left, and up or down, should I move? It’s a mouthful but it’s makes sense if you sketch it out. After a query like this, we can look at the answer and move in that direction, probably by a smaller distance than the gradient suggests, and maybe carrying over some influence from the last few gradients we saw to add momentum. This is gradient descent.
Gradient descent worked pretty well, but as I tuned it I started to understand what someone with more experience in optimization would have seen from the beginning: the problem we’re thinking about here is much smoother (easier, really) than the kind of problem that gradient descent tends to be good at. Our GCPs are low-dimensional and don’t have a lot of sneaky local minima, so a lot of what makes gradient descent good at what it does (for similarly defined but differently textured problems) is wasted effort here. It’s like trying to dice onions with a hand saw. Possible certainly, elegant not really.
It was fun, however, to think about how descending in a landscape is one of the prime metaphors for gradient descent, and here it is actually moving around in a representation of an actual landscape, its gradients containing real mountains where the GCPs are offset by layover. Spatial metaphors skulking and chittering everywhere.
The Euler method. If we tune away the unhelpful-to-this-problem parts of gradient descent, so that it takes full-vector steps and doesn’t use momentum, what that looks like is: We start at a random point, we query the tangent space to ask that point where it thinks we’re supposed to end up, and we go directly there. We iterate until we’re happy. This is roughly analogous to the Euler method [frantically waving a fist full of the good chalk to distract mathematicians], and it works really well. The main thing that bothers me is that as you watch it converge you can see the overall bend in the data throwing everything off in the first step or two. If we belong in the north of the image but land in the south of the image, the first guess will be much too far east, for example. Predictable error should be factored out!
Fourth-order Runge–Kutta, kind of. So let’s keep the lerped field but get fancy with how we sample it. RK4 pulls in far more information per step: it looks at the local lon/lat vector, then takes four carefully placed samples that combine into a kind of cross-calibrated estimate of the true direction. This is a common tool in engineering and simulation because, without noise, and with a reasonably small timestep, it boxes in the solution and cancels out its own errors in a way that gives really astonishing accuracy. But in testing here, it wasn’t that much faster than Euler to get to an acceptably close point, because (best I can tell) the noise in the GCP field is (a) basically random and (b) almost entirely at one scale.
Consider what happens if we’re closing in on the final label placement and all the points that RK4 samples are from within the same rectangle of GCPs. Since the vectors at those points are all based on the same information (from the rectangle vertexes), each one after the first can only add information in proportion to the spatial biases of the interpolation. In other words, RK4 is overdoing it for this data.
In the bigger picture, RK4 is not for this kind of problem. (As people with a solid grasp of numerical methods have been yelling at the screen since they saw the names Runge and Kutta.) RK4 is designed around things like initial value problems: If a particle in fluid flow (say) is here and the nearby particles are doing this and the rules for fluid motion are such, then where will the particle be at each timestep from t=0 through t=∞? You use RK4 to efficiently approach the impossibility of tracing your particle in continuous and infinite time. But what we actually want here is almost the mirror image of that. We start with a label at an undefined or arbitrary position and the only state we care about is at t=∞. So, very much unlike the kind of question RK4 is built to answer, there is nothing special about where we start, we don’t have strictly defined rules for how we must move, and we don’t care about continuity in any dimensions – in fact, what we want is to discontinuously teleport to the endpoint. Via sinful ruses like lying about what the timestep is, we can mislead RK4 into serving this purpose, but as with gradient descent, we’re using a good tool wrong. It’s like chopping the onion with a scalpel.
I think of gradient descent and RK4 as bracketing Euler. They are Too Much in roughly opposite ways – very broadly, GD is designed for more noise and RK4 is designed for less. Euler, in the middle, is a bit too basic even for this relatively simple problem. Ideally we’d like something a little to the RK4 side of Euler.
And you might say: It sounds like you gave up on the idea of solving the problem generally, and are thinking about the details of the data you actually have. This is fair. But once I felt comfortable with some general solutions, I wanted to actually do a good job with this data.
The way it is implemented

Second-order Runge–Kutta, kind of. This is also called the (or a) midpoint method, and is itself a kind of midpoint between Euler and RK4. We implement it in a time-free way like this. Call the label’s current position A. We query the tangents here and get a vector pointing to the local best estimate of our endpoint: B. We cut the line AB in half to find its midpoint, C, and query the tangent vector there. Following that produces a second endpoint estimate, D. We double CD, the vector that goes from the midpoint to its estimate of the endpoint, and follow that doubled vector from A. I typed out a whole rephrasing of this but honestly? Just find a good diagram.
RK2 doesn’t build as clear a picture of nearby tangent space as RK4 does, but (as slightly weirdly applied here) it is good at accounting for modest constant curvature. Our data’s great curve isn’t constant (remember, it’s roughly a section of a sine wave), but RK2 is able to factor most of it out of each step. That means the obvious one-sided error of Euler disappears. Is RK2 clearly near-optimal for placing labels efficiently with this data, given the artificial constraints? Nope. It’s still designed for a different task than we’re using it for. And in a serious benchmark, which I have deliberately refrained from carrying out, it wouldn’t suprise me to see Euler win – just because Euler queries closer to the endpoint sooner (and thus handles noise better), even if its curve-induced bias annoys me. But RK2 makes me happy.
Choosing the first point

Right now, the initial point is chosen as the straight mean of extrapolations from two interpolated points, one in the north and the other in the south. This works but is painfully inelegant. (If nothing else, it adds two full queries per label.) One way or another, it seems to me that the tidy way to solve this is to have some kind of simple, spatially coarse model, ideally accounting for that first-order curve, that plops the label in the right general region, and then RK2 or something of that kind can cinch it up. And specifically, if I could go back in time, I’d say: try barycentric coordinates from GCP triangles if you can manage to make it continuous across triangle edges without pre-triangularizing all the GCPs.
Questions

Can we save useful information as we place each label that might speed the next placement? For example, is it worthwhile to do some kind of online learning of a curve? Or just save computed tangents? What about Bézier surfaces? What about full triangulations? What about bicubic interpolation? Maybe bicubic interpolation for the spatially coarse step?
Not that shaggy

The punchline to all this, of course, is that being able to RK2 to centimeter precision is fundamentally silly given that we don’t have anything close to centimeter accuracy. In particular, layover, which you naturally read all about when I linked to it the first time, and which is roughly equivalent to perspective but backwards, means that mountain peaks are all shifted in the −x direction and labels miss them (unless, I suppose, a GCP happened to land on top of them). So for best results, labeling should have been in pixel space anyway, and this was all just for fun.
Results

This project came out almost exactly as I had hoped. Like many people reading this, I suspect, I’ve started about ten times as many of these for-my-own-satisfaction projects as I’ve finished. Usually some barrier appears – turns out you can’t get that data, turns out that’s an O(n²) algorithm and n is huge, turns out it looks too much like a train ticket – or it just stops being fun. Part of tinkering is walking into unclear or open-ended questions. You have to balance a sincere intention to do the thing you thought of with a realistic understanding that it’s probably not possible to do properly and all you’re going to have to show for it is the memories you made along the way. (You can’t “pursue” the grail while explaining to your friends that it’s not actually about the literal grail per se. It only works if you actually want the damn grail.)
Sometimes it works and you make the thing you wanted. It’s great. And in this case, no one else seems anywhere near as entranced. I’m basing this off, like, fediverse replies. I can’t see into the minds of others. Sometimes doing something just for yourself connects with other people, and sometimes it doesn’t. Sometimes you make the personal project public but it actually stays personal. If you will forgive a corny analogy, it is as if the project, like a label with a bad scootching algorithm, made a nice first step but stopped there.
People want to pan and zoom maps by dragging, not scrolling. They see this as a map and not an image. People don’t want to learn the odd things about sar in this oblique way. People don’t happen to have lived in Washington, Oregon, and California for practically their whole lives and thus don’t find the landscapes interesting to stare at. People expect scrollytelling narratives when presented with a bunch of scrolling. People want a why. People don’t get the logic to the label fade-in and -out. These are reasonable things. People are right, or at least not wrong, to want them. I thought a certain feeling might carry viewers past them. Not really. And I don’t have easy ways to solve most of these problems. They’re design problems that are bigger than a weekend.
So for me this whole project is a mix of triumph – in making what I had in mind – and failure – to make it make sense. Some day I will do more experiments in this kind of direction. Maybe more sar, maybe true color, maybe something else. I like swaths.