vojtamolda/!Differentiable Shallow Water PDE Solver.md

## !Differentiable Shallow Water PDE Solver.md

      
    Raw
  

              !Differentiable Shallow Water PDE Solver.md
            
          
    Differentiable Shallow Water PDE Solver

This repository contains source code of a shallow water partial differential equation (PDE) solver. The solver is written in differentiable Swift which allows end-to-end propagation of derivatives through time. In physics this kind of solution is traditionally called adjoint and it unlocks a new class of algorithms for optimal control of PDEs.
More details about the PDE and how to derive it from general Navier-Stokes equations can found for example on Wikipedia.
The following notebook demonstrates how the differentiability of the solver can be leveraged to solve an otherwise tricky optimal control problem.
Showcase.ipynb 
Motivation

The entire project is inspired by DiffTaichi paper, in particular the wave.py example. Besides exploring and having fun, the main goal is to benchmark differentiable Swift against the DiffTaichi framework. For the time being, the focus is only on single core CPU performance.
DiffTaichi uses an imperative programming language and transforms the Python source code in a way that preserves arithmetic intensity and parallelism. For my personal interests, DiffTaichi provides an excellent benchmark. It uses a very lightweight tape to record operations for the backward pass and focuses on physical simulations. The lightweight tape gives good performance in cases, where the studied algorithm operates on small matrices. This typically happens in physical system simulations. Tight loops and element-wise modification of matrices generally pose a significant performance bottleneck for traditional ML frameworks like PyTorch or TensorFlow 2. These frameworks were designed to run well in situations when the runtime is dominated by linear algebra operations on large matrices.
Swift for TensorFlow aims to solve these kinds of problems by integrating the differentiability directly into the compiler. This will eventually bring a wide palette of benefits, which are in depth described in this manifesto. However, as of version 0.10 the differentiability features are still new and to a large extent experimental. So before implementing a larger project it's worthwhile to spend some time and try to poke hole in the whole thing. First, this helps to better understand the strengths and weaknesses of the framework. And second, one can gain more experience how to implement different algorithms and ideas.
Goals


 Reimplement DiffTaichi's wave.py shallow water PDE solver example using only Foundation (i.e.  [[Float]] or [[Double]]). This will serve two purposes. First, get myself familiar with the differentiable API. And second, serve as a baseline to make sure the other results are not completely off. Optionally, if the results don't make sense, in forward-only mode the Foundation-only implementation should have a similar performance as a C implementation.


 Reimplement DiffTaichi's wave.py with TensorFlow (i.e. Tensor<Float>) data types. I'm curious to see how it fares against the Foundation only version. Chances are it's going to be very slow do to eager mode. Hopefully compiling the time-stepping with XLA will make things better.


 Benchmark the Swift reimplementations against DiffTaichi on a simple problem that doesn't require differentiability. This will give a baseline for the expensiveness of the backward pass relative to the forward one.


 Benchmark solving an optimization problem that also exercises the backward pass. Ideally, the parameters like the the grid size and number of time-steps should stay the same so the runtimes of the backward and forward pass can be compared.


Implementation

Solution of the shallow water hyperbolic PDE is calculated with finite-difference discretization on unit square. Time-stepping uses semi implicit Euler's schema. Time step itself is limited to stay below the Courant–Friedrichs–Lewy numerical stability limit. Values around the edges of the domain are subject to trivial Dirichlet boundary conditions - i.e. they're equal to zero with an arbitrary normal derivative.
The bread and butter of the solver is the Laplace operator. Its evaluation is the bulk of the execution time. This particular implementation approximates the laplacian with five-point stencil finite-differencing.
The repository contains 3 different implementations of the same algorithm. The key difference is the underlying data storage of the discretized Laplace operator.

A - ArrayLoopSolution.swift: Uses [[Float]] for storage. Calculations are done in a tight 2D loop.
B - TensorLoopSolution.swift: Uses Tensor for storage. Calculations are done in a tight 2D loop.
C - TensorSliceSolution.swift: Uses arithmetic operations on slices of Tensor and no loops.
D - TensorConvSolution.swift: Uses 2D convolution with a fixed kernel for calculations, Tensor  and no loops.

A and B are written in an imperative style that is very close to DiffTaichi. These are therefore the two most representative candidates for benchmarking. C was written more or less out of desperation from bugs encountered while working on A and B. It uses arithmetics on tensor slices in place of the 2D loops.
The discretized Laplace operator can be viewed as a convolution or also a special type of a Gaussian blur operation. D uses a fixed 3x3 single channel weight matrix and runs convolution on a 2D greyscale image. Shades of gray in this case represent the height of the water surface.
In some sense, solutions A and B  reimplement convolution from scratch using loops. From the performance perspective, this is a very bad idea. The problem here is that not all discretizations of the laplacian can be written as a convolution. So this benchmark has some relevance for these alternative discretizations that operate on unstructured meshes. Typical examples of this is FVM (finite volume method) or FEM (finite element method).
Conclusion

The following table contains results of the benchmark on macOS S4TF development toolchain 2020-09-16-a. There's no conclusion yet.
name                                        time         std        iterations warmup      
-------------------------------------------------------------------------------------------
Shallow Water PDE Solver.C Loop              355.480 ms ±   4.35 %         10       nan ms
Shallow Water PDE Solver.Array Loop          665.574 ms ±   1.52 %         10  1298.180 ms
Shallow Water PDE Solver.Tensor Slice        536.672 ms ±   4.60 %         10  1201.721 ms
Shallow Water PDE Solver.Tensor Slice (XLA)  640.667 ms ±   1.91 %         10  1470.422 ms
Shallow Water PDE Solver.Tensor Conv        7502.167 ms ±   0.21 %         10 14969.662 ms
Shallow Water PDE Solver.Tensor Conv (XLA)   562.674 ms ±   0.25 %         10  1333.303 ms

More details and the benchmark source code can be found in the following notebook.
Benchmarks.ipynb 
The plan is to revisit the project again when the differentiability of coroutines is working. In particular, at the moment it's not possible to differentiate through indexed assignments like anArray[i] = someVal. Progress on these issues is tracked under the following links:

https://bugs.swift.org/browse/TF-1078
https://bugs.swift.org/browse/TF-1080


## ArrayLoopSolution.swift
import Foundation
import Swim


// MARK: Solution of shallow water equation

/// Differentiable solution of shallow water equation on a unit square.
///
/// Shallow water equation is a type of hyperbolic partial differential equation (PDE). This struct
/// represents its solution calculated with finite-difference discretization on a 2D plane and at a
/// particular point in time.
///
/// More details about the shallow water PDE can found for example on
/// [Wikipedia](https://en.wikipedia.org/wiki/Shallow_water_equations)
///
/// # Domain and Discretization
/// The PDE is solved on a `<0,1>x<0,1>` square discretized with spatial step of size `Δx`.
/// Laplace operator is approximated with five-point stencil finite-differencing.
///
/// Temporal advancing uses semi implicit Euler's schema. Time step `Δt` is calculated from
/// `Δx` to stay below the Courant–Friedrichs–Lewy numerical stability limit.
///
/// # Boundary Conditions
/// Values around the edges of the domain are subject to trivial Dirichlet boundary conditions
/// (i.e. equal to 0 with an arbitrary gradient).
///
/// # Laplace Operator Δ
/// Discretization of the operator is implemented as tight loops over the water height field.
/// This is a very naive but natural implementation that serves as a performance baseline
/// on the CPU.
///
struct ArrayLoopSolution: ShallowWaterEquationSolution {
    /// Water level height
    var waterLevel: [[Float]] { u1 }
    /// Solution time
    var time: Float { t }

    /// Height of the water surface at time `t`
    private var u1: [[Float]]
    /// Height of the water surface at previous time-step `t - Δt`
    private var u0: [[Float]]
    /// Solution time
    @noDerivative private let t: Float
    /// Speed of sound
    @noDerivative private let c: Float = 340.0
    /// Dispersion coefficient
    @noDerivative private let α: Float = 0.001
    /// Number of spatial grid points
    @noDerivative private let resolution: Int = 256
    /// Spatial discretization step
    @noDerivative private var Δx: Float { 1 / Float(resolution) }
    /// Time-step calculated to stay below the CFL stability limit
    @noDerivative private var Δt: Float { (sqrt(α * α + Δx * Δx / 3) - α) / c }

    /// Creates initial solution with water level `u0` at time `t`.
    @differentiable
    init(waterLevel u0: [[Float]], time t: Float = 0.0) {
        self.u0 = u0
        self.u1 = u0
        self.t = t

        precondition(u0.count == resolution)
        precondition(u0.allSatisfy { $0.count == resolution })
    }

    /// Calculates solution stepped forward by one time-step `Δt`.
    ///
    /// - `u0` - Water surface height at previous time step
    /// - `u1` - Water surface height at current time step
    /// - `u2` - Water surface height at next time step (calculated)
    @differentiable
    func evolved() -> ArrayLoopSolution {
        var u2 = u1

        for x in 1 ..< resolution - 1 {
            for y in 1 ..< resolution - 1 {
                // FIXME: Should be u2[x][y] = ...
                u2.update(x, y, to:
                            2 * u1[x][y] +
                            (c * c * Δt * Δt + c * α * Δt) * Δ(u1, x, y) -
                            u0[x][y] - c * α * Δt * Δ(u0, x, y)
                )
            }
        }

        return ArrayLoopSolution(u0: u1, u1: u2, t: t + Δt)
    }

    /// Constructs intermediate solution with previous water level `u0`, current water level `u1` and time `t`.
    @differentiable
    private init(u0: [[Float]], u1: [[Float]], t: Float) {
        self.u0 = u0
        self.u1 = u1
        self.t = t

        precondition(u0.count == self.resolution)
        precondition(u0.allSatisfy { $0.count == self.resolution })
        precondition(u1.count == self.resolution)
        precondition(u1.allSatisfy { $0.count == self.resolution })
    }

    /// Applies discretized Laplace operator to scalar field `u` at grid points `x` and `y`.
    @differentiable
    private func Δ(_ u: [[Float]], _ x: Int, _ y: Int) -> Float {
        (                u[x][y + 1]
        + u[x - 1][y] - (4 * u[x][y]) + u[x + 1][y] +
                         u[x][y - 1]                ) / Δx / Δx
    }
}


// MARK: - Cost calculated as mean L2 distance to a target image

extension ArrayLoopSolution {

    /// Calculates mean squared error loss between the solution and a `target` grayscale image.
    @differentiable
    func meanSquaredError(to target: Swim.Image<Gray, Float>) -> Float {
        precondition(target.width == resolution && target.height == resolution)

        var mse: Float = 0.0
        for x in 0 ..< resolution {
            for y in 0 ..< resolution {
                let error = target[x, y][.gray] - u1[x][y]
                mse += error * error * Δx * Δx
            }
        }
        return mse
    }

}


// MARK: - Workaround for non-differentiable coroutines
// https://bugs.swift.org/browse/TF-1078
// https://bugs.swift.org/browse/TF-1080

fileprivate extension Array where Element == [Float] {

    @differentiable(wrt: (self, value))
    mutating func update(_ x: Int, _ y: Int, to value: Float) {
        let _ = withoutDerivative(at: (value)) { value -> Int? in
            self[x][y] = value
            return nil
        }
    }

    @derivative(of: update, wrt: (self, value))
    mutating func vjpUpdate(_ x: Int, _ y: Int, to value: Float) ->
            (value: (), pullback: (inout Array<[Float]>.TangentVector) -> Float) {

        self.update(x, y, to: value)

        func pullback(`self`: inout Array<[Float]>.TangentVector) -> Float {
            let `value` = `self`[x][y]
            `self`[x][y] = Float(0)
            return `value`
        }
        return ((), pullback)
    }
}

## Benchmarks.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              Benchmarks.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## CLoopSolution.c
#include <math.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>


const int n = 256;
const int duration = 512;

const float c = 340.0f;
const float alpha = 0.00001f;


int idx(const int x, const int y) {
    return y * n + x;
}

float laplace(const float* const u, const int x, const int y) {
    const float dx = 1.0f / n;

    return (-4 * u[idx(x,y)] \
            + u[idx(x+1,y)] + u[idx(x-1,y)] \
            + u[idx(x,y+1)] + u[idx(x,y-1)]) / dx / dx;
}

void timestep(float* u, const float* const u1, const float* const u0) {
    const float dx = 1.0f / n;
    const float dt = (sqrt(alpha * alpha + dx * dx / 3.0f) - alpha) / c;

    for (int x = 1; x < n-1; x++) {
        for (int y = 1; y < n-1; y++) {
            u[idx(x,y)] = 2 * u1[idx(x,y)] \
                          + (c * c * dt * dt + c * alpha * dt) * laplace(u1, x, y) \
                          - u0[idx(x,y)] - c * alpha * dt * laplace(u0, x, y);
        }
    }
}

void splash() {
    float* u = (float*) calloc(n*n, sizeof(float));

    float* u1 = (float*) calloc(n*n, sizeof(float));
    u1[idx(n/2, n/2)] = 100.0f;

    float* u0 = (float*) calloc(n*n, sizeof(float));
    u0[idx(n/2, n/2)] = 100.0f;

    for (int t = 0; t < duration; t++) {
        timestep(u, u1, u0);

        float* tmp = u0;
        u0 = u1;
        u1 = u;
        u = tmp;
    }

    free(u);
    free(u1);
    free(u0);
}

int main(int argc, const char * argv[]) {
    printf("Warmup...\n");
    for (int i = 0; i < 2; i++) {
        splash();
    }

    printf("Benchmarks...\n");
    for (int i = 0; i < 10; i++) {
        clock_t start = clock();
        splash();
        long double milisecs = (long double)(clock() - start) * 1000.0 / CLOCKS_PER_SEC;

        printf("%Lf ms\n", milisecs);
    }

    return 0;
}

## Optimization.gif

      
    Raw
  

              Optimization.gif
            
          
## Showcase.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              Showcase.ipynb
            
          
    Sorry, this is too big to display.
  
 
## Solution.swift

      
    Raw
  

              Solution.swift
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
        
    
## Splash.gif

      
    Raw
  

              Splash.gif
            
          
## Target.png

      
    Raw
  

              Target.png
            
          
## TensorConvSolution.swift

      
    Raw
  

              TensorConvSolution.swift
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
        
    
## TensorLoopSolution.swift

      
    Raw
  

              TensorLoopSolution.swift
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
        
    
## TensorSliceSolution.swift

      
    Raw
  

              TensorSliceSolution.swift
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
        
    
## Visualization.swift

      
    Raw
  

              Visualization.swift
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
	import Foundation
	import Swim


	// MARK: Solution of shallow water equation

	/// Differentiable solution of shallow water equation on a unit square.
	///
	/// Shallow water equation is a type of hyperbolic partial differential equation (PDE). This struct
	/// represents its solution calculated with finite-difference discretization on a 2D plane and at a
	/// particular point in time.
	///
	/// More details about the shallow water PDE can found for example on
	/// [Wikipedia](https://en.wikipedia.org/wiki/Shallow_water_equations)
	///
	/// # Domain and Discretization
	/// The PDE is solved on a `<0,1>x<0,1>` square discretized with spatial step of size `Δx`.
	/// Laplace operator is approximated with five-point stencil finite-differencing.
	///
	/// Temporal advancing uses semi implicit Euler's schema. Time step `Δt` is calculated from
	/// `Δx` to stay below the Courant–Friedrichs–Lewy numerical stability limit.
	///
	/// # Boundary Conditions
	/// Values around the edges of the domain are subject to trivial Dirichlet boundary conditions
	/// (i.e. equal to 0 with an arbitrary gradient).
	///
	/// # Laplace Operator Δ
	/// Discretization of the operator is implemented as tight loops over the water height field.
	/// This is a very naive but natural implementation that serves as a performance baseline
	/// on the CPU.
	///
	struct ArrayLoopSolution: ShallowWaterEquationSolution {
	/// Water level height
	var waterLevel: [[Float]] { u1 }
	/// Solution time
	var time: Float { t }

	/// Height of the water surface at time `t`
	private var u1: [[Float]]
	/// Height of the water surface at previous time-step `t - Δt`
	private var u0: [[Float]]
	/// Solution time
	@noDerivative private let t: Float
	/// Speed of sound
	@noDerivative private let c: Float = 340.0
	/// Dispersion coefficient
	@noDerivative private let α: Float = 0.001
	/// Number of spatial grid points
	@noDerivative private let resolution: Int = 256
	/// Spatial discretization step
	@noDerivative private var Δx: Float { 1 / Float(resolution) }
	/// Time-step calculated to stay below the CFL stability limit
	@noDerivative private var Δt: Float { (sqrt(α * α + Δx * Δx / 3) - α) / c }

	/// Creates initial solution with water level `u0` at time `t`.
	@differentiable
	init(waterLevel u0: [[Float]], time t: Float = 0.0) {
	self.u0 = u0
	self.u1 = u0
	self.t = t

	precondition(u0.count == resolution)
	precondition(u0.allSatisfy { $0.count == resolution })
	}

	/// Calculates solution stepped forward by one time-step `Δt`.
	///
	/// - `u0` - Water surface height at previous time step
	/// - `u1` - Water surface height at current time step
	/// - `u2` - Water surface height at next time step (calculated)
	@differentiable
	func evolved() -> ArrayLoopSolution {
	var u2 = u1

	for x in 1 ..< resolution - 1 {
	for y in 1 ..< resolution - 1 {
	// FIXME: Should be u2[x][y] = ...
	u2.update(x, y, to:
	2 * u1[x][y] +
	(c * c * Δt * Δt + c * α * Δt) * Δ(u1, x, y) -
	u0[x][y] - c * α * Δt * Δ(u0, x, y)
	)
	}
	}

	return ArrayLoopSolution(u0: u1, u1: u2, t: t + Δt)
	}

	/// Constructs intermediate solution with previous water level `u0`, current water level `u1` and time `t`.
	@differentiable
	private init(u0: [[Float]], u1: [[Float]], t: Float) {
	self.u0 = u0
	self.u1 = u1
	self.t = t

	precondition(u0.count == self.resolution)
	precondition(u0.allSatisfy { $0.count == self.resolution })
	precondition(u1.count == self.resolution)
	precondition(u1.allSatisfy { $0.count == self.resolution })
	}

	/// Applies discretized Laplace operator to scalar field `u` at grid points `x` and `y`.
	@differentiable
	private func Δ(_ u: [[Float]], _ x: Int, _ y: Int) -> Float {
	( u[x][y + 1]
	+ u[x - 1][y] - (4 * u[x][y]) + u[x + 1][y] +
	u[x][y - 1] ) / Δx / Δx
	}
	}


	// MARK: - Cost calculated as mean L2 distance to a target image

	extension ArrayLoopSolution {

	/// Calculates mean squared error loss between the solution and a `target` grayscale image.
	@differentiable
	func meanSquaredError(to target: Swim.Image<Gray, Float>) -> Float {
	precondition(target.width == resolution && target.height == resolution)

	var mse: Float = 0.0
	for x in 0 ..< resolution {
	for y in 0 ..< resolution {
	let error = target[x, y][.gray] - u1[x][y]
	mse += error * error * Δx * Δx
	}
	}
	return mse
	}

	}


	// MARK: - Workaround for non-differentiable coroutines
	// https://bugs.swift.org/browse/TF-1078
	// https://bugs.swift.org/browse/TF-1080

	fileprivate extension Array where Element == [Float] {

	@differentiable(wrt: (self, value))
	mutating func update(_ x: Int, _ y: Int, to value: Float) {
	let _ = withoutDerivative(at: (value)) { value -> Int? in
	self[x][y] = value
	return nil
	}
	}

	@derivative(of: update, wrt: (self, value))
	mutating func vjpUpdate(_ x: Int, _ y: Int, to value: Float) ->
	(value: (), pullback: (inout Array<[Float]>.TangentVector) -> Float) {

	self.update(x, y, to: value)

	func pullback(`self`: inout Array<[Float]>.TangentVector) -> Float {
	let `value` = `self`[x][y]
	`self`[x][y] = Float(0)
	return `value`
	}
	return ((), pullback)
	}
	}
	#include <math.h>
	#include <time.h>
	#include <stdio.h>
	#include <stdlib.h>


	const int n = 256;
	const int duration = 512;

	const float c = 340.0f;
	const float alpha = 0.00001f;


	int idx(const int x, const int y) {
	return y * n + x;
	}

	float laplace(const float* const u, const int x, const int y) {
	const float dx = 1.0f / n;

	return (-4 * u[idx(x,y)] \
	+ u[idx(x+1,y)] + u[idx(x-1,y)] \
	+ u[idx(x,y+1)] + u[idx(x,y-1)]) / dx / dx;
	}

	void timestep(float* u, const float* const u1, const float* const u0) {
	const float dx = 1.0f / n;
	const float dt = (sqrt(alpha * alpha + dx * dx / 3.0f) - alpha) / c;

	for (int x = 1; x < n-1; x++) {
	for (int y = 1; y < n-1; y++) {
	u[idx(x,y)] = 2 * u1[idx(x,y)] \
	+ (c * c * dt * dt + c * alpha * dt) * laplace(u1, x, y) \
	- u0[idx(x,y)] - c * alpha * dt * laplace(u0, x, y);
	}
	}
	}

	void splash() {
	float* u = (float) calloc(nn, sizeof(float));

	float* u1 = (float) calloc(nn, sizeof(float));
	u1[idx(n/2, n/2)] = 100.0f;

	float* u0 = (float) calloc(nn, sizeof(float));
	u0[idx(n/2, n/2)] = 100.0f;

	for (int t = 0; t < duration; t++) {
	timestep(u, u1, u0);

	float* tmp = u0;
	u0 = u1;
	u1 = u;
	u = tmp;
	}

	free(u);
	free(u1);
	free(u0);
	}

	int main(int argc, const char * argv[]) {
	printf("Warmup...\n");
	for (int i = 0; i < 2; i++) {
	splash();
	}

	printf("Benchmarks...\n");
	for (int i = 0; i < 10; i++) {
	clock_t start = clock();
	splash();
	long double milisecs = (long double)(clock() - start) * 1000.0 / CLOCKS_PER_SEC;

	printf("%Lf ms\n", milisecs);
	}

	return 0;
	}