Paraphraser/CalculatingBarometricPressureTrend.md

## CalculatingBarometricPressureTrend.md

      
    Raw
  

              CalculatingBarometricPressureTrend.md
            
          
    Calculating barometric pressure trend

This gist is a response to a Discord question. It explains my approach to deciding whether barometric pressure is rising, falling or remaining steady.
I do the trend calculation in Arduino (ESP8266) code. I tried to mimic the idea of setting the reference needle on a real barometer, waiting an hour, then seeing which way the pressure needle had moved with respect to the reference. The wrinkle is that "the last hour" is a moving window that updates every 10 minutes.
The BMP085 sensor is polled every 10 minutes. Until there are six observations (eg after a reboot) the code returns "training".
After that, the code does a least-squares regression on the last six observations (ie the last hour of data) to find the linear equation of best fit. In effect, the algorithm assumes that knowing the time (independent variable on the X axis) at some point in the past hour would permit the pressure at that time (dependent variable on the Y axis) to be estimated with some degree of reliability.
Then the code runs the hypothesis test H1: β₁ ≠ 0 (the slope of the linear equation is not zero). If there is sufficient evidence at α = 5%, the code returns "rising" or "falling" according to the sign of the slope, otherwise returning "steady".
When I first wrote the code I wasn't convinced that "the last six observations" was enough data for this to work but, in practice, it has proven to be both sensitive and reliable.
As well as the trend, my dashboard shows the most-recent observation plus a chart of observations over the last 24 hours. Sometimes the trend will show a value (rising, falling, steady) that doesn't appear to be supported by eyeballing the 24-hour chart. But, when I examine the synoptic chart published by the Bureau of Meteorology, the trend coming out of the algorithm is always right on the money.
Still, "caveat emptor", "use at your own risk" and "don't make life or death decisions based on this code".
Globals referenced by the function

// JSON values wrapped in quotes
const char* PayloadTrendTrainingKey     = "\"training\"";
const char* PayloadTrendFallingKey      = "\"falling\"";
const char* PayloadTrendSteadyKey       = "\"steady\"";
const char* PayloadTrendRisingKey       = "\"rising\"";

/*
 *  Note: the size of the pressure history array and the
 *  Critical value of t are related. If you change the size
 *  of the array, you must recalculate Critical_t_value.
 *  
 *  Given:
 *  
 *      Let α = 0.05    (5%)
 *      Let ν = (PressureHistorySize - 2) = (6 - 2) = 4
 *      
 *  Then use one of the following:
 *  
 *      Excel:      =T.INV(α/2,ν)  
 *      
 *      TINspire:   invt(α/2,ν)
 *      
 *  Given those inputs, the answer is:
 *  
 *      -2.776445105
 *      
 *  Critical_t_value is the absolute value of that.
 *  
 */
const size_t PressureHistorySize = 6;
const double Critical_t_value = 2.776445105;

// the array of pressures
double pressures[PressureHistorySize] = { };

// the number of elements currently in the array
size_t pressureHistoryCount = 0;

function to calculate pressure trend

Notes:

this function has the side-effect of adding a new observation to the pressures array so it can only be called when a new observation becomes available.
there is an underlying assumption that the observations are spaced (roughly) equidistant in time. If that assumption doesn't hold then you'll get an answer but it won't really mean anything.
the code could be made more efficient (eg circular buffer and running calculations for ∑(x) and so-on) but the ESP8266 running this has plenty of grunt, isn't doing anything else, and the code only has to run once every 10 minutes.
I follow the coding convention of always using curly braces even where C permits their omission (a convention now enforced by Swift 🥳).

const char* pressureAnalysisIncluding(double newPressure) {

    // have we filled the array?
    if (pressureHistoryCount < PressureHistorySize) {

        // no! add this observation to the array
        pressures[pressureHistoryCount] = newPressure;

        // bump n
        pressureHistoryCount++;
        
    } else {

        // yes! the array is full so we have to make space
        for (size_t i = 1; i < PressureHistorySize; i++) {

            pressures[i-1] = pressures[i];

        }

        // now we can fill in the last slot
        pressures[PressureHistorySize-1] = newPressure;
        
    }

    // is the array full yet?
    if (pressureHistoryCount < PressureHistorySize) {

        // no! we are still training
        return PayloadTrendTrainingKey;
        
    }

    /*
     * Step 1 : calculate the straight line of best fit
     *          (least-squares linear regression)
     */

    double sum_x = 0.0;     // ∑(x)
    double sum_xx = 0.0;    // ∑(x²)
    double sum_y = 0.0;     // ∑(y)
    double sum_xy = 0.0;    // ∑(xy)
    
    // we need n in lots of places and it's convenient as a double
    double n = 1.0 * PressureHistorySize;

    // iterate to calculate the above values
    for (size_t i = 0; i < PressureHistorySize; i++) {

        double x = 1.0 * i;
        double y = pressures[i];

        sum_x = sum_x + x;
        sum_xx = sum_xx + x * x;
        sum_y = sum_y + y;
        sum_xy = sum_xy + x * y;
        
    }

    // calculate the slope and intercept
    double slope = (sum_x*sum_y - n*sum_xy) / (sum_x*sum_x - n*sum_xx);
    double intercept = (sum_y -slope*sum_x) / n;

    /*
     * Step 2 : Perform an hypothesis test on the equation of the linear
     *          model to see whether, statistically, the available data
     *          contains sufficient evidence to conclude that the slope
     *          is non-zero.
     *          
     *          Let beta1 = the slope of the regression line between
     *          fixed time intervals and pressure observations.
     *          
     *          H0: β₁ = 0    (the slope is zero)
     *          H1: β₁ ≠ 0    (the slope is not zero)
     *          
     *          The level of significance: α is 5% (0.05)
     *          
     *          The test statistic is:
     *          
     *              tObserved = (b₁ - β₁) / s_b₁
     *              
     *          In this context, b₁ is the estimated slope of the linear
     *          model and β₁ the reference value from the hypothesis
     *          being tested. s_b₁ is the standard error of b₁.
     *
     *          From H0, β₁ = 0 so the test statistic simplifies to:
     * 
     *              tObserved = b₁ / s_b₁
     *      
     *          This is a two-tailed test so half of α goes on each side
     *          of the T distribution.
     *          
     *          The degrees-of-freedom, ν, for the test is:
     *          
     *              ν = n-2 = 6 - 2 = 4
     *              
     *          The critical value (calculated externally using Excel or
     *          a graphics calculator) is:
     * 
     *              -tCritical = invt(0.05/2,4) = -2.776445105
     *      
     *          By symmetry:
     * 
     *              +tCritical = abs(-tCritical)
     *              
     *          The decision rule is:
     * 
     *              reject H0 if tObserved < -tCritical or 
     *                           tObserved > +tCritical
     *      
     *          which can be simplified to:
     * 
     *              reject H0 if abs(tObserved) > +tCritical
     * 
     *          Note that the value of +tCritical is carried in the
     *          global variable:
     *          
     *              Critical_t_value
     *              
     *          The next step is to calculate the test statistic but one
     *          of the inputs to that calculation is SSE, so we need
     *          that first.
     *      
     */

    double SSE = 0.0;        // ∑((y-ŷ)²)

    // iterate
    for (size_t i = 0; i < PressureHistorySize; i++) {

        double y = pressures[i];
        double residual = y - (intercept + slope * i);
        SSE = SSE + residual * residual;

    }

    /*    
     *          Now we can calculate the test statistic. Note the use
     *          of the fabs() function below to force the result into
     *          the positive domain for comparison with Critical_t_value
     */

    double tObserved =
        fabs(
           slope/(sqrt(SSE / (n-2.0)) / sqrt(sum_xx - sum_x*sum_x/n))
        );

    /*    
     *          Finally, make the decision and return a string
     *          summarising the conclusion.
     */

    // is tObserved further to the left or right than tCritical?
    if (tObserved > Critical_t_value) {

        // yes! what is the sign of the slope?
        if (slope < 0.0) {

            return PayloadTrendFallingKey;
            
        } else {

            return PayloadTrendRisingKey;
            
        }

    }

    // otherwise, the slope may be zero (statistically)
    return PayloadTrendSteadyKey;

}

Usage example

char payload[255] = { 0 };

// construct pressure payload eg {"localHPa": 982.31, "seaHPa": 1021.72, "trend": "trendkey", "eMask": 0, "upTime": 44113}
sprintf(
    payload,
    "{%s:%0.2f,%s:%0.2f,%s:%s,%s:%d,%s:%ld}",
    PayloadLocalPressureKey,
    bmp_sensor_event.pressure,
    PayloadSeaLevelPressureKey,
    seaLevelPressure,
    PayloadTrendLabelKey,
    pressureAnalysisIncluding(seaLevelPressure),
    PayloadErrorMaskKey,
    sensorErrorMask,
    PayloadUpTimeKey,
    upTime
);

// publish the pressure payload
publishToMQTT(mqttPressureTopic,payload,false);

There is no particular reason for using seaLevelPressure rather than bmp_sensor_event.pressure as the argument to pressureAnalysisIncluding(). It's just how I did it at the time. Either input value should produce the same result.
Sample Data

MQTT messages

2020-07-09T12:11:18+1000 /merle/pressureSensor/internal/pressure {"local":990.75,"sea":1030.94,"trend":"falling","eMask":0,"upTime":874906}
2020-07-09T12:21:18+1000 /merle/pressureSensor/internal/pressure {"local":990.73,"sea":1030.89,"trend":"falling","eMask":0,"upTime":875506}
2020-07-09T12:31:18+1000 /merle/pressureSensor/internal/pressure {"local":990.55,"sea":1030.69,"trend":"falling","eMask":0,"upTime":876106}
2020-07-09T12:41:18+1000 /merle/pressureSensor/internal/pressure {"local":990.23,"sea":1030.33,"trend":"falling","eMask":0,"upTime":876706}
2020-07-09T12:51:19+1000 /merle/pressureSensor/internal/pressure {"local":990.21,"sea":1030.27,"trend":"falling","eMask":0,"upTime":877306}

Influx query

$ influx
> use weather
> select * from pressure where time > '2020-07-09T12:10:10+10:00' tz('Australia/Sydney')
name: pressure
time                                localPressure seaLevelPressure sensor   station        trend
----                                ------------- ---------------- ------   -------        -----
2020-07-09T12:11:18.555247006+10:00 990.75        1030.94          internal pressureSensor falling
2020-07-09T12:21:18.637129389+10:00 990.73        1030.89          internal pressureSensor falling
2020-07-09T12:31:18.707571671+10:00 990.55        1030.69          internal pressureSensor falling
2020-07-09T12:41:18.78782154+10:00  990.23        1030.33          internal pressureSensor falling
2020-07-09T12:51:18.859474815+10:00 990.21        1030.27          internal pressureSensor falling

And, because it may come up…

This is how I convert local barometric pressure from the sensor to sea-level equivalent. I get t from the BMP085 sensor (logged against a different MQTT topic, which is why you don't see it in the data) while h is hard-coded to my local altitude (338m).
The results from this algorithm (see the embedded reference URL) also align with the synoptic chart published by the Bureau of Meteorology so I use it "as is" without trying to understand it.
My BMP085 is located indoors so that affects my results to some extent. Across my local expected outdoor temperature range (roughly -10°C to 45°C), the worst-case bias is under 5hPa so I don't worry about it.
float equivalentPressureAtSeaLevel  (
       float ph,    // pressure at this altitude (hPa)
       float h,     // this altitude (metres)
       float t      // temperature at this altitude (celsius)
) {

    // see https://keisan.casio.com/exec/system/1224575267

    // precalculate altitude, corrected for the temperature lapse rate
    float hl = h * 0.0065;

    return ph / pow(1.0-hl/(t+hl+273.15),5.257);
    
}