Skip to content

Instantly share code, notes, and snippets.

@danielhaim1
Last active March 8, 2024 14:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danielhaim1/d67d32d029503581e21b591e84e7cfa6 to your computer and use it in GitHub Desktop.
Save danielhaim1/d67d32d029503581e21b591e84e7cfa6 to your computer and use it in GitHub Desktop.
Advanced Sequence Predictive w/ Linear Regression

Advanced Sequence Prediction

This guide offers examples into predicting sequences with JS, Python, and C implementations, with refining prediction accuracy through advanced methodologies.

$$m = \frac{n\sum xy - \sum x\sum y}{n\sum x^2 - (\sum x)^2} b = \frac{\sum y - m\sum x}{n}$$

Introduction

Linear Regression is an analytical technique employed to discern the relationship between two continuous variables. It does so by fitting a linear equation to observed data. The equation manifests as $y=mx+b$, where $y$ denotes the dependent variable, $x$ the independent variable, $m$ the slope of the line, and $b$ the y-intercept.

Table of Contents

Fundamentals

Linear regression's core lies in deriving the slope $m$ and y-intercept $b$ from a dataset, minimizing prediction errors. The formulas are as follows:

To deduce the slope and y-intercept from a dataset, the subsequent formulas are applied:

$$m = \frac{n\sum xy - \sum x\sum y}{n\sum x^2 - (\sum x)^2} b = \frac{\sum y - m\sum x}{n}$$

Where:

  • $n$ signifies the quantity of data points.
  • $x$ and $y$ are vectors denoting the coordinates of the data points in the independent and dependent dimensions, respectively.
  • $∑x$ and $∑y$ represent the cumulative sums of the $x$ and $y$ values within the dataset.
  • $∑xy$ is the aggregate of the products of corresponding $x$ and $y$ pairs
  • $∑x^2$ encapsulates the total of the squares of the $x$ values.
  • $m$ is derived as the slope of the line, calculated per the linear regression slope
  • $b$, the y-intercept of the line, is determined using the linear regression intercept formula.
$$\begin{align*} n &= 5 \\\ x &= [0, 1, 2, 3, 4] \\\ y &= [3, 6, 9, 12, 15] \\\ \sum x &= 10 \\\ \sum y &= 45 \\\ \sum xy &= 90 \\\ \sum x^2 &= 30 \\\ m &= \frac{5 \cdot 90 - 10 \cdot 45}{5 \cdot 30 - 10^2} = 3 \\\ b &= \frac{45 - 3 \cdot 10}{5} = 0 \end{align*}$$

Hence, the equation for the optimal line of fit is $y = 3x + 0$.

For predicting subsequent values in the sequence, employ the equation $y=mx+b$ with = $x=5$ (the ensuing sequence value)

Implementations

JavaScript

function forecastNextValue(series) {
  const count = series.length;
  let sumX = 0, sumY = 0, sumXY = 0, sumXX = 0;
  for (let index = 0; index < count; index++) {
    sumX += index;
    sumY += series[index];
    sumXY += index * series[index];
    sumXX += index * index;
  }
  const m = (count * sumXY - sumX * sumY) / (count * sumXX - sumX * sumX);
  const b = (sumY - m * sumX) / count;
  return m * count + b;
}

const sequence = [1, 2, 4, 7, 11];
const prediction = forecastNextValue(sequence);
console.log(prediction); // Expected output: 16.2

Python

def predict_next_value(numbers):
    n = len(numbers)
    x_sum = sum(range(n))
    y_sum = sum(numbers)
    xy_sum = sum(x * y for x, y in zip(range(n), numbers))
    xx_sum = sum(x ** 2 for x in range(n))
    
    slope = (n * xy_sum - x_sum * y_sum) / (n * xx_sum - x_sum ** 2)
    intercept = (y_sum - slope * x_sum) / n
    return slope * n + intercept

numbers = [1, 2, 4, 7, 11]
next_number = predict_next_value(numbers)
print(next_number)

C

#include <stdio.h>

double predict_next_value(int *numbers, int n) {
    int x_sum = 0, y_sum = 0, xy_sum = 0, xx_sum = 0;
    for (int i = 0; i < n; i++) {
        x_sum += i;
        y_sum += numbers[i];
        xy_sum += i * numbers[i];
        xx_sum += i * i;
    }
    
    double slope = (n * xy_sum - x_sum * y_sum) / (double)(n * xx_sum - x_sum * x_sum);
    double intercept = (y_sum - slope * x_sum) / (double)n;
    return slope * n + intercept;
}

int main() {
    int numbers[] = {1, 2, 4, 7, 11};
    int n = sizeof(numbers) / sizeof(numbers[0]);
    double next_number = predict_next_value(numbers, n);
    printf("%f\n", next_number);
    return 0;
}

Advanced Analytical Methods

Advanced techniques like Polynomial Regression, Regularization (Lasso and Ridge), and Moving Averages improve the model's accuracy, especially in non-linear datasets or to prevent overfitting. Incorporating external variables and continuous model updating are essential for adapting to new data.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment