Skip to content

Instantly share code, notes, and snippets.

@ajbouh
Forked from yirenlu92/metered_forecast.md
Last active August 2, 2021 07:26
Show Gist options
  • Save ajbouh/c116e0578f6b6232d6edf32c52556449 to your computer and use it in GitHub Desktop.
Save ajbouh/c116e0578f6b6232d6edf32c52556449 to your computer and use it in GitHub Desktop.
Forecasting Timeseries with Metered and Prophet

Forecasting timeseries with the Metered Prophet API

Forecasting timeseries

Forecasting timeseries is a common problem in data science/machine learning. It asks, given a set of observations of the past, what the future will look like.

Some real world applications of timeseries forecasting include:

  • Sales/demand forecasting: Say you're an ice cream chain. You might expect that sales will be much higher in the summer and lower in the winter, but trend higher year-over-year overall because you're investing in advertising. The sales forecasts would be useful for things like setting quota for your salespeople, financial disclosure/valuation, and inventory planning.
  • Capacity planning: In a software context, capacity planning refers to ensuring enough compute resources to serve expected traffic. More broadly, capacity planning asks, how many servers, employees, meals, parking spaces, etc., are going to be needed?
  • Observability and Monitoring: Anomalies in timeseries can reflect software outages. For instance, if the number of trips requested is suddenly much lower than expected, that might indicate a bug in the Uber app.

Facebook Prophet

This API implements the Prophet open-source forecasting library for timeseries developed by Facebook's data science team. It's an additive regression model that combines a piece-wise linear component with yearly and monthly seasonal components, as well as a user-provided list of holidays. It works particularly well on timeseries that are seasonal, and is robust to missing data/outliers.


Benefits of using a Metered API

Prophet is already available as a Python and R package, but with Metered, you can access a production-ready version of it via a simple API call.

The benefits of Metered over running the packages yourself include:

  • no need to install anything locally
  • massive parallelism possible with concurrent requests
  • ability to use places that don't run Python or R

Considerations

  • The Metered Prophet API is available in GraphQL. There is no REST API.
  • Pricing for any Metered API is per request.
features

How this guide works

The code snippets in this guide are intended to be run interactively as you follow along. The snippets are also interrelated, so outputs are automatically propagated from snippet to snippet, as needed. This is an experimental format that we're eager to receive feedback on. Please send us a note with any thoughts you have.

For ease of use, the API keys in this guide are real keys. You can copy-and-paste them right into your code. The keys have a dynamic rate limit. To access a higher rate limit, create an account and provider your credit card information.


Getting Started

Below, we show how to accomplish a common forecasting flow using the Metered Prophet API. Recall that GraphQL allows you to create services by defining types and fields on those types, then providing functions for each field on each type. The Metered Prophet API is running a GraphQL service that looks like this:

  Query

All GraphQL API queries are made on a single endpoint, which only accepts POST requests:

url
apiKey

1. Loading Timeseries

The first step is to load your raw data into memory.

Your raw data should be in a csv file format with two columns: ds and y, containing the date and numeric value respectively. The ds column should be YYYY-MM-DD for a date, or YYYY-MM-DD HH:MM:SS for a timestamp.

As our case study in this guide, we would like to forecast the number of Wikimedia pageviews for observability purposes -- if the number of pageviews is ever much lower than expected, this might indicate a softare outage.

We were able to collect, through the Wikimedia API, three years worth of Wikimedia data at a daily granularity.

This is what the first couple rows of the csv looks like:

ds y
2018-01-01 302828330
2018-01-02 319485738
2018-01-03 322019675

We use historyFromURL to load the hosted csv file into memory.

query gettingHistory{
  historyFromURL(
    url: "https://raw.githubusercontent.com/yirenlu92/metered/main/prophet/wikimedia-observability.csv"
    schema: { ds: "ds", y: "y" }
  ) {
    records {
      ds
      y
    }
  }
}
with gettingHistory.historyFromURL.records as history
historyPlot

As you can see, this metric has a weekly seasonality that makes Prophet a good fit.

The above process loads data for a single timeseries. In practice, you'll often want to forecast multiple timeseries in parallel (Even a single metric/timeseries often needs to be forecasted multiple times , for instance for different cities/countries). For instructions on how to do that, you can refer to the guide for the Generic Batch Job API. The instructions there will import the raw data from a csv file and perform some aggregation/cleaning, then save it in a sqlite database.

2. Fitting the Prophet model

The next step is to fit the Prophet model, which can be done by querying fitProphet on the History type. fitProphet returns a type ProphetModelWithHistory

3. Forecasting using the Prophet model.

The next step is to use the trained model to forecast the data into the future. This can be done by querying forecast on the ProphetModelWithHistory type. forecast takes a futurePeriods parameter, which is the number of units into the future you would like to forecast, and a futureFreq parameter, which denotes the unit. It returns a Forecast type.

4. Retrieve summary statistics and forecast metrics from Forecast

The final step is to retrieve summary statistics and the forecasted data, which can be done starting at the history and future fields on the Forecast type. Recall that GraphQL returns only the leaves of the graph, and you have to specify the entire path down to any particular leaf that you want.

query fullForecast{
  historyFromURL(
    url: "https://raw.githubusercontent.com/yirenlu92/metered/main/prophet/wikimedia-observability.csv"
    schema: { ds: "ds", y: "y" }
  ) {
    fitProphet {
      forecast(futurePeriods: 5, futureFreq:"D") {
        history {
          records {
            ds
            y
            yhat
            yhatLower
            yhatUpper
            error
            relativeError
            boundedRelativeError
          }
          metrics {
            me
            mse
            mpe
            nrmse
          }
        }
        future {
          records {
            ds
            yhat
            yhatLower
            yhatUpper
          }
        }
      }
    }
  }
}

What happens if you run Prophet and you don't like the results?

You can easily tune the model by changing up parameters like the number of changepoints in the piece-wise linear model, which determines how flexible the curve is. Refer to the documentation for a full reference on the parameters of a Prophet model.

with fullForecast.historyFromURL.fitProphet.forecast.history.records as history
with fullForecast.historyFromURL.fitProphet.forecast.future.records as future
forecastPlot

Next Steps

When the observed number of Wikimedia pageviews is anomalously low compared to what is forecasted, we would like to send an alert to our on-call engineers telling them that hey, you should check things out, there might be an outage.

To do this, we can use the confidence intervals on the forecast (yhatLower and yHatUpper) to create a set of thresholds. When the observed timeseries crosses the thresholds in either direction, we alert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment