Forecasting timeseries is a common problem in data science/machine learning. It asks, given a set of observations of the past, what the future will look like.
Some real world applications of timeseries forecasting include:
- Sales/demand forecasting: Say you're an ice cream chain. You might expect that sales will be much higher in the summer and lower in the winter, but trend higher year-over-year overall because you're investing in advertising. The sales forecasts would be useful for things like setting quota for your salespeople, financial disclosure/valuation, and inventory planning.
- Capacity planning: In a software context, capacity planning refers to ensuring enough compute resources to serve expected traffic. More broadly, capacity planning asks, how many servers, employees, meals, parking spaces, etc., are going to be needed?
- Observability and Monitoring: Anomalies in timeseries can reflect software outages. For instance, if the number of trips requested is suddenly much lower than expected, that might indicate a bug in the Uber app.
This API implements the Prophet open-source forecasting library for timeseries developed by Facebook's data science team. It's an additive regression model that combines a piece-wise linear component with yearly and monthly seasonal components, as well as a user-provided list of holidays. It works particularly well on timeseries that are seasonal, and is robust to missing data/outliers.
Prophet is already available as a Python and R package, but with Metered, you can access a production-ready version of it via a simple API call.
The benefits of Metered over running the packages yourself include:
- no need to install anything locally
- massive parallelism possible with concurrent requests
- ability to use places that don't run Python or R
- The Metered Prophet API is available in GraphQL. There is no REST API.
- Pricing for any Metered API is per request.
features
The code snippets in this guide are intended to be run interactively as you follow along. The snippets are also interrelated, so outputs are automatically propagated from snippet to snippet, as needed. This is an experimental format that we're eager to receive feedback on. Please send us a note with any thoughts you have.
For ease of use, the API keys in this guide are real keys. You can copy-and-paste them right into your code. The keys have a dynamic rate limit. To access a higher rate limit, create an account and provider your credit card information.
Below, we show how to accomplish a common forecasting flow using the Metered Prophet API. Recall that GraphQL allows you to create services by defining types and fields on those types, then providing functions for each field on each type. The Metered Prophet API is running a GraphQL service that looks like this:
Query
All GraphQL API queries are made on a single endpoint, which only accepts POST requests:
url
apiKey
The first step is to load your raw data into memory.
Your raw data should be in a csv file format with two columns: ds
and y
, containing the date and numeric value respectively. The ds
column should be YYYY-MM-DD
for a date, or YYYY-MM-DD HH:MM:SS
for a timestamp.
As our case study in this guide, we would like to forecast the number of Wikimedia pageviews for observability purposes -- if the number of pageviews is ever much lower than expected, this might indicate a softare outage.
We were able to collect, through the Wikimedia API, three years worth of Wikimedia data at a daily granularity.
This is what the first couple rows of the csv looks like:
ds | y |
---|---|
2018-01-01 | 302828330 |
2018-01-02 | 319485738 |
2018-01-03 | 322019675 |
We use historyFromURL
to load the hosted csv file into memory.
query gettingHistory{
historyFromURL(
url: "https://raw.githubusercontent.com/yirenlu92/metered/main/prophet/wikimedia-observability.csv"
schema: { ds: "ds", y: "y" }
) {
records {
ds
y
}
}
}
with gettingHistory.historyFromURL.records as history
historyPlot
As you can see, this metric has a weekly seasonality that makes Prophet a good fit.
The above process loads data for a single timeseries. In practice, you'll often want to forecast multiple timeseries in parallel (Even a single metric/timeseries often needs to be forecasted multiple times , for instance for different cities/countries). For instructions on how to do that, you can refer to the guide for the Generic Batch Job API. The instructions there will import the raw data from a csv file and perform some aggregation/cleaning, then save it in a sqlite database.
The next step is to fit the Prophet model, which can be done by querying fitProphet
on the History
type. fitProphet
returns a type ProphetModelWithHistory
The next step is to use the trained model to forecast the data into the future. This can be done by querying forecast
on the ProphetModelWithHistory
type. forecast
takes a futurePeriods
parameter, which is the number of units into the future you would like to forecast, and a futureFreq
parameter, which denotes the unit. It returns a Forecast
type.
The final step is to retrieve summary statistics and the forecasted data, which can be done starting at the history
and future
fields on the Forecast
type. Recall that GraphQL returns only the leaves of the graph, and you have to specify the entire path down to any particular leaf that you want.
query fullForecast{
historyFromURL(
url: "https://raw.githubusercontent.com/yirenlu92/metered/main/prophet/wikimedia-observability.csv"
schema: { ds: "ds", y: "y" }
) {
fitProphet {
forecast(futurePeriods: 5, futureFreq:"D") {
history {
records {
ds
y
yhat
yhatLower
yhatUpper
error
relativeError
boundedRelativeError
}
metrics {
me
mse
mpe
nrmse
}
}
future {
records {
ds
yhat
yhatLower
yhatUpper
}
}
}
}
}
}
What happens if you run Prophet and you don't like the results?
You can easily tune the model by changing up parameters like the number of changepoints in the piece-wise linear model, which determines how flexible the curve is. Refer to the documentation
for a full reference on the parameters of a Prophet model.
with fullForecast.historyFromURL.fitProphet.forecast.history.records as history
with fullForecast.historyFromURL.fitProphet.forecast.future.records as future
forecastPlot
When the observed number of Wikimedia pageviews is anomalously low compared to what is forecasted, we would like to send an alert to our on-call engineers telling them that hey, you should check things out, there might be an outage.
To do this, we can use the confidence intervals on the forecast (yhatLower
and yHatUpper
) to create a set of thresholds. When the observed timeseries crosses the thresholds in either direction, we alert.