creyesp/Heuristic_Benchmark.md

## Heuristic_Benchmark.md

      
    Raw
  

              Heuristic_Benchmark.md
            
          
    The below table shows some examples of heuristic benchmarks to compare the performance of a machine learning model when no previous solution exists. The original version of the table can be found in the Machine Learning Design Patterns Book (pattern 28)


Scenario
Heuristic benchmark
Example task
Implementation for example task


Regression problem where features and interactions between features are not well understood by the business.
Mean or median value of the label value over the training data. Choose the median if there are a lot of outliers.
Time interval before a question on Stack Overflow is answered.
Predict that it will take 2,120 seconds always. 2,120 seconds is the median time to first answer over the entire training dataset.


Binary classification problem where features and interactions between features are not well understood by the business.
Overall fraction of positives in the training data.
Whether or not an accepted answer in Stack Overflow will be edited.
Predict 0.36 as the output probability for all answers. 0.36 is the fraction of accepted answers overall that are edited.


Multilabel classification problem where features and interactions between features are not well understood by the business.
Distribution of the label value over the training data.
Country from which a Stack Overflow question will be answered.
Predict 0.03 for France, 0.08 for India, and so on. These are the fractions of answers written by people from France, India, and so on.


Regression problem where there is a single, very important, numeric feature.
Linear regression based on what is, intuitively, the single most important feature.
Predict taxi fare amount given pickup and dropoff locations. The distance between the two points is, intuitively, a key feature.
Fare = $4.64 per kilometer. The $4.64 is computed from the training data over all trips.


Regression problem with one or two important features. The features could be numeric or categorical but should be commonly used heuristics.
Lookup table where the rows and columns correspond to the key features (discretized if necessary) and the prediction for each cell is the average label in that cell estimated over the training data.
Predict duration of bicycle rental. Here, the two key features are the station that the bicycle is being rented from and whether or not it is peak hours for commuting.
Lookup table of average rental duration from each station based on peak hour versus nonpeak hour.


Classification problem with one or two important features. The features could be numeric or categorical.
As above, except that the prediction for each cell is the distribution of labels in that cell. If the goal is to predict a single class, compute the mode of the label in each cell.
Predict whether a Stack Overflow question will get answered within one day. The most important feature here is the primary tag.
For each tag, compute the fraction of questions that are answered within one day.


Regression problem that involves predicting the future value of a time series.
Persistence or linear trend. Take seasonality into account. For annual data, compare against the same day/week/quarter of previous year.
Predict weekly sales volume
Predict that next week’s $$ sales = s_0 $$ where $$s_0$$ is the sales this week. (or) Next week’s $$sales = s_0 + (s_0 - s_{-1})$$ where $$s_{-1}$$ is last week’s sales. (or) Next week’s $$ sales = s_{-1y}$$ where $$s_{-1y}$$ is the sales of the corresponding week last year. Avoid the temptation to combine the three options since the value of the relative weights is not intuitive.


Classification problem currently being solved by human experts. This is common for image, video, and text tasks and includes scenarios where it is cost-prohibitive to routinely solve the problem with human experts.
Performance of human experts.
Detecting eye disease from retinal scans.
Have three or more physicians examine each image. Treat the decision of a majority of physicians as being correct, and look at the percentile ranking of the ML model among human experts.


Preventive or predictive maintenance.
Perform maintenance on a fixed schedule.
Preventive maintenance of a car.
Bring cars in for maintenance once every three months. The three months is the median time to failure of cars from the last service date.


Anomaly detection.
99th percentile value estimated from the training dataset.
Identify a denial of service (DoS) attack from network traffic.
Find the 99th percentile of the number of requests per minute in the historical data. If over any one-minute period, the number of requests exceeds this number, flag it as a DoS attack.


Recommendation model.
Recommend the most popular item in the category of the customer’s last purchase.
Recommend movies to users.
If a user just saw (and liked) Inception (a sci-fi movie), recommend Icarus to them (the most popular sci-fi movie they haven’t yet watched).


The most important features described above are the well known features for the business side, for example the taxi fare is high related with the distance of the trip, so it should be used as the most important feature and not one that was obtained from a model.
Scenario	Heuristic benchmark	Example task	Implementation for example task
Regression problem where features and interactions between features are not well understood by the business.	Mean or median value of the label value over the training data. Choose the median if there are a lot of outliers.	Time interval before a question on Stack Overflow is answered.	Predict that it will take 2,120 seconds always. 2,120 seconds is the median time to first answer over the entire training dataset.
Binary classification problem where features and interactions between features are not well understood by the business.	Overall fraction of positives in the training data.	Whether or not an accepted answer in Stack Overflow will be edited.	Predict 0.36 as the output probability for all answers. 0.36 is the fraction of accepted answers overall that are edited.
Multilabel classification problem where features and interactions between features are not well understood by the business.	Distribution of the label value over the training data.	Country from which a Stack Overflow question will be answered.	Predict 0.03 for France, 0.08 for India, and so on. These are the fractions of answers written by people from France, India, and so on.
Regression problem where there is a single, very important, numeric feature.	Linear regression based on what is, intuitively, the single most important feature.	Predict taxi fare amount given pickup and dropoff locations. The distance between the two points is, intuitively, a key feature.	Fare = $4.64 per kilometer. The $4.64 is computed from the training data over all trips.
Regression problem with one or two important features. The features could be numeric or categorical but should be commonly used heuristics.	Lookup table where the rows and columns correspond to the key features (discretized if necessary) and the prediction for each cell is the average label in that cell estimated over the training data.	Predict duration of bicycle rental. Here, the two key features are the station that the bicycle is being rented from and whether or not it is peak hours for commuting.	Lookup table of average rental duration from each station based on peak hour versus nonpeak hour.
Classification problem with one or two important features. The features could be numeric or categorical.	As above, except that the prediction for each cell is the distribution of labels in that cell. If the goal is to predict a single class, compute the mode of the label in each cell.	Predict whether a Stack Overflow question will get answered within one day. The most important feature here is the primary tag.	For each tag, compute the fraction of questions that are answered within one day.
Regression problem that involves predicting the future value of a time series.	Persistence or linear trend. Take seasonality into account. For annual data, compare against the same day/week/quarter of previous year.	Predict weekly sales volume	Predict that next week’s $$ sales = s_0 $$ where $$s_0$$ is the sales this week. (or) Next week’s $$sales = s_0 + (s_0 - s_{-1})$$ where $$s_{-1}$$ is last week’s sales. (or) Next week’s $$ sales = s_{-1y}$$ where $$s_{-1y}$$ is the sales of the corresponding week last year. Avoid the temptation to combine the three options since the value of the relative weights is not intuitive.
Classification problem currently being solved by human experts. This is common for image, video, and text tasks and includes scenarios where it is cost-prohibitive to routinely solve the problem with human experts.	Performance of human experts.	Detecting eye disease from retinal scans.	Have three or more physicians examine each image. Treat the decision of a majority of physicians as being correct, and look at the percentile ranking of the ML model among human experts.
Preventive or predictive maintenance.	Perform maintenance on a fixed schedule.	Preventive maintenance of a car.	Bring cars in for maintenance once every three months. The three months is the median time to failure of cars from the last service date.
Anomaly detection.	99th percentile value estimated from the training dataset.	Identify a denial of service (DoS) attack from network traffic.	Find the 99th percentile of the number of requests per minute in the historical data. If over any one-minute period, the number of requests exceeds this number, flag it as a DoS attack.
Recommendation model.	Recommend the most popular item in the category of the customer’s last purchase.	Recommend movies to users.	If a user just saw (and liked) Inception (a sci-fi movie), recommend Icarus to them (the most popular sci-fi movie they haven’t yet watched).