Regression is a machine learning task where the goal is to predict a real value based on a set of feature vectors. There exists a large variety of regression algorithms: linear regression, logistic regression, gradient boosting or neural networks. During training, each of these algorithms adjusts the weights of a model based on the loss function used for optimization.
The choice of a loss function depends on a certain task and particular values of a metric required to achieve. Many loss functions (like MSE, MAE, RMSLE etc.) focus on predicting the expected value of a variable given a feature vector.
In this article, we will have a look at a special loss function called quantile loss used to predict particular variable quantiles. Before diving into the details of quantile loss, let us briefly revise the term of a quantile.
Quantile qₐ is a value that divides a given set of numbers in a way at which α * 100% of numbers are less than the value and (1 — α) * 100% of numbers are greater than the value.
Quantiles qₐ for α = 0.25, α = 0.5 and α = 0.75 are often utilized in statistics and called quartiles. These quartiles are denoted as Q₁, Q₂ and Q₃ respectively. Three quartiles split data into 4 equal parts.
Similarly, there are percentiles p which divide a given set of numbers by 100 equal parts. A percentile is denoted as pₐ where α is the percentage of numbers less than the corresponding value.
Quartiles Q₁, Q₂ and Q₃ correspond to percentiles p₂₅, p₅₀ and p₇₅ respectively.
In the example below, for a given set of numbers, all three quartiles are found.
Machine learning algorithms aiming to predict a particular variable quantile use quantile loss as the loss function. Before going to the formulation, let us consider a simple example.
Imagine a problem where the goal is to predict the 75-th percentile of a variable. In fact, this statement is equivalent to the one that prediction errors have to be negative in 75% of cases and in the other 25% to be positive. That is the actual intuition used behind the quantile loss.
The quantile loss formula is illustrated below. The α parameter refers to the quantile which needs to be predicted.
The value of quantile loss depends on whether a prediction is less or greater than the true value. To understand better the logic behind it, let us suppose we objective is to predict the 80-th quantile, thus the value of α = 0.8 is plugged into the equations. As a result, the formula looks like this:
Basically, in such a case, the quantile loss penalizes under-estimated predictions 4 times more than over-estimated. This way the model will be more critical to under-estimated errors and will predict higher values more often. As a result, the fitted model on average will over-estimate results approximately in 80% of cases and in 20% it will produce under-estimated.
Right now assume that two predictions for the same target were obtained. The target has a value of 40, while the predictions are 30 and 50. Let us calculate the quantile loss in both cases. Despite the fact that the absolute error of 10 is the same in both cases, the loss value is different:
- for 30, the loss value is l = 0.8 * 10 = 8
- for 50, the loss value is l = 0.2 * 10 = 2.
This loss function is illustrated in the diagram below which shows loss values for different parameters of α when the true value is 40.
Inversely, if the value of α was 0.2, then over-estimated predictions would be penalized 4 times more than the under-estimated.
The problem of predicting a certain variable quantile is called quantile regression.
Let us create a synthetic dataset with 10 000 samples where ratings of players in a video game will be estimated based on the number of playing hours.
Let us split the data on train and test in 80:20 proportion:
For comparison, let us build 3 regression models with different α values: 0.2, 0.5 and 0.8. Each of the regression models will be created by LightGBM — a library with an efficient implementation of gradient boosting.
Based on the information from the official documentation, LightGBM allows solving quantile regression problems by specifying the objective parameter as ‘quantile’ and passing a corresponding value of alpha.
After training 3 models, they can be used to obtain predictions (line 6).
Let us visualize the predictions via the code snippet below:
From the scatter plot above, it is clear that with greater values of α, models tend to generate more over-estimated results. Additionally, let us compare the predictions of each model with all target values.
This leads to the following output:
The pattern from the output is clearly seen: for any α, predicted values are greater than true values in approximately α * 100% of cases. Therefore, we can experimentally conclude that our prediction models work correctly.
Prediction errors of quantile regression models are negative approximately in α * 100% of cases and are positive in (1 — α) * 100% of cases.
We have discovered quantile loss — a flexible loss function that can be incorporated into any regression model to predict a certain variable quantile. Based on the example of LightGBM, we saw how to adjust a model, so it solves a quantile regression problem. In fact, many other popular machine learning libraries allow setting quantile loss as a loss function.
The code used in this article is available here:
All images unless otherwise noted are by the author.