Does your Machine Learning Model Know What you don’t Know? — ChAI

ChAI
5 min readMar 4, 2020

Uncertainty

Uncertainty is an inherent part of knowledge, yet, in the era of high competition, many shy away from openly communicating their uncertainty about what they don’t know. The world of machine learning is no different. The pervasiveness of machine learning in our lives, from decisions regarding medical treatment to investment, means that understanding the uncertainty in predictive models is essential to assessing and managing risk.

The Two Main Types of Uncertainty

The nature of uncertainties and the manner of dealing with them has been a topic of debate by statisticians, scientists and philosophers for a long time. Often, uncertainty is categorised into two types:

  1. Epistemic Uncertainty: Some things we are uncertain about simply because we lack knowledge. This uncertainty can be reduced by gathering more information.
  2. Aleatory Uncertainty: This is due to the fundamental indeterminacy or randomness in the world, which is unpredictable regardless of how much knowledge we have.

In the context of machine learning, aleatory uncertainty arises due to the stochastic (random) variability inherent in the data-generating process. For example, data generated by an imperfect sensor has irreducible noise.

On the other hand, epistemic uncertainty accounts for uncertainty in the model parameters. This captures our lack of knowledge about the data-generating mechanism, and can be reduced by collecting more data. For example, to predict the price of copper in three months, we can reduce the epistemic uncertainty in our predictions by using not only the historical price of copper, but also extra data such as copper inventory levels and the currency values of copper-exporting countries.

Frequentist Vs Bayesian: The Two Interpretations of Probability and Types of Statistical Inference

The theory of statistics rests on describing uncertainties by using probability. Almost everyone had their introduction to statistics and probability through examples of aleatory uncertainty such as throwing a dice or flipping a coin. But, what does it actually mean to say that the probability of obtaining a head is 0.5 when tossing a fair coin? Here is where the fundamental dichotomy between the two principal theories of statistics arises.

The important distinction between frequentist and Bayesian inference is that in Bayesian statistics our knowledge about anything that is unknown can be described by probability or a probability distribution. Consider observed data y and unknown parameters θ, we can posit a model which specifies the likelihood function p(y|θ) to capture the aleatory uncertainty.

  1. Frequentist inference
  • Data y are regarded as random variables, while the parameters θ are fixed.
  • Inference about θ is not just conditional on the observed data y, but also on what might have been observed under repeated sampling.
  • It is not obvious what the epistemic uncertainty is.

2. Bayesian inference:

  • Bayesians consider both y and θ as random variables.
  • Posterior inference about θ is conditional on the particular, actually observed, realisation of y.
  • θ should have a prior probability distribution p(θ) reflecting our prior knowledge about it, before seeing the data y.
  • After seeing y, we should update our belief about θ, by using a posterior distribution calculated using Bayes theorem, p(θ | y) = p(θ)p(y|θ)p(y).
  • The posterior distribution encodes the epistemic uncertainty we have about the parameter θ.

A frequentist would say that if a coin is tossed an infinite number of times and 50% of the time we obtain heads, then the coin is fair. The frequentist interpretation is considered objective as it is based on (potentially) observable events. However, this is also a limitation as the assumption of repeatable experiments is not always practical (e.g. if we need to calculate the true surgical mortality rate of a medical procedure).

The Bayesian interpretation of probability is subjective to judgement — the probability of obtaining a head is a measure of someone’s degree of belief that a head will occur when tossing a coin. It is also possible to incorporate partial information of the process of interest (by choosing a prior distribution).

In the limit of infinite observed data, epistemic uncertainty tends to zero and Bayesian inference obtains the same result as frequentist inference.

How ChAI Deals with Uncertainty

It is common to model aleatory uncertainty using an assumed distribution family, often a Gaussian distribution. However, this is frequently an oversimplified assumption and may not capture patterns such as skewness, kurtosis and multimodality. In particular, time series data such as the log-returns of commodity spot prices often display properties of asymmetry, heavy-tailedness and heteroscedasticity. On the other hand, many machine learning and deep learning models estimate the model parameters using the frequentist approach: doing parameter point estimate using maximum likelihood methods, which mean that epistemic uncertainty is often not taken into account.

At ChAI we advocate the use of Bayesian statistics since they allow us to easily quantify both aleatory and epistemic uncertainty through the confidence bounds of our predictions. In addition, they enable us to be more explicit about the uncertainty we have in some of our input data. For instance, when we use satellite data, we may have priors around the observations we use with the variance depending on the observation noise (such as cloud cover).

So far, our research team have been focusing on:

  • Developing machine learning models that are well equipped to deal with distributions with heteroskedastic, asymmetry and heavy-tailedness properties with the aim to improve our aleatory uncertainty modelling.
  • Bayesian nonparametric methods such as Stochastic Processes to better quantify the epistemic uncertainty.

Some challenges we face are that:

  • Quantifying epistemic uncertainty using models that can model complex aleatory uncertainty is non-trivial, as it often involves evaluating posterior distributions that have no analytical form.
  • Models that can express epistemic uncertainty well have oversimplified assumptions on the distribution of aleatoric uncertainty.

We are continuing our research on developing expressive models that can accurately capture the aleatory and epistemic uncertainty jointly. Knowing what we don’t know will help us understand the limitations of our models and to develop targeted solutions that are the most impactful to businesses. It will also enable our clients to make informed decisions using our predictions, helping them to better communicate, assess and manage their risk.

Originally published at https://www.chai-uk.com on March 4, 2020.

--

--

ChAI

REMOVING THE PAIN OF COMMODITY PRICE VOLATILITY