In this article we will perform a research on the concepts on location and dispersion. We will explore some of the most common ways to measure quantities related to such things.
We will also study recurrence relationships in the context of arithmetic mean and variance, bringing empirical proof of the correctness of such algorithms.
We will also give a comparison between straightforward batch algorithms and recurrence-based ones for mean and variance.
Theoretical background
Location and dispersion are used to describe datasets.
Location
About location, we will refer to a NIST article, giving an intuitive but clear definition:
A fundamental task in many statistical analyses is to estimate a location parameter for the distribution; i.e., to find a typical or central value that best describes the data.
There are many ways to define and measure location. For univariate datasets, we will give the most common ones below.
Arithmetic mean
The arithmetic mean is defined as the sum of all the samples in a collection divided by their number. Formally:
As an example, computing the arithmetic mean in JavaScript is fairly straightforward:
export function mean(values) {
if (!values.length) // Avoid division by 0
return 0
return values.reduce((x, y) => x + y, 0) / values.length
}
Median
The median is defined as the point $\tilde{x}$ such that half the samples are smaller than $\tilde{x}$ and half the samples are larger than $\tilde{x}$.
Formally, let $X = { x_1, x_2, \dots, x_n }$ be a sorted sequence of samples (in growing order). Then the median is defined as:
As above, we give a very simple (and quite inefficient) JavaScript implementation to compute the median:
export function median(values) {
const n = values.length
values = values.sort()
return (values.length % 2 === 0)
? (values[(n / 2) - 1] + values[n / 2]) / 2
: values[((n + 1) / 2) - 1]
}
Mode
The mode is defined as the most recurring sample in a collection. Formally, being $P(x = x_i)$ the probability (or frequency) distribution of the collection, the mode is defined as follows:
i.e. it is the value $x_i$ for which the probability distribution $P(x_i)$ has the highest value.
Notice that unlike the arithmetic mean and the median, there is no guarantee to have a defined value for the mode. In fact, there could be multiple values for it (e.g. in a uniform distribution).
Below, a very simplistic JavaScript function to identify the mode of a collection is given:
export function mode(values) {
const counts = new Map()
for (const x of values) {
if (!counts.has(x))
counts.set(x, 0)
counts.set(x, counts.get(x) + 1)
}
let current = { c: 0 }, unique;
for (const [x, c] of counts.entries()) {
if (c === current.c)
unique = false
else if (c > current.c) {
current = {x, c}
unique = true
}
}
return unique ? current.x : null
}
Dispersion
Dispersion is another important concept in statistics. Intuitively, it measures how spread or variable a probability or frequency distribution is.
Dispersion can be measured with a number; the higher the dispersion, the more spread the distribution is.
Range
Perhaps the most simple and easy way to measure dispersion is using the range dispersion. Range dispersion is defined as follows:
Below, we give a basic JavaScript function to compute the range dispersion of a set of samples:
export function dispersion_range(values) {
return Math.max(...values) - Math.min(...values)
}
Variance
A very common way to measure dispersion is computing the variance, which is defined as follows (for a known population, not for a theoretical distribution):
The variance can be easily computed in JavaScript:
export function variance(values) {
const mu = mean(values)
const sum = values.reduce((acc, x) => acc + Math.pow(x - mu, 2), 0)
return sum / (values.length - 1)
}
Standard deviation
The standard deviation immediately follows from the definition of variance as it is defined as its square root:
The standard deviation is an important measure in statistics, as it is, for example, a key parameter of the Normal distribution.
Also given the code we just gave, computing the standard deviation in JavaScript is straightforward:
export function standard_deviation(values) {
return Math.sqrt(variance(values))
}
Conclusions
We explored the concepts of location and dispersion in statistics, giving some preliminary theoretical notions.
We also gave definitions and JavaScript implementations of the most used location and dispersion metrics, such as mean and variance.