# Unsupervised Classification of Hyperspectral Images using Latent Dirichlet Allocation

## Latent Dirichlet Allocation (LDA)¶

Latent Dirichlet Allocation (LDA) is a type of probabilistic topic model commonly used in natural language processing to extract topics from large collections of documents in an unsupervised manner. LDA assumes that each document in a corpus (collection of documents) is associated with a mixture of topics and the proportions of the topics varies per document. Each topic is represented as a probability distribution over a vocabulary (the set of all allowable words).

# Whitening Characteristics of the Mahalanobis Distance

Mahalanobis distance is a metric used to compare a vector to a multivariate normal distribution with a given mean vector ($\boldsymbol{\mu}$) and covariance matrix ($\boldsymbol{\Sigma}$). It is often used to detect statistical outliers (e.g., in the RX anomaly detector) and also appears in the exponential term of the probability density function for the multivariate normal distribution. Here, we'll show how Mahalanobis distance is equivalent to Euclidean distance measured in the whitened space of the distribution.

# Visualizing Dirichlet Distributions with Matplotlib

This post describes how I went about visualizing probability density functions of 3-dimensional Dirichlet distributions with matplotlib. If you're already familiar with the Dirichlet distribution, you might want to skip the next section.

## Rolling Dice¶

To understand what the Dirichlet distribution describes, it is useful to consider how it can characterize the variability of a random multinomial distribution

# Anomalously Non-Anomalous Anomaly Detection Results

## The RX Anomaly Detector¶

In image processing, anomaly detectors are algorithms used to detect image pixels that are sufficiently different than other pixels in the same image (or within a local neighborhood of the pixel being evaluated). The RX anomaly detector [1] represents each pixel in an image as a point in an $N$-dimensional space, where $N$ is the number of bands in the image. The image background is assumed to be distributed as an $N$-dimensional Gaussian distribution with mean vector $\mathbf{\boldsymbol{\mu}}_{b}$ and covariance matrix $\boldsymbol{\Sigma}_{b}$. Each pixel is then compared to the image background by computing the squared Mahalanobis distance