blog.bogatron.net

A blog mostly about Python, Machine Learning, and Remote Sensing.

Unsupervised Classification of Hyperspectral Images using Latent Dirichlet Allocation

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) is a type of probabilistic topic model commonly used in natural language processing to extract topics from large collections of documents in an unsupervised manner. LDA assumes that each document in a corpus (collection of documents) is associated with a mixture of topics and the proportions of the topics varies per document. Each topic is represented as a probability distribution over a vocabulary (the set of all allowable words).

Whitening Characteristics of the Mahalanobis Distance

Mahalanobis distance is a metric used to compare a vector to a multivariate normal distribution with a given mean vector ($\boldsymbol{\mu}$) and covariance matrix ($\boldsymbol{\Sigma}$). It is often used to detect statistical outliers (e.g., in the RX anomaly detector) and also appears in the exponential term of the probability density function for the multivariate normal distribution. Here, we'll show how Mahalanobis distance is equivalent to Euclidean distance measured in the whitened space of the distribution.

Anomalously Non-Anomalous Anomaly Detection Results

The RX Anomaly Detector

In image processing, anomaly detectors are algorithms used to detect image pixels that are sufficiently different than other pixels in the same image (or within a local neighborhood of the pixel being evaluated). The RX anomaly detector [1] represents each pixel in an image as a point in an $N$-dimensional space, where $N$ is the number of bands in the image. The image background is assumed to be distributed as an $N$-dimensional Gaussian distribution with mean vector $\mathbf{\boldsymbol{\mu}}_{b}$ and covariance matrix $\boldsymbol{\Sigma}_{b}$. Each pixel is then compared to the image background by computing the squared Mahalanobis distance