Latent Semantic Indexing

Under the hood and without the math


by Walt Stoneburner Copyright © 2010 by Walt Stoneburner
All Rights Reserved.

This document provides some important insights about LSI that are commonly neglected. It addresses what it can do, how it does it, what it can't do, and important limitations and data assertions that are often hidden by the smoke'n'mirrors of marketing. Best of all, we're not going to use any math.

Everything you see here is backed up by this this document from Cambridge.

Latent Semantic Indexing

Latent Semantic Indexing (LSI) works. That's not disputed.

But, like any tool, it can be misused and misunderstood.

Understanding LSI's capabilities, intended purpose, and limitations up front will help determine if LSI is a proper fit.

Clustering Isn't Ranking

Imagine you have four objects sitting in front of you: a plastic orange, a sandwich, an apple core, and a rock.
LSI can place these items into buckets based on context that it's been exposed to. You get to choose how many buckets you want, so to keep our example simple, let's say we have two. Let's see what LSI does with this:

Introduction to Information Retreival. Chapter 18: Matrix decomposition and latent semantic indexing. p. 403

SlingCode Search Results About     Articles     Links     Search Tips