This website uses LSA with the N-Closest metric on the IBS 70 Corpus.

Latent Semantic Analysis(LSA) is a method that uses Singular Value Decomposition(SVD) on word count vectors. First each unique word is maped to a set of orthogonal unit vectors. So, the number of unique words in a semantic space is equal to the dimension of that space. After the document is transformed into a vector it is placed in an array with the other documents of the corpus. Using SVD the corpus may be trained so that phenomenon such as polynomny and synonymny become accounted for.

After semantic space has been trained and the proper value for N is found, new documents may then be compaired to the Corpus within this semantic space. By finding the N documents in the Corpus with the the largest cosine of the angle to the new document we determine the classification of the new document. This is done simply declaring the new document to be of the same classification as the majority of the N closest documents inside the semantic space. All of this may be explored in much more detail in the doctorial thesis of Dr. Melanie Martin, professor at CSU Stanislaus.