This section will be unfortunetly brief, but if it weren't it wouldn't be in this section.
Weka is a machine learning toolset that is used for datamining. As outlined in Dr. Martin's doctorial thesis, a large set of features in the future will be extracted. These features from the webpages will then have Weka machine learning algorithms run on them to find under lieing patterns used for further the accuracy of the classification process.
In addition, under the inspiration and guidance of Dr. Martin, Jonathan Brown, her research assistant at CSU Stanislaus, is working on ways to augment the LSA with techniques to discover archetypes, imperically find noise words, and extract meaning from community structure in graphs related to the LSA N-closest algorithm. This is just to mention the projects that are most likely to be implemented in this site in the hopefully near future.
When dealing with such a high dimensional space and complicated beast as natural language, one thing is fairly certain. Machine learning algorithms and heuristics are our best avenue, so far.