Flame Recognition Project
We divided into two groups:
Classification
using Weka - Juan and Roberto
For next week: come up with a list of
possible attributes building on Smokey and the other papers
Keep in mind the issue of included text in the messages
Also keep in mind preprocessing issues and how to get it into ".arff"
format
Clustering using
LSA and Dr. Carter's algorithms - Jesus and Cameron
For next week: get the data ready for
input to LSA
Keep id, but
remove the rest of the header and included text
Each message
is one document
Output should
be three files:
One with ids (index file)
One with the documents only, each document separated by a
blank line
One with ids and documents:
id1
doc1
id2
doc2
Reminder: please keep track of
what you do, so that we can write procedure manual once we are done
Papers to read:
Data to look at:
Work Report
Census Data Project
TIGER: Topologically
Integrated Geographic Encoding and Referencing system
IVF Project
Papers to look at:
Can we predict IVF outcomes?, Julie
Goodside, Leah Passmore, Lutz Hamel, Liliana Gonzalez, Tali
Silberstein, Richard Hackett, David L. Keefe and James R. Trimarchi,
Abstract presented at the 2004 First Quarterly Meeting of The New
England Fertility Society and The Annual Assembly of the New England
Fertility Society (NEFS2004), March 12
– 14, 2004.
Comparing Data Mining and Logistic Regression for Predicting IVF
Outcome, J. R. Trimarchi, J. Goodside, L. Passmore, T. Silberstein, L.
Hamel, L. Gonzalez, Abstract presented at the 59th Annual meeting of
the American Society for Reproductive Medicine (ASRM 2003), San
Antonio, TX, October 11-15, 2003.
Assessing Decision Tree Models for Clinical In-Vitro Fertilization
Data, J. R. Trimarchi, J. Goodside, L. Passmore, T. Silberstein, L.
Hamel, L. Gonzalez, Technical Report TR03-296, Dept. of Computer
Science and Statistics, University of Rhode Island, 2003.
google "preficting implantation outcome from imbalanced ivf dataset"
Data to look at:
Month-long Data Mining
Introduction
Some tools we will use:
Python
Awk
Excel
Weka
Assignments are from
KDNuggets
Preliminary materials
slides
for 8/17/10
slides
for 8/18/10
slides
for 8/19/10
Some Python Examples
counts.py
process_counts.py
fixnewlines.py
pull_ids.py
Some awk examples
ls -l | awk '$5 == 0 {print $0}'
this finds empty files in a directory
awk -f awk_test/fmt print3 > print4
<input> <output>
reformats lines to 75 characters
awk '$4 ~ /SS-A-.*/ && $2 ~ /DF-A-.*/ {print
$0}' Tm05M04codes.comb
awk '$4 ~ /SS-A[-]*.*/ && $2 ~ /DF-A[-]*.*/
{print $0}' Tm05M04codes.comb
awk '$2 ~ /DF-.*E\^.*-*.*/ {print $0}'
Tm05M04codes.comb
some matching examples
awk
link
Week 1