LargeDataSets

From Wiki
Revision as of 10:35, 19 July 2011 by MThomas (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

AHPCRC main page for 2011-2012

Home for the AHPCRC Research Project at CSU Stanislaus. (Temporary home?? Candy bar to the student who emails Dr Thomas the best name.)

Useful AHPCRC project software

Useful AHPCRC project data sets

DBMS Performance Testing Group

Useful Travel Information For Tapia Conference Trip

LARGE GraphViz examples

Government data analysis (demos) The RPI project also includes, on their main wiki page, links to tutorials which may be useful. (Like a tutorial about how to get your demo onto the Data.gov web page successfully.)

Article on "big data" by Elizabeth Pisani

Visualizing data

Tom's clustering stuff

Hans Rosling Links Utube 200 countries 200 years GapMinder main page The Joy of Statistics

Student Final Reports, Useful Links

Hints on Technical Report Writing -- Advice on writing a good technical report, aimed at engineering/science students

blank tech report document, courtesy of the University of Houston -- start your tech report by opening this up and replacing text with your own. Preserve the given formatting.

ACM blank conference documents Do not use this for writing your tech report; we include it only for comparison purposes.

Meeting 4/12/2011

We have 5 weeks left in the semester, including this week.

Weeks ending Friday:

4/15

4/22

4/29

5/6 Acceptable (English language) draft of student reports due

5/13 Final draft of student reports due

Meeting 1/26/2011

Cal-Atlas Geospatial Clearinghouse

Geodata.gov, U.S. Maps and Data

Meeting 1/25/2011

Maps of 2008 USA Presidential Election Results, discussed by Dr Carter

Map of Facebook Friendships, sent to mailing list by Dr Martin

Map of Social Networking Site Popularity, by National Geographic (click on the image to see the full map)

Meet in GIS lab with Dr Hauselt tomorrow

Meeting 12/16

Workshop, with GIS lab work, 1/25-1/26.

Tues, 1:30, Carter's group meets, plus Tony/Carlos network exploration/stress testing Carter/Thomas meeting.

Ideas, student/project matching up, 12/7:

Carlos / Tony / Aaron / Tony -- load up a database, run tests over a network until the network can't handle the traffic and / or network data exploration

StudentProject IdeaOther Students Working on ThisFaculty To Help
Juanbaseball statistics exploration, data miningMartin
ManuelAnalyzing algorithms and networksDr. Carter
MaxIVF data, pattern discoveryDr Martin
Camerondetecting flaming, not flaming posts in newsgroup traffic using stats Martin
AaronAnalyzing algorithms and networksDr. Carter
TonyCheck network traffic and ways to overload it Dr. Carter/Thomas/Martin (as needed)
CarlosNetwork overloading / network explorationCarter/Thomas
Robertoflaming posts topic sounds interestingJesusMartin
JesusCensus data exploration, flaming postsMartin

Ideas, student/project matching up posts:

Students / professors put ideas here, express interest in projects / ideas / data sets

Meeting on Tuesday, 11/30, Notes

11/30 projects!

12/7 another 3:30 meeting

1/21 Tapia poster abstracts and student scholarship applications due

1/25-26 more workshop and GIS (Dr Hauselt, GIS expert)

1/27 Start of classes for spring semester

3/8 Student Research Competition, CSU Stan, submissions due

3/19 Student Research Competition! (prizes for winners!)

4/3-5 Tapia Conference, San Francisco

Project ideas:

Dr Martin has various sizeable data sets. Plus there is always "crawl the web." Plus the Stanford SNAP data sets. Plus data.gov.

Meeting on Tuesday, 11/30

OK: Martin, Thomas, Cameron, Tony, Carlos, Juan, Jesus, Aaron, Max, Roberto, Manuel

Tapia Conference

Tapia Conference -- the list of Big Name speakers is up, if anyone is interested. Student poster deadline is Jan 21.

Student scholarships to attend Tapia Some of our students should apply for these! Deadline is Jan 21.

CREW/DREU summer research experience (paid!) for summer 2011 - for minorities, student travels to a university for 10 weeks to Do Research. Dr Thomas has more information, if you are interested in applying.

Computing Alliance of Hispanic-Serving Institutions - has internship information

Workshop Day 1, 8/16/2010

Workshop Day 2

http://www.crisp-dm.org/ Useful knowledge discovery web site.

http://snap.stanford.edu/ SNAP Library: Stanford Network Analysis Platform

Data Mining in the news:

('The Economist' magazine articles are available in the Electronic Journals of the CSU Stanislaus library.)

"Data, data everywhere" The Economist. London: Feb 27, 2010. Vol. 394, Iss. 8671; pg. 3. "When the Sloan Digital Sky Survey started work in 2000, its telescope in New Mexico collected more data in its first few weeks than had been amassed in the entire history of astronomy. Now, a decade later, ..."

"Needle in a haystack" The Economist. London: Feb 27, 2010. Vol. 394, Iss. 8671; pg. 15. "As data become more abundant, the main problem is no longer finding the information as such but laying one's hands on the relevant bits easily and quickly. What is needed is information about information..."

"Know-alls", Sept 27th 2008, The Economist. Article about government data mining, looking for terrorists, threats to civil liberty...

"Surviving the exaflood", The Economist. Dec 6, 2008. Vol. 389, Iss. 8609; pg. 26.


Workshop Day 3

Slides Dr Thomas used: http://www.cs.csustan.edu/~mthomas/workshopDay3.pdf Included an added slide about the MySQL way to grant all permissions to a user on a particular database. (Which didn't work when Dr Thomas did it during group exercise today. Figured out why, and added the slide.)


Workshop Day 4


Workshop Day 5

WE NEED IDEAS!! Anything from a logo idea to a t-shirt slogan. The sooner the better! Thanks!

Fall Schedule

DatesCarterMartinThomas
8/30-9/26Carlos, Jesus, Manuel Cameron, Tony, AaronRoberto, Juan, Maximino
9/27-10/24Cameron, Tony, AaronRoberto, Juan, Maximino Carlos, Jesus, Manuel
10/25-11/21Roberto, Juan, MaximinoCarlos, Jesus, Manuel Cameron, Tony, Aaron
11/29 onRe-form groups for research projects

Netbook Specific Notes

(8/18/10)
If your EeePC boots into grub-rescue after an OS update, try this:
sudo apt-get install lilo
sudo lilo -M /dev/sda mbr

This should fix the Master Boot Record and allow you to choose Ubuntu or Windows again. (Fix found by Julie G.)

(8/24/10)
I discovered an interesting potential problem. If you turn your wireless hardware off (FN key + F2 key) in Windows, then boot into Ubuntu you can't turn it back on in Ubuntu. I could only turn it off and on in Windows. If I turned it off in Windows, booted into Ubuntu I had to boot into Windows, turn it back on and then shutdown before it would work in Windows or Ubuntu. -- Julie G.