LargeDataSets WekaAnalysis 2012 13

From Wiki
Revision as of 09:34, 23 October 2012 by MThomas (Talk | contribs) (10/23/12)

Jump to: navigation, search

WPST data analysis group, Fall 2012

Record here: SQL queries, and the create tables statements used to create intermediate tables.


This query should create a table with only the data from students appear in wpst and also in the cyerss data.

** Create Table InfoWeHave SELECT DISTINCT w.* FROM wpst_data w WHERE IN (SELECT DISTINCT FROM cyerss c) 

This query should join all three tables into one table. Needs some fixes since it provides duplicates of some of the tuples more than one time. For example for student #20 you may notice that there is duplicates for the rows that contain full data and no null fields. This query should work it only outputs less than 90 records with null values, but they can be explain. e.g. id #20 took the test while not being enroll in school and also as a graduate which we do not have record for that.

** create table  WekaFiles SELECT distinct 
W.ID AS identification, c.*, f.year,, f.efc,  f.academic, f.low_income, w.esl, w.db, w.score, w.level, w.lang, w.fresh,
 w.day_taken, w.loc_taken, w.year_taken,   w.spec  
FROM infowehave w 
LEFT JOIN fin_aid f ON w.ID = f.ID AND w.year_taken = f.year
LEFT JOIN CYERSS c ON c.ID=w.ID   and c.schoolyear = w.year_taken 

Note: ** This query statements have the statement to create a table and store the values in the table.


To do: collapse fin_aid table so one row per student (either first for each student, or last for each student) so that can use 'left join' in create wekafiles on just ID, without 'and year=year', and won't lose as many students from resulting table.

update "Spring II 20xx" terms in data to be simply "Spring 20xx". Fix "W/ Spring" semesters, too. (GIGO! GIGO!)

Drop graduate students and students who never enrolled at CSUS from data set.