# LargeDataSets Analysis IR results 2012

Try to put tags by result data and statistical data that you put on this page, so we know who generated which results.

## WPST Test Results, statistics

Insert non-formatted text here== WPST Test Results, statistics == Kristi: I created two tables with the new data, one for scores and one for sex.

SELECT avg(score)

FROM scores

NATURAL JOIN sex as s

WHERE s.sex_code = "f";

Result: 7.5375

SELECT avg(score)

FROM scores

NATURAL JOIN sex as s

WHERE s.sex_code = "m";

Result: 7.5251

I am working on entering the new data into the database.

I cleaned up the wpst data. I deleted ~5 that were duplicates- same ID, and date taken, ~5 passed at another university, the data didn't match, and one that the data was not comprehensible.

Here are some results: (I have not taken into account students that have taken the test more than once)

Average score:

(ESL = yes) 6.9555

(ESL = no) 7.9205

(test location = Turlock) 7.5204

(test location = stockton) 7.1261

(current grade level = senior) 7.3477

(current grade level = junior) 7.6041

(current grade level = soph) 7.6178

(current grade level = frsh) 7.6327

(fresh_comp = CSU stanislaus) 7.6452

(fresh_comp = Modesto Junior College) 7.6083

(fresh_comp = Delta College) 7.2529

(fresh_comp = Merced College) 7.1291

(fresh_comp = Columbia College) 8.0870

(fresh_comp = AP) 8.6389

Fresh_comp = CSU Stanislaus: (count)

Total = 775

14.8% fail

score < 7 = 115

85.2% pass

score > 6 = 660

ESL = yes and score < 7 = 94

81.7% of students failing are ESL

ESL = no and score < 7 = 21

18.3% of students failing are not ESL

Fresh_comp = Modesto Junior College: (count)

Total = 457

score < 7 = 67

14.5% fail

score > 6 = 390

85.3% pass

ESL = yes and score < 7 = 47

70.2% of students failing are ESL

ESL = no and score < 7 = 20

29.9% of students failing are not ESL

Fresh_comp = Delta College: (count)

Total = 261

score < 7 = 66

25.3% fail

score > 6 = 195

74.7% pass

ESL = yes and score < 7 = 49

74.2% of students failing are ESL

ESL = no and score < 7 = 17

25.8% of students failing are not ESL

Fresh_comp = Merced College: (count)

Total = 302

score < 7 = 93

30.8% fail

score > 6 = 209

69.2% pass

ESL = yes and score < 7 = 73

78.5 of students failing are ESL

ESL = no and score < 7 = 23

21.5% of students failing are not ESL

Fresh_comp = Columbia College: (count)

Total = 23

score < 7 = 0

score > 6 = 23

100% pass

1 ESL student

## NSSE Test Results, statistics

**Cassie**

**Test 1:**

Attributes: Grades and Commute<2 (Commute<2 means Yes if a student commutes less than 5 hrs a week, No if greater than 5 hrs a week).

Test: Naive Bayes Predict Commute based on grades.

**Results:**

Correctly Classified Instances: 731 or 90.0246 %

Incorrectly Classified Instances: 81 or 9.9754 %

Kappa statistic: 0.4972

Mean absolute error: 0.1635

Root mean squared error: 0.2902

Relative absolute error: 74.78 %

Root relative squared error: 87.9226 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 2**

Attributes: Commute and Grades<2 (Grades<2 Yes for students with grades C- and lower)

Test: Naive Bayes predict student grades based on commute times.

**Results:**

Correctly Classified Instances: 729 or 89.7783 %

Incorrectly Classified Instances: 83 or 10.2217 %

Kappa statistic: -0.0072

Mean absolute error: 0.1309

Root mean squared error: 0.2438

Relative absolute error: 73.3311 %

Root relative squared error: 81.8162 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 3**

Attributes: Commute and Grades<3 (Grades<3 Yes for students with grades C and lower)

Test: Naive Bayes predict student grades based on commute times.

**Results:**

Correctly Classified Instances: 708 or 87.1921 %

Incorrectly Classified Instances: 104 or 12.8079 %

Kappa statistic: -0.0096

Mean absolute error: 0.1747

Root mean squared error: 0.2878

Relative absolute error: 80.6035 %

Root relative squared error: 87.5789 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 4**

Attributes: Commute and Grades<4 (Grades<4 Yes for students with grades C+ and lower)

Test: Naive Bayes predict student grades based on commute times.

**Results:**

Correctly Classified Instances: 713 or 87.8079 %

Incorrectly Classified Instances: 99 or 12.1921 %

Kappa statistic: 0.4459

Mean absolute error: 0.2456

Root mean squared error: 0.347

Relative absolute error: 85.8924 %

Root relative squared error: 91.8604 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 5**

Attributes: Commute and Grades<5 (Grades<5 Yes for students with grades B- and lower)

Test: Naive Bayes predict student grades based on commute times.

**Results:**

Correctly Classified Instances: 643 or 79.1872 %

Incorrectly Classified Instances: 169 or 20.8128 %

Kappa statistic: 0.3138

Mean absolute error: 0.3469

Root mean squared error: 0.418

Relative absolute error: 89.5639 %

Root relative squared error: 95.014 %

Total Number of Instances 812

-------------------------------------------------------------------------------------------

**Test 6**

Attributes: Commute and Grades<6 (Grades<6 Yes for students with grades B and lower)

Test: Naive Bayes predict student grades based on commute times.

**Results:**
Correctly Classified Instances: 476 or 58.6207 %

Incorrectly Classified Instances: 336 or 41.3793 %

Kappa statistic: 0.1437

Mean absolute error: 0.4625

Root mean squared error: 0.4849

Relative absolute error: 92.8499 %

Root relative squared error: 97.1543 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 7**

Attributes: Commute and Grades<7 (Grades<7 Yes for students with grades B+ and lower)

Test: Naive Bayes predict student grades based on commute times.

**Results:**

Correctly Classified Instances: 517 or 63.67 %

Incorrectly Classified Instances: 295 or 36.33 %

Kappa statistic: 0

Mean absolute error: 0.4466

Root mean squared error: 0.4723

Relative absolute error: 96.5173 %

Root relative squared error: 98.2064 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 8**

Attributes: Commute and Grades<8 (Grades<8 Yes for students with grades A- and lower)

Test: Naive Bayes predict student grades based on commute times.

**Results:**

Correctly Classified Instances: 644 or 79.3103 %

Incorrectly Classified Instances: 168 or 20.6897 %

Kappa statistic: 0

Mean absolute error: 0.328

Root mean squared error: 0.4019

Relative absolute error: 99.8159 %

Root relative squared error: 99.2075 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 9**

Attributes: Caredep (Students who have dependents) and Grades<2 (Grades<2 Yes for students with grades C- and lower)

Test: Naive Bayes predict student grades based on number of dependents a student cares for.

**Results:**

Correctly Classified Instances: 732 or 90.1478 %

Incorrectly Classified Instances: 80 or 9.8522 %

Kappa statistic: 0

Mean absolute error: 0.1695

Root mean squared error: 0.2914

Relative absolute error: 94.9573 %

Root relative squared error: 97.7692 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 10**

Attributes: Caredep (Students who have dependents) and Grades<3 (Grades<3 Yes for students with grades C and lower)

Test: Naive Bayes predict student grades based on number of dependents a student cares for.

**Results:**

Correctly Classified Instances: 712 or 87.6847 %

Incorrectly Classified Instances: 100 or 12.3153 %

Kappa statistic: 0

Mean absolute error: 0.2094

Root mean squared error: 0.323

Relative absolute error: 96.5972 %

Root relative squared error: 98.2886 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 11**

Attributes: Caredep (Students who have dependents) and Grades<4 (Grades<4 Yes for students with grades C+ and lower)

Test: Naive Bayes predict student grades based on number of dependents a student cares for.

**Results:**

Correctly Classified Instances: 672 or 82.7586 %

Incorrectly Classified Instances: 140 or 17.2414 %

Kappa statistic: 0

Mean absolute error: 0.28

Root mean squared error: 0.3735

Relative absolute error: 97.9319 %

Root relative squared error: 98.8887 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 12**

Attributes: Caredep (Students who have dependents) and Grades<5 (Grades<5 Yes for students with grades B- and lower)

Test: Naive Bayes predict student grades based on number of dependents a student cares for.

**Results:**

Correctly Classified Instances: 599 or 73.7685 %

Incorrectly Classified Instances: 213 or 26.2315 %

Kappa statistic: 0

Mean absolute error: 0.385

Root mean squared error: 0.4381

Relative absolute error: 99.3872 %

Root relative squared error: 99.6021 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 13**

Attributes: Caredep (Students who have dependents) and Grades<6 (Grades<6 Yes for students with grades B and lower)

Test: Naive Bayes predict student grades based on number of dependents a student cares for.

**Results:**

Correctly Classified Instances: 412 or 50.7389 %

Incorrectly Classified Instances: 400 or 49.2611 %

Kappa statistic: -0.0396

Mean absolute error: 0.5012

Root mean squared error: 0.5032

Relative absolute error: 100.6108 %

Root relative squared error: 100.8407 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 14**

Attributes: Caredep (Students who have dependents) and Grades<7 (Grades<7 Yes for students with grades B+ and lower)

Test: Naive Bayes predict student grades based on number of dependents a student cares for.

**Results:**

Correctly Classified Instances: 517 or 63.67 %

Incorrectly Classified Instances: 295 or 36.33 %

Kappa statistic: 0

Mean absolute error: 0.462

Root mean squared error: 0.4819

Relative absolute error: 99.843 %

Root relative squared error: 100.1859 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 15**

Attributes: Caredep (Students who have dependents) and Grades<8 (Grades<8 Yes for students with grades A- and lower)

Test: Naive Bayes predict student grades based on number of dependents a student cares for.

**Results:**

Correctly Classified Instances: 644 or 79.3103 %

Incorrectly Classified Instances: 168 or 20.6897 %

Kappa statistic: 0

Mean absolute error: 0.3297

Root mean squared error: 0.4067

Relative absolute error: 100.3099 %

Root relative squared error: 100.408 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 16**

Attributes: Acadpr01 (Time students spend preparing for class) and Grades<2 (Grades<2 Yes for students with grades C- and lower)

Test: Naive Bayes predict student grades based on the time students prepare for class.

**Results:**

Correctly Classified Instances: 776 or 95.5665 %

Incorrectly Classified Instances: 36 or 4.4335 %

Kappa statistic: 0.6963

Mean absolute error: 0.126

Root mean squared error: 0.2342

Relative absolute error: 70.5637 %

Root relative squared error: 78.5953 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 17**

Attributes: Acadpr01 (Time students spend preparing for class) and Grades<3 (Grades<3 Yes for students with grades C and lower)

Test: Naive Bayes predict student grades based on the time students prepare for class.

**Results:**

Correctly Classified Instances: 712 or 87.6847 %

Incorrectly Classified Instances: 100 or 12.3153 %

Kappa statistic: 0

Mean absolute error: 0.192

Root mean squared error: 0.3096

Relative absolute error: 88.5942 %

Root relative squared error: 94.2095 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 18**

Attributes: Acadpr01 (Time students spend preparing for class) and Grades<4 (Grades<4 Yes for students with grades C+ and lower)

Test: Naive Bayes predict student grades based on the time students prepare for class.

**Results:**

Correctly Classified Instances: 718 or 88.4236 %

Incorrectly Classified Instances: 94 or 11.5764 %

Kappa statistic: 0.4517

Mean absolute error: 0.235

Root mean squared error: 0.3335

Relative absolute error: 82.1648 %

Root relative squared error: 88.2901 % Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 19**

Attributes: Acadpr01 (Time students spend preparing for class) and Grades<5 (Grades<5 Yes for students with grades B- and lower)

Test: Naive Bayes predict student grades based on the time students prepare for class.

**Results:**

Correctly Classified Instances: 646 or 79.5567

Incorrectly Classified Instances: 166 or 20.4433 %

Kappa statistic: 0.2975

Mean absolute error: 0.3476

Root mean squared error: 0.4126

Relative absolute error: 89.7546 %

Root relative squared error: 93.7989 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 20**

Attributes: Acadpr01 (Time students spend preparing for class) and Grades<6 (Grades<6 Yes for students with grades B and lower)

Test: Naive Bayes predict student grades based on the time students prepare for class.

**Results:**

Correctly Classified Instances: 459 or 56.5271 %

Incorrectly Classified Instances: 353 or 43.4729 %

Kappa statistic: 0.1036

Mean absolute error: 0.4769

Root mean squared error: 0.4874

Relative absolute error: 95.7463 %

Root relative squared error: 97.6678 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 21**

Attributes: Acadpr01 (Time students spend preparing for class) and Grades<7 (Grades<7 Yes for students with grades B+ and lower)

Test: Naive Bayes predict student grades based on the time students prepare for class.

**Results:**

Correctly Classified Instances: 517 or 63.67 %

Incorrectly Classified Instances: 295 or 36.33 %

Kappa statistic: 0

Mean absolute error: 0.4515

Root mean squared error: 0.4749

Relative absolute error: 97.5709 %

Root relative squared error: 98.7316 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

**Test 22**

Attributes: Acadpr01 (Time students spend preparing for class) and Grades<8 (Grades<8 Yes for students with grades A- and lower)

Test: Naive Bayes predict student grades based on the time students prepare for class.

**Results:**

Correctly Classified Instances: 644 or 79.3103 %

Incorrectly Classified Instances: 168 or 20.6897 %

Kappa statistic: 0

Mean absolute error: 0.3211

Root mean squared error: 0.4009

Relative absolute error: 97.6982 %

Root relative squared error: 98.9554 %

Total Number of Instances: 812

-------------------------------------------------------------------------------------------

-NOTES- I basically ran tests on commuting times, the number of dependents, and the amount of time students take to prep for classes. These three types were compared against student grades. It seems that some results are okay, but a majority of them are not so good (a high error rate).