|
Initial Experiments
Naive Bayes (simple)
Class >50K: P(C) = 0.24893913
Attribute age
Mean: 43.95911028 Standard Deviation: 10.26963284
Attribute workclass
Private 0.64888238
Self-emp-not-inc 0.09513039
Self-emp-inc 0.07996275
Federal-gov 0.04869611
Local-gov 0.08116019
State-gov 0.04590208
Without-pay 0.00013305
Never-worked 0.00013305
Attribute fnlwgt
Mean: 188149.96217368 Standard Deviation: 102821.73711374
Attribute education
Bachelors 0.28269537
Some-college 0.17769803
11th 0.00797448
HS-grad 0.21504519
Prof-school 0.05409357
Assoc-acdm 0.03415736
Assoc-voc 0.04585327
9th 0.00345561
7th-8th 0.00478469
12th 0.00398724
Masters 0.12214248
1st-4th 0.00093036
10th 0.00797448
Doctorate 0.03734716
5th-6th 0.0017278
Preschool 0.00013291
Attribute education-num
Mean: 11.60641982 Standard Deviation: 2.36842295
Attribute marital-status
Married-civ-spouse 0.85163007
Divorced 0.06027944
Never-married 0.06267465
Separated 0.0089155
Widowed 0.01077844
Married-spouse-absent 0.00425815
Married-AF-spouse 0.00146374
Attribute occupation
Tech-support 0.0370912
Craft-repair 0.12084552
Other-service 0.01768147
Sales 0.12908801
Exec-managerial 0.25764424
Prof-specialty 0.24089338
Handlers-cleaners 0.01116724
Machine-op-inspct 0.03270407
Adm-clerical 0.06633874
Farming-fishing 0.01542143
Transport-moving 0.04254188
Priv-house-serv 0.00026589
Protective-serv 0.02805105
Armed-Forces 0.00026589
Attribute relationship
Wife 0.09249401
Own-child 0.00865052
Husband 0.75592228
Not-in-family 0.10966196
Other-relative 0.00479106
Unmarried 0.02848017
Attribute race
White 0.91042194
Asian-Pac-Islander 0.03314255
Amer-Indian-Eskimo 0.00465859
Other 0.00292826
Black 0.04884866
Attribute sex
Female 0.1482024
Male 0.8517976
Attribute capital-gain
Mean: 3937.6798082 Standard Deviation: 14386.06001898
Attribute capital-loss
Mean: 193.75066596 Standard Deviation: 592.82558964
Attribute hours-per-week
Mean: 45.70657965 Standard Deviation: 10.73698663
Attribute native-country
United-States 0.92674526
Cambodia 0.00105974
England 0.0041065
Puerto-Rico 0.00172208
Canada 0.00490131
Germany 0.00596105
Outlying-US(Guam-USVI-etc) 0.00013247
India 0.00543118
Japan 0.00317923
Greece 0.00119221
South 0.00198702
China 0.00278183
Cuba 0.00344416
Iran 0.00251689
Honduras 0.00026494
Philippines 0.00808054
Italy 0.0033117
Poland 0.00158961
Jamaica 0.00145715
Vietnam 0.00079481
Mexico 0.00450391
Portugal 0.00066234
Ireland 0.00079481
France 0.00172208
Dominican-Republic 0.0003974
Laos 0.0003974
Ecuador 0.00066234
Taiwan 0.00264936
Haiti 0.00066234
Columbia 0.0003974
Hungary 0.00052987
Guatemala 0.00052987
Nicaragua 0.0003974
Scotland 0.0003974
Thailand 0.00052987
Yugoslavia 0.00092728
El-Salvador 0.00132468
Trinadad&Tobago 0.0003974
Peru 0.0003974
Hong 0.00092728
Holand-Netherlands 0.00013247
Class <=50K: P(C) = 0.75106087
Attribute age
Mean: 36.60806039 Standard Deviation: 13.46463126
Attribute workclass
Private 0.76829053
Self-emp-not-inc 0.07881034
Self-emp-inc 0.0209602
Federal-gov 0.02554938
Local-gov 0.0643809
State-gov 0.04130262
Without-pay 0.0006619
Never-worked 0.00004413
Attribute fnlwgt
Mean: 190338.64672905 Standard Deviation: 106571.34300494
Attribute education
Bachelors 0.12876048
Some-college 0.23568593
11th 0.04367005
HS-grad 0.36277018
Prof-school 0.00604323
Assoc-acdm 0.0332157
Assoc-voc 0.04252316
9th 0.01901191
7th-8th 0.02307014
12th 0.01539479
Masters 0.03131892
1st-4th 0.00644023
10th 0.0336127
Doctorate 0.00423467
5th-6th 0.01221879
Preschool 0.00202911
Attribute education-num
Mean: 9.62911627 Standard Deviation: 2.41359613
Attribute marital-status
Married-civ-spouse 0.33833458
Divorced 0.16605622
Never-married 0.40849918
Separated 0.03856847
Widowed 0.03300825
Married-spouse-absent 0.01500375
Married-AF-spouse 0.00052954
Attribute occupation
Tech-support 0.02801306
Craft-repair 0.13777131
Other-service 0.13591848
Sales 0.11536086
Exec-managerial 0.09070055
Prof-specialty 0.09828834
Handlers-cleaners 0.05593789
Machine-op-inspct 0.07596612
Adm-clerical 0.14222693
Farming-fishing 0.03860067
Transport-moving 0.05532028
Priv-house-serv 0.00630845
Protective-serv 0.01919005
Armed-Forces 0.00039704
Attribute relationship
Wife 0.03146514
Own-child 0.19430715
Husband 0.2994263
Not-in-family 0.30467785
Other-relative 0.03773169
Unmarried 0.13239188
Attribute race
White 0.84271151
Asian-Pac-Islander 0.02859791
Amer-Indian-Eskimo 0.01116554
Other 0.00931197
Black 0.10821307
Attribute sex
Female 0.38272422
Male 0.61727578
Attribute capital-gain
Mean: 148.89383773 Standard Deviation: 936.39227955
Attribute capital-loss
Mean: 53.44800035 Standard Deviation: 310.27026297
Attribute hours-per-week
Mean: 39.34859186 Standard Deviation: 11.95077414
Attribute native-country
United-States 0.90372329
Cambodia 0.00052875
England 0.00251157
Puerto-Rico 0.00431813
Canada 0.0031725
Germany 0.00374532
Outlying-US(Guam-USVI-etc) 0.00066094
India 0.00268782
Japan 0.00163032
Greece 0.00096938
South 0.00255563
China 0.00215907
Cuba 0.00299625
Iran 0.00110156
Honduras 0.00052875
Philippines 0.00568407
Italy 0.00198282
Poland 0.00202688
Jamaica 0.00312844
Vietnam 0.00264375
Mexico 0.02546816
Portugal 0.00136594
Ireland 0.00088125
France 0.000705
Dominican-Republic 0.00290813
Laos 0.000705
Ecuador 0.0010575
Taiwan 0.0010575
Haiti 0.00171844
Columbia 0.00242344
Hungary 0.00048469
Guatemala 0.00268782
Nicaragua 0.00141
Scotland 0.00044063
Thailand 0.00066094
Yugoslavia 0.00048469
El-Salvador 0.00405376
Trinadad&Tobago 0.00074906
Peru 0.00127781
Hong 0.00061688
Holand-Netherlands 0.00008813
=== Error on training data ===
Correctly Classified Instances 24956 82.7399 %
Incorrectly Classified Instances 5206 17.2601 %
Mean absolute error 0.1795
Root mean squared error 0.38
Relative absolute error 48.0074 %
Root relative squared error 87.8843 %
Total Number of Instances 30162
=== Confusion Matrix ===
a b <-- classified as
3854 3654 | a = >50K
1552 21102 | b = <=50K
=== Error on test data ===
Correctly Classified Instances 12411 82.4104 %
Incorrectly Classified Instances 2649 17.5896 %
Mean absolute error 0.1818
Root mean squared error 0.3827
Relative absolute error 48.8345 %
Root relative squared error 88.8846 %
Total Number of Instances 15060
=== Confusion Matrix ===
a b <-- classified as
1867 1833 | a = >50K
816 10544 | b = <=50K
|
|
|