|
Initial Experiments
Naive Bayes (simple)
Class >50K: P(C) = 0.24082548
Attribute age
Mean: 44.24984058 Standard Deviation: 10.51902772
Attribute workclass
Private 0.64821102
Self-emp-not-inc 0.09467224
Self-emp-inc 0.08135283
Federal-gov 0.04857665
Local-gov 0.08069992
State-gov 0.04622617
Without-pay 0.00013058
Never-worked 0.00013058
Attribute fnlwgt
Mean: 188005 Standard Deviation: 102541.77547231
Attribute education
Bachelors 0.28280514
Some-college 0.17665776
11th 0.00776378
HS-grad 0.21331297
Prof-school 0.05396462
Assoc-acdm 0.03385516
Assoc-voc 0.04607356
9th 0.0035637
7th-8th 0.00521828
12th 0.00432735
Masters 0.12218404
1st-4th 0.00089093
10th 0.00801833
Doctorate 0.03907344
5th-6th 0.00216368
Preschool 0.00012728
Attribute education-num
Mean: 11.61165668 Standard Deviation: 2.38512863
Attribute marital-status
Married-civ-spouse 0.85282875
Divorced 0.05912334
Never-married 0.06269113
Separated 0.00853721
Widowed 0.01095821
Married-spouse-absent 0.00445973
Married-AF-spouse 0.00140163
Attribute occupation
Tech-support 0.03705637
Craft-repair 0.12134656
Other-service 0.01800626
Sales 0.12839248
Exec-managerial 0.25691545
Prof-specialty 0.24269311
Handlers-cleaners 0.01135177
Machine-op-inspct 0.03275052
Adm-clerical 0.06628392
Farming-fishing 0.0151357
Transport-moving 0.04188413
Priv-house-serv 0.00026096
Protective-serv 0.0276618
Armed-Forces 0.00026096
Attribute relationship
Wife 0.09506818
Own-child 0.00866573
Husband 0.75430101
Not-in-family 0.10921371
Other-relative 0.00484262
Unmarried 0.02790875
Attribute race
White 0.90721387
Asian-Pac-Islander 0.03530461
Amer-Indian-Eskimo 0.00471578
Other 0.00331379
Black 0.04945195
Attribute sex
Female 0.15045263
Male 0.84954737
Attribute capital-gain
Mean: 4006.14245632 Standard Deviation: 14570.37895128
Attribute capital-loss
Mean: 195.00153042 Standard Deviation: 595.48757398
Attribute hours-per-week
Mean: 45.4730264 Standard Deviation: 11.01297093
Attribute native-country
United-States 0.92709411
Cambodia 0.00103413
England 0.00400724
Puerto-Rico 0.00168046
Canada 0.00517063
Germany 0.00581696
Outlying-US(Guam-USVI-etc) 0.00012927
India 0.0052999
Japan 0.00323164
Greece 0.00116339
South 0.00219752
China 0.00271458
Cuba 0.00336091
Iran 0.00245605
Honduras 0.00025853
Philippines 0.00801448
Italy 0.00336091
Poland 0.00168046
Jamaica 0.00142192
Vietnam 0.00077559
Mexico 0.00439504
Portugal 0.00064633
Ireland 0.00077559
France 0.00168046
Dominican-Republic 0.0003878
Laos 0.0003878
Ecuador 0.00064633
Taiwan 0.00271458
Haiti 0.00064633
Columbia 0.0003878
Hungary 0.00051706
Guatemala 0.00051706
Nicaragua 0.0003878
Scotland 0.00051706
Thailand 0.00051706
Yugoslavia 0.00090486
El-Salvador 0.00129266
Trinadad&Tobago 0.0003878
Peru 0.0003878
Hong 0.00090486
Holand-Netherlands 0.00012927
Class <=50K: P(C) = 0.75917452
Attribute age
Mean: 36.78373786 Standard Deviation: 14.02008849
Attribute workclass
Private 0.76827102
Self-emp-not-inc 0.07875926
Self-emp-inc 0.02144435
Federal-gov 0.02555994
Local-gov 0.06398648
State-gov 0.04098254
Without-pay 0.00064983
Never-worked 0.00034658
Attribute fnlwgt
Mean: 190340.8651699 Standard Deviation: 106482.27119468
Attribute education
Bachelors 0.12673836
Some-college 0.23872089
11th 0.04511643
HS-grad 0.35684832
Prof-school 0.00622574
Assoc-acdm 0.03246281
Assoc-voc 0.0413163
9th 0.01972833
7th-8th 0.02453913
12th 0.01621119
Masters 0.03092658
1st-4th 0.00658959
10th 0.03525226
Doctorate 0.00436611
5th-6th 0.01285576
Preschool 0.0021022
Attribute education-num
Mean: 9.59506472 Standard Deviation: 2.43614679
Attribute marital-status
Married-civ-spouse 0.33505884
Divorced 0.1609981
Never-married 0.41222146
Separated 0.03882396
Widowed 0.03676143
Married-spouse-absent 0.01557002
Married-AF-spouse 0.00056618
Attribute occupation
Tech-support 0.02798718
Craft-repair 0.13737978
Other-service 0.13685989
Sales 0.1155879
Exec-managerial 0.09093666
Prof-specialty 0.09886492
Handlers-cleaners 0.05567109
Machine-op-inspct 0.07594663
Adm-clerical 0.14140889
Farming-fishing 0.03812495
Transport-moving 0.05536782
Priv-house-serv 0.00645525
Protective-serv 0.01901915
Armed-Forces 0.00038991
Attribute relationship
Wife 0.03332524
Own-child 0.20229718
Husband 0.29426515
Not-in-family 0.30130227
Other-relative 0.03821888
Unmarried 0.13059128
Attribute race
White 0.8372093
Asian-Pac-Islander 0.0308999
Amer-Indian-Eskimo 0.01116279
Other 0.00998989
Black 0.11073812
Attribute sex
Female 0.38803495
Male 0.61196505
Attribute capital-gain
Mean: 148.75246764 Standard Deviation: 963.13930736
Attribute capital-loss
Mean: 53.14292071 Standard Deviation: 310.7557691
Attribute hours-per-week
Mean: 38.84021036 Standard Deviation: 12.31899464
Attribute native-country
United-States 0.9044565
Cambodia 0.00053445
England 0.00250781
Puerto-Rico 0.0042345
Canada 0.00341227
Germany 0.0038645
Outlying-US(Guam-USVI-etc) 0.00061667
India 0.00250781
Japan 0.00160335
Greece 0.00090446
South 0.00267226
China 0.00230225
Cuba 0.00291893
Iran 0.0010689
Honduras 0.00053445
Philippines 0.00567341
Italy 0.00201447
Poland 0.00201447
Jamaica 0.00296004
Vietnam 0.00259003
Mexico 0.02511922
Portugal 0.0013978
Ireland 0.00082223
France 0.00074001
Dominican-Republic 0.0028367
Laos 0.0006989
Ecuador 0.00102779
Taiwan 0.00131557
Haiti 0.00168558
Columbia 0.00238448
Hungary 0.00045223
Guatemala 0.00254892
Nicaragua 0.00135668
Scotland 0.00041112
Thailand 0.00065779
Yugoslavia 0.00045223
El-Salvador 0.00402894
Trinadad&Tobago 0.00074001
Peru 0.00123335
Hong 0.00061667
Holand-Netherlands 0.00008222
=== Error on training data ===
Correctly Classified Instances 27134 83.3328 %
Incorrectly Classified Instances 5427 16.6672 %
Mean absolute error 0.1742
Root mean squared error 0.3738
Relative absolute error 47.6362 %
Root relative squared error 87.4192 %
Total Number of Instances 32561
=== Confusion Matrix ===
a b <-- classified as
4048 3793 | a = >50K
1634 23086 | b = <=50K
=== Error on test data ===
Correctly Classified Instances 13512 82.9924 %
Incorrectly Classified Instances 2769 17.0076 %
Mean absolute error 0.1758
Root mean squared error 0.3759
Relative absolute error 48.3955 %
Root relative squared error 88.4922 %
Total Number of Instances 16281
=== Confusion Matrix ===
a b <-- classified as
1945 1901 | a = >50K
868 11567 | b = <=50K
|
|
|