Skip to content

Instantly share code, notes, and snippets.

View NataliiaRastoropova's full-sized avatar

Nataliia Rastoropova NataliiaRastoropova

View GitHub Profile
@NataliiaRastoropova
NataliiaRastoropova / Dataset_10_records
Last active May 14, 2019 07:45
Top 10 records from dataset
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5
...................
7 7.3 0.65 0.00 1.2 0.065 15.0 21.0 0.9946 3.39 0.47 10.0 7
8 7.8 0.58 0.02 2.0 0.073 9.0 18.0 0.9968 3.36
@NataliiaRastoropova
NataliiaRastoropova / Data_statistics
Last active May 13, 2019 13:02
Statistical characteristics
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
count 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000
mean 8.319637 0.527821 0.270976 2.538806 0.087467 15.874922 46.467792 0.996747 3.311113 0.658149 10.422983 5.636023
std 1.741096 0.179060 0.194801 1.409928 0.047065 10.460157 32.895324 0.001887 0.154386 0.169507 1.065668 0.807569
min 4.600000 0.120000 0.000000 0.900000 0.012000 1.000000 6.000000 0.990070 2.740000 0.330000 8.400000 3.000000
25% 7.100000 0.390000 0.090000 1.900000 0.07000
alcohol chlorides citric acid density fixed acidity free sulfur dioxide pH residual sugar sulphates total sulfur dioxide volatile acidity
quality
3 9.925 0.0905 0.035 0.997565 7.50 6.0 3.39 2.1 0.545 15.0 0.845
4 10.000 0.0800 0.090 0.996500 7.50 11.0 3.37 2.1 0.560 26.0 0.670
5 9.700 0.0810 0.230 0.997000 7.80 15.0 3.30 2.2 0.580 47.0 0.580
6 10.500 0.0780 0.260 0.996560 7.90 14.0 3.32 2.2 0.640 35.0 0.490
7 11.500 0.0730 0.400 0.995770 8.8
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
count 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000
mean 8.319637 0.527821 0.270976 2.538806 0.087467 15.874922 46.467792 0.996747 3.311113 0.658149 10.422983 5.636023
std 1.741096 0.179060 0.194801 1.409928 0.047065 10.460157 32.895324 0.001887 0.154386 0.169507 1.065668 0.807569
min 4.600000 0.120000 0.000000 0.900000 0.012000 1.000000 6.000000 0.990070 2.740000 0.330000 8.400000 3.000000
25% 7.100000 0.390000 0.090000 1.900000 0.07000
# Create pivot_table
colum_names = ['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol']
df_pivot_table = df.pivot_table(colum_names, ['quality'], aggfunc='median')
print(df_pivot_table)
alcohol 0.476166
sulphates 0.251397
citric acid 0.226373
fixed acidity 0.124052
residual sugar 0.013732
free sulfur dioxide -0.050656
pH -0.057731
chlorides -0.128907
density -0.174919
total sulfur dioxide -0.185100
SupportVectorClassifier: 0.873364 (0.024056)
StochasticGradientDecentC: 0.838976 (0.034672)
RandomForestClassifier: 0.884289 (0.020586)
DecisionTreeClassifier: 0.848321 (0.037492)
GaussianNB: 0.826446 (0.025753)
KNeighborsClassifier: 0.860845 (0.026717)
AdaBoostClassifier: 0.866351 (0.039970)
LogisticRegression: 0.871014 (0.028362)
Confusion matrix
[[285 2]
[ 18 15]]
Classification report
precision recall f1-score support
0 0.94 0.99 0.97 287
1 0.88 0.45 0.60 33
micro avg 0.94 0.94 0.94 320
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 11 columns, instead of 6. in line 5.
alcohol,chlorides,citric acid,density,fixed acidity,free sulfur dioxide,pH,residual sugar,sulphates,total sulfur dioxide,volatile acidity
9.955000000000002,0.12250000000000001,0.17099999999999999,0.9974640000000001,8.36,11.0,3.3979999999999997,2.6350000000000002,0.5700000000000001,24.9,0.8845000000000001
10.265094339622639,0.09067924528301885,0.1741509433962264,0.9965424528301886,7.779245283018868,12.264150943396226,3.381509433962264,2.69433962264151,0.5964150943396227,36.24528301886792,0.6939622641509429
9.899706314243753,0.09273568281938328,0.24368575624082198,0.9971036270190888,8.167254038179149,16.983847283406753,3.3049486049926546,2.528854625550658,0.6209691629955947,56.51395007342144,0.5770411160058732
10.629519331243463,0.08495611285266458,0.2738244514106587,0.9966150626959255,8.347178683385575,15.711598746081505,3.3180721003134837,2.477194357366772,0.6753291536050158,40.86990595611285,0.49748432601880965
11.465912897822443,0.07658793969849244,0.37517587939698493,0.9961042713567828,8.872361809045225,14
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 12 columns, instead of 2. in line 1.
,alcohol,chlorides,citric acid,density,fixed acidity,free sulfur dioxide,pH,residual sugar,sulphates,total sulfur dioxide,volatile acidity
quality,
3,9.955000000000002,0.12250000000000001,0.17099999999999999,0.9974640000000001,8.36,11.0,3.3979999999999997,2.6350000000000002,0.5700000000000001,24.9,0.8845000000000001
4,10.265094339622639,0.09067924528301885,0.1741509433962264,0.9965424528301886,7.779245283018868,12.264150943396226,3.381509433962264,2.69433962264151,0.5964150943396227,36.24528301886792,0.6939622641509429
5,9.899706314243753,0.09273568281938328,0.24368575624082198,0.9971036270190888,8.167254038179149,16.983847283406753,3.3049486049926546,2.528854625550658,0.6209691629955947,56.51395007342144,0.5770411160058732
6,10.629519331243463,0.08495611285266458,0.2738244514106587,0.9966150626959255,8.347178683385575,15.711598746081505,3.3180721003134837,2.477194357366772,0.6753291536050158,40.86990595611285,0.49748432601880965
7,11.465912897822443,0.07658793969849244,0.37517587939698493,0.9961042713567828,