Skip to content

Instantly share code, notes, and snippets.

@kiwidamien
Created August 26, 2019 04:57
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save kiwidamien/1ee8d6217610be9ed1dcda81dbc9eba4 to your computer and use it in GitHub Desktop.
Save kiwidamien/1ee8d6217610be9ed1dcda81dbc9eba4 to your computer and use it in GitHub Desktop.
Category Encoders companion gist
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@marcelkore
Copy link

your notebooks/posts are so easy to follow! I have spend the better part of this afternoon reviewing your posts. Thanks for sharing!

@kiwidamien
Copy link
Author

@marcelkore thanks for taking the time to drop a comment -- it helps to know I am not talking to myself and someone finds this useful! =)

@HaeHwan
Copy link

HaeHwan commented Mar 4, 2020

I had a problem with Hashing Encoder and it seems like the problem may also happens to yours since I used all your code exactly the same.
Would you mind if you come and visit my github and see the problem?
Here is the URL : https://github.com/HaeHwan/hello-world/blob/master/Hashing(2).ipynb

The main problem is that HashEncoder doesn't change the columns at all as you can see on the above URL.

Thanks.

@kiwidamien
Copy link
Author

Hi @HaeHwan

That's strange .... I cannot duplicate your error. If you try running the following file in the terminal, what do you get?

import pandas as pd
import category_encoders as ce

print(f"""
  Version check:
  --------------
      Pandas version:            {pd.__version__}
      Category Encoders version: {ce.__version__}
""")

df_train = pd.read_csv('https://raw.githubusercontent.com/kiwidamien/StackedTurtles/master/content/preprocessing/simple_loan_example.csv')

encoder_purpose = ce.HashingEncoder(n_components=3, cols=['purpose'])
df_transform = encoder_purpose.fit_transform(df_train)

print(df_transform)

For reference, my output is

  Version check:
  --------------
      Pandas version:            0.24.2
      Category Encoders version: 2.1.0

   col_0  col_1  col_2  annual_income  debt_to_income  loan_amount grade  repaid
0      0      0      1         120000           0.100         3500     A    True
1      0      0      1         130000           0.500        13800     C   False
2      0      0      1         220000           0.400        33500     B   False
3      0      0      1          65000           0.250         2000     B   False
4      0      0      1          60000           0.200         2200     B    True
5      1      0      0          45000           0.312         5500     D    True
6      1      0      0          75000           0.111         2000     B    True
7      0      1      0          24000           0.400          500     C   False

@HaeHwan
Copy link

HaeHwan commented Mar 5, 2020

oh finally I solved it! maybe the problem was process number within my laptop pc. I plug "max_process = 1" and now it works thank you for your kindness

@ThisIsVenkatesh
Copy link

Hi @kiwidamien, Thanks for sharing this. It helps me a lot. I'm unable to open the link to "An introduction to pipelines". Can you please look into this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment