Skip to content

Instantly share code, notes, and snippets.

View harshvardhaniimi's full-sized avatar
👨‍💻
Building something useful

Harshvardhan harshvardhaniimi

👨‍💻
Building something useful
View GitHub Profile
@harshvardhaniimi
harshvardhaniimi / Apps for Kindle and Calibre.md
Last active May 18, 2024 03:52
Apps for Kindle and Calibre

Calibre App Enhancements

  • Calibre Plugins: Enhance your Calibre app with a variety of plugins available at GitHub - Calibre Plugins by a great developer, or explore the complete directory.
    • K2pdfopt: Converts PDF to a format readable on Kindle. More information can be found on MobileRead Forums.
    • Page Count and Readability Indexes: Adds a page count column and readability indexes to the Calibre app. Details are available on GitHub - Count Pages Wiki.

Kindle Highlights Management

  • Emdash: An AI-powered wisdom indexer that organizes text snippets for better recall and learning, functioning completely offline. Open-source and free. GitHub - Emdash.
  • Remind Kindle: Sends reminder emails from your Kindle highlights using Clippings.t
@harshvardhaniimi
harshvardhaniimi / optuna_lightgbm.py
Last active July 25, 2023 22:17
Optimise Hyperparameters using Optuna
# pip install lightgbm optuna
# Also read:
# 1. https://optuna.org/
# 2. https://archive.is/sYEBT
# 3. https://archive.is/lrcTV
import optuna
import lightgbm as lgb
from sklearn.metrics import mean_squared_error
@harshvardhaniimi
harshvardhaniimi / df_clean_column_names.py
Last active June 22, 2023 00:36
This function takes in a pandas dataframe, and renames its columns by removing special characters and spaces, replacing them with underscores. It also converts all letters to lowercase.
import pandas as pd
import re
def clean_names(df):
"""
This function takes in a pandas dataframe, and renames its columns by removing special characters
and spaces, replacing them with underscores. It also converts all letters to lowercase.
"""
df.columns = df.columns.str.lower() # Convert to lowercase
df.columns = df.columns.str.replace(' ', '_', regex=False) # Replace spaces with underscores
@harshvardhaniimi
harshvardhaniimi / get_categorical_uniques.py
Created June 6, 2023 20:04
Returns a DataFrame of unique values for each categorical column in the input DataFrame
import pandas as pd
def get_categorical_uniques(df, cols=None):
"""
This function identifies categorical columns as those with the 'object' or 'category' data type.
If a list of columns is provided, it will only consider those columns. The output DataFrame
has three columns: 'Column Name' for the name of the input column, 'Number of Unique Values'
for the count of unique values in the input column, and 'Unique Values' for a list of the
unique values.
@harshvardhaniimi
harshvardhaniimi / describe_df.py
Created May 31, 2023 00:09
Describes a data frame in terms of unique values of categorical variables, range of continuous variables and proportion of missing values.
import pandas as pd
import numpy as np
def describe_dataframe(df):
"""
Describes a dataframe in terms of unique values of categorical variables,
range of continuous variables and proportion of missing values.
"""
results = []
for column in df.columns:
@harshvardhaniimi
harshvardhaniimi / compare_dataframes.py
Created April 19, 2023 19:54
A function to compare large data frames by comparing their hashes instead of values for efficiency
import pandas as pd
import hashlib
def hash_dataframe(df):
"""
Generate a hash for a DataFrame using the SHA-256 algorithm.
This function creates a hash for each row of the DataFrame using pandas' `hash_pandas_object`
and then hashes the resulting array of row hashes using `hashlib.sha256`.
@harshvardhaniimi
harshvardhaniimi / downcast.py
Last active May 31, 2023 23:06
Function to downcast a data frame in Python in place
# Citation: https://imgflip.com/i/7im94j
import pandas as pd
import numpy as np
def downcast_dataframe(df):
"""
Downcasts a pandas DataFrame's columns to the minimum resolution required, modifying it in place.
It retains categorical columns and prints the changes made along with the reduction in memory consumption.
Parameters
@harshvardhaniimi
harshvardhaniimi / 2022_11_10_population_collapse.R
Created November 11, 2022 02:16
Ratio of baby diapers sale to adult diapers sale
library(tidyverse)
library(readxl)
library(MetBrewer)
theme_set(ggthemes::theme_clean())
setwd("/Users/harshvardhan/Desktop/Dump/diapers/")
# its funny how something so simple requires such ninja skills
# 1. Data from Statista is sometimes in millions, sometimes in billions
# 2. You cannot download more than three countries of data at a time
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@harshvardhaniimi
harshvardhaniimi / Installing Python Packages via Jupyter.py
Created June 21, 2022 16:40
Installing Python Packages via Jupyter Notebook
# Install a conda package in the current Jupyter kernel
import sys
!conda install --yes --prefix {sys.prefix} numpy
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install numpy
# Learn more: https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/