Feature | Jupyter Notebooks | Databricks Notebooks |
---|---|---|
Platform | Open-source, runs locally or on cloud platforms | Exclusive to the Databricks platform |
Collaboration and Sharing | Limited collaboration features, manual sharing | Built-in collaboration, real-time concurrent editing |
Execution | Relies on local or external servers | Execution on Databricks clusters |
Integration with Big Data | Can be integrated with Spark, requires additional configurations | Native integration with Apache Spark, optimized for big data |
Built-in Features | External tools/extensions for version control, collaboration, and visualization | Integrated with Databricks-specific features like Delta Lake, built-in support for collaboration and analytics tools |
Cost and Scaling | Local installations are often free, cloud-based solutions may have costs | Paid service, costs depend on usage, scales seamlessly with Databricks clusters |
Ease of Use | Familiar and widely used in the data science commun |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import seaborn as sns | |
# Load the dataset from Seaborn | |
diamonds = sns.load_dataset("diamonds") | |
# Create a Pandas DataFrame | |
df = pd.DataFrame(diamonds) | |
# Save the DataFrame directly as a Parquet file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import seaborn as sns | |
# Load the dataset from Seaborn | |
diamonds = sns.load_dataset("diamonds") | |
# Create a Pandas DataFrame | |
df = pd.DataFrame(diamonds) | |
# Save the DataFrame directly as a Parquet file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a test gist. 13894950uijklakd#$%^&*'\\./ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import optuna | |
import xgboost as xgb | |
from sklearn.metrics import mean_squared_error # or any other metric | |
from sklearn.model_selection import train_test_split | |
# Load the dataset | |
X, y = ... # load your own | |
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) | |
# Define the objective function for Optuna |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import numpy as np | |
import string | |
# Set the desired number of rows and columns | |
num_rows = 10_000_000 | |
num_cols = 10 | |
chunk_size = 100_000 | |
# Define an empty DataFrame to store the chunks |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import time | |
import datatable as dt | |
import pandas as pd | |
import polars as pl | |
# Define a DataFrame to store the results | |
results_df = pd.DataFrame( | |
columns=["Function", "Library", "Runtime (s)"] | |
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import random | |
import time | |
import numpy as np | |
import pandas as pd | |
from faker import Faker | |
# Set seed for reproducibility | |
random.seed(42) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM nvidia/cuda:11.2.0-runtime-ubuntu20.04 | |
# install utilities | |
RUN apt-get update && \ | |
apt-get install --no-install-recommends -y curl | |
ENV CONDA_AUTO_UPDATE_CONDA=false \ | |
PATH=/opt/miniconda/bin:$PATH | |
RUN curl -sLo ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py38_4.9.2-Linux-x86_64.sh \ | |
&& chmod +x ~/miniconda.sh \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ dvc remove dvclive.dvc models.dvc | |
$ rm -rf dvclive models | |
$ git add - all | |
$ git commit -m "Remove all experiments" | |
$ git tag "cnn32" |
NewerOlder