Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save MaximePawlakFr/71a5cfbaef45ad5b0f4f23536752f229 to your computer and use it in GitHub Desktop.
Save MaximePawlakFr/71a5cfbaef45ad5b0f4f23536752f229 to your computer and use it in GitHub Desktop.
2018 - Lectures hivernales

Notes on Deep Learning with Python - François Chollet.md


  1. Define the problem and assemble a dataset
  2. Choose a measure of success (loss function)
  3. Decide on an evaluation protocol
  • K-Fold cross validation (= few samples)
  • Iterated k-fold validation (model evaluation when little data)
  1. Prepare your data
  • Data tensors
  • Normalize
  • Feature engineering
  1. Develop a model that does better than a baseline
  • Last layer activation

  • Loss function

  • Optimization configuration (default = rmsprop)

  • Binary classification

    • => Sigmoid
    • Binary crossentropy
  • Single label classification

    • => softmax
    • Categorical crossentropy
  • Multilabel classification => Sigmoid

    • Binary crossentropy
  • Regression

    • => None
    • MSE
  • Regression [0,1]

    • => Sigmoid
    • mse or binary crossentropy
  1. Scale up
  • Develop a model that overfits
  • Optimization vs Generalization
  • Underfitting vs Overfitting
  • Add layers
  • Make layers bigger
  • Train for more epochs
  1. Regularize your model and tune your hyperparameters
  • Add dropout
  • Add/remove layers Add L1/L2 regularization
  • Try hyperparameters

Computer Vision

  • Convolutional neural networks ~ convnets
  • window 3x3 / 5x5
    • pattern
    • spatial hierarchies
  • 3D tensors = feature maps
  • stride
  • downsample
    • use maxpooling

Data augmentation

Use a pretrained convnet

Deep learning for text and sequences

  • recurrent neural network
  • 1D convnets
  • N-grames => N consecutives items
  • Bags of n-grams
  • One hot encoding
  • Word embeddings

RNN

  • sequence

  • memory

  • Simple RNN

  • LSTM: Long short term memory

  • good idea to increase the capacity of the network until overfitting becomes the primary obstacles

Best Practices

  • Fonctional API
    • multiple inputs
    • acyclic graph
  • Advanced architecture patterns
    • residual connections
    • normalization
    • depthwise separable convolution
  • Automatic hyperparameter optimization
    • hyperopt
    • hyperas

Generative deep learning

  • Augment artistic creation

  • LSTM: 2014-2016

  • Text generation

  • Deep dream

  • Style transfer

  • VAE

  • GAN: Generative adversarial network

    • generator vs discriminator

Notes on Pandas cookbook - Theodore Petrou.md


Dataframe & Series Columns & Index Missing values: NaN

df.index
df.columns
df.data
type(...)

df.dtypes

series.to_frame()
s.value_counts()
s.describe()
s.isnull()
s.fillna(0)
s.dropna()

s.value_counts(normalize=True)

s.hasnans()
dataframe.isnull()
df.sum()

pd.read_csv(..., index_col="...")
df.reset_index
df.rename(index={...}, columns={...})

idx_list = df.index.tolist()
idx_list[1] = ...
df.index = idx_list

df.drop("...", axis="columns")
df.insert(loc=..., column="...", value=[])

Operations

df.filter(like="...")
df.filter(regex="...")

df.count(...) // no NaN values

df.isnull()
df.sum()
df.head()

df.memory_usage()

df.nunique()
col.astype("categorical")

df.nlargest()
df.sort_values(...)

df.drop_duplicate()

df.iloc[...] // index
df.loc[...] // label

df.columns
df.get_loc(...)

df.col.pct_change()

pd.cut(col, bins)

Tidy data => "Hadley"

  • Stack & melt
  • vs Unstack & pivot

The Zen of Python

Combining Pandas Objects

df.loc[len(df)] = {Age: ...}

pd.concat([df1, df2])

Time Series Analysis

  • date
  • time
  • datetime
  • timedelta
  • pd.Timestamp
df.between_time()
df.at_time()

df.resample("w")
df.size()

df.resample("w", on="col1")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment