Skip to content

Instantly share code, notes, and snippets.

@bikcrum
Created December 15, 2021 22:38
Show Gist options
  • Save bikcrum/00023a8e047036d920726be8e7f79f49 to your computer and use it in GitHub Desktop.
Save bikcrum/00023a8e047036d920726be8e7f79f49 to your computer and use it in GitHub Desktop.

Setting up miniforge for TensorFlow support

  1. Download Miniforge3-MacOSX-arm64.sh
  2. Run the file using the following command:-
    • ./Miniforge3-MacOSX-arm64.sh
  3. It will download miniforge in the current directory. Now you have to activate it. Use the following command to do so.
    • source miniforge3/bin/activate
  4. You should see (conda) is prepended in your command line. To make sure it is activated during terminal start-up. Use the following command.
    • conda init
    • or if you are using zsh, conda init zsh
  5. Make sure it is activated properly. To check it use which python. It should show .../miniforge3/bin/python. If it doesn't show it, first remove miniforge3 directory and try to install again from step 2. Also, make sure your anaconda environment is disabled.

Now we'll install TensorFlow and its dependencies.

  1. Make a new environment on the top of conda environment using the following command. This will install TensorFlow's dependencies.
    • conda install -c apple tensorflow-deps.
  2. Install Tensorflow and Tensorflow metal for mac using following command.
    • pip install tensorflow-macos
    • pip install tensorflow-metal

Troubleshooting

  1. 'miniforge3/envs/tensorflow/lib/libcblas.3.dylib' (no such file) or similar libcblas error.
    Solution: conda install -c conda-forge openblas

  2. /tensorflow/core/framework/tensor.h:880] Check failed: IsAligned() ptr = 0x101511d60
    Solution: I found this error when using Tensorflow >2.5.0 for some program. Use TensorFlow version 2.5.0. To reinstall it do following.

    • pip uninstall tensorflow-macos
    • pip uninstall tensorflow-metal
    • conda install -c apple tensorflow-deps==2.5.0 --force-reinstall (optional, try only if you get error)
    • pip install tensorflow-mac==2.5.0
    • pip install tensorflow-metal

Testing (M1 Max, 10-core CPU, 24-core GPU version)

Code:
import tensorflow as tf
import tensorflow_datasets as tfds

DISABLE_GPU = False

if DISABLE_GPU:
    try:
        # Disable all GPUS
        tf.config.set_visible_devices([], 'GPU')
        visible_devices = tf.config.get_visible_devices()
        for device in visible_devices:
            assert device.device_type != 'GPU'
    except:
        # Invalid device or cannot modify virtual devices once initialized.
        pass

print(tf.__version__)

(ds_train, ds_test), ds_info = tfds.load('mnist', split=['train', 'test'], shuffle_files=True, as_supervised=True,
                                         with_info=True)


def normalize_img(image, label):
    return tf.cast(image, tf.float32) / 255., label


ds_train = ds_train.map(normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)
ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10)
])
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

model.fit(ds_train, epochs=6, validation_data=ds_test, )
For batch size = 128
Output (GPU)
462/469 [============================>.] - ETA: 0s - loss: 0.3619 - sparse_categorical_accuracy: 0.9003
469/469 [==============================] - 4s 5ms/step - loss: 0.3595 - sparse_categorical_accuracy: 0.9008 - val_loss: 0.1963 - val_sparse_categorical_accuracy: 0.9432
Epoch 2/6
469/469 [==============================] - 2s 5ms/step - loss: 0.1708 - sparse_categorical_accuracy: 0.9514 - val_loss: 0.1392 - val_sparse_categorical_accuracy: 0.9606
Epoch 3/6
469/469 [==============================] - 2s 5ms/step - loss: 0.1224 - sparse_categorical_accuracy: 0.9651 - val_loss: 0.1233 - val_sparse_categorical_accuracy: 0.9650
Epoch 4/6
469/469 [==============================] - 2s 5ms/step - loss: 0.0956 - sparse_categorical_accuracy: 0.9725 - val_loss: 0.0988 - val_sparse_categorical_accuracy: 0.9696
Epoch 5/6
469/469 [==============================] - 2s 5ms/step - loss: 0.0766 - sparse_categorical_accuracy: 0.9780 - val_loss: 0.0875 - val_sparse_categorical_accuracy: 0.9727
Epoch 6/6
469/469 [==============================] - 2s 5ms/step - loss: 0.0633 - sparse_categorical_accuracy: 0.9813 - val_loss: 0.0842 - val_sparse_categorical_accuracy: 0.9745
Output (without GPU)
469/469 [==============================] - 2s 1ms/step - loss: 0.3598 - sparse_categorical_accuracy: 0.9013 - val_loss: 0.1970 - val_sparse_categorical_accuracy: 0.9427
Epoch 2/6
469/469 [==============================] - 0s 933us/step - loss: 0.1705 - sparse_categorical_accuracy: 0.9511 - val_loss: 0.1449 - val_sparse_categorical_accuracy: 0.9589
Epoch 3/6
469/469 [==============================] - 0s 936us/step - loss: 0.1232 - sparse_categorical_accuracy: 0.9642 - val_loss: 0.1146 - val_sparse_categorical_accuracy: 0.9655
Epoch 4/6
469/469 [==============================] - 0s 925us/step - loss: 0.0955 - sparse_categorical_accuracy: 0.9725 - val_loss: 0.1007 - val_sparse_categorical_accuracy: 0.9690
Epoch 5/6
469/469 [==============================] - 0s 946us/step - loss: 0.0774 - sparse_categorical_accuracy: 0.9781 - val_loss: 0.0890 - val_sparse_categorical_accuracy: 0.9732
Epoch 6/6
469/469 [==============================] - 0s 971us/step - loss: 0.0647 - sparse_categorical_accuracy: 0.9811 - val_loss: 0.0844 - val_sparse_categorical_accuracy: 0.9752

At this point we can see running in CPU is way faster (x5) than running on GPU but don't become disappointed by that. We will run for large batches.

Batch size = 1024
Output (GPU)
58/59 [============================>.] - ETA: 0s - loss: 0.4862 - sparse_categorical_accuracy: 0.8680
59/59 [==============================] - 2s 11ms/step - loss: 0.4839 - sparse_categorical_accuracy: 0.8686 - val_loss: 0.2269 - val_sparse_categorical_accuracy: 0.9362
Epoch 2/6
59/59 [==============================] - 0s 8ms/step - loss: 0.1964 - sparse_categorical_accuracy: 0.9442 - val_loss: 0.1610 - val_sparse_categorical_accuracy: 0.9543
Epoch 3/6
59/59 [==============================] - 1s 9ms/step - loss: 0.1408 - sparse_categorical_accuracy: 0.9605 - val_loss: 0.1292 - val_sparse_categorical_accuracy: 0.9624
Epoch 4/6
59/59 [==============================] - 1s 9ms/step - loss: 0.1067 - sparse_categorical_accuracy: 0.9707 - val_loss: 0.1055 - val_sparse_categorical_accuracy: 0.9687
Epoch 5/6
59/59 [==============================] - 1s 9ms/step - loss: 0.0845 - sparse_categorical_accuracy: 0.9767 - val_loss: 0.0912 - val_sparse_categorical_accuracy: 0.9723
Epoch 6/6
59/59 [==============================] - 1s 9ms/step - loss: 0.0683 - sparse_categorical_accuracy: 0.9814 - val_loss: 0.0827 - val_sparse_categorical_accuracy: 0.9747
Output (without GPU)
59/59 [==============================] - 2s 15ms/step - loss: 0.4640 - sparse_categorical_accuracy: 0.8739 - val_loss: 0.2280 - val_sparse_categorical_accuracy: 0.9338
Epoch 2/6
59/59 [==============================] - 1s 12ms/step - loss: 0.1962 - sparse_categorical_accuracy: 0.9450 - val_loss: 0.1626 - val_sparse_categorical_accuracy: 0.9537
Epoch 3/6
59/59 [==============================] - 1s 12ms/step - loss: 0.1411 - sparse_categorical_accuracy: 0.9602 - val_loss: 0.1304 - val_sparse_categorical_accuracy: 0.9613
Epoch 4/6
59/59 [==============================] - 1s 12ms/step - loss: 0.1091 - sparse_categorical_accuracy: 0.9700 - val_loss: 0.1020 - val_sparse_categorical_accuracy: 0.9698
Epoch 5/6
59/59 [==============================] - 1s 12ms/step - loss: 0.0864 - sparse_categorical_accuracy: 0.9764 - val_loss: 0.0912 - val_sparse_categorical_accuracy: 0.9716
Epoch 6/6
59/59 [==============================] - 1s 12ms/step - loss: 0.0697 - sparse_categorical_accuracy: 0.9812 - val_loss: 0.0834 - val_sparse_categorical_accuracy: 0.9749

As you can see now, running on GPU is faster (x1.3) than running on CPU. Increasing the batch size can significantly improve the performance of GPU.

Fig: Runtime graph for various batch sizes.

enter image description here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment