Skip to content

Instantly share code, notes, and snippets.

@juliobguedes
Last active November 4, 2020 14:33
Show Gist options
  • Save juliobguedes/8d18fe1cecdcc1aab49793fcd3c91434 to your computer and use it in GitHub Desktop.
Save juliobguedes/8d18fe1cecdcc1aab49793fcd3c91434 to your computer and use it in GitHub Desktop.
1 Physical GPUs, 1 Logical GPU
> Preprocessing Spotify dataset. This process may take several hours.
>> Checking if the Spotify MPD is already preprocessed.
>> Preprocessed version was found. Skipping stage.
> Loading Spotify MPD dataset.
>> Dataset loaded.
> Splitting dataset
>> Sampling 10% of dataset
>> Splitting sample with 20% for testing
>> Creating Vocab.
> Building tensorflow datasets.
>> Building training and validation dataset. This process may take some time.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 72415/72415 [00:09<00:00, 7815.94it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 72415/72415 [00:09<00:00, 7921.91it/s]
>> Building testing dataset. This process may take a long time.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18104/18104 [00:02<00:00, 8216.06it/s]
Creating and compiling model
> Retrieving candidates. This process takes more time as the number of unique items increases.
Fitting model
Epoch 1/10
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
1/2263 [..............................] - ETA: 0s - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.9073 - regularization_loss: 0.0000e+00 - total_loss: 110.9073WARNING:tensorflow:From C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py:1277: stop (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
use `tf.profiler.experimental.stop` instead.
2/2263 [..............................] - ETA: 2:45:22 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.8951 - regularization_loss: 0.0000e+00 - total_loss: 110.8951WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 1.7244s vs `on_train_batch_end` time: 7.0526s). Check your callbacks.
3/2263 [..............................] - ETA: 2:08:43 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.8939 - regularization_loss: 0.0000e+00 - total_loss: 110.893
4/2263 [..............................] - ETA: 1:46:16 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.8943 - regularization_loss: 0.0000e+00 - total_loss: 110.894
5/2263 [..............................] - ETA: 1:32:50 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.9021 - regularization_loss: 0.0000e+00 - total_loss: 110.902
6/2263 [..............................] - ETA: 1:23:52 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.9047 - regularization_loss: 0.0000e+00 - total_loss: 110.904
7/2263 [..............................] - ETA: 1:17:32 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.9015 - regularization_loss: 0.0000e+00 - total_loss: 110.901
8/2263 [..............................] - ETA: 1:12:43 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.9008 - regularization_loss: 0.0000e+00 - total_loss: 110.900
9/2263 [..............................] - ETA: 1:09:00 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.9005 - regularization_loss: 0.0000e+00 - total_loss: 110.900
10/2263 [..............................] - ETA: 1:05:59 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.8975 - regularization_loss: 0.0000e+00 - total_loss: 110.897
11/2263 [..............................] - ETA: 1:03:32 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.8981 - regularization_loss: 0.0000e+00 - total_loss: 110.898
12/2263 [..............................] - ETA: 1:01:30 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.8984 - regularization_loss: 0.0000e+00 - total_loss: 110.898
13/2263 [..............................] - ETA: 59:45 - top_k_categorical_accuracy: 0.0000e+00 - loss: 110.8984 - regularization_loss: 0.0000e+00 - total_loss: 110.8984
2263/2263 [==============================] - 3782s 2s/step - top_k_categorical_accuracy: 4.0047e-04 - loss: 108.2779 - regularization_loss: 0.0000e+00 - total_loss: 108.2779 - val_top_k_categorical_accuracy: 2.4857e-04 - val_loss: 94.5910 - val_regularization_loss: 0.0000e+00 - val_total_loss: 94.5910
Epoch 2/10
2263/2263 [==============================] - 3780s 2s/step - top_k_categorical_accuracy: 0.0015 - loss: 92.2767 - regularization_loss: 0.0000e+00 - total_loss: 92.2767 - val_top_k_categorical_accuracy: 9.1141e-04 - val_loss: 89.1521 - val_regularization_loss: 0.0000e+00 - val_total_loss: 89.1521
Epoch 3/10
2263/2263 [==============================] - 3791s 2s/step - top_k_categorical_accuracy: 0.0096 - loss: 59.0481 - regularization_loss: 0.0000e+00 - total_loss: 59.0481 - val_top_k_categorical_accuracy: 0.0014 - val_loss: 93.6735 - val_regularization_loss: 0.0000e+00 - val_total_loss: 93.6735
Epoch 4/10
2263/2263 [==============================] - 12196s 5s/step - top_k_categorical_accuracy: 0.0445 - loss: 29.7369 - regularization_loss: 0.0000e+00 - total_loss: 29.7369 - val_top_k_categorical_accuracy: 0.0020 - val_loss: 103.1409 - val_regularization_loss: 0.0000e+00 - val_total_loss: 103.1409
Epoch 5/10
2263/2263 [==============================] - 3757s 2s/step - top_k_categorical_accuracy: 0.1330 - loss: 13.9861 - regularization_loss: 0.0000e+00 - total_loss: 13.9861 - val_top_k_categorical_accuracy: 0.0024 - val_loss: 106.7716 - val_regularization_loss: 0.0000e+00 - val_total_loss: 106.7716
Epoch 6/10
2263/2263 [==============================] - 3762s 2s/step - top_k_categorical_accuracy: 0.2286 - loss: 7.2852 - regularization_loss: 0.0000e+00 - total_loss: 7.2852 - val_top_k_categorical_accuracy: 0.0027 - val_loss: 117.1357 - val_regularization_loss: 0.0000e+00 - val_total_loss: 117.1357
Epoch 7/10
2263/2263 [==============================] - 3758s 2s/step - top_k_categorical_accuracy: 0.2817 - loss: 4.2511 - regularization_loss: 0.0000e+00 - total_loss: 4.2511 - val_top_k_categorical_accuracy: 0.0030 - val_loss: 119.9953 - val_regularization_loss: 0.0000e+00 - val_total_loss: 119.9953
Epoch 8/10
2263/2263 [==============================] - 6142s 3s/step - top_k_categorical_accuracy: 0.3183 - loss: 2.7132 - regularization_loss: 0.0000e+00 - total_loss: 2.7132 - val_top_k_categorical_accuracy: 0.0028 - val_loss: 123.1363 - val_regularization_loss: 0.0000e+00 - val_total_loss: 123.1363
Epoch 9/10
2263/2263 [==============================] - 3781s 2s/step - top_k_categorical_accuracy: 0.3406 - loss: 1.8537 - regularization_loss: 0.0000e+00 - total_loss: 1.8537 - val_top_k_categorical_accuracy: 0.0029 - val_loss: 127.1309 - val_regularization_loss: 0.0000e+00 - val_total_loss: 127.1309
Epoch 10/10
2263/2263 [==============================] - 3799s 2s/step - top_k_categorical_accuracy: 0.3582 - loss: 1.3512 - regularization_loss: 0.0000e+00 - total_loss: 1.3512 - val_top_k_categorical_accuracy: 0.0031 - val_loss: 131.3763 - val_regularization_loss: 0.0000e+00 - val_total_loss: 131.3763
Starting Evaluation
566/566 [==============================] - 357s 631ms/step - top_k_categorical_accuracy: 0.0018 - loss: 131.8116 - regularization_loss: 0.0000e+00 - total_loss: 131.8116
2020-11-03 21:40:28.665119: W tensorflow/core/common_runtime/bfc_allocator.cc:431] Allocator (GPU_0_bfc) ran out of memory trying to allocate 8.09GiB (rounded to 8689920000)requested by op CudnnRNN
Current allocation summary follows.
2020-11-03 21:40:28.671396: W tensorflow/core/common_runtime/bfc_allocator.cc:439] *_*********************************************************************_____________________________
2020-11-03 21:40:28.674661: E tensorflow/stream_executor/dnn.cc:616] OOM when allocating tensor with shape[2172480000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2020-11-03 21:40:28.677792: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at cudnn_rnn_ops.cc:1517 : Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 3, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 100, 100, 1, 200, 18104, 0]
Traceback (most recent call last):
File "run.py", line 86, in <module>
pred = model.predict_batch(x_test, n=100)
File "../src\models\gru4rec.py", line 60, in predict_batch
_, recommended_items = index(user_history, k=n)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 985, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow_recommenders\layers\factorized_top_k.py", line 371, in call
queries = self.query_model(queries)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 985, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\keras\engine\sequential.py", line 372, in call
return super(Sequential, self).call(inputs, training=training, mask=mask)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 386, in call
inputs, training=training, mask=mask)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 508, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\keras\layers\recurrent.py", line 663, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 985, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\keras\layers\recurrent_v2.py", line 441, in call
inputs, initial_state, training, mask, row_lengths)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\keras\layers\recurrent_v2.py", line 496, in _defun_gru_call
last_output, outputs, new_h, runtime = gpu_gru(**gpu_gru_kwargs)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\keras\layers\recurrent_v2.py", line 656, in gpu_gru
rnn_mode='gru')
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\ops\gen_cudnn_rnn_ops.py", line 103, in cudnn_rnn
ctx=_ctx)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\ops\gen_cudnn_rnn_ops.py", line 180, in cudnn_rnn_eager_fallback
attrs=_attrs, ctx=ctx, name=name)
File "C:\Users\lmd-pc-03\anaconda3\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 3, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 100, 100, 1, 200, 18104, 0] [Op:CudnnRNN]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment