Skip to content

Instantly share code, notes, and snippets.

@onuryartasi
Last active June 30, 2023 13:36
Show Gist options
  • Save onuryartasi/7b861ff3cff77bcf68846db3bec0b2a6 to your computer and use it in GitHub Desktop.
Save onuryartasi/7b861ff3cff77bcf68846db3bec0b2a6 to your computer and use it in GitHub Desktop.
Google Colab Using GPU with Tensorflow version 1.0.0
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Hoggaan
Copy link

Hoggaan commented May 12, 2020

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help?
pk qaas

@onuryartasi
Copy link
Author

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help?
pk qaas

Thats your trained model, Use this method for restore your trained model.

checkpoint = "./chatbot_weights.ckpt"
session = tf.InteractiveSession()
session.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(session,checkpoint)

@Hoggaan
Copy link

Hoggaan commented May 12, 2020

Sorry dude. It seems we misunderstood a bit. I organized my own dataset to train a chatbot. I was following step by step the Udemy course i shared its link already. When ever i use the colonel movie dataset of the course everything is well however when i try to use my own dataset Things not work properly by not saving the trained models of my Dataset. Can you help of that?

Man, This is a huge problem on me. Its my last year project and the time is running out.

If you can help, i can share the Notebook, the Dataset to check on your own. Thank you!

@onuryartasi
Copy link
Author

Sorry dude. It seems we misunderstood a bit. I organized my own dataset to train a chatbot. I was following step by step the Udemy course i shared its link already. When ever i use the colonel movie dataset of the course everything is well however when i try to use my own dataset Things not work properly by not saving the trained models of my Dataset. Can you help of that?

Man, This is a huge problem on me. Its my last year project and the time is running out.

If you can help, i can share the Notebook, the Dataset to check on your own. Thank you!

Because your training set too short to learn, Your train process stopping before save checkpoint, Can you share code or notebook?

@denizOgut
Copy link

Onur Bey merhaba , udemyde chatbot modelini train etmem lazım ama hata almaktayım , yardım edebilir misiniz ? nasıl ulaşabilirim size

@onuryartasi
Copy link
Author

Onur Bey merhaba , udemyde chatbot modelini train etmem lazım ama hata almaktayım , yardım edebilir misiniz ? nasıl ulaşabilirim size

https://www.linkedin.com/in/onuryartasi buradan iletişime geçebilirsin.

@daniel-kollanyi
Copy link

daniel-kollanyi commented Jun 24, 2020

Thank you, you have saved me a lot of time and unnecessary stressing on training the model! I have never used Google Colab before, so maybe it's a stupid question but it seems to be using almost all of the GPU RAM before I can even start to train the network:

Gen RAM Free: 12.3 GB | Proc size: 457.6 MB
GPU RAM Free: 567MB | Used: 10874MB | Util 95% | Total 11441MB

Eventually, the training dies with ResourceExhaustedError. I have tried to restart the session many times and checked the GPU RAM usage and it's around 500MB (used).
Do you have any idea why it behaves this way? Any guess would be appreciated! Thanks!

@daniel-kollanyi
Copy link

Found the issue, it seems to be a Colab bug: https://stackoverflow.com/questions/48750199/google-colaboratory-misleading-information-about-its-gpu-only-5-ram-available
After some runtime restart, it seems to be working fine. Thanks again @onuryartasi for the Colab config!

@onuryartasi
Copy link
Author

Thank you, you have saved me a lot of time and unnecessary stressing on training the model! I have never used Google Colab before, so maybe it's a stupid question but it seems to be using almost all of the GPU RAM before I can even start to train the network:

Gen RAM Free: 12.3 GB | Proc size: 457.6 MB
GPU RAM Free: 567MB | Used: 10874MB | Util 95% | Total 11441MB

Eventually, the training dies with ResourceExhaustedError. I have tried to restart the session many times and checked the GPU RAM usage and it's around 500MB (used).
Do you have any idea why it behaves this way? Any guess would be appreciated! Thanks!

Hi, Google Colab's some user's just access to part of the gpu. Usually this part %5 of all GPU.

@joelpelo
Copy link

I get this error. Any way I can look for a different site for where I can get a different link for this?

!wget http://archive.ubuntu.com/ubuntu/pool/main/m/mesa/libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb
!dpkg -i --force-overwrite libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb

--2020-12-25 17:24:16-- http://archive.ubuntu.com/ubuntu/pool/main/m/mesa/libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb
Resolving archive.ubuntu.com (archive.ubuntu.com)... 91.189.88.142, 91.189.88.152, 2001:67c:1360:8001::24, ...
Connecting to archive.ubuntu.com (archive.ubuntu.com)|91.189.88.142|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-12-25 17:24:16 ERROR 404: Not Found.

dpkg: error: cannot access archive 'libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb': No such file or directory

@onuryartasi
Copy link
Author

I get this error. Any way I can look for a different site for where I can get a different link for this?

!wget http://archive.ubuntu.com/ubuntu/pool/main/m/mesa/libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb
!dpkg -i --force-overwrite libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb

--2020-12-25 17:24:16-- http://archive.ubuntu.com/ubuntu/pool/main/m/mesa/libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb
Resolving archive.ubuntu.com (archive.ubuntu.com)... 91.189.88.142, 91.189.88.152, 2001:67c:1360:8001::24, ...
Connecting to archive.ubuntu.com (archive.ubuntu.com)|91.189.88.142|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-12-25 17:24:16 ERROR 404: Not Found.

dpkg: error: cannot access archive 'libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb': No such file or directory

Hello,
You can download package from there http://launchpadlibrarian.net/373093738/libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb

@adnvenkatesh
Copy link

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help? pk qaas

Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online

@onuryartasi
Copy link
Author

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help? pk qaas

Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online

Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number.

@Hoggaan
Copy link

Hoggaan commented Nov 10, 2021 via email

@adnvenkatesh
Copy link

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help? pk qaas

Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online

Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number.

Thankyou for the reply. Can you please suggest where might the problem be. it does not appear like the error will converge anytime soon. i have checked with different learning rates. this graph is for learning rate 0.001. if i increase the learning rate the gradient is exploding. i tried decreasing but still the error is not converging. Should i just keep the training for a large number of epochs like 500 and wait patiently
WhatsApp Image 2021-11-10 at 23 52 19
s.

@onuryartasi
Copy link
Author

Yeah. Thank you for the feedback. That time i was working Final project. Now I am doing masters of AI. :)

On Wed, 10 Nov 2021, 23:04 Onur Yartaşı, @.> wrote: @.* commented on this gist. ------------------------------ I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help? [image: pk qaas] https://user-images.githubusercontent.com/65222100/81706608-ccbac500-9478-11ea-9b0a-e8f84d1fdf71.PNG Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://gist.github.com/7b861ff3cff77bcf68846db3bec0b2a6#gistcomment-3958102, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRTLVCDT24SYLMNU255PQTULKUKLANCNFSM4I3H65SA .

I am very happy, I wish you success.

@onuryartasi
Copy link
Author

onuryartasi commented Nov 10, 2021

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help? pk qaas

Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online

Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number.

Thankyou for the reply. Can you please suggest where might the problem be. it does not appear like the error will converge anytime soon. i have checked with different learning rates. this graph is for learning rate 0.001. if i increase the learning rate the gradient is exploding. i tried decreasing but still the error is not converging. Should i just keep the training for a large number of epochs like 500 and wait patiently WhatsApp Image 2021-11-10 at 23 52 19 s.

Sorry, I'm not working with machine learning anymore. Whatever I say will be wrong.

@adnvenkatesh
Copy link

Thankyou for the reply anyhow

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help? pk qaas

Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online

Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number.

Thankyou for the reply. Can you please suggest where might the problem be. it does not appear like the error will converge anytime soon. i have checked with different learning rates. this graph is for learning rate 0.001. if i increase the learning rate the gradient is exploding. i tried decreasing but still the error is not converging. Should i just keep the training for a large number of epochs like 500 and wait patiently WhatsApp Image 2021-11-10 at 23 52 19 s.

Sorry, I'm not working with machine learning anymore. Whatever I say will be wrong.

Thankyou for the reply anyhow

@adnvenkatesh
Copy link

Yeah. Thank you for the feedback. That time i was working Final project. Now I am doing masters of AI. :)

On Wed, 10 Nov 2021, 23:04 Onur Yartaşı, @.> wrote: @.* commented on this gist. ------------------------------ I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help? [image: pk qaas] https://user-images.githubusercontent.com/65222100/81706608-ccbac500-9478-11ea-9b0a-e8f84d1fdf71.PNG Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://gist.github.com/7b861ff3cff77bcf68846db3bec0b2a6#gistcomment-3958102, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRTLVCDT24SYLMNU255PQTULKUKLANCNFSM4I3H65SA .

can you please help me out with my problem if you can. This is now my final year project

@MohammadHarisZia
Copy link

Hi there, Does it still work? I am getting an error that colab doesnt support tensorflow 1.0 anymore despite downgrading it to cuda 8.0

@adnvenkatesh
Copy link

I could not make it work in google colab. I got the same error as you. I manually trained it on my local machine

@MohammadHarisZia
Copy link

@adnvenkatesh can you tell me were the responses any good and also that i have an xps 15 9570 so it has thermal limitations, can you send me your ckpt files to just test it?

@adnvenkatesh
Copy link

As i have mentioned in the above discussion the error did not converge at all. The responses were very bad..just repitions of random strings.I have read in various sources about what might be the problem. I saw in some sources that adams optimizer has convergence problems. So i have implemented gradient descent optimizer and made some changes and got this to work

@MohammadHarisZia
Copy link

Can you kindly share your code ? I just need to understand the issues. Anyway, million thanks. Means a lot tbh for the help.

@adnvenkatesh
Copy link

adnvenkatesh commented Jan 22, 2022

On the github somewhere if i remember correctly there are some very good implementations of gradient descent optimizer in this context in addition with the bidirectional encoder layers. Try using them for understanding. They are far more clear. My code was too messy for you to understand it.😅

@Sudar88
Copy link

Sudar88 commented Feb 11, 2022

Hey all,
Has anyone, in the recent period according to the given instructions, managed to run training on Colab of this model https://www.udemy.com/course/chatbot/ ?

@MohammadHarisZia
Copy link

Hi there @Sudar88, Unfortunately I tried alot but could not make it work out.

@Sudar88
Copy link

Sudar88 commented Feb 11, 2022

@MohammadHarisZia thanks for the answer. After many attempts I was not able to run training this model according to the given instructions.
If anyone has found a solution, I hope they will respond.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment