Skip to content

Instantly share code, notes, and snippets.

@korakot
Last active November 21, 2024 06:30
Show Gist options
  • Save korakot/5c8e21a5af63966d80a676af0ce15067 to your computer and use it in GitHub Desktop.
Save korakot/5c8e21a5af63966d80a676af0ce15067 to your computer and use it in GitHub Desktop.
Use selenium in Colab
# install chromium, its driver, and selenium
!apt update
!apt install libu2f-udev libvulkan1
!wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
!dpkg -i google-chrome-stable_current_amd64.deb
!wget https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/118.0.5993.70/linux64/chromedriver-linux64.zip
!unzip -j chromedriver-linux64.zip chromedriver-linux64/chromedriver -d /usr/local/bin/
!pip install selenium chromedriver_autoinstaller
# set options to be headless, ..
from selenium import webdriver
import chromedriver_autoinstaller
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
chromedriver_autoinstaller.install()
# open it, go to a website, and get results
wd = webdriver.Chrome(options=options)
wd.get("https://www.website.com")
print(wd.page_source) # results
# divs = wd.find_elements_by_css_selector('div')
# I create my own library to make it even easier
!pip install kora -q
from kora.selenium import wd
wd.get("https://www.website.com")
print(wd.page_source) # results
# I add a few helpers
divs = wd.select("div") # css selecter
div = divs[0]
span = div.select1("span") # return the first result
wd # screenshot
@hyunsikhwang
Copy link

thanks a lot!

@chetan7116desai
Copy link

As I run the code ,after completion its not directed to the provided url ,however the page source is printed,can help me on this?

@Sagargajare
Copy link

As I run the code ,after completion its not directed to the provided url ,however the page source is printed,can help me on this?

this chrome-driver is running in headless mode so it will run in background You can disable it

@murgado
Copy link

murgado commented Jul 16, 2020

Thank you very much mate! I was mad looking for this

@ashish25ece
Copy link

After running the above code, it throws this error. Could you please help me on this?

Message: unknown error: Chrome failed to start: crashed.
(unknown error: DevToolsActivePort file doesn't exist)

@hun-park
Copy link

hun-park commented Sep 3, 2020

I really appreciate a your work!

@guiraojpg
Copy link

You are the best!

@rohanaggarwal45
Copy link

How can we download files using Selenium in Google Colab Notebooks..??

@rohanaggarwal45
Copy link

Please help

@swankyshahir
Copy link

As I run the code ,after completion its not directed to the provided url ,however the page source is printed,can help me on this?

this chrome-driver is running in headless mode so it will run in background You can disable it

could you show the code how to disable headless in it so that it shows the url page. thanks

@murgado
Copy link

murgado commented May 11, 2021

As I run the code ,after completion its not directed to the provided url ,however the page source is printed,can help me on this?

this chrome-driver is running in headless mode so it will run in background You can disable it

could you show the code how to disable headless in it so that it shows the url page. thanks

I will reply to this, hope it helps you.

While using "google collaborate" you must run selenium in headless mode because it can't display new browsers. Therefore, the only way to run it without headless mode would be through your own device, declaring your webdriver not specifying the headless option.

If you still want to run it in headless mode either because you are using collab or because you plan to use it in a server-side environment, there's still a way to print the page's url:

# Consider wb as your webdriver
wb = webdriver.Chrome(executable_path = driverPath , options=chrome_options)

# Display webdriver's current url
print(wb.current_url)

# If you want to display the title of the page your webdriver is visting
print(wb.title)

# If you want to display the html of the page your webdriver is visting
print(wb.page_source)

@abdoo13
Copy link

abdoo13 commented Jun 12, 2021

Useful. That really worked for me.

@Shivani29sheth
Copy link

As I run the code ,after completion its not directed to the provided url ,however the page source is printed,can help me on this?

this chrome-driver is running in headless mode so it will run in background You can disable it

could you show the code how to disable headless in it so that it shows the url page. thanks

I will reply to this, hope it helps you.

While using "google collaborate" you must run selenium in headless mode because it can't display new browsers. Therefore, the only way to run it without headless mode would be through your own device, declaring your webdriver not specifying the headless option.

If you still want to run it in headless mode either because you are using collab or because you plan to use it in a server-side environment, there's still a way to print the page's url:

# Consider wb as your webdriver
wb = webdriver.Chrome(executable_path = driverPath , options=chrome_options)

# Display webdriver's current url
print(wb.current_url)

# If you want to display the title of the page your webdriver is visting
print(wb.title)

# If you want to display the html of the page your webdriver is visting
print(wb.page_source)

Thank you, that was really helpful!

@JorgeSantosJ
Copy link

How to add others web driver options using kora?

@korakot
Copy link
Author

korakot commented Jul 26, 2021

You can create your own wd

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# create a webdriver instance, ready to use
wd = webdriver.Chrome('chromedriver',options=options)

Kora just help with installation and create a default wd for you.

@jiahaoli57
Copy link

jiahaoli57 commented Sep 24, 2021

Nice job, btw, for those who cannot use the code provided by korakot, use:

from IPython.display import Javascript
def open_web():
  url = 'https://github.com/jiahaoli57'
  display(Javascript('window.open("{url}");'.format(url=url)))

@VictorFaraon
Copy link

Hi Korakot!
I'm trying tu use selenium in Google Colab, but I'm having trouble to refert to de chromedriver.exe file.
If I put it in the Colaboratory space, I get the message "'chromedriver.exe' executable may have wrong permissions."
And if I refer to the file in my computer, the error is: Message: "'C:\chromedriver\chromedriver.exe' executable needs to be in PATH."
I also failed to use webdriver-manager in the Colab environment.
Can you help me on this?

@Ezra-Cohen
Copy link

I just want to say thank you, I was trying everything I could for a project I was working on, nothing worked until I came across this, you are genuinely amazing for making this

@korakot
Copy link
Author

korakot commented Apr 28, 2022

For reference, this method is first discovered here in Dec 2018.

@GColab2023
Copy link

Getting below error for Google Colab Selenium with Chrome

Code:

install chromium, its driver, and selenium

!apt update
!apt install chromium-chromedriver
!pip install selenium# set options to be headless, ..
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

create a webdriver instance, ready to use

wd = webdriver.Chrome('chromedriver',options=options)

Error

WebDriverException Traceback (most recent call last)
in
6 options.add_argument('--disable-dev-shm-usage')
7 # create a webdriver instance, ready to use
----> 8 wd = webdriver.Chrome('chromedriver',options=options)

3 frames
/usr/local/lib/python3.8/dist-packages/selenium/webdriver/common/service.py in assert_process_still_running(self)
115 return_code = self.process.poll()
116 if return_code:
--> 117 raise WebDriverException(f"Service {self.path} unexpectedly exited. Status code was: {return_code}")
118
119 def is_connectable(self) -> bool:

WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: 1

Can someone help on this.

@TomekGitHubPrivate
Copy link

getting the same error as above-mentioned :/

@korakot
Copy link
Author

korakot commented Mar 26, 2024

I have updated the solution, so it should work again now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment