Beginner

Selenium is a useful python library to extract web page data especially for pages with javascript loading. Many of you may have tried to use selenium but may have gotten stuck in the installation process. One key thing you have to remember is that Selenium will run an actual browser in the background (or foreground if you wish) to query a given website. So a key step is to install the driver if you haven’t done so already.

Step 1: Locate the right web driver

Since Selenium will use an actual driver, one of the first decisions you’ll need to make is to determine which driver to use. Generally it won’t matter, but the best browser to use, is the one that works the best for your target website. For example, if your target website works best under Firefox, then use that.

BrowserSupported OSMaintained byDownloadIssue Tracker
Chromium/ChromeWindows/macOS/LinuxGoogleDownloadsIssues
FirefoxWindows/macOS/LinuxMozillaDownloadsIssues
EdgeWindows 10MicrosoftDownloadsIssues
Internet ExplorerWindowsSelenium ProjectDownloadsIssues
OperaWindows/macOS/LinuxOperaDownloadsIssues

So decide which one, and then go to the download page. For this example we will use FireFox. In the above table, the download link goes to this page: https://github.com/mozilla/geckodriver/releases

You can then click on the latest release:

First, click on the latest release

You can then scroll down to the bottom of the page to see the driver list:

Right click on the .gz file, and then get the URL.

Step 2: Download the web driver

Next go to your linux terminal and create a directory to store this file:

Next go into that directory, and then use wget to download the url by pasting the link you copied above:

wget https://github.com/mozilla/geckodriver/releases/download/v0.29.1/geckodriver-v0.29.1-linux32.tar.gz

Step 3: Extract the download web drivers

Next you should see the .gz file when you list the files:

You can the gzip the file to extract it:

gzip -d geckodriver-v0.29.1-linux32.tar.gz

You can then finally untar the file to decompress:

tar -xvf geckodriver-v0.29.1-linux32.tar

Step 4: Configure PATH

What you will be left with is a file called “geckodriver”. This is the driver file. You will need to have it made available via the export path. The reason is that the selenium looks for the driver file from the PATH operating system environment variable.

I simply went to the parent directory, then updated the PATH environment variable by taking the existing PATH value ($PATH) then appending the gdriver folder:

export PATH=$PATH:gdriver

If you do not do the above, you will get the error:

selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH. 

Step 5: Test running the web driver

That’s it! Now if you test the following code, you should be able to run a web query by running a firefox driver in the background:

# main.py
from selenium import webdriver
from selenium.webdriver import FirefoxOptions

opts = FirefoxOptions()
opts.add_argument("--headless")
browser = webdriver.Firefox(options=opts)


# Declare a variable containing the URL is going to be scrapped 
URL = 'https://pythonhowtoprogram.com/'
# Web driver going into website
browser.get(URL)

# Printing page title
print(browser.title)

You will notice it does take a few seconds to run for the first time. It’s because that an instance of a browser needs to be loaded which does take a few seconds. Just keep this in mind in case you need to have faster performance for which you may need to use urllib or requests instead.

Next Steps

Now that you know how to install a driver, there are numerous webscraping tutorials we have on offer. You can find them all in our web scraping section: https://pythonhowtoprogram.com/category/web-scraping/

Want More Great Articles? Subscribe to our newsletter and have great articles sent right to your inbox as they come:

Error SendFox Connection: 403 Forbidden

403 Forbidden