Step 1: Locate the right web driver
Since Selenium will use an actual driver, one of the first decisions you’ll need to make is to determine which driver to use. Generally it won’t matter, but the best browser to use, is the one that works the best for your target website. For example, if your target website works best under Firefox, then use that.
|Browser||Supported OS||Maintained by||Download||Issue Tracker|
|Internet Explorer||Windows||Selenium Project||Downloads||Issues|
So decide which one, and then go to the download page. For this example we will use FireFox. In the above table, the download link goes to this page: https://github.com/mozilla/geckodriver/releases
You can then click on the latest release:
You can then scroll down to the bottom of the page to see the driver list:
Right click on the .gz file, and then get the URL.
Step 2: Download the web driver
Next go to your linux terminal and create a directory to store this file:
Next go into that directory, and then use wget to download the url by pasting the link you copied above:
Step 3: Extract the download web drivers
Next you should see the .gz file when you list the files:
You can the gzip the file to extract it:
gzip -d geckodriver-v0.29.1-linux32.tar.gz
You can then finally untar the file to decompress:
tar -xvf geckodriver-v0.29.1-linux32.tar
Step 4: Configure PATH
What you will be left with is a file called “geckodriver”. This is the driver file. You will need to have it made available via the export path. The reason is that the selenium looks for the driver file from the PATH operating system environment variable.
I simply went to the parent directory, then updated the PATH environment variable by taking the existing PATH value ($PATH) then appending the gdriver folder:
If you do not do the above, you will get the error:
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.
Step 5: Test running the web driver
That’s it! Now if you test the following code, you should be able to run a web query by running a firefox driver in the background:
# main.py from selenium import webdriver from selenium.webdriver import FirefoxOptions opts = FirefoxOptions() opts.add_argument("--headless") browser = webdriver.Firefox(options=opts) # Declare a variable containing the URL is going to be scrapped URL = 'https://pythonhowtoprogram.com/' # Web driver going into website browser.get(URL) # Printing page title print(browser.title)
You will notice it does take a few seconds to run for the first time. It’s because that an instance of a browser needs to be loaded which does take a few seconds. Just keep this in mind in case you need to have faster performance for which you may need to use urllib or requests instead.
Now that you know how to install a driver, there are numerous webscraping tutorials we have on offer. You can find them all in our web scraping section: https://pythonhowtoprogram.com/category/web-scraping/
Want More Great Articles? Subscribe to our newsletter and have great articles sent right to your inbox as they come: