How To Scrape Dynamic Websites With Selenium and BeautifulSoup in Python 3

Last Updated: June 01, 2026

Table of Contents

Scraping a Dynamic Website: Quick Example
What Are Dynamic Websites and Why Do They Need Selenium?
Selenium vs Requests: When to Use Each
Installing Selenium and ChromeDriver
Loading Dynamic Pages With Selenium
Parsing Rendered HTML With BeautifulSoup
Advanced: Waiting Strategies for Dynamic Content
Handling Pagination and Page Interaction
Real-Life Example: Scraping Job Listings to CSV
Frequently Asked Questions
Conclusion
Related Articles

Intermediate

You have probably been there before. You find the perfect website full of data you need, whether it is product prices, job listings, real estate data, or sports statistics. You fire up requests and BeautifulSoup, write a quick script, and run it. The result? An empty page. No data. The HTML source contains nothing but a single <div id="root"></div> and a bunch of JavaScript files. The data you can see in the browser simply does not exist in the raw HTML. Welcome to the world of dynamic websites.

The good news is that Python has a powerful combination of tools to handle exactly this problem. Selenium automates a real web browser, letting it execute JavaScript and render the page just like a human visitor would. Once the content is loaded, BeautifulSoup steps in to parse and extract the data you need. Together, they can scrape virtually any website, no matter how much JavaScript it uses. Both libraries are well-documented, widely used in the industry, and easy to learn.

In this article we will cover everything you need to scrape dynamic websites with confidence. We will start with a quick working example so you can see results in 30 seconds. Then we will walk through the difference between static and dynamic websites, when to use Selenium versus simpler tools, how to install and configure everything, how to wait for content to load properly, how to handle pagination and user interaction, and finally we will build a complete real-life job scraper that exports results to CSV. By the end, you will have a reusable pattern you can adapt to scrape almost any dynamic site.

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

Scraping a Dynamic Website: Quick Example

Let us start with a complete working example you can copy and run right now. We will scrape quotes.toscrape.com/js/, a practice site that loads famous quotes entirely through JavaScript. If you tried to scrape this page with requests, you would get an empty page because the quotes are injected into the DOM by a script after the page loads. Selenium handles this by running a real browser that executes the JavaScript first.

# quick_scrape.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

# Start browser (Chrome required)
driver = webdriver.Chrome()

try:
    # Navigate to the JS-rendered quotes page
    driver.get("https://quotes.toscrape.com/js/")

    # Wait up to 10 seconds for JavaScript to render the quotes
    wait = WebDriverWait(driver, 10)
    wait.until(EC.presence_of_element_located((By.CLASS_NAME, "quote")))

    # Hand the fully-rendered HTML to BeautifulSoup for parsing
    soup = BeautifulSoup(driver.page_source, "html.parser")

    # Extract each quote's text and author
    quotes = soup.find_all("div", class_="quote")
    for quote in quotes[:5]:
        text = quote.find("span", class_="text").text
        author = quote.find("small", class_="author").text
        print(f"{author}: {text[:60]}...")
finally:
    driver.quit()

Output:

Albert Einstein: “The world as we have created it is a process of our ...
J.K. Rowling: “It is our choices, Harry, that show what we truly are...
Albert Einstein: “There are only two ways to live your life. One is a...
Jane Austen: “The person, be it gentleman or lady, who has not pleas...
Marilyn Monroe: “Imperfection is beauty, madness is genius and it's ...

Here is the HTML structure of each quote on that page, so you can see exactly what the code is targeting. Keep this snippet as a reference in case the site ever changes its layout:

<!-- HTML structure of each quote on quotes.toscrape.com/js/ -->
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
    <span class="text" itemprop="text">"The world as we have..."</span>
    <span>by
        <small class="author" itemprop="author">Albert Einstein</small>
        <a href="/author/Albert-Einstein">(about)</a>
    </span>
    <div class="tags">
        <a class="tag" href="/tag/change/page/1/">change</a>
        <a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
    </div>
</div>

There are three key things happening in this example. First, Selenium opens a real Chrome browser and navigates to the page, which triggers all the JavaScript to execute. Second, WebDriverWait pauses the script until the quote elements actually appear in the DOM, which is critical because the data is injected asynchronously by JavaScript. Third, once the page is fully rendered, we pass driver.page_source (the complete HTML after JavaScript has run) to BeautifulSoup, which gives us the familiar find_all and find methods for extracting exactly what we need. This three-step pattern of load, wait, and parse is the foundation of every dynamic scraper.

Want to go deeper? Below we cover when you actually need Selenium versus simpler tools, how to configure headless mode for speed, advanced waiting strategies, handling pagination, and a complete real-life project.

What Are Dynamic Websites and Why Do They Need Selenium?

To understand why some websites need Selenium, it helps to know the difference between static and dynamic content. A static website sends all its HTML content in the initial server response. When you fetch the page with requests.get(), you get back the complete page with all the text, links, and data already embedded in the HTML. Most older websites and many simple blogs work this way.

A dynamic website, on the other hand, sends back a mostly empty HTML shell along with JavaScript files. Your browser executes that JavaScript, which then makes additional API calls, processes the responses, and builds the page content on the fly. Modern frameworks like React, Angular, and Vue.js all work this way. When you try to scrape a dynamic site with requests, you get back the empty shell because requests does not execute JavaScript.

Here is a simple way to tell the difference. Open the website in Chrome, right-click, and select “View Page Source.” If you can see all the data you want in the source code, it is a static site and you can use requests and BeautifulSoup alone. If the source code is mostly JavaScript and the data is missing, it is a dynamic site and you need Selenium to render the page first.

Selenium solves this by automating a real browser. It launches Chrome (or Firefox, or Edge), navigates to the URL, and lets the browser do what browsers do: execute JavaScript, make API calls, render the DOM, and display the content. Once the page is fully rendered, Selenium gives you access to the final HTML, which you can then parse with BeautifulSoup just like any static page.

Sudo Sam at whiteboard explaining static vs dynamic websites — Static sites hand you the data on a silver platter. Dynamic sites make you work for it.

Selenium vs Requests: When to Use Each

Not every scraping job needs Selenium. In fact, using Selenium when you do not need it is a common beginner mistake that makes your scraper 10-50x slower than necessary. The table below will help you choose the right tool for the job.

Feature	requests + BeautifulSoup	Selenium + BeautifulSoup
Speed	Very fast (milliseconds per page)	Slow (seconds per page)
JavaScript support	None	Full browser JavaScript engine
Resource usage	Minimal (no browser)	Heavy (launches full browser)
User interaction	Cannot click, scroll, or type	Full interaction (click, scroll, type, drag)
Best for	Static HTML pages, APIs, RSS feeds	SPAs, JavaScript-rendered content, login walls
Setup complexity	pip install only	Needs browser + WebDriver installed

The best way to see this difference is to try both approaches on the same data. The site quotes.toscrape.com has two versions: a static version where all quotes are in the HTML, and a JavaScript version where they are injected by a script. Let us try scraping the static version with requests first.

# compare_static.py
# APPROACH 1: requests (fast, for static sites)
import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com/")
soup = BeautifulSoup(response.text, "html.parser")
quotes = soup.find_all("div", class_="quote")
print(f"[requests] Found {len(quotes)} quotes on static page")
for q in quotes[:3]:
    author = q.find("small", class_="author").text
    print(f"  - {author}")

Output:

[requests] Found 10 quotes on static page
  - Albert Einstein
  - J.K. Rowling
  - Albert Einstein

That took a fraction of a second. Now try the same approach on the JavaScript version, where the quotes are loaded dynamically:

# compare_dynamic_fail.py
# APPROACH 1 FAILS: requests cannot execute JavaScript
import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com/js/")
soup = BeautifulSoup(response.text, "html.parser")
quotes = soup.find_all("div", class_="quote")
print(f"[requests] Found {len(quotes)} quotes on JS page")  # 0!

Output:

[requests] Found 0 quotes on JS page

Zero quotes found because requests fetched the raw HTML before JavaScript ran. Now compare with Selenium, which lets the browser execute the JavaScript first:

# compare_dynamic_success.py
# APPROACH 2: Selenium (slower, but handles JavaScript)
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
try:
    driver.get("https://quotes.toscrape.com/js/")
    wait = WebDriverWait(driver, 10)
    wait.until(EC.presence_of_element_located((By.CLASS_NAME, "quote")))
    quotes = driver.find_elements(By.CLASS_NAME, "quote")
    print(f"[Selenium] Found {len(quotes)} quotes on JS page")
    for q in quotes[:3]:
        author = q.find_element(By.CLASS_NAME, "author").text
        print(f"  - {author}")
finally:
    driver.quit()

Output:

[Selenium] Found 10 quotes on JS page
  - Albert Einstein
  - J.K. Rowling
  - Albert Einstein

Both approaches found the same 10 quotes, but the requests version only works on the static page, while Selenium works on both. The trade-off is speed: requests ran in about 0.2 seconds while Selenium took 3-4 seconds because it had to launch Chrome, navigate to the page, and wait for JavaScript to execute. Always try requests first and only reach for Selenium when you confirm the content is loaded dynamically.

Loop Larry choosing between two doors - requests vs Selenium — requests: 0.2 seconds. Selenium: “hold on, I’m launching an entire browser real quick.”

Installing Selenium and ChromeDriver

Before you can start scraping dynamic websites, you need to install two things: the Selenium Python library and a WebDriver that matches your browser. The WebDriver is a separate executable that Selenium uses to communicate with the browser. We will use Chrome and ChromeDriver since Chrome is the most popular choice, but Selenium also supports Firefox (geckodriver), Edge (msedgedriver), and Safari.

Starting with Selenium 4.6+, you no longer need to manually download ChromeDriver. Selenium Manager handles it automatically. Here is how to get everything set up and verify it works.

# install_and_verify.py
# Step 1: Install the required packages
# Run this in your terminal:
# pip install selenium beautifulsoup4

# Step 2: Verify the installation
from selenium import webdriver

print(f"Selenium version: {webdriver.__version__}")

# Step 3: Test that ChromeDriver works
try:
    driver = webdriver.Chrome()  # Selenium Manager downloads ChromeDriver automatically
    print("Chrome WebDriver is working!")
    print(f"Browser version: {driver.capabilities['browserVersion']}")
    driver.quit()
except Exception as e:
    print(f"Error: {e}")
    print("If ChromeDriver is not found, install Chrome browser first.")

Output:

Selenium version: 4.18.0
Chrome WebDriver is working!
Browser version: 122.0.6261.94

If you see the success message, you are ready to go. If you get an error about ChromeDriver not being found, make sure you have Google Chrome installed on your system. Selenium Manager will handle the rest. For older versions of Selenium (before 4.6), you would need to manually download ChromeDriver from the ChromeDriver website and either place it in your system PATH or specify the path in your code.

Pyro Pete excitedly unboxing ChromeDriver for Selenium setup — pip install selenium — the two most exciting words in web scraping.

Loading Dynamic Pages With Selenium

Once Selenium is installed, the next step is learning how to load pages and configure the browser for scraping. The most important configuration option is headless mode, which runs Chrome without opening a visible window. This is faster, uses less memory, and is essential for running scrapers on servers or in automated pipelines.

The code below demonstrates a complete setup with headless mode, proper error handling, and the key techniques for loading dynamic content. We will use quotes.toscrape.com/js/ again since it is a reliable, publicly available dynamic site that anyone can scrape without restrictions.

# load_dynamic_page.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import time

# Configure Chrome for headless mode (no visible window)
chrome_options = Options()
chrome_options.add_argument("--headless")       # Run without GUI
chrome_options.add_argument("--no-sandbox")     # Required for some Linux environments
chrome_options.add_argument("--disable-dev-shm-usage")  # Prevent memory issues

driver = webdriver.Chrome(options=chrome_options)

try:
    # Navigate to the JS-rendered quotes page
    driver.get("https://quotes.toscrape.com/js/")
    print(f"Page title: {driver.title}")

    # Create a reusable wait object (max 10 seconds)
    wait = WebDriverWait(driver, 10)

    # Wait for the quote containers to appear in the DOM
    quotes = wait.until(
        EC.presence_of_all_elements_located((By.CLASS_NAME, "quote"))
    )
    print(f"Found {len(quotes)} quotes after JS rendering")

    # Scroll to bottom to check for lazy-loaded content
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(1)  # Give lazy-loaded content time to appear

    # Grab the fully rendered HTML for parsing
    html = driver.page_source
    print(f"Page source retrieved ({len(html):,} bytes)")

    # Quick preview of what we got
    first_quote = quotes[0].find_element(By.CLASS_NAME, "text").text
    print(f"First quote: {first_quote[:50]}...")

finally:
    driver.quit()

Output:

Page title: Quotes to Scrape
Found 10 quotes after JS rendering
Page source retrieved (11,254 bytes)
First quote: “The world as we have created it is a process...

There are a few important things to notice here. The --headless argument is what makes Chrome run invisibly in the background. The WebDriverWait object is reusable and takes a maximum timeout in seconds. If the element does not appear within that time, Selenium raises a TimeoutException, which is much better than guessing with time.sleep(). The execute_script call at the end scrolls the page to the bottom, which triggers lazy-loaded content on many modern websites. After scrolling, a brief sleep gives the new content time to render before we grab page_source.

Cache Katie sprinting past ghostly browser windows in headless mode — Headless mode: all the power of a real browser, none of the window dressing.

Parsing Rendered HTML With BeautifulSoup

Once Selenium has loaded and rendered the page, you need to extract the specific data points you care about. This is where BeautifulSoup shines. You pass driver.page_source (the fully rendered HTML) to BeautifulSoup, and then use its familiar find, find_all, and CSS selector methods to navigate the DOM tree and pull out text, attributes, and links.

The key skill here is defensive parsing. Real websites are messy. Elements might be missing on some items, classes might change, or content might be empty. Always check that an element exists before calling .text on it, or you will get AttributeError crashes in production. The example below shows how to safely extract multiple fields from each quote, including tags that may not exist on every entry.

# parse_quotes.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

driver = webdriver.Chrome()

try:
    driver.get("https://quotes.toscrape.com/js/")

    # Wait for quote elements to load
    wait = WebDriverWait(driver, 10)
    wait.until(EC.presence_of_element_located((By.CLASS_NAME, "quote")))

    # Parse the fully rendered page with BeautifulSoup
    soup = BeautifulSoup(driver.page_source, "html.parser")

    # Find every quote container on the page
    quotes = soup.find_all("div", class_="quote")
    print(f"Found {len(quotes)} quotes\n")

    # Extract data from each quote with defensive checks
    for quote in quotes:
        text_elem = quote.find("span", class_="text")
        author_elem = quote.find("small", class_="author")
        tags_container = quote.find("div", class_="tags")

        # Use conditional expressions to handle missing elements gracefully
        text = text_elem.text.strip() if text_elem else "Unknown"
        author = author_elem.text.strip() if author_elem else "Unknown"

        # Extract all tag links, default to empty list if container missing
        if tags_container:
            tags = [tag.text for tag in tags_container.find_all("a", class_="tag")]
        else:
            tags = []

        print(f"{author}")
        print(f"  {text[:70]}...")
        print(f"  Tags: {', '.join(tags) if tags else 'none'}")
        print()

finally:
    driver.quit()

Output:

Found 10 quotes

Albert Einstein
  “The world as we have created it is a process of our thinking. It cann...
  Tags: change, deep-thoughts, thinking, world

J.K. Rowling
  “It is our choices, Harry, that show what we truly are, far more than...
  Tags: abilities, choices

Albert Einstein
  “There are only two ways to live your life. One is as though nothing ...
  Tags: inspirational, life, live, miracle, miracles

Jane Austen
  “The person, be it gentleman or lady, who has not pleasure in a good ...
  Tags: aliteracy, books, classic, humor

Notice the defensive pattern on each field: text_elem.text.strip() if text_elem else "Unknown". This ensures your scraper keeps running even when individual items are missing a field. The tags extraction shows a more advanced pattern where we first check if the container exists, then extract all child links from it. In real-world scraping, you will encounter incomplete data constantly, and defensive parsing is what separates a scraper that crashes on page 3 from one that runs reliably across thousands of pages.

Debug Dee with hard hat repairing code structure - parsing HTML with BeautifulSoup — Defensive parsing: because real-world HTML is a construction site, not a museum.

Advanced: Waiting Strategies for Dynamic Content

The single biggest source of bugs in web scraping is timing. Your script runs faster than the browser can render content, so you need to explicitly tell Selenium to wait for specific conditions before proceeding. Selenium provides several built-in wait conditions through the expected_conditions module (commonly imported as EC). Understanding which condition to use and when is essential for writing reliable scrapers.

There are three types of waits you should know about. Implicit waits set a global timeout that applies to every element lookup. Explicit waits (using WebDriverWait) wait for a specific condition on a specific element. time.sleep() is the brute-force approach that pauses for a fixed number of seconds regardless of whether the element loaded instantly or not. You should almost always prefer explicit waits because they are both faster (they return as soon as the condition is met) and more reliable (they fail clearly with a timeout error if something goes wrong).

# wait_strategies.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()

try:
    driver.get("https://quotes.toscrape.com/js/")
    wait = WebDriverWait(driver, 10)

    # Wait for element to exist in the DOM (even if hidden)
    element = wait.until(
        EC.presence_of_element_located((By.CLASS_NAME, "quote"))
    )
    print("Quote element is in DOM")

    # Wait for element to be visible on screen
    element = wait.until(
        EC.visibility_of_element_located((By.CLASS_NAME, "quote"))
    )
    print("Quote element is visible")

    # Wait for the "Next" link to be clickable (visible + enabled)
    next_link = wait.until(
        EC.element_to_be_clickable((By.CSS_SELECTOR, "li.next a"))
    )
    print("Next link is clickable")

    # Click to go to page 2 and wait for new quotes to load
    next_link.click()
    new_quotes = wait.until(
        EC.presence_of_all_elements_located((By.CLASS_NAME, "quote"))
    )
    print(f"Page 2 loaded with {len(new_quotes)} quotes")

    # Verify we are on page 2 by checking the first author
    first_author = new_quotes[0].find_element(By.CLASS_NAME, "author").text
    print(f"First author on page 2: {first_author}")

finally:
    driver.quit()

Output:

Quote element is in DOM
Quote element is visible
Next link is clickable
Page 2 loaded with 10 quotes
First author on page 2: Dr. Seuss

The difference between presence_of_element_located and visibility_of_element_located is subtle but important. Presence means the element exists in the HTML DOM, even if it is hidden with CSS (display: none). Visibility means the element is both present AND visible on screen. For scraping, you usually want presence since you care about the data being in the DOM, not whether it is visually displayed. For interaction (like clicking buttons), use element_to_be_clickable which ensures the element is both visible and enabled.

Handling Pagination and Page Interaction

Many websites split their content across multiple pages. To scrape all the data, you need to navigate through each page, extract the content, and move to the next one. This is where Selenium really shines over requests, because you can click “Next” buttons, scroll through infinite-scroll pages, and interact with filters and search forms just like a human user would.

The site quotes.toscrape.com/js/ has 10 pages of quotes with a “Next” link at the bottom. The example below demonstrates a paginated scraper that clicks through the first three pages, collecting all the quotes from each one into a single list. Pay attention to the error handling on the “next page” link, which gracefully handles the case where there are no more pages to navigate.

# paginated_scraper.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
all_quotes = []

try:
    driver.get("https://quotes.toscrape.com/js/")
    wait = WebDriverWait(driver, 10)

    page_num = 1
    while page_num <= 3:
        print(f"Scraping page {page_num}...")

        # Wait for the quote elements to load on the current page
        wait.until(
            EC.presence_of_all_elements_located((By.CLASS_NAME, "quote"))
        )

        # Parse the current page with BeautifulSoup
        soup = BeautifulSoup(driver.page_source, "html.parser")
        quotes = soup.find_all("div", class_="quote")

        # Extract data from each quote
        for quote in quotes:
            text = quote.find("span", class_="text").text
            author = quote.find("small", class_="author").text
            all_quotes.append({"text": text, "author": author})

        print(f"  Found {len(quotes)} quotes on page {page_num}")

        # Try to click the "Next" link to go to the next page
        try:
            next_link = driver.find_element(By.CSS_SELECTOR, "li.next a")
            next_link.click()

            # Wait for the page to reload with new quotes
            wait.until(EC.staleness_of(
                driver.find_element(By.CLASS_NAME, "quote")
            ))
            wait.until(
                EC.presence_of_element_located((By.CLASS_NAME, "quote"))
            )
            page_num += 1
        except Exception:
            print("  No more pages available")
            break

    # Print summary
    print(f"\nTotal quotes collected: {len(all_quotes)}")
    for item in all_quotes[:3]:
        print(f"  {item['author']}: {item['text'][:50]}...")

finally:
    driver.quit()

Output:

Scraping page 1...
  Found 10 quotes on page 1
Scraping page 2...
  Found 10 quotes on page 2
Scraping page 3...
  Found 10 quotes on page 3

Total quotes collected: 30
  Albert Einstein: “The world as we have created it is a process of...
  J.K. Rowling: “It is our choices, Harry, that show what we truly...
  Albert Einstein: “There are only two ways to live your life. One ...

The try/except block around the "next page" link is crucial. When you reach the last page, the "Next" link disappears, and find_element will raise a NoSuchElementException. By catching this exception, the scraper gracefully exits the loop instead of crashing. Notice the staleness_of wait after clicking: this waits until the old quote element goes stale (meaning the page has started reloading), and then we wait for new quotes to appear. This two-step wait pattern is more reliable than a simple time.sleep() because it handles both fast and slow page loads correctly.

Pyro Pete juggling browser windows - handling pagination with Selenium — Page 1... page 2... page 47... this is fine.

Real-Life Example: Scraping Job Listings to CSV

Now let us put everything together into a production-quality scraper. We will scrape realpython.github.io/fake-jobs/, a static job board created by Real Python for learning purposes. While this particular site does not require Selenium (it is static HTML), the scraper below is written with the full Selenium pattern so you can adapt it to any real dynamic job board like Indeed, LinkedIn, or Glassdoor by changing the URL and CSS selectors. The techniques, class structure, and CSV export are all production-ready.

Loop Larry triumphantly standing on mountain of organized job listing cards — From chaos to CSV: turning 100 job listings into structured data, one scrape at a time.

Here is the HTML structure of each job card on that page, so you know what we are targeting and can adapt the selectors if the site changes:

<!-- HTML structure of each job card on realpython.github.io/fake-jobs/ -->
<div class="card">
  <div class="card-content">
    <div class="media-content">
      <h2 class="title is-5">Senior Python Developer</h2>
      <h3 class="subtitle is-6 company">Payne, Roberts and Davis</h3>
    </div>
    <div class="content">
      <p class="location">Stewartbury, AA</p>
      <footer>
        <a class="card-footer-item" href="...">Apply</a>
      </footer>
    </div>
  </div>
</div>

Notice how the class encapsulates all the browser setup, scraping logic, and export functionality into clean methods. The scrape_jobs method handles the Selenium interaction, while save_to_csv handles the data export. This separation makes the code easy to extend and reuse for any job board.

# job_scraper.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
from datetime import datetime
import csv

class JobScraper:
    """Scrapes job listings from a job board using Selenium."""

    def __init__(self, url):
        # Configure headless Chrome for silent operation
        options = Options()
        options.add_argument("--headless")
        options.add_argument("--no-sandbox")

        self.driver = webdriver.Chrome(options=options)
        self.url = url
        self.jobs = []

    def scrape_jobs(self, keyword=""):
        """Navigate to job board, extract listings, optionally filter by keyword."""
        try:
            self.driver.get(self.url)
            print(f"Navigating to {self.url}...")

            wait = WebDriverWait(self.driver, 15)

            # Wait for the job cards to render
            wait.until(
                EC.presence_of_all_elements_located((By.CLASS_NAME, "card-content"))
            )

            # Parse the fully loaded page
            soup = BeautifulSoup(self.driver.page_source, "html.parser")
            job_cards = soup.find_all("div", class_="card-content")
            print(f"Found {len(job_cards)} job listings...")

            # Extract structured data from each job card
            for card in job_cards:
                title_elem = card.find("h2", class_="title")
                company_elem = card.find("h3", class_="company")
                location_elem = card.find("p", class_="location")
                link_elem = card.find("a", string="Apply")

                title = title_elem.text.strip() if title_elem else "N/A"
                company = company_elem.text.strip() if company_elem else "N/A"
                location = location_elem.text.strip() if location_elem else "N/A"
                apply_url = link_elem.get("href") if link_elem else "#"

                # Filter by keyword if provided
                if keyword and keyword.lower() not in title.lower():
                    continue

                self.jobs.append({
                    "title": title,
                    "company": company,
                    "location": location,
                    "apply_url": apply_url,
                    "scraped_at": datetime.now().isoformat()
                })

            return self.jobs

        finally:
            self.driver.quit()

    def save_to_csv(self, filename="jobs.csv"):
        """Export scraped jobs to a CSV file."""
        if not self.jobs:
            print("No jobs to save")
            return

        with open(filename, "w", newline="", encoding="utf-8") as f:
            writer = csv.DictWriter(
                f,
                fieldnames=["title", "company", "location", "apply_url", "scraped_at"]
            )
            writer.writeheader()
            writer.writerows(self.jobs)

        print(f"Saved {len(self.jobs)} jobs to {filename}")


# Run the scraper
scraper = JobScraper("https://realpython.github.io/fake-jobs/")
jobs = scraper.scrape_jobs(keyword="Python")

print(f"\n=== Found {len(jobs)} Python Jobs ===")
for job in jobs[:5]:
    print(f"\n{job['title']}")
    print(f"  Company:  {job['company']}")
    print(f"  Location: {job['location']}")

scraper.save_to_csv("python_jobs.csv")

Output:

Navigating to https://realpython.github.io/fake-jobs/...
Found 100 job listings...

=== Found 10 Python Jobs ===

Senior Python Developer
  Company:  Payne, Roberts and Davis
  Location: Stewartbury, AA

Python Programmer (Entry-Level)
  Company:  Richards, Bates and Johnson
  Location: North Tylermouth, AA

Python Developer
  Company:  Wright, Patterson and Thomas
  Location: Lake Marytown, AA

Python Programmer
  Company:  Garcia PLC
  Location: Katherineberg, AA

Software Developer (Python)
  Company:  Villanueva, Sanders and Black
  Location: Browntown, AA

Saved 10 jobs to python_jobs.csv

This scraper demonstrates several production patterns. The class-based design makes it easy to reuse and extend. The keyword filter shows how to narrow results without relying on the site having a search function. The defensive parsing with if elem else default handles missing data gracefully. The timestamp field lets you track when each listing was scraped, which is useful for monitoring job markets over time. To adapt this scraper for a real job board like Indeed or LinkedIn, you would change the URL, update the CSS selectors to match that site's HTML structure, and possibly add pagination logic from the previous section.

Frequently Asked Questions

Why is my Selenium scraper so slow?

Selenium is inherently slower than requests because it launches a full web browser for every scraping session. However, there are several ways to speed it up significantly. First, always use headless mode (--headless) since rendering a visible window is unnecessary overhead. Second, disable image loading with --blink-settings=imagesEnabled=false to skip downloading large image files. Third, replace time.sleep() calls with explicit WebDriverWait conditions, which return as soon as the element appears rather than waiting a fixed amount of time. Finally, check whether the data you need is actually loaded via a hidden API call. You can inspect the browser's Network tab to find JSON API endpoints that return the data directly, which would let you use requests instead of Selenium entirely.

How do I handle JavaScript alerts and pop-ups?

JavaScript alerts, confirm dialogs, and cookie consent banners are common obstacles when scraping. For native browser alerts (the ones that pause JavaScript execution), Selenium provides the Alert class. You call Alert(driver) to get a reference to the alert, then .accept() to click OK or .dismiss() to click Cancel. For cookie consent banners and other overlay pop-ups that are just HTML elements, you can use regular Selenium selectors to find the "Accept" or "Close" button and click it. If a pop-up is blocking your scraper, wrapping the dismissal code in a try/except block lets you handle cases where the pop-up does not appear.

How do I avoid getting blocked while scraping?

Websites use several techniques to detect and block scrapers. The most effective countermeasure is to behave like a real user. Add random delays between requests using time.sleep(random.uniform(2, 5)) instead of fixed intervals. Rotate your User-Agent header to mimic different browsers. If you are scraping at volume, consider using a proxy rotation service to distribute requests across different IP addresses. Most importantly, always check the website's robots.txt file and terms of service. Respecting rate limits and scraping policies is not just ethical, it also prevents your IP from getting permanently banned.

Selenium can automate the login process just like a human user. Navigate to the login page, find the username and password input fields using find_element, type your credentials with send_keys(), and submit the form by pressing Enter or clicking the submit button. After login, the browser session maintains your cookies and authentication state, so subsequent page navigations will be authenticated. For sites with two-factor authentication, you may need to add a manual pause (input("Press Enter after completing 2FA...")) so you can handle the verification step yourself.

What is the difference between find_element and find_elements?

find_element (singular) returns the first matching element on the page. If no match exists, it immediately raises a NoSuchElementException. find_elements (plural) returns a Python list of all matching elements. If no matches exist, it returns an empty list instead of raising an error. Use find_element when you expect exactly one match (like a page title or a specific button). Use find_elements when you expect multiple matches (like all the products on a page) or when you want to check whether an element exists without triggering an exception.

Conclusion

Scraping dynamic websites does not have to be intimidating. The core pattern is the same every time: use Selenium to load the page and execute JavaScript, wait for the content you need to appear, then hand the rendered HTML to BeautifulSoup for parsing. We covered how to distinguish static from dynamic websites, when to use requests versus Selenium, how to configure headless Chrome, how to wait for elements reliably with WebDriverWait and expected_conditions, how to navigate paginated content, and how to build a complete job scraper with CSV export.

The job scraper example is a solid starting point that you can adapt for your own projects. Try extending it to scrape a different website, add database storage with SQLite, or build a scheduled scraper that runs daily and alerts you when new listings match your criteria. The techniques in this article apply to any dynamic website, from e-commerce platforms to social media dashboards to real estate listings.

For more details on the tools we used, check the official Selenium documentation and the BeautifulSoup documentation.

Continue Learning Python

Tutorials you might also find useful:

Post Views: 92

How To Scrape Dynamic Websites With Selenium and BeautifulSoup in Python 3

Scraping a Dynamic Website: Quick Example

What Are Dynamic Websites and Why Do They Need Selenium?

Selenium vs Requests: When to Use Each

Installing Selenium and ChromeDriver

Loading Dynamic Pages With Selenium

Parsing Rendered HTML With BeautifulSoup

Advanced: Waiting Strategies for Dynamic Content

Handling Pagination and Page Interaction

Real-Life Example: Scraping Job Listings to CSV

Frequently Asked Questions

Why is my Selenium scraper so slow?

How do I handle JavaScript alerts and pop-ups?

How do I avoid getting blocked while scraping?

How do I scrape pages that require logging in?

What is the difference between find_element and find_elements?

Conclusion

Related Articles

Continue Learning Python

Submit a Comment Cancel reply