How To Use Playwright for Web Scraping in Python

Last Updated: June 01, 2026

Table of Contents

Scraping a Dynamic Page: Quick Example
What Is Playwright and When Do You Need It?
Installing Playwright for Python
Sync vs Async API
Waiting for Dynamic Content
Extracting Data from the Page
Real-Life Example: Scraping a JavaScript-Rendered Quote Page
Frequently Asked Questions
Conclusion
Related Articles

Intermediate

Some websites load their content dynamically with JavaScript, which means a simple HTTP request with requests only gets you an empty shell. Playwright solves this by controlling a real browser — Chromium, Firefox, or WebKit — letting you scrape pages that rely on JavaScript rendering, single-page applications, and content loaded behind user interactions.

Microsoft’s Playwright library for Python provides a clean async and sync API for browser automation. It installs its own browser binaries, handles waits automatically, and runs headless by default. Combined with BeautifulSoup for HTML parsing, it gives you the power to scrape virtually any website.

In this tutorial, you will learn how to install Playwright, launch a browser, navigate to pages, wait for dynamic content, extract data from JavaScript-rendered pages, handle clicks and scrolling, and build a complete scraper for dynamic websites.

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

Scraping a Dynamic Page: Quick Example

Here is a minimal example that scrapes quotes from a JavaScript-rendered page that returns empty HTML to regular HTTP requests.

# quick_playwright.py
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://quotes.toscrape.com/js/")
    page.wait_for_selector("div.quote")

    quotes = page.query_selector_all("div.quote span.text")
    for quote in quotes[:5]:
        print(quote.text_content())
    browser.close()

Output:

"The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."
"It is our choices, Harry, that show what we truly are, far more than our abilities."
"There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle."
"The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid."
"Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring."

The key difference from BeautifulSoup is that Playwright actually executes the page’s JavaScript before you extract data. The wait_for_selector() call ensures the dynamic content has loaded before scraping.

What Is Playwright and When Do You Need It?

Playwright is a browser automation library that controls real browsers programmatically. Unlike requests + BeautifulSoup which only works with static HTML, Playwright can handle any page a human can see in a browser — including single-page applications built with React, Vue, or Angular.

Feature	requests + BeautifulSoup	Playwright
JavaScript rendering	No	Yes
Speed	Very fast	Slower (browser overhead)
Memory usage	Low	Higher
Login/cookies	Manual	Automatic
Click/scroll	No	Yes
Best for	Static HTML pages	Dynamic/JS-heavy pages

Use Playwright when the page you need to scrape loads content with JavaScript. If the page works with JavaScript disabled, stick with requests and BeautifulSoup — it will be much faster.

Playwright: <a href= — Playwright: Selenium’s modern younger sibling. Faster, better.

Installing Playwright for Python

Playwright requires two installation steps: the Python package and the browser binaries.

# install_playwright.sh
pip install playwright
playwright install chromium

Output:

Successfully installed playwright-1.42.0
Downloading Chromium 123.0.6312.4 - 140.2 Mb
Chromium downloaded to /home/user/.cache/ms-playwright/

The playwright install chromium command downloads a specific Chromium build. You can also install firefox or webkit if you need cross-browser testing.

Sync vs Async API

Playwright offers both synchronous and asynchronous APIs. The sync API is simpler for scripts and scraping. The async API is better for high-concurrency applications.

# sync_example.py
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://quotes.toscrape.com/js/")
    title = page.title()
    print(f"Page title: {title}")
    browser.close()

Output:

Page title: Quotes to Scrape

# async_example.py
import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto("https://quotes.toscrape.com/js/")
        title = await page.title()
        print(f"Page title: {title}")
        await browser.close()

asyncio.run(main())

Output:

Page title: Quotes to Scrape

For most scraping projects, the sync API is perfectly fine. Use async only when you need to scrape multiple pages concurrently.

Headless Chrome, Firefox, WebKit. One API, three browsers.

Waiting for Dynamic Content

The most common mistake with browser-based scraping is trying to extract data before it has loaded. Playwright provides several waiting strategies to handle this.

# waiting_strategies.py
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://quotes.toscrape.com/js/")

    # Wait for a specific element to appear
    page.wait_for_selector("div.quote", timeout=10000)

    # Wait for network to be idle (all AJAX calls done)
    page.wait_for_load_state("networkidle")

    quotes = page.query_selector_all("div.quote")
    print(f"Found {len(quotes)} quotes after waiting")
    browser.close()

Output:

Found 10 quotes after waiting

The wait_for_selector() method pauses execution until the element appears in the DOM. The timeout parameter (in milliseconds) prevents infinite waiting if the element never appears.

Extracting Data from the Page

Playwright gives you two approaches for extracting data: using Playwright’s built-in selectors, or passing the rendered HTML to BeautifulSoup.

# extract_with_playwright.py
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://quotes.toscrape.com/js/")
    page.wait_for_selector("div.quote")

    # Method 1: Playwright selectors
    first_quote = page.query_selector("span.text")
    print("Playwright:", first_quote.text_content()[:60] + "...")

    # Method 2: Pass HTML to BeautifulSoup
    html = page.content()
    soup = BeautifulSoup(html, "html.parser")
    bs_quote = soup.select_one("span.text")
    print("BeautifulSoup:", bs_quote.text[:60] + "...")

    browser.close()

Output:

Playwright: "The world as we have created it is a process of our thinking...
BeautifulSoup: "The world as we have created it is a process of our thinking...

The BeautifulSoup approach is often better for complex parsing because BeautifulSoup has richer navigation methods. Use page.content() to get the fully rendered HTML and parse it with BeautifulSoup.

Auto-waiting selectors: no more sleep(5) hacks.

Real-Life Example: Scraping a JavaScript-Rendered Quote Page

# js_quote_scraper.py
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
import json

def scrape_js_quotes():
    all_quotes = []
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        for page_num in range(1, 11):
            url = f"https://quotes.toscrape.com/js/page/{page_num}/"
            page.goto(url)
            page.wait_for_selector("div.quote", timeout=5000)

            html = page.content()
            soup = BeautifulSoup(html, "html.parser")

            for quote_div in soup.find_all("div", class_="quote"):
                text = quote_div.find("span", class_="text")
                author = quote_div.find("small", class_="author")
                tags_div = quote_div.find("div", class_="tags")
                tags = [t.text for t in tags_div.find_all("a")] if tags_div else []

                all_quotes.append({
                    "text": text.text if text else "Unknown",
                    "author": author.text if author else "Unknown",
                    "tags": tags
                })
            print(f"Page {page_num}: {len(soup.find_all('div', class_='quote'))} quotes")
        browser.close()
    return all_quotes

quotes = scrape_js_quotes()
print(f"Total: {len(quotes)} quotes from JS-rendered pages")
with open("js_quotes.json", "w", encoding="utf-8") as f:
    json.dump(quotes, f, indent=2, ensure_ascii=False)
print("Saved to js_quotes.json")

Output:

Page 1: 10 quotes
Page 2: 10 quotes
...
Page 10: 10 quotes
Total: 100 quotes from JS-rendered pages
Saved to js_quotes.json

This scraper combines Playwright for rendering with BeautifulSoup for parsing. Reusing the same browser instance across pages is significantly faster than launching a new browser for each page.

Frequently Asked Questions

What does headless mode mean?

Headless mode means the browser runs without a visible window. This is the default and preferred mode for scraping because it uses less memory and runs faster. Set headless=False during development to see what the browser is doing.

How does Playwright compare to Selenium?

Playwright is newer and generally faster than Selenium. It has better auto-waiting, built-in support for multiple browser contexts, and a cleaner API. Selenium has a larger community and longer track record. For new scraping projects, Playwright is the recommended choice.

Can websites detect Playwright?

Yes, some websites use bot detection that can identify automated browsers. Playwright provides stealth options, but sophisticated anti-bot systems can still detect automation. Always respect website terms of service and consider whether scraping a particular site is appropriate.

How can I make Playwright scraping faster?

Disable images and CSS loading with route interception to speed up page loads. Reuse browser instances across multiple pages. Use networkidle wait state only when necessary. For multiple pages, use async mode with concurrent page objects.

Does Playwright use a lot of memory?

Yes, each browser instance uses 50-200MB of RAM. For large scraping jobs, process pages sequentially and close browser contexts when done. Monitor memory usage and restart the browser periodically for long-running scrapers.

Conclusion

Playwright opens up scraping possibilities that static HTTP libraries cannot touch. We covered installation, sync and async APIs, waiting for dynamic content, extracting data with both Playwright selectors and BeautifulSoup, and building a complete scraper for JavaScript-rendered pages.

For the full Playwright Python documentation, visit playwright.dev/python.

Playwright Setup

Playwright installs the browser binaries alongside the Python package — no separate driver dance like Selenium. One install command brings Chrome, Firefox, and WebKit:

# pip install playwright
# python -m playwright install   (downloads browser binaries)

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://example.com")
    print(page.title())
    browser.close()

The sync_playwright() context manager handles the entire browser-process lifecycle. Use browser.close() in production code to free resources promptly.

Async API for Concurrent Scraping

Playwright’s killer feature for scrapers: native async. Crawl 100 URLs concurrently without spinning up 100 browser processes:

import asyncio
from playwright.async_api import async_playwright

async def fetch_title(url, page):
    await page.goto(url, wait_until="networkidle")
    return url, await page.title()

async def main(urls):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context()
        # One page per concurrent task
        results = await asyncio.gather(*[
            fetch_title(url, await context.new_page())
            for url in urls
        ])
        await browser.close()
        for url, title in results:
            print(url, "->", title)

asyncio.run(main([
    "https://example.com",
    "https://python.org",
    "https://github.com",
]))

The shared context reuses cookies, local storage, and authentication state across all tabs — efficient when scraping a logged-in site.

Auto-Waiting Selectors

Playwright’s selectors auto-wait for the target to be ready before acting. No more WebDriverWait + EC.presence_of_element_located:

page.goto("https://shop.example.com")

# Click — waits for element to be visible AND enabled
page.click("button#add-to-cart")

# Fill — waits for the input to be present
page.fill("input[name='email']", "user@example.com")

# Select option — waits for dropdown
page.select_option("select#country", "US")

# Wait for navigation after click
with page.expect_navigation():
    page.click("a.checkout")

# Wait for a specific selector to appear
page.wait_for_selector("text=Order confirmed", timeout=10000)

The implicit waits are why Playwright tests are dramatically less flaky than Selenium tests. No time.sleep(2) hacks.

Locator API and Best Practices

For complex pages, use the locator() API instead of raw selectors. Locators are lazy and re-resolve on each action — perfect for SPAs that re-render constantly:

# Reusable locator
cart_button = page.locator("button#add-to-cart")
cart_button.click()
cart_button.wait_for(state="hidden")  # waits until removed from DOM

# Chained locators (parent -> child)
form = page.locator("form#checkout")
form.locator("input[name='email']").fill("alice@example.com")
form.locator("input[name='zip']").fill("90210")
form.locator("button[type='submit']").click()

# Text-based selector — robust against class-name changes
page.get_by_text("Sign in").click()
page.get_by_role("button", name="Submit").click()
page.get_by_label("Email address").fill("alice@example.com")

The get_by_role and get_by_label helpers are accessibility-first — they target the same semantic elements that screen readers find, making them resilient to CSS / class changes.

Network Interception

Playwright lets you intercept and modify network requests — useful for blocking ads, mocking APIs, or extracting JSON responses:

# Block all images (faster scraping)
page.route("**/*.{png,jpg,jpeg,gif,svg}", lambda route: route.abort())

# Mock an API response
def handle(route):
    route.fulfill(json={"status": "ok", "data": [1, 2, 3]})

page.route("**/api/users", handle)

# Capture all XHR responses
page.on("response", lambda resp: print(resp.url, resp.status) if "api" in resp.url else None)

Common Pitfalls

Forgetting to install browsers. pip install playwright doesn’t bring the browser binaries. Run python -m playwright install once after install.
Using time.sleep instead of wait_for. Sleeps make tests flaky and slow. page.wait_for_selector, page.wait_for_load_state, and locator’s implicit waits handle every case.
One context per page when you should share. Each context costs memory. For 100 pages of the same site, use one context and 100 pages.
Ignoring page.on(“dialog”). JavaScript alert() / confirm() dialogs block Playwright forever unless you register a handler that dismisses them.
Headless != real. Some sites detect headless Chrome by missing browser features. Use headless=False for debugging, then test in headless mode separately.

FAQ

Q: Playwright or Selenium?
A: Playwright for new projects — faster, less flaky, better API. Selenium when you need broader browser support (Edge, IE legacy) or you have existing Selenium tests.

Q: Sync or async API?
A: Async for scrapers that hit many URLs concurrently. Sync for test suites that already use synchronous frameworks (pytest, unittest). The APIs are nearly identical; switching later is easy.

Q: Does Playwright handle bot protection?
A: Better than Selenium, but not perfectly. For Cloudflare / DataDome, use the playwright-stealth plugin and rotate proxies. Some sites you simply can’t scrape reliably without residential IPs.

Q: How do I take a screenshot for debugging?
A: page.screenshot(path="screenshot.png", full_page=True). For video recording, set context = browser.new_context(record_video_dir="videos/").

Q: Can I reuse browser state across runs?
A: Yes — save context state with context.storage_state(path="state.json") and load with browser.new_context(storage_state="state.json"). Cookies and local storage persist between sessions.

Wrapping Up

Playwright fixes nearly every Selenium pain point — driver dance, flaky waits, slow startup, awkward async. For new web-scraping or browser-automation projects in Python, it’s the right default. The locator API + auto-waiting + native async covers 95% of use cases. The remaining 5% (Cloudflare bypass, mobile emulation, headless detection) needs specialty tools regardless of framework.

Continue Learning Python

Tutorials you might also find useful:

Post Views: 76

How To Use Playwright for Web Scraping in Python

Scraping a Dynamic Page: Quick Example

What Is Playwright and When Do You Need It?

Installing Playwright for Python

Sync vs Async API

Waiting for Dynamic Content

Extracting Data from the Page

Real-Life Example: Scraping a JavaScript-Rendered Quote Page

Frequently Asked Questions

What does headless mode mean?

How does Playwright compare to Selenium?

Can websites detect Playwright?

How can I make Playwright scraping faster?

Does Playwright use a lot of memory?

Conclusion

Playwright Setup

Async API for Concurrent Scraping

Auto-Waiting Selectors

Locator API and Best Practices

Network Interception

Common Pitfalls

FAQ

Wrapping Up

Related Articles

Continue Learning Python

Submit a Comment Cancel reply