Intermediate
You need to find, extract, or replace text patterns in strings — email addresses in a document, phone numbers in a log file, or URLs scattered across a web page. Python’s built-in re module gives you regular expressions, a powerful pattern-matching language that handles all of these tasks with concise, reusable patterns.
Regular expressions look intimidating at first, but they follow logical rules. Once you learn the core patterns, you will reach for them constantly — for data validation, log parsing, text cleanup, and search-and-replace operations. Python’s re module is part of the standard library, so there is nothing to install.
In this tutorial, you will learn the essential regex patterns, how to use re.search(), re.findall(), re.sub(), and re.compile(), work with groups and named groups, handle common real-world patterns, and build a complete log parser project.
Pattern Matching: Quick Example
Here is a minimal example that extracts all email addresses from a block of text.
# quick_regex.py
import re
text = "Contact us at support@example.com or sales@company.org for help."
pattern = r'[\w.+-]+@[\w-]+\.[\w.]+'
emails = re.findall(pattern, text)
for email in emails:
print(email)
Output:
support@example.com
sales@company.org
The r prefix creates a raw string, which prevents Python from interpreting backslashes as escape characters. Always use raw strings for regex patterns.

What Are Regular Expressions and Why Use Them?
Regular expressions (regex) are sequences of characters that define search patterns. Think of them as advanced wildcards — where a file search might use *.txt to match all text files, regex gives you the precision to match patterns like “any string that looks like a phone number” or “every word followed by a colon.”
| Pattern | Matches | Example |
|---|---|---|
\d |
Any digit (0-9) | \d{3} matches “123” |
\w |
Word character (letter, digit, underscore) | \w+ matches “hello_world” |
\s |
Whitespace (space, tab, newline) | \s+ matches ” “ |
. |
Any character except newline | a.c matches “abc”, “a1c” |
+ |
One or more of previous | \d+ matches “42”, “1000” |
* |
Zero or more of previous | \d* matches “”, “5”, “99” |
? |
Zero or one of previous | colou?r matches “color”, “colour” |
[abc] |
Any character in the set | [aeiou] matches vowels |
^ |
Start of string | ^Hello matches “Hello world” |
$ |
End of string | world$ matches “Hello world” |
Core Functions: search, match, findall, sub
The re module provides four main functions. Each serves a different use case.
# core_functions.py
import re
text = "Order #12345 was placed on 2024-03-15 for $49.99"
# search() - find first match anywhere in string
match = re.search(r'#(\d+)', text)
if match:
print(f"Order number: {match.group(1)}")
# findall() - find all matches, return as list
prices = re.findall(r'\$[\d.]+', text)
print(f"Prices found: {prices}")
# sub() - search and replace
cleaned = re.sub(r'\$[\d.]+', '[PRICE]', text)
print(f"Cleaned: {cleaned}")
# match() - match at the START of string only
result = re.match(r'Order', text)
print(f"Starts with Order: {result is not None}")
Output:
Order number: 12345
Prices found: ['$49.99']
Cleaned: Order #12345 was placed on 2024-03-15 for [PRICE]
Starts with Order: True
Use search() when you want to find a pattern anywhere in the string. Use match() only when you specifically need to check the beginning of the string.
Groups and Named Groups
Parentheses in regex create capture groups that let you extract specific parts of a match. Named groups make your code more readable.
# groups.py
import re
log_line = '2024-03-15 14:30:22 ERROR [auth] Failed login for user admin from 192.168.1.100'
# Numbered groups
pattern = r'(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+) \[(\w+)\] (.+)'
match = re.search(pattern, log_line)
if match:
print(f"Date: {match.group(1)}")
print(f"Time: {match.group(2)}")
print(f"Level: {match.group(3)}")
print(f"Module: {match.group(4)}")
print(f"Message: {match.group(5)}")
# Named groups (much more readable)
pattern = r'(?P<date>\d{4}-\d{2}-\d{2}) (?P<time>\d{2}:\d{2}:\d{2}) (?P<level>\w+)'
match = re.search(pattern, log_line)
if match:
print(f"\nNamed: {match.group('date')} at {match.group('time')} [{match.group('level')}]")
Output:
Date: 2024-03-15
Time: 14:30:22
Level: ERROR
Module: auth
Message: Failed login for user admin from 192.168.1.100
Named: 2024-03-15 at 14:30:22 [ERROR]
Named groups use the (?P<name>...) syntax. They are especially valuable in complex patterns where numbered groups become confusing.

Compiled Patterns with re.compile()
When you use the same regex pattern multiple times, compile it first for better performance and cleaner code.
# compiled.py
import re
# Compile once, use many times
email_pattern = re.compile(r'[\w.+-]+@[\w-]+\.[\w.]+')
phone_pattern = re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b')
contacts = [
"Call John at 555-123-4567 or email john@example.com",
"Reach Jane at jane@company.org or 555.987.6543",
"No contact info here, sorry!",
]
for text in contacts:
emails = email_pattern.findall(text)
phones = phone_pattern.findall(text)
print(f"Emails: {emails}, Phones: {phones}")
Output:
Emails: ['john@example.com'], Phones: ['555-123-4567']
Emails: ['jane@company.org'], Phones: ['555.987.6543']
Emails: [], Phones: []
Compiled patterns have the same methods as the re module itself — search(), findall(), sub(), etc. The performance benefit comes from compiling the regex once instead of recompiling it on every call.
Common Real-World Patterns
Here are battle-tested patterns for common data extraction tasks.
# common_patterns.py
import re
# URL extraction
text = "Visit https://example.com or http://docs.python.org/3/ for details"
urls = re.findall(r'https?://[\w./\-?=&]+', text)
print("URLs:", urls)
# Date extraction (YYYY-MM-DD)
text = "Events on 2024-03-15 and 2024-12-25"
dates = re.findall(r'\d{4}-\d{2}-\d{2}', text)
print("Dates:", dates)
# IP address
text = "Server 192.168.1.100 responded, backup at 10.0.0.1"
ips = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', text)
print("IPs:", ips)
# HTML tag stripping
html = "<p>Hello <b>world</b></p>"
clean = re.sub(r'<[^>]+>', '', html)
print("Stripped:", clean)
Output:
URLs: ['https://example.com', 'http://docs.python.org/3/']
Dates: ['2024-03-15', '2024-12-25']
IPs: ['192.168.1.100', '10.0.0.1']
Stripped: Hello world
These patterns handle the most common extraction tasks. For production use, consider edge cases — the email pattern above works for most addresses but does not cover every valid format defined in the RFC.

Real-Life Example: Building a Log File Parser
# log_parser.py
import re
from collections import Counter
# Sample log data
log_data = """
2024-03-15 14:30:22 ERROR [auth] Failed login for user admin from 192.168.1.100
2024-03-15 14:30:25 INFO [auth] Successful login for user alice from 10.0.0.50
2024-03-15 14:31:00 WARNING [db] Slow query detected: 2.5s
2024-03-15 14:31:05 ERROR [api] Timeout connecting to payment service
2024-03-15 14:31:10 INFO [auth] Successful login for user bob from 10.0.0.51
2024-03-15 14:31:15 ERROR [auth] Failed login for user admin from 192.168.1.100
2024-03-15 14:31:20 INFO [api] Health check passed
2024-03-15 14:31:30 ERROR [db] Connection pool exhausted
""".strip()
log_pattern = re.compile(
r'(?P\d{4}-\d{2}-\d{2}) '
r'(?P
Output:
Log level counts:
ERROR: 4
INFO: 3
WARNING: 1
Failed login attempts: 2
Suspicious IPs:
192.168.1.100: 2 failed attempts
This parser combines compiled regex patterns, named groups, and Python’s Counter class to analyze log files efficiently. You can extend it to read from actual log files, send alerts for suspicious activity, or export metrics to a dashboard.
Frequently Asked Questions
What is the difference between greedy and lazy matching?
By default, quantifiers like + and * are greedy — they match as much as possible. Adding ? makes them lazy, matching as little as possible. For example, <.+> on “<b>bold</b>” matches the entire string, while <.+?> matches just “<b>”.
How do I match across multiple lines?
Use the re.DOTALL flag to make . match newlines, or re.MULTILINE to make ^ and $ match at line boundaries instead of string boundaries. Pass these as the flags parameter: re.search(pattern, text, flags=re.MULTILINE).
When do I need to escape special characters?
Escape regex metacharacters (. * + ? [ ] ( ) { } ^ $ | \) with a backslash when you want to match them literally. Use re.escape() to automatically escape a string: re.escape("price is $9.99") returns "price\ is\ \$9\.99".
Are regular expressions slow?
For simple pattern matching, string methods like str.startswith(), str.endswith(), and in are faster. Use regex when you need pattern matching that string methods cannot handle. Compile patterns with re.compile() if you use them repeatedly.
When should I NOT use regex?
Do not use regex to parse HTML or XML — use BeautifulSoup or lxml instead. Do not use regex for JSON — use the json module. Regex is best for flat text patterns, not nested structures. If your pattern needs recursive matching, you need a proper parser.
Conclusion
Regular expressions are one of the most powerful tools in a Python developer’s toolkit. We covered the core syntax, the main re module functions, capture groups, compiled patterns, common real-world patterns, and building a complete log parser.
For the full re module documentation, visit docs.python.org/3/library/re. For interactive regex testing, try regex101.com which provides real-time explanations of your patterns.