How To Parse XML with Python Using ElementTree

Last Updated: June 01, 2026

Table of Contents

Parsing XML: Quick Example
What Is XML and How Does ElementTree Represent It?
Parsing XML Files
Searching with XPath Expressions
Handling XML Namespaces
Writing XML Files
Real-Life Example: RSS Feed Parser
Frequently Asked Questions
Conclusion
Related Articles

Beginner

XML is everywhere: RSS feeds, SOAP APIs, Android app manifests, Microsoft Office files (which are zipped XML), and countless configuration formats. When you need to read an RSS feed, parse a config file from a legacy enterprise system, or process government data published in XML format, Python’s built-in xml.etree.ElementTree module handles it without any extra dependencies. It is fast, easy to learn, and sufficient for the vast majority of XML parsing tasks.

Everything you need comes with Python — no pip install required. For more advanced XPath support or HTML-tolerant parsing, the third-party lxml library is an excellent upgrade. The examples in this article use only the standard library.

This article covers: parsing XML from strings and files, navigating the element tree, reading attributes and text, searching with find and findall, basic XPath expressions, handling namespaces, and writing XML back to a file. There is also a real-life RSS feed parser project at the end.

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

Parsing XML: Quick Example

Here is the fastest way to parse XML and extract values from it:

# xml_quick.py
import xml.etree.ElementTree as ET

xml_data = """

    
        Clean Code
        Robert Martin
        2008
    
    
        The Pragmatic Programmer
        David Thomas
        1999
    

"""

root = ET.fromstring(xml_data)

for book in root.findall("book"):
    book_id = book.get("id")
    title   = book.findtext("title")
    author  = book.findtext("author")
    print(f"[{book_id}] {title} by {author}")

Output:

[1] Clean Code by Robert Martin
[2] The Pragmatic Programmer by David Thomas

ET.fromstring() parses an XML string and returns the root element. findall("book") returns all direct child elements named “book”. .get("id") reads an attribute, and .findtext("title") returns the text of the first matching child element (or None if not found).

What Is XML and How Does ElementTree Represent It?

XML is a text format that represents data as a tree of nested elements. Each element has a tag name, optional attributes (key-value pairs in the opening tag), optional text content, and zero or more child elements. ElementTree maps this structure directly to Python objects.

XML concept	ElementTree object	How to access
Element tag	`element.tag`	`"book"`
Attribute	`element.get("name")`	`"id"`, `"class"`
Text content	`element.text`	Text before first child
Tail text	`element.tail`	Text after closing tag
Children	`list(element)`	Direct child elements
All descendants	`element.iter(tag)`	Recursive search

The root element is just another Element object. The whole document is an ElementTree object that wraps the root. You typically call ET.parse(file) to get an ElementTree, then call .getroot() to get the root Element — or use ET.fromstring(text) to parse a string directly into the root element.

Debug Dee tracing tree branches from root to leaf nodes — XML trees have roots, branches, and leaves. Just like real trees, except less photosynthesis.

Parsing XML Files

For real-world use you will usually parse XML from a file rather than a string. ET.parse() accepts a file path and returns an ElementTree object. Call .getroot() to start traversing.

# xml_parse_file.py
import xml.etree.ElementTree as ET

# Create a sample XML file first
xml_content = """\


    
        Alice Chen
        95000
        
            Python
            Docker
        
    
    
        Bob Rivera
        72000
        
            SEO
            Analytics
        
    

"""
with open("employees.xml", "w") as f:
    f.write(xml_content)

# Now parse it
tree = ET.parse("employees.xml")
root = tree.getroot()

print(f"Root tag: {root.tag}")
print(f"Number of employees: {len(root)}")

for emp in root.findall("employee"):
    name = emp.findtext("name")
    dept = emp.get("department")
    skills = [s.text for s in emp.findall("skills/skill")]
    print(f"{name} ({dept}): {', '.join(skills)}")

Output:

Root tag: employees
Number of employees: 2
Alice Chen (Engineering): Python, Docker
Bob Rivera (Marketing): SEO, Analytics

The emp.findall("skills/skill") call uses a simple XPath path expression to find all <skill> elements inside <skills> inside each employee. We use a list comprehension to extract .text from each skill element. Always use elem.text if elem is not None else "" defensively if an element might be absent.

Searching with XPath Expressions

ElementTree supports a useful subset of XPath for searching documents. These expressions are passed to find(), findall(), and findtext().

# xml_xpath.py
import xml.etree.ElementTree as ET

xml_data = """

    
        Python Cookbook
        true
    
    
        Advanced Python
        false
    
    
        Python Basics
        true
    

"""

root = ET.fromstring(xml_data)

# Find all books
books = root.findall("product[@type='book']")
print("Books:")
for b in books:
    print(f"  {b.findtext('name')} -- ${b.get('price')}")

# Find first in-stock item
in_stock = root.find("product[in_stock='true']")
print("\nFirst in-stock:", in_stock.findtext("name") if in_stock is not None else "None")

# Find all product names
all_names = root.findall("product/name")
print("\nAll products:", [n.text for n in all_names])

Output:

Books:
  Python Cookbook -- $29.99
  Python Basics -- $19.99

First in-stock: Python Cookbook

All products: ['Python Cookbook', 'Advanced Python', 'Python Basics']

The [@attribute='value'] predicate filters elements by attribute value. The [child='text'] predicate filters by child text content. These XPath subsets cover most practical search scenarios without needing the full XPath library.

Cache Katie highlighting specific branch with search beam — find() gets you one element. findall() gets you all of them. XPath gets you a headache.

Handling XML Namespaces

Many real-world XML formats (RSS, SOAP, SVG, Office Open XML) use namespaces — URI prefixes that qualify element names. ElementTree includes the namespace URI in curly braces as part of the tag: {http://example.com/ns}element. You can handle this with a namespace map.

# xml_namespaces.py
import xml.etree.ElementTree as ET

rss_xml = """

    
        
            Python 3.14 Released
            
        
        
            PyCon 2026 Recap
            
        
    

"""

ns = {"media": "http://search.yahoo.com/mrss/"}

root = ET.fromstring(rss_xml)
for item in root.findall("channel/item"):
    title = item.findtext("title")
    media = item.find("media:content", ns)
    url = media.get("url") if media is not None else "no image"
    print(f"{title} -- {url}")

Output:

Python 3.14 Released -- https://example.com/img.jpg
PyCon 2026 Recap -- https://example.com/img2.jpg

Define a dictionary mapping your chosen prefix to the full namespace URI, then pass it as the second argument to find() and findall(). You can choose any prefix you like in your Python code — it does not have to match the prefix used in the XML document itself.

Writing XML Files

ElementTree also lets you build XML documents from scratch using ET.Element, ET.SubElement, and ET.ElementTree. Use ET.indent() (Python 3.9+) to add pretty-print formatting before writing.

# xml_write.py
import xml.etree.ElementTree as ET

root = ET.Element("inventory")

def add_item(parent, name, qty, price):
    item = ET.SubElement(parent, "item")
    item.set("price", str(price))
    ET.SubElement(item, "name").text = name
    ET.SubElement(item, "quantity").text = str(qty)
    return item

add_item(root, "Widget A", 150, 9.99)
add_item(root, "Widget B", 42, 24.99)
add_item(root, "Gadget X", 8, 199.99)

ET.indent(root, space="    ")  # Python 3.9+ pretty-printing
tree = ET.ElementTree(root)
tree.write("inventory.xml", encoding="utf-8", xml_declaration=True)

# Verify by reading it back
with open("inventory.xml") as f:
    print(f.read())

Output:

<?xml version='1.0' encoding='utf-8'?>
<inventory>
    <item price="9.99">
        <name>Widget A</name>
        <quantity>150</quantity>
    </item>
    <item price="24.99">
        <name>Widget B</name>
        <quantity>42</quantity>
    </item>
    <item price="199.99">
        <name>Gadget X</name>
        <quantity>8</quantity>
    </item>
</inventory>

ET.SubElement(parent, tag) creates a child element and attaches it to the parent in one step. Set attributes with .set(key, value) and text with .text = "...". Pass xml_declaration=True to .write() to include the <?xml ...?> header line.

Sudo Sam placing new branches on glowing tree structure — Building XML by hand: therapeutic or torturous, depending on the nesting depth.

Real-Life Example: RSS Feed Parser

This script fetches and parses a real RSS feed using urllib (built-in) and prints the latest 5 articles with titles, links, and publish dates.

# rss_parser.py
import xml.etree.ElementTree as ET
import urllib.request

RSS_URL = "https://realpython.com/atom.xml"

def parse_rss(url: str, limit: int = 5) -> list[dict]:
    with urllib.request.urlopen(url) as response:
        xml_bytes = response.read()

    root = ET.fromstring(xml_bytes)

    # Atom feed namespace
    atom_ns = {"atom": "http://www.w3.org/2005/Atom"}

    entries = root.findall("atom:entry", atom_ns)[:limit]
    results = []
    for entry in entries:
        title   = entry.findtext("atom:title", namespaces=atom_ns) or "No title"
        link_el = entry.find("atom:link", atom_ns)
        link    = link_el.get("href") if link_el is not None else ""
        updated = entry.findtext("atom:updated", namespaces=atom_ns) or ""
        results.append({"title": title, "link": link, "updated": updated[:10]})
    return results

if __name__ == "__main__":
    articles = parse_rss(RSS_URL)
    for i, art in enumerate(articles, 1):
        print(f"{i}. [{art['updated']}] {art['title']}")
        print(f"   {art['link']}")

Output:

1. [2026-04-10] Python 3.14 Feature Highlights
   https://realpython.com/python-314-features/
2. [2026-04-08] How to Use Structural Pattern Matching
   https://realpython.com/structural-pattern-matching/
...

This parser works on any Atom feed — swap the URL for any site that publishes an Atom feed. For RSS 2.0 feeds (which use <item> instead of <entry> and have no namespace), replace the XPath expressions with simple tag names like root.findall("channel/item"). Always wrap urllib.request.urlopen in a try/except for production code that handles network errors gracefully.

API Alice pouring data cubes into funnel producing organized tree — From raw data to structured XML. The funnel does the heavy lifting.

Frequently Asked Questions

When should I use lxml instead of ElementTree?

Use lxml when you need full XPath 1.0 support, XSLT transforms, schema validation (DTD/XSD), or when you are parsing malformed HTML/XML that ElementTree would reject. lxml is also significantly faster for very large documents (hundreds of MB). For most everyday XML parsing tasks — config files, RSS feeds, API responses — xml.etree.ElementTree is sufficient and avoids the extra dependency.

What is the difference between find() and findall()?

find() returns the first matching element or None if nothing matches. findall() always returns a list — an empty list if nothing matches, never None. Use find() when you expect exactly one element (like a single <title> tag). Use findall() when you expect multiple elements (like all <item> tags). The companion findtext() returns the text of the first match or a default value, saving you from checking for None yourself.

How do I parse very large XML files without loading them into memory?

Use ET.iterparse() to process an XML file as a stream, one element at a time. This keeps memory usage constant regardless of file size. The pattern is: call ET.iterparse(filepath, events=("start", "end")), process elements when you see the “end” event, and call elem.clear() after processing to release memory. This is the right approach for XML files in the hundreds of megabytes range.

How do I modify an existing XML file?

Parse the file with ET.parse(), traverse to the element you want to change, modify its .text, .attrib, or call .remove(child) / .append(new_child), then write the modified tree back with tree.write(filepath). Note that ElementTree does not preserve the original XML formatting (whitespace, comments outside the root element are lost). If preserving exact formatting matters, use lxml which has better round-trip fidelity.

Is ElementTree safe to use with untrusted XML input?

ElementTree is vulnerable to certain XML attack vectors including “billion laughs” (exponential entity expansion) and external entity injection (XXE). If you are parsing XML from untrusted sources (user uploads, third-party web responses), use the defusedxml library instead (pip install defusedxml). It is a drop-in replacement that blocks these attack vectors. For trusted sources like your own config files or known APIs, standard ElementTree is fine.

Conclusion

Python’s xml.etree.ElementTree module covers the full lifecycle of XML processing: parsing strings and files, traversing the element tree, reading attributes and text content, filtering with XPath predicates, handling namespaces, and writing XML documents back to disk. The RSS feed parser example shows how these pieces combine into a real, practical tool in under 30 lines.

A good way to extend the RSS parser is to add error handling for network timeouts, cache the parsed results to a local SQLite database to track which articles you have already seen, and send new article notifications via a Telegram bot. For large-scale XML processing, explore lxml for its full XPath support and significantly faster parsing speed.

The full ElementTree API reference is in the official Python documentation, including the complete list of supported XPath syntax.

Continue Learning Python

Tutorials you might also find useful:

Post Views: 60

How To Parse XML with Python Using ElementTree

Parsing XML: Quick Example

What Is XML and How Does ElementTree Represent It?

Parsing XML Files

Searching with XPath Expressions

Handling XML Namespaces

Writing XML Files

Real-Life Example: RSS Feed Parser

Frequently Asked Questions

When should I use lxml instead of ElementTree?

What is the difference between find() and findall()?

How do I parse very large XML files without loading them into memory?

How do I modify an existing XML file?

Is ElementTree safe to use with untrusted XML input?

Conclusion

Related Articles

Continue Learning Python

Submit a Comment Cancel reply