Beginner
XML is everywhere: RSS feeds, SOAP APIs, Android app manifests, Microsoft Office files (which are zipped XML), and countless configuration formats. When you need to read an RSS feed, parse a config file from a legacy enterprise system, or process government data published in XML format, Python’s built-in xml.etree.ElementTree module handles it without any extra dependencies. It is fast, easy to learn, and sufficient for the vast majority of XML parsing tasks.
Everything you need comes with Python — no pip install required. For more advanced XPath support or HTML-tolerant parsing, the third-party lxml library is an excellent upgrade. The examples in this article use only the standard library.
This article covers: parsing XML from strings and files, navigating the element tree, reading attributes and text, searching with find and findall, basic XPath expressions, handling namespaces, and writing XML back to a file. There is also a real-life RSS feed parser project at the end.
Parsing XML: Quick Example
Here is the fastest way to parse XML and extract values from it:
# xml_quick.py
import xml.etree.ElementTree as ET
xml_data = """
Clean Code
Robert Martin
2008
The Pragmatic Programmer
David Thomas
1999
"""
root = ET.fromstring(xml_data)
for book in root.findall("book"):
book_id = book.get("id")
title = book.findtext("title")
author = book.findtext("author")
print(f"[{book_id}] {title} by {author}")
Output:
[1] Clean Code by Robert Martin
[2] The Pragmatic Programmer by David Thomas
ET.fromstring() parses an XML string and returns the root element. findall("book") returns all direct child elements named “book”. .get("id") reads an attribute, and .findtext("title") returns the text of the first matching child element (or None if not found).
What Is XML and How Does ElementTree Represent It?
XML is a text format that represents data as a tree of nested elements. Each element has a tag name, optional attributes (key-value pairs in the opening tag), optional text content, and zero or more child elements. ElementTree maps this structure directly to Python objects.
| XML concept | ElementTree object | How to access |
|---|---|---|
| Element tag | element.tag | "book" |
| Attribute | element.get("name") | "id", "class" |
| Text content | element.text | Text before first child |
| Tail text | element.tail | Text after closing tag |
| Children | list(element) | Direct child elements |
| All descendants | element.iter(tag) | Recursive search |
The root element is just another Element object. The whole document is an ElementTree object that wraps the root. You typically call ET.parse(file) to get an ElementTree, then call .getroot() to get the root Element — or use ET.fromstring(text) to parse a string directly into the root element.
Parsing XML Files
For real-world use you will usually parse XML from a file rather than a string. ET.parse() accepts a file path and returns an ElementTree object. Call .getroot() to start traversing.
# xml_parse_file.py
import xml.etree.ElementTree as ET
# Create a sample XML file first
xml_content = """\
Alice Chen
95000
Python
Docker
Bob Rivera
72000
SEO
Analytics
"""
with open("employees.xml", "w") as f:
f.write(xml_content)
# Now parse it
tree = ET.parse("employees.xml")
root = tree.getroot()
print(f"Root tag: {root.tag}")
print(f"Number of employees: {len(root)}")
for emp in root.findall("employee"):
name = emp.findtext("name")
dept = emp.get("department")
skills = [s.text for s in emp.findall("skills/skill")]
print(f"{name} ({dept}): {', '.join(skills)}")
Output:
Root tag: employees
Number of employees: 2
Alice Chen (Engineering): Python, Docker
Bob Rivera (Marketing): SEO, Analytics
The emp.findall("skills/skill") call uses a simple XPath path expression to find all <skill> elements inside <skills> inside each employee. We use a list comprehension to extract .text from each skill element. Always use elem.text if elem is not None else "" defensively if an element might be absent.
Searching with XPath Expressions
ElementTree supports a useful subset of XPath for searching documents. These expressions are passed to find(), findall(), and findtext().
# xml_xpath.py
import xml.etree.ElementTree as ET
xml_data = """
Python Cookbook
true
Advanced Python
false
Python Basics
true
"""
root = ET.fromstring(xml_data)
# Find all books
books = root.findall("product[@type='book']")
print("Books:")
for b in books:
print(f" {b.findtext('name')} -- ${b.get('price')}")
# Find first in-stock item
in_stock = root.find("product[in_stock='true']")
print("\nFirst in-stock:", in_stock.findtext("name") if in_stock is not None else "None")
# Find all product names
all_names = root.findall("product/name")
print("\nAll products:", [n.text for n in all_names])
Output:
Books:
Python Cookbook -- $29.99
Python Basics -- $19.99
First in-stock: Python Cookbook
All products: ['Python Cookbook', 'Advanced Python', 'Python Basics']
The [@attribute='value'] predicate filters elements by attribute value. The [child='text'] predicate filters by child text content. These XPath subsets cover most practical search scenarios without needing the full XPath library.
Handling XML Namespaces
Many real-world XML formats (RSS, SOAP, SVG, Office Open XML) use namespaces — URI prefixes that qualify element names. ElementTree includes the namespace URI in curly braces as part of the tag: {http://example.com/ns}element. You can handle this with a namespace map.
# xml_namespaces.py
import xml.etree.ElementTree as ET
rss_xml = """
-
Python 3.14 Released
-
PyCon 2026 Recap
"""
ns = {"media": "http://search.yahoo.com/mrss/"}
root = ET.fromstring(rss_xml)
for item in root.findall("channel/item"):
title = item.findtext("title")
media = item.find("media:content", ns)
url = media.get("url") if media is not None else "no image"
print(f"{title} -- {url}")
Output:
Python 3.14 Released -- https://example.com/img.jpg
PyCon 2026 Recap -- https://example.com/img2.jpg
Define a dictionary mapping your chosen prefix to the full namespace URI, then pass it as the second argument to find() and findall(). You can choose any prefix you like in your Python code — it does not have to match the prefix used in the XML document itself.
Writing XML Files
ElementTree also lets you build XML documents from scratch using ET.Element, ET.SubElement, and ET.ElementTree. Use ET.indent() (Python 3.9+) to add pretty-print formatting before writing.
# xml_write.py
import xml.etree.ElementTree as ET
root = ET.Element("inventory")
def add_item(parent, name, qty, price):
item = ET.SubElement(parent, "item")
item.set("price", str(price))
ET.SubElement(item, "name").text = name
ET.SubElement(item, "quantity").text = str(qty)
return item
add_item(root, "Widget A", 150, 9.99)
add_item(root, "Widget B", 42, 24.99)
add_item(root, "Gadget X", 8, 199.99)
ET.indent(root, space=" ") # Python 3.9+ pretty-printing
tree = ET.ElementTree(root)
tree.write("inventory.xml", encoding="utf-8", xml_declaration=True)
# Verify by reading it back
with open("inventory.xml") as f:
print(f.read())
Output:
<?xml version='1.0' encoding='utf-8'?>
<inventory>
<item price="9.99">
<name>Widget A</name>
<quantity>150</quantity>
</item>
<item price="24.99">
<name>Widget B</name>
<quantity>42</quantity>
</item>
<item price="199.99">
<name>Gadget X</name>
<quantity>8</quantity>
</item>
</inventory>
ET.SubElement(parent, tag) creates a child element and attaches it to the parent in one step. Set attributes with .set(key, value) and text with .text = "...". Pass xml_declaration=True to .write() to include the <?xml ...?> header line.
Real-Life Example: RSS Feed Parser
This script fetches and parses a real RSS feed using urllib (built-in) and prints the latest 5 articles with titles, links, and publish dates.
# rss_parser.py
import xml.etree.ElementTree as ET
import urllib.request
RSS_URL = "https://realpython.com/atom.xml"
def parse_rss(url: str, limit: int = 5) -> list[dict]:
with urllib.request.urlopen(url) as response:
xml_bytes = response.read()
root = ET.fromstring(xml_bytes)
# Atom feed namespace
atom_ns = {"atom": "http://www.w3.org/2005/Atom"}
entries = root.findall("atom:entry", atom_ns)[:limit]
results = []
for entry in entries:
title = entry.findtext("atom:title", namespaces=atom_ns) or "No title"
link_el = entry.find("atom:link", atom_ns)
link = link_el.get("href") if link_el is not None else ""
updated = entry.findtext("atom:updated", namespaces=atom_ns) or ""
results.append({"title": title, "link": link, "updated": updated[:10]})
return results
if __name__ == "__main__":
articles = parse_rss(RSS_URL)
for i, art in enumerate(articles, 1):
print(f"{i}. [{art['updated']}] {art['title']}")
print(f" {art['link']}")
Output:
1. [2026-04-10] Python 3.14 Feature Highlights
https://realpython.com/python-314-features/
2. [2026-04-08] How to Use Structural Pattern Matching
https://realpython.com/structural-pattern-matching/
...
This parser works on any Atom feed — swap the URL for any site that publishes an Atom feed. For RSS 2.0 feeds (which use <item> instead of <entry> and have no namespace), replace the XPath expressions with simple tag names like root.findall("channel/item"). Always wrap urllib.request.urlopen in a try/except for production code that handles network errors gracefully.
Frequently Asked Questions
When should I use lxml instead of ElementTree?
Use lxml when you need full XPath 1.0 support, XSLT transforms, schema validation (DTD/XSD), or when you are parsing malformed HTML/XML that ElementTree would reject. lxml is also significantly faster for very large documents (hundreds of MB). For most everyday XML parsing tasks — config files, RSS feeds, API responses — xml.etree.ElementTree is sufficient and avoids the extra dependency.
What is the difference between find() and findall()?
find() returns the first matching element or None if nothing matches. findall() always returns a list — an empty list if nothing matches, never None. Use find() when you expect exactly one element (like a single <title> tag). Use findall() when you expect multiple elements (like all <item> tags). The companion findtext() returns the text of the first match or a default value, saving you from checking for None yourself.
How do I parse very large XML files without loading them into memory?
Use ET.iterparse() to process an XML file as a stream, one element at a time. This keeps memory usage constant regardless of file size. The pattern is: call ET.iterparse(filepath, events=("start", "end")), process elements when you see the “end” event, and call elem.clear() after processing to release memory. This is the right approach for XML files in the hundreds of megabytes range.
How do I modify an existing XML file?
Parse the file with ET.parse(), traverse to the element you want to change, modify its .text, .attrib, or call .remove(child) / .append(new_child), then write the modified tree back with tree.write(filepath). Note that ElementTree does not preserve the original XML formatting (whitespace, comments outside the root element are lost). If preserving exact formatting matters, use lxml which has better round-trip fidelity.
Is ElementTree safe to use with untrusted XML input?
ElementTree is vulnerable to certain XML attack vectors including “billion laughs” (exponential entity expansion) and external entity injection (XXE). If you are parsing XML from untrusted sources (user uploads, third-party web responses), use the defusedxml library instead (pip install defusedxml). It is a drop-in replacement that blocks these attack vectors. For trusted sources like your own config files or known APIs, standard ElementTree is fine.
Conclusion
Python’s xml.etree.ElementTree module covers the full lifecycle of XML processing: parsing strings and files, traversing the element tree, reading attributes and text content, filtering with XPath predicates, handling namespaces, and writing XML documents back to disk. The RSS feed parser example shows how these pieces combine into a real, practical tool in under 30 lines.
A good way to extend the RSS parser is to add error handling for network timeouts, cache the parsed results to a local SQLite database to track which articles you have already seen, and send new article notifications via a Telegram bot. For large-scale XML processing, explore lxml for its full XPath support and significantly faster parsing speed.
The full ElementTree API reference is in the official Python documentation, including the complete list of supported XPath syntax.