Beginner

XML is everywhere you do not want it to be: legacy payment gateways, government data feeds, enterprise software exports, and RSS/Atom feeds. Working with Python’s built-in xml.etree.ElementTree means navigating tree nodes, iterating children, and checking for None everywhere — for what is often a simple data extraction task. What if you could treat XML like JSON and access values with dictionary keys instead?

The xmltodict library does exactly that: it parses an XML string or file into a standard Python dictionary using a single function call. Going the other way — from a dictionary back to XML — is equally simple. The library is small (one file), fast, and handles attributes, namespaces, and nested elements gracefully.

This article covers installing xmltodict, parsing XML strings and files, handling attributes and namespaces, dealing with repeated elements that may be lists or single items, converting back to XML with unparse(), and processing real-world RSS feeds. By the end, you will be able to consume any XML data source with the same ease as a JSON API.

xmltodict Quick Example

Here is the core workflow in six lines:

# quick_xmltodict.py
import xmltodict

xml_data = """

    Python Tricks
    Dan Bader
    29.99

"""

data = xmltodict.parse(xml_data)
print(data["book"]["title"])                    # Python Tricks
print(data["book"]["price"]["#text"])           # 29.99
print(data["book"]["price"]["@currency"])       # USD

Output:

Python Tricks
29.99
USD

That is the entire API for basic use. xmltodict.parse() takes a string and returns an OrderedDict (which behaves like a regular dict). XML attributes are prefixed with @ and text content mixed with attributes uses #text. The sections below cover more complex scenarios including lists, namespaces, and real-world feeds.

What Is xmltodict and How Does It Compare to ElementTree?

XML has two common parsing approaches in Python: tree-based (ElementTree, lxml) and event-based (SAX). Both require navigating the tree structure explicitly. xmltodict adds a third option: dictionary conversion, which is the simplest approach for small-to-medium XML documents where you just need to extract data.

LibraryAPI StyleLarge FilesNamespace SupportBest For
xmltodictDict accessStreaming modeYesSimple extraction, quick scripts
ElementTreeTree nodesYes (iterparse)With prefixStandard library, tree traversal
lxmlTree + XPathYesFullComplex queries, fast parsing
BeautifulSoupCSS/tag selectorsNoPartialHTML + XML web scraping

For most data engineering tasks — parsing API responses, reading config files, processing RSS feeds — xmltodict is the simplest solution. Reach for lxml when you need XPath queries or when performance on large documents matters, and for truly massive files (gigabytes), use ElementTree’s iterparse().

xmltodict.parse converts XML to Python dict
xmltodict.parse() — because nobody enjoys .find(‘child’).text

Installation

Install xmltodict with pip. It has no dependencies beyond the Python standard library.

pip install xmltodict

Verify the installation:

python -c "import xmltodict; print(xmltodict.__version__)"

Output:

0.13.0

Parsing XML: Strings and Files

The primary entry point is xmltodict.parse(), which accepts either a string or a file-like object.

Parsing XML Strings

# parsing_strings.py
import xmltodict
import json

xml = """

    
        Learning Python
        Mark Lutz
        2013
        49.99
    
    
        Fluent Python
        Luciano Ramalho
        2022
        59.99
    

"""

data = xmltodict.parse(xml)

# Navigate the structure
catalog = data["catalog"]
books = catalog["book"]  # This is a LIST when there are multiple  elements

for book in books:
    book_id = book["@id"]          # Attribute (prefixed with @)
    title = book["title"]          # Child element text
    price = float(book["price"])
    print(f"[{book_id}] {title}: ${price:.2f}")

# Pretty-print the full dict as JSON for inspection
print("\nFull structure:")
print(json.dumps(data, indent=2))

Output:

[001] Learning Python: $49.99
[002] Fluent Python: $59.99

Full structure:
{
  "catalog": {
    "book": [
      {
        "@id": "001",
        "title": "Learning Python",
        "author": "Mark Lutz",
        "year": "2013",
        "price": "49.99"
      },
      {
        "@id": "002",
        "title": "Fluent Python",
        "author": "Luciano Ramalho",
        "year": "2022",
        "price": "59.99"
      }
    ]
  }
}

Important: all values from xmltodict.parse() are strings by default, even numbers. Always convert explicitly with float(), int(), or a type-coercion step after parsing. The library deliberately avoids type inference to preserve exact values from the source XML.

Parsing XML Files

# parsing_files.py
import xmltodict

# Method 1: Pass a file object directly
with open("config.xml", "rb") as f:
    data = xmltodict.parse(f)

# Method 2: Read and parse in one step (for small files)
with open("config.xml", "r", encoding="utf-8") as f:
    data = xmltodict.parse(f.read())

# Method 3: Parse from a URL response (using requests)
import requests
response = requests.get("https://feeds.bbci.co.uk/news/rss.xml")
feed = xmltodict.parse(response.content)
channel = feed["rss"]["channel"]
print(f"Feed title: {channel['title']}")
print(f"Items: {len(channel['item'])}")

Output:

Feed title: BBC News
Items: 20

Use open(file, "rb") (binary mode) when the XML file might have a BOM (byte order mark) or unusual encoding declaration — xmltodict handles encoding detection automatically from binary input. For text mode, make sure to specify the correct encoding explicitly.

Handling the List vs Single-Item Problem

The trickiest part of xmltodict is that a repeated element becomes a list, but a single element becomes a dict. This inconsistency can cause KeyError or TypeError in production when the data volume varies.

# list_problem.py
import xmltodict

# One book: data["catalog"]["book"] is a DICT
one_book_xml = "A"
data_one = xmltodict.parse(one_book_xml)
print(type(data_one["catalog"]["book"]))  # 

# Two books: data["catalog"]["book"] is a LIST
two_books_xml = """
    A
    B
"""
data_two = xmltodict.parse(two_books_xml)
print(type(data_two["catalog"]["book"]))  # 

# Solution 1: force_list parameter
data_forced = xmltodict.parse(one_book_xml, force_list={"book"})
print(type(data_forced["catalog"]["book"]))  #  -- always a list!

# Solution 2: defensive helper
def ensure_list(value):
    if value is None:
        return []
    return value if isinstance(value, list) else [value]

books = ensure_list(data_one["catalog"].get("book"))
print(f"Books (always iterable): {len(books)}")  # 1

Output:




Books (always iterable): 1

The force_list={"book"} parameter is the cleanest solution when you know in advance which element names should always be lists. Pass a set of tag names and xmltodict wraps them in a list even when only one is present. This eliminates the type inconsistency at the source and removes the need for defensive isinstance checks throughout your code.

force_list ensures consistent list output from xmltodict
force_list={‘item’} — because one result should not crash your loop.

Converting Back to XML with unparse()

The reverse operation — dictionary to XML — uses xmltodict.unparse(). This is useful for generating XML config files, building API request bodies, or transforming data between formats.

# unparse_demo.py
import xmltodict

data = {
    "config": {
        "@version": "1.0",
        "database": {
            "host": "localhost",
            "port": "5432",
            "name": "myapp",
            "credentials": {
                "@encrypted": "true",
                "username": "admin",
                "password": "s3cr3t",
            },
        },
        "cache": {
            "backend": "redis",
            "ttl": "3600",
        },
        "features": {
            "feature": ["dark_mode", "beta_api", "experimental_ui"]
        },
    }
}

# Convert to formatted XML
xml_output = xmltodict.unparse(data, pretty=True, indent="    ")
print(xml_output)

Output:

<?xml version="1.0" encoding="utf-8"?>
<config version="1.0">
    <database>
        <host>localhost</host>
        <port>5432</port>
        <name>myapp</name>
        <credentials encrypted="true">
            <username>admin</username>
            <password>s3cr3t</password>
        </credentials>
    </database>
    <cache>
        <backend>redis</backend>
        <ttl>3600</ttl>
    </cache>
    <features>
        <feature>dark_mode</feature>
        <feature>beta_api</feature>
        <feature>experimental_ui</feature>
    </features>
</config>

The list ["dark_mode", "beta_api", "experimental_ui"] is correctly serialised as three repeated <feature> elements. Dictionary keys starting with @ become XML attributes and the #text key becomes the element’s text content — the same conventions as parsing, in reverse.

Real-Life Example: RSS Feed Parser

This project parses a real RSS feed, extracts article metadata, and formats it as a structured report.

# rss_parser.py
import xmltodict
import requests
from datetime import datetime
import re

def parse_rss_feed(url: str) -> list[dict]:
    """Fetch and parse an RSS feed, returning a list of article dicts."""
    response = requests.get(url, timeout=10, headers={"User-Agent": "Mozilla/5.0"})
    response.raise_for_status()
    
    feed = xmltodict.parse(response.content, force_list={"item"})
    channel = feed["rss"]["channel"]
    
    articles = []
    items = channel.get("item") or []
    
    for item in items:
        # Defensive parsing -- not all feeds have all fields
        title = item.get("title", "Untitled")
        link = item.get("link", "")
        description = item.get("description", "")
        pub_date_raw = item.get("pubDate", "")
        
        # Strip HTML from description
        clean_desc = re.sub(r"<[^>]+>", "", description)[:200]
        
        articles.append({
            "title": title,
            "link": link,
            "summary": clean_desc.strip(),
            "published": pub_date_raw,
        })
    
    return articles

def print_feed_report(articles: list[dict], max_items: int = 5) -> None:
    print(f"\nLatest {min(max_items, len(articles))} articles:")
    print("-" * 60)
    for article in articles[:max_items]:
        print(f"TITLE: {article['title'][:70]}")
        print(f"  URL: {article['link'][:70]}")
        if article["summary"]:
            print(f"  SUMMARY: {article['summary'][:100]}...")
        print(f"  PUBLISHED: {article['published']}")
        print()

# Real public RSS feeds for testing
FEEDS = {
    "Python Blog": "https://feeds.feedburner.com/PythonInsider",
    "Real Python": "https://realpython.com/atom.xml",
}

for name, url in FEEDS.items():
    try:
        articles = parse_rss_feed(url)
        print(f"\n=== {name} ({len(articles)} articles) ===")
        print_feed_report(articles)
    except Exception as e:
        print(f"Error fetching {name}: {e}")

Output:

=== Python Blog (10 articles) ===

Latest 5 articles:
------------------------------------------------------------
TITLE: Python 3.14 alpha 7 released
  URL: http://feedproxy.google.com/~r/PythonInsider/~3/...
  SUMMARY: We are pleased to announce the release of Python 3.14...
  PUBLISHED: Tue, 18 Feb 2025 15:30:00 +0000

TITLE: Python 3.13.1 released
  ...

The force_list={"item"} ensures the items loop always works, whether the feed has one article or hundreds. The defensive item.get("description", "") pattern handles feeds that omit optional fields. Real RSS feeds from different publishers vary significantly in which optional fields they include — defensive parsing is what keeps this script running across all of them.

Frequently Asked Questions

How do I handle XML namespaces with xmltodict?

By default, namespaces appear in keys as prefixes like "ns0:element". Pass process_namespaces=True to xmltodict.parse() to expand namespace URIs into the key names, or pass namespaces={"http://ns.uri": "short"} to map namespace URIs to your own short prefixes. For most practical use cases — SOAP APIs, Atom feeds, Office XML — the default prefix approach works fine since you can predict the prefix from the XML declaration.

Can xmltodict handle large XML files?

For files larger than ~10 MB, use the streaming API: xmltodict.parse(f, item_depth=2, item_callback=my_fn). The item_depth parameter specifies how deep the elements are that you want to stream, and item_callback is called with each item as it is parsed. Return True from the callback to continue or False to stop early. This keeps memory usage constant regardless of file size.

What encodings does xmltodict support?

xmltodict relies on the underlying expat parser (via Python’s xml.parsers.expat), which supports all standard XML encodings: UTF-8, UTF-16, ISO-8859-1, and more. When reading from a file, open in binary mode ("rb") and let xmltodict detect the encoding from the XML declaration. When parsing a string, ensure it is already decoded to Unicode first.

How do I access CDATA sections?

CDATA sections (<![CDATA[...]]>) are automatically decoded and their content is accessible as the element’s text content, just like regular text. No special handling is needed — xmltodict merges CDATA content with regular text content transparently. The #text key holds the combined value when an element has both attributes and text content (whether that text comes from CDATA or regular character data).

xmltodict converts everything to strings — how do I get proper types?

Pass a postprocessor function to xmltodict.parse(). The function receives (path, key, value) for every element and can return a modified (key, value) pair. Use it to convert numeric strings to int or float, parse date strings with datetime.fromisoformat(), or apply any other transformation. Alternatively, use Pydantic to validate and type-coerce the dictionary after parsing — this approach also handles nested structure validation cleanly.

Conclusion

xmltodict makes XML feel like JSON. You have learned how to parse XML strings and files with xmltodict.parse(), access attributes with the @ prefix and mixed content with #text, handle the single-item vs list inconsistency with force_list, convert dictionaries back to XML with xmltodict.unparse(), and apply it all to a real RSS feed parser. The defensive parsing patterns — item.get(), force_list, and stripping HTML from descriptions — are what make the RSS parser work reliably across the varied real-world feeds you will encounter.

From here, extend the RSS parser to write articles to a database (use SQLite via the sqlite3 module), add deduplication based on the article URL, or schedule it to run hourly and alert you to new articles on topics you care about. All of these extensions require only standard Python libraries alongside xmltodict.

See the xmltodict GitHub repository for the full API documentation including streaming mode and postprocessor examples.