Advanced
Once your core application is complete, a plugin architecture can help you to extend the functionality very easily. With a plugin architecture, you can simply write the core application, and then extend the functionality in the future much more easily. Without a plugin architecture, it can be quite difficult to do this since you will be afraid that you will break the original functionality.
So why don’t do this all the time? Well it does take more planning effort in the beginning in order to reap the rewards in the future, and most of us (myself included) are often too impatient to do that. However, there are some methods that you can take in order to embed a plugin desirable to extend the functionality. Last time we looked at using importlib (see our previous article “A Plugin Architecture using importlib“), and this time we have an even simpler library called pyplugs.
When to use plugin architecture
So when should you use a plugin architecture? Here are several scenarios – they are all around separating the code from the core to the variations:
- Separate Functionality: When you can split the problem you’re trying to solve/application from core functionality (the main “engine”) to the variations: e.g. ranking cheapest flights where data is from different websites. The core application/engine is the ranking logic. The data extraction from different websites would each be a plugin – website 1 = plugin 1, website 2 = plugin2. When you want to add a new website, you just need to add a new plugin
- Distribute Development Effort: When you want to work in a team to easily separate the focus from core functionality to variations: e.g. suppose you have an application to do image recognition. Team 1 (e.g. data science team) can work on the core engine of doing the image recognition, while you can have Team 2-4 work on creating different plugins for different image formats (e.g. Team 2: read in JPG files, Team 3: read in PNG files, etc)
- Launch sooner and add functionality in future: When you want to launch an application as quickly as possible. e.g. Suppose you want to create an application to return the number of working days from different countries. To begin with, you can just start by launching this for United States and Australia. Then, you can add more countries in the future. Since you designed the plugin architecture from the start, it’ll be safer to add more countries.
There are many more, but the disadvantage is that you have to plan for it upfront. Invest now in a plugin architecture, and then reap the benefits in the future.
Invest now in a plugin architecture, and then reap the benefits in the future

Let’s explore this third example of a public holiday counter application and show how the pyplugs library can help.
Example Problem: Extracting Public Holidays
The application we’d like to create is a command line application that can be used to pass in a location (country and/or state), and then return the list of public holidays in 2020:
The pseudo-code will be as follows:
1. Get location
2. If data for location not available, then error
3. Get the list of all holidays from the location
4. Return the list of working days
As you probably guessed, it’s step 3 that can be converted into a plugin. However, let’s start without a plugin architecture and do this the normal way.
First let’s see where we can get the data from – for UK data you can get this from publicholidays.co.uk:

And then for Singapore data, you can get it from jalanow.com:

In both cases, the data is in a HTML Table view where the data is in a <td> tag. We will need to use regular expressions to extract the data.
Here’s the code for non-plugin approach:
#pubholiday.py
import argparse
import requests, re
G_COUNTRIES = ['UK', 'SG']
def get_working_days(args):
if args.countrycode =='UK':
r = requests.get( 'https://publicholidays.co.uk/2020-dates/')
m = re.findall('<tr class.+?><td>(.+?)<\/td>', r.text)
return list(set(m))
elif args.countrycode =='SG':
r = requests.get('https://www.jalanow.com/singapore-holidays-2021.htm')
m = re.findall('<td class\=\"crDate\">(.+?)<\/td>', r.text)
return list(set(m))
def setup_args():
parser = argparse.ArgumentParser(description='Get list of public holidays in a given year')
parser.add_argument('-c', '--countrycode', required=True, type=str, choices=G_COUNTRIES, help='Country code')
return parser
if __name__ == '__main__':
parser = setup_args()
args = parser.parse_args()
print( get_working_days(args) )
Running the above with no arguments gives the following – the argparse is a useful library to create arguments very easily – see our other article How to use argparse to manage arguments.

Now, when we run the application with either UK or SG, we get the following data:

The way the code works is all from the function get_working_days:
def get_working_days(args):
if args.countrycode =='UK':
r = requests.get( 'https://publicholidays.co.uk/2020-dates/')
m = re.findall('<tr class.+?><td>(.+?)<\/td>', r.text)
return list(set(m))
elif args.countrycode =='SG':
r = requests.get('https://www.jalanow.com/singapore-holidays-2021.htm')
m = re.findall('<td class\=\"crDate\">(.+?)<\/td>', r.text)
return list(set(m))
The code for UK, for examples works the following way:
1. Get the data using the requests to the website. All the data will be in a r.text
2. Next, run a regular expression to extract the date data from the <TD> tag
3. Finally, remove duplicates with the list(set(m)) code
The disadvantage with this code is that if we add more countries, the function get_working_days() will become longer and longer with complex IF statements. The other challenge is testing it, either manually or with pytest will become quite painful. We can always have it call a dynamic function, but then we end up having difficult to read code.
What we need is a dynamic way to call a function for each country so that it can be easily maintainable and extendible… this is where a plugin architecture will help.
Extracting Public Holidays with a plugin architecture using pyplugs
What we will do now is to separate the main core logic from the plugins. So the file structure will be as follows:
|--- pubholidays.py
|___ plugins\
|___________ __init__.py
|___________ reader_UK.py
|___________ reader_SG.py
So there will be the main functionality still in pubholidays.py, however all the country readers will all be in the plugins package (and subdirectory).
But first, let’s install the pyplugs library
Installing pyplugs
PyPlugs is available at PyPI. You can install it using pip:
python -m pip install pyplugs
Or, using pip directly:
pip install pyplugs
Pyplugs is composed of three levels:
- Plug-in packages: Directories containing files with plug-ins
- Plug-ins: Modules containing registered functions or classes
- Plug-in functions: Several registered functions in the same file
Core logic in plugin architecture
The core logic will be simplified to the following:
#pubholiday_pi.py
import argparse
import requests, re
import plugins
G_COUNTRIES = ['UK', 'SG']
def get_working_days(args):
return plugins.read( 'reader_' + args.countrycode)
def setup_args():
parser = argparse.ArgumentParser(description='Get list of public holidays in a given year')
parser.add_argument('-c', '--countrycode', required=True, type=str, choices=G_COUNTRIES, help='Country code')
return parser
if __name__ == '__main__':
parser = setup_args()
args = parser.parse_args()
print( get_working_days(args) )
Now the get_working_days() function has been significant simplified. It calls the “read” function from the plugins/__init__.py package file. The ‘reader_’ + args.countrycode refers to the function and the module name.
Plugin logic
The plugsin/__init__.py is setup as follows:
# plugins/__init__.py
# Import the pyplugs libs
import pyplugs
# All function names are going to be stored under names
names = pyplugs.names_factory(__package__)
# When read function is called, it will call a function received as parameter
read = pyplugs.call_factory(__package__)
The “read” is the same “read” that is referenced by get_working_days() function from the main pubholiday_pi.py files.
The plugin files/functions are each to be stored in files called “reader_<country code>.py”. The following is the UK file:
#plugins/reader_UK.py
import re, requests
import pyplugs
@pyplugs.register
def reader_UK():
r = requests.get('https://www.jalanow.com/singapore-holidays-2021.htm')
m = re.findall('<td class\=\"crDate\">(.+?)<\/td>', r.text)
return list(set(m))
And then finally the SG file:
#plugins/reader_SG.py
import re, requests
import pyplugs
@pyplugs.register
def reader_SG():
r = requests.get('https://www.jalanow.com/singapore-holidays-2021.htm')
m = re.findall('<td class\=\"crDate\">(.+?)<\/td>', r.text)
return list(set(m))
In Conclusion
So there is no change when you run the application – you still get the same output:

However, you have a much more maintainable application.
So we started with a monolithic file, and now we extended this to a plugin architecture where the variations are all stored in the “plugins/” folder. In order to add more country public holidays where the data may come from different websites, all that needs to be done is to: (1) add the country code into variable G_COUNTRIES to ensure the command line argument validation works, and (2) add the new file called reader_<country code>.py in the plugins directory with a function name also called reader_<country code>(). That’s it, everything else will work.
You can also see how we used importlib to achieve a similar outcome as well: A plugin architecture using importlib.
Get Notified Automatically Of New Articles
How To Use Python difflib for Comparing Text and Sequences
Intermediate
You need to show what changed between two versions of a config file. Or find the closest matching product name from a misspelled user query. Or detect whether two documents are the same despite minor formatting differences. These are all sequence comparison problems, and Python’s standard library has a dedicated module for them: difflib. It is the same engine that powers many code review tools, fuzzy finders, and spell checkers.
difflib is a pure-Python standard library module (no installation needed) that computes differences between sequences. A sequence can be a list of strings (lines of a file), a list of characters, or any other ordered collection. The module provides several tools: SequenceMatcher for computing similarity ratios, Differ for human-readable line-by-line diffs, HtmlDiff for side-by-side HTML diffs, and get_close_matches() for fuzzy string matching.
This article covers the full toolkit: computing similarity scores with SequenceMatcher, generating diffs in unified and context formats, finding close matches, building an HTML diff viewer, and applying everything in a real-world config file auditor. By the end, you will have practical tools for text comparison tasks that previously required external libraries.
difflib Quick Example
# quick_difflib.py
import difflib
# Compare two versions of a config file
old = ["host = localhost\n", "port = 5432\n", "debug = False\n"]
new = ["host = db.prod.example.com\n", "port = 5432\n", "debug = False\n", "pool_size = 10\n"]
# Generate a unified diff (like git diff)
diff = difflib.unified_diff(old, new, fromfile="config.old", tofile="config.new", n=1)
print("".join(diff))
# Similarity ratio between two strings
matcher = difflib.SequenceMatcher(None, "python", "pytohn")
print(f"\nSimilarity: {matcher.ratio():.2%}") # 0.833 = 83.3%
# Fuzzy close matches
matches = difflib.get_close_matches("pythno", ["python", "java", "ruby", "perl"])
print(f"Close matches: {matches}")
Output:
--- config.old
+++ config.new
@@ -1,2 +1,2 @@
-host = localhost
+host = db.prod.example.com
port = 5432
Similarity: 83.33%
Close matches: ['python']
Three distinct tools shown in one example: unified diff for change tracking, SequenceMatcher for similarity scoring, and get_close_matches() for fuzzy lookup. Each addresses a different comparison need, and together they cover the majority of text comparison tasks you will encounter.
What Is difflib and What Can It Do?
difflib implements Ratcliff/Obershelp pattern matching (similar to the Gestalt approach) to find the longest common subsequences in two sequences. It does not use edit distance (Levenshtein), which means it handles block moves and multi-line changes well, making it more suitable for text files than character-level edit distance metrics.
| Function / Class | Input | Output | Use Case |
|---|---|---|---|
| SequenceMatcher | Two sequences | Similarity ratio, opcodes | Similarity scoring, change detection |
| unified_diff() | Two line lists | Unified diff lines | git-style change display |
| context_diff() | Two line lists | Context diff lines | Traditional diff format |
| Differ | Two line lists | Human-readable diff | Readable change display |
| HtmlDiff | Two line lists | HTML table | Side-by-side web display |
| get_close_matches() | String + word list | List of close matches | Fuzzy search, spell check |
SequenceMatcher: Similarity Ratios and Change Blocks
SequenceMatcher is the core engine underlying all other difflib tools. Instantiate it with two sequences and call ratio() for a 0.0-to-1.0 similarity score, or get_opcodes() to get a list of edit operations that transforms sequence A into sequence B.
# sequence_matcher.py
from difflib import SequenceMatcher
# Compare two strings character by character
a = "The quick brown fox jumps over the lazy dog"
b = "The quick brown cat jumps over the lazy dog"
matcher = SequenceMatcher(None, a, b)
print(f"Ratio: {matcher.ratio():.4f}") # 0.9302
print(f"Quick ratio: {matcher.quick_ratio():.4f}") # Upper bound, faster to compute
# Get the exact changes
for tag, i1, i2, j1, j2 in matcher.get_opcodes():
if tag != "equal":
print(f" {tag}: '{a[i1:i2]}' -> '{b[j1:j2]}'")
print()
# Compare lists of lines (more natural for text files)
lines_a = ["def greet(name):", " print(f'Hello {name}')", " return True"]
lines_b = ["def greet(name):", " print(f'Hi {name}!')", " return True"]
matcher2 = SequenceMatcher(None, lines_a, lines_b)
print(f"Line similarity: {matcher2.ratio():.4f}")
for tag, i1, i2, j1, j2 in matcher2.get_opcodes():
if tag == "replace":
print(f" Changed line {i1+1}:")
print(f" FROM: {lines_a[i1:i2]}")
print(f" TO: {lines_b[j1:j2]}")
Output:
Ratio: 0.9302
Quick ratio: 0.9302
replace: 'fox' -> 'cat'
Line similarity: 0.8000
Changed line 2:
FROM: [" print(f'Hello {name}')"]
TO: [" print(f'Hi {name}!')"]
The opcodes are a low-level but powerful API. The five operations are "equal" (no change), "insert" (add from B), "delete" (remove from A), "replace" (change in place), and "equal". Build a custom diff renderer by iterating opcodes and coloring each section — this is exactly what code editors do for inline change highlighting.
The autojunk Parameter
By default, SequenceMatcher ignores “junk” elements: lines that appear in more than 1% of the sequence. For code files this is usually helpful (blank lines, common keywords), but for short strings it can produce unexpected results. Pass autojunk=False to disable this heuristic when comparing short strings or structured data.
Generating Diffs: unified_diff and context_diff
For human-readable change reports, unified_diff() produces the familiar ---/+++/@@@ format used by git, patch, and most code review tools.
# unified_diff_demo.py
import difflib
def compare_files(file_a: str, file_b: str, context_lines: int = 3) -> str:
"""Compare two file contents and return a unified diff string."""
lines_a = file_a.splitlines(keepends=True)
lines_b = file_b.splitlines(keepends=True)
diff = difflib.unified_diff(
lines_a, lines_b,
fromfile="original.py",
tofile="modified.py",
n=context_lines
)
return "".join(diff)
original = """def calculate_discount(price, percent):
discount = price * percent
return discount
def apply_coupon(order, code):
if code == "SAVE10":
return order - 10
return order
"""
modified = """def calculate_discount(price, percent):
if percent > 100:
raise ValueError("Discount cannot exceed 100%")
discount = price * (percent / 100)
return discount
def apply_coupon(order, code, user_id=None):
if code == "SAVE10":
return order - 10
if code == "SAVE20":
return order - 20
return order
"""
result = compare_files(original, modified)
print(result if result else "No differences found.")
Output:
--- original.py
+++ modified.py
@@ -1,4 +1,6 @@
def calculate_discount(price, percent):
+ if percent > 100:
+ raise ValueError("Discount cannot exceed 100%")
- discount = price * percent
+ discount = price * (percent / 100)
return discount
-def apply_coupon(order, code):
+def apply_coupon(order, code, user_id=None):
if code == "SAVE10":
return order - 10
+ if code == "SAVE20":
+ return order - 20
return order
The n=context_lines parameter controls how many unchanged lines to show around each change. The default is 3. Use n=0 to show only changed lines (like a “what changed” summary), or n=999 to show the full file with changes highlighted.
get_close_matches: Fuzzy String Lookup
get_close_matches() is the simplest path to fuzzy matching: give it a word and a vocabulary list, and it returns the best matches above a similarity threshold.
# close_matches.py
from difflib import get_close_matches
vocabulary = [
"python", "javascript", "typescript", "java", "kotlin",
"swift", "rust", "golang", "ruby", "scala",
]
# Basic usage
print(get_close_matches("pyhton", vocabulary)) # ['python']
print(get_close_matches("jvascript", vocabulary)) # ['javascript']
print(get_close_matches("xyz", vocabulary)) # [] -- no close match
# Control sensitivity with n and cutoff
print(get_close_matches("java", vocabulary, n=3, cutoff=0.4))
# ['java', 'javascript'] -- lower cutoff finds more results
# Practical use: did-you-mean suggestion
def did_you_mean(word: str, options: list[str]) -> str | None:
matches = get_close_matches(word, options, n=1, cutoff=0.6)
return matches[0] if matches else None
commands = ["start", "stop", "restart", "status", "reload"]
user_input = "reestart"
suggestion = did_you_mean(user_input, commands)
if suggestion:
print(f"Unknown command '{user_input}'. Did you mean '{suggestion}'?")
Output:
['python']
['javascript']
[]
['java', 'javascript']
Unknown command 'reestart'. Did you mean 'restart'?
The cutoff parameter (default 0.6) controls how similar a match must be to be included. A lower cutoff (0.4) catches more distant matches but produces more false positives. A higher cutoff (0.8) is stricter but misses matches with more typos. For command-line “did you mean?” suggestions, 0.6 is a reasonable starting point.
Real-Life Example: Config File Auditor
This project compares a deployed config file against a template to detect drift, showing exactly what changed in a human-readable report.
# config_auditor.py
import difflib
from dataclasses import dataclass, field
from typing import List
@dataclass
class ConfigAuditResult:
similarity: float
added_lines: List[str] = field(default_factory=list)
removed_lines: List[str] = field(default_factory=list)
changed_sections: List[str] = field(default_factory=list)
@property
def has_drift(self) -> bool:
return self.similarity < 1.0
def summary(self) -> str:
if not self.has_drift:
return "No drift detected. Config matches template."
return (
f"Config drift detected (similarity: {self.similarity:.1%})\n"
f" Added lines: {len(self.added_lines)}\n"
f" Removed lines: {len(self.removed_lines)}"
)
def audit_config(template: str, deployed: str) -> ConfigAuditResult:
"""Compare deployed config against template and return audit result."""
template_lines = template.splitlines(keepends=True)
deployed_lines = deployed.splitlines(keepends=True)
matcher = difflib.SequenceMatcher(None, template_lines, deployed_lines)
result = ConfigAuditResult(similarity=matcher.ratio())
for tag, i1, i2, j1, j2 in matcher.get_opcodes():
if tag == "delete":
result.removed_lines.extend(template_lines[i1:i2])
elif tag == "insert":
result.added_lines.extend(deployed_lines[j1:j2])
elif tag == "replace":
result.removed_lines.extend(template_lines[i1:i2])
result.added_lines.extend(deployed_lines[j1:j2])
result.changed_sections.append(template_lines[i1].strip())
return result
def print_diff(template: str, deployed: str) -> None:
"""Print a unified diff between template and deployed config."""
diff = difflib.unified_diff(
template.splitlines(keepends=True),
deployed.splitlines(keepends=True),
fromfile="template.conf",
tofile="deployed.conf",
n=2
)
diff_text = "".join(diff)
if diff_text:
print(diff_text)
else:
print("Files are identical.")
# Example usage
TEMPLATE = """[server]
host = 0.0.0.0
port = 8080
workers = 4
timeout = 30
[database]
host = db.internal
port = 5432
pool_size = 10
"""
DEPLOYED = """[server]
host = 0.0.0.0
port = 9090
workers = 4
timeout = 30
[database]
host = db.internal
port = 5432
pool_size = 20
max_overflow = 5
"""
result = audit_config(TEMPLATE, DEPLOYED)
print(result.summary())
print()
print("--- Unified Diff ---")
print_diff(TEMPLATE, DEPLOYED)
print()
print(f"Changed sections: {result.changed_sections}")
Output:
Config drift detected (similarity: 83.0%)
Added lines: 3
Removed lines: 2
--- Unified Diff ---
--- template.conf
+++ deployed.conf
@@ -1,6 +1,6 @@
[server]
host = 0.0.0.0
-port = 8080
+port = 9090
workers = 4
@@ -9,3 +9,5 @@
host = db.internal
port = 5432
-pool_size = 10
+pool_size = 20
+max_overflow = 5
Changed sections: ['port = 8080\n', 'pool_size = 10\n']
The ConfigAuditResult dataclass separates the raw diff data (added, removed lines) from the derived properties (has_drift, summary()). This structure makes the auditor easy to extend: add a critical_fields list to flag specific settings (like host changes) as high-severity drift.
Frequently Asked Questions
What is the difference between ratio() and quick_ratio()?
ratio() computes the exact similarity ratio by performing the full sequence comparison. quick_ratio() uses a faster upper-bound estimate that overestimates the true ratio. real_quick_ratio() is even faster but less accurate. Use quick_ratio() as a preliminary filter when you have many candidates: if quick_ratio() < threshold, skip the expensive ratio() call. This optimisation is built into get_close_matches() internally.
When should I use autojunk=False?
Disable autojunk when comparing short strings or structured data where “common” lines should not be discounted. The autojunk heuristic marks elements appearing in more than 1% of the longer sequence as junk. In a short config file, a blank line might appear in only 2 places yet qualify as junk due to the small total line count. Pass SequenceMatcher(isjunk=None, a=a, b=b, autojunk=False) to disable this behaviour.
How do I generate a side-by-side HTML diff?
Use difflib.HtmlDiff(). Call HtmlDiff().make_file(a_lines, b_lines, fromdesc, todesc) to get a complete HTML page with side-by-side tables, highlighting, and legend. Save it to a .html file and open it in a browser. This is useful for generating code review reports: just call make_file() in a loop for each changed file and write the outputs to a folder.
How does difflib compare to Levenshtein distance?
Levenshtein distance counts the minimum character-level edits (insert, delete, substitute) between two strings. difflib uses the Ratcliff/Obershelp algorithm, which finds the longest matching substrings recursively. Levenshtein is better for single-word spell-checking (it handles transpositions naturally). difflib is better for multi-line text comparison because it handles block moves gracefully. For production spell-checkers, use the python-Levenshtein or rapidfuzz library, which are implemented in C and significantly faster than difflib.
Is difflib fast enough for large files?
For files up to a few thousand lines, difflib is fast enough for interactive use. For larger files (tens of thousands of lines), consider calling SequenceMatcher once and caching the result, or use the C-based python-Levenshtein library for pure string comparison. The main performance lever is the autojunk heuristic — it is on by default and significantly speeds up comparison of files with repeated lines (like log files).
Conclusion
Python’s difflib module provides a complete text comparison toolkit without any external dependencies. You have learned how to compute similarity ratios with SequenceMatcher, generate unified and context diffs for change tracking, use get_close_matches() for fuzzy string lookup, and build a config auditor that detects and reports configuration drift. The Ratcliff/Obershelp algorithm at difflib’s core handles multi-line block moves well, making it a natural fit for file comparison tasks.
Extend the config auditor by adding an HTML diff output (using HtmlDiff().make_file()), integrating it into a CI pipeline to fail builds when critical settings drift from the template, or adapting the fuzzy matcher into a search autocomplete feature for a command-line tool. All three extensions build directly on what you have learned here.
For the full API reference, see the official difflib documentation.
Related Articles
Further Reading: For more details, see the Python importlib documentation.
Frequently Asked Questions
What is a plugin architecture in Python?
A plugin architecture allows you to extend an application’s functionality by loading external code modules at runtime without modifying the core application. It promotes loose coupling, making your software more flexible and maintainable.
How does PyPlugs work?
PyPlugs provides a simple decorator-based system for registering and discovering plugins. You decorate functions or classes with PyPlugs decorators, and the framework automatically discovers and loads them from specified packages or directories.
What are alternatives to PyPlugs for plugin systems in Python?
Alternatives include pluggy (used by pytest), stevedore (uses setuptools entry points), yapsy, and Python’s built-in importlib for manual plugin loading. Each has different tradeoffs in complexity and features.
When should I use a plugin architecture?
Use a plugin architecture when you need extensibility without modifying core code, when third parties should be able to add features, or when different deployments need different feature sets. Common examples include text editors, web frameworks, and data processing pipelines.
Can I create a simple plugin system without external libraries?
Yes. Use Python’s importlib.import_module() to dynamically load modules from a plugins directory, combined with a registration pattern using decorators or base classes. This gives you a basic but functional plugin system with no dependencies.