Advanced
Making computer generated text mimic human speech is fascinating and actually not that difficult for an effect that is sometimes convincing, but certainly entertaining. Markov Chain’s is one way to do this. It works by generating new text based on historical texts where the original sequencing of neighboring words (or groups of words) is used to generate meaningful sentences. Read the below guide on how to code a Markov Chain text generator (code example in python) including explanation of the concept.
What’s really interesting, is that you can take historical texts of a person, then generate new sentences which can sound similar to the way that person speaks. Alternatively, you can combine texts from two different people and get a mixed “voice”.
I played around this with texts of speeches from two great presidents:
What my Markov Chain generated which was “trained” using the combination of texts from Obama speeches and Bartlet scripts, is as follows:
- ‘Can I burn my mother in North Carolina for giving us a great night planned.’
- ‘And so going forward, I believe that we can build a bomb into their church.’
- ‘’Charlie, my father had grown up in the Situation Room every time I came in.’’
- ‘This campaign must be ballistic.’,
What is a Markov Chain in the context of a text generation?
For a more technical explanation, I think you can find plenty of resources out there. In simple terms, it is an algorithm which is used to generate a new outcome from a weighted list of words based on historical texts. Now that’s rather abstract. In more practical terms, in the scenario for text generation, it is a way to use historical texts, chop it up into individual words (or sets of words), and then randomly chose a given word then randomly chose the next likely words based on historical sequences. For example:

This doesn’t just apply in text as well (although one of the most popular applications is in your smart phone where there’s predictive text), it can be used for any scenario where you use historical information to define next steps for a given state. For example, you could codify a given stock market pattern (such as the % daily changes for the last 30 days), then use that to see historically what was the likely next day outcome (example only.. I’m very doubtful how effective it would be).
Why are Markov Chain Text Generators so fun?
I’ve always wanted to build a text generator as it’s just an awesome way to see how you could mimic intelligence using a very cheap shortcut. You’ll see the algorithm below, and it is super simple. The other fact is that, like above example, you can use it to mix the ‘voice’ from two different persons and see the outcome.
How does the Markov Chain Text Generator work?
There are two phases for text generation with Markov Chains. There’s first the ‘dictionary build phase’ which involves gathering the historical texts, and then generating a dictionary with the key being a given word in a sentence, and then having the resultant being the natural follow-up words.

The second is the execution, where you start from a given word, then use that word to see what the next word would be in a probabilistic way. For example:

Now, there are some tricks which you need to be mindful of ( I found this out the hard way):
- You can’t start from any random word — if you do, then you’ll get sentences like this: “ate the cat.” . You have to keep track of “starting words” to keep things simple — hence you can have: “John ate the cat”.
- Don’t ignore punctuation— if you do remove punctuation, you’ll get sentence like this: “The dog barked at John cat”. Instead keep them there so that you can have a better chance to have a more realistic sentence — i.e. “The dog barked at John’s ca”
- End on a full-stop word. When you go through and start from a word, then find the next word, then find the next word and so on, you can continue until you reach a specified length, but then you’ll end up stopping in mid-sentence such as this: “The cat ate John’s”. Instead, simply end when you have a word that has a full stop (another reason not to remove the punctuation) — i.e. “The cat ate John’s boots.”
Markov Chain Example Code Source texts
I played around with different texts including: Eddie Murphy stand-up routines, Donald Trump tweets, Obama speeches, and Jed Bartlet dialogue. You can find the the markov chain example source text here. It’s great to use one source and then generate the dictionary, but then you can mix and match and use two sources (e.g. Obama and Bartlet) and then create the one dictionary file. Then when you traverse the dictionary you get the both voices.
It is important to make sure that you can balance the text — e.g. if you had a 8000 text from Obama and only 1000 text from Eddie Murphy, it’s likely that you would see more of the Obama words. Of course, when you build the dictionary, you can also add some artificial weighting towards the lighter text source to balance things out.
Markov Chain Summary
The Markov Chain text generator is not perfect — you’ll see when you create your own, that some text is just gibberish. The more text that you have the better. Secondly, using single words is not helpful in the dictionary — you should use groups of 2–3 words. The actual number depends on how much historical text you have.
You can find all the python code, source texts and Markov Chain python example code here. Good luck!
Subscribe to our newsletter
How To Use Python Loguru for Simplified Logging
Beginner
Python’s built-in logging module requires you to write at least five lines of boilerplate before you see your first log message: import logging, call basicConfig, create a logger, set a level, set a formatter. By the time you have rotation, colour output, and exception tracebacks working, you have a small library worth of configuration. And then you copy it into every new project.
Loguru is a logging library that replaces all of that setup with a single import and a single pre-configured logger object. Install it with pip install loguru. From the first line of code you get coloured output, accurate file and line references, clean exception tracebacks, and a sensible default format. When you need to customise it, the API is a handful of methods rather than a configuration object hierarchy.
This article covers Loguru’s basic usage, log levels, adding file sinks with automatic rotation and retention, structured logging with bound context, exception capturing, filtering log output, and integrating Loguru with code that uses the standard logging module. By the end you will have a logging setup that works in development and production without changes.
Loguru in Python: Quick Example
Zero configuration is the whole point. Import logger and start logging. The pre-configured sink writes coloured, formatted output to stderr immediately.
# quick_loguru.py
from loguru import logger
logger.debug("Debugging the widget factory")
logger.info("Server started on port 8000")
logger.warning("Config file not found, using defaults")
logger.error("Failed to connect to database")
logger.critical("Disk space below 1%")
Output (colours appear in real terminals):
2026-05-21 09:00:01.234 | DEBUG | __main__::3 - Debugging the widget factory
2026-05-21 09:00:01.235 | INFO | __main__::4 - Server started on port 8000
2026-05-21 09:00:01.235 | WARNING | __main__::5 - Config file not found, using defaults
2026-05-21 09:00:01.235 | ERROR | __main__::6 - Failed to connect to database
2026-05-21 09:00:01.236 | CRITICAL | __main__::7 - Disk space below 1%
Every line includes a timestamp, the log level, the module name, the function name, and the line number — all without any setup code. Compare this to the standard library where you would need logging.basicConfig(format='%(asctime)s %(levelname)s %(name)s'%(lineno)d %(message)s', level=logging.DEBUG) just to get a similar format. The rest of the article shows how to extend this with file output, rotation, and structured context.
What Is Loguru and How Does It Differ from logging?
Loguru is a third-party logging library designed around the idea that logging setup should be trivial. It ships with one pre-built logger object — you do not create logger instances per module. Instead of configuring handlers, formatters, and filters as separate objects, Loguru uses a single logger.add() call that accepts a destination (file path, callable, or stream) and all formatting/filtering options as keyword arguments.
| Feature | stdlib logging | Loguru |
|---|---|---|
| Setup lines | 5-15 lines minimum | 0 (works at import) |
| Coloured output | Third-party library | Built-in |
| Exception traceback | Manual exc_info=True | logger.exception() or opt() |
| File rotation | RotatingFileHandler | logger.add() rotation= param |
| Structured context | LogRecord extra dict | logger.bind() |
| Async support | Not built-in | logger.add(enqueue=True) |
The main trade-off is that Loguru is not a drop-in replacement for the standard library in code that expects logging.Logger instances. The section on integration below explains how to bridge this when you use libraries that log through the standard module. For greenfield code and scripts, Loguru is a straightforward upgrade.
Adding File Sinks with Rotation and Retention
The default Loguru sink writes to stderr. To write to a file as well, call logger.add() with a file path. The rotation parameter creates a new log file when a size or time threshold is reached, and retention automatically deletes old files to prevent disk exhaustion.
# file_logging.py
from loguru import logger
import sys
# Remove the default stderr sink
logger.remove()
# Add a clean stderr sink for development (no colours in redirected streams)
logger.add(sys.stderr, level="INFO", format="{time:HH:mm:ss} | {level} | {message}")
# Add a rotating file sink for production
logger.add(
"logs/app_{time:YYYY-MM-DD}.log", # One file per day
level="DEBUG",
format="{time:YYYY-MM-DD HH:mm:ss} | {level:<8} | {name}:{line} | {message}",
rotation="10 MB", # Rotate when file exceeds 10 MB
retention="30 days", # Keep files for 30 days, then delete
compression="gz", # Compress rotated files
encoding="utf-8",
)
logger.info("Application started")
logger.debug("Loading configuration from /etc/app/config.toml")
logger.warning("Cache miss rate above 20%")
File output (logs/app_2026-05-21.log):
2026-05-21 09:00:01 | DEBUG | file_logging:7 - Application started
2026-05-21 09:00:01 | DEBUG | file_logging:8 - Loading configuration from /etc/app/config.toml
2026-05-21 09:00:01 | WARNING | file_logging:9 - Cache miss rate above 20%
The rotation parameter accepts size strings ("10 MB", "1 GB"), time strings ("1 day", "monday"), or a datetime.time object for rotation at a specific time of day. The retention parameter accepts count (5 -- keep five files), time strings ("2 weeks"), or a callable for custom logic. Calling logger.remove() at the top removes the default stderr sink so you have full control over every output destination.
Structured Logging with bind() and contextualize()
In a web application, every log message for a single request should carry the same request ID and user ID so you can filter the log file by request. Loguru's logger.bind() creates a new logger instance with extra context fields attached to every message it produces. logger.contextualize() does the same thing but for the duration of a context manager block.
# structured_logging.py
from loguru import logger
import uuid
logger.add("logs/structured.log", format="{time} | {level} | {extra[request_id]} | {extra[user_id]} | {message}", level="DEBUG")
def handle_request(user_id: int) -> dict:
request_id = str(uuid.uuid4())[:8]
# Create a bound logger that tags every message with these values
req_logger = logger.bind(request_id=request_id, user_id=user_id)
req_logger.info("Request received")
req_logger.debug("Fetching user profile from database")
result = {"user_id": user_id, "name": "Alice"}
req_logger.info("Request completed successfully")
return result
# Simulate two concurrent requests
handle_request(user_id=42)
handle_request(user_id=99)
Output in structured.log:
2026-05-21 09:00:01... | INFO | a1b2c3d4 | 42 | Request received
2026-05-21 09:00:01... | DEBUG | a1b2c3d4 | 42 | Fetching user profile from database
2026-05-21 09:00:01... | INFO | a1b2c3d4 | 42 | Request completed successfully
2026-05-21 09:00:01... | INFO | e5f6a7b8 | 99 | Request received
2026-05-21 09:00:01... | DEBUG | e5f6a7b8 | 99 | Fetching user profile from database
2026-05-21 09:00:01... | INFO | e5f6a7b8 | 99 | Request completed successfully
Every message produced by req_logger carries the request_id and user_id fields. When you search this log for a1b2c3d4, you see every event for that request in chronological order regardless of which function produced it. For async web frameworks, use logger.contextualize() inside an async context manager to achieve the same effect without threading issues.
Capturing Exceptions with Full Tracebacks
Loguru's exception capture adds variable values to tracebacks, showing not just which line raised the exception but what each variable contained at that moment. This turns a cryptic traceback into a self-contained bug report. Use logger.exception() inside an except block, or decorate a function with @logger.catch to capture any unhandled exception it raises.
# exception_capture.py
from loguru import logger
logger.add("logs/errors.log", level="ERROR", backtrace=True, diagnose=True)
@logger.catch
def parse_config(data: dict) -> dict:
"""Parse a config dict -- crashes if 'port' is not an integer."""
port = int(data["port"]) # Will fail if port is a string like "abc"
host = data["host"]
return {"host": host, "port": port}
def safe_divide(a: float, b: float) -> float:
try:
return a / b
except ZeroDivisionError:
logger.exception("Division by zero: a={}, b={}", a, b)
return 0.0
result = safe_divide(10, 0)
print(f"safe_divide result: {result}")
# This will log the full traceback with variable values
parse_config({"host": "localhost", "port": "abc_not_a_number"})
Output:
2026-05-21 09:00:01 | ERROR | Division by zero: a=10, b=0
Traceback (most recent call last):
File "exception_capture.py", line 14, in safe_divide
return a / b
^^^^^
ZeroDivisionError: division by zero
safe_divide result: 0.0
2026-05-21 09:00:01 | ERROR | An error has been caught in function 'parse_config'
...
port = int(data["port"])
--> data = {'host': 'localhost', 'port': 'abc_not_a_number'}
ValueError: invalid literal for int() with base 10: 'abc_not_a_number'
The diagnose=True parameter on the sink enables Loguru's enhanced traceback output that shows variable values at each frame. The @logger.catch decorator catches any exception the function raises, logs it with the full enhanced traceback, and re-raises it (or suppresses it if you pass reraise=False). This is especially useful on entry points like Celery task functions or scheduled jobs where unhandled exceptions would otherwise disappear silently.
Real-Life Example: Application Logger Module
The following module shows a production-ready logging setup that you can drop into any project. It configures stderr for development, a rotating file for production, and provides a function to get a pre-bound logger for each module.
# app_logger.py
import sys
from loguru import logger
from pathlib import Path
def setup_logging(log_dir: str = "logs", level: str = "INFO", debug: bool = False) -> None:
"""Configure application logging for development or production."""
logger.remove() # Remove default handler
console_level = "DEBUG" if debug else level
logger.add(
sys.stderr,
level=console_level,
format="{time:HH:mm:ss} | {level:<8} | {name} :{line} - {message} ",
colorize=True,
)
Path(log_dir).mkdir(parents=True, exist_ok=True)
logger.add(
f"{log_dir}/app_{{time:YYYY-MM-DD}}.log",
level="DEBUG",
format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {level:<8} | {name}:{line} | {extra} | {message}",
rotation="50 MB",
retention="14 days",
compression="gz",
encoding="utf-8",
backtrace=True,
diagnose=True,
)
logger.info("Logging configured: console={}, file=DEBUG", console_level)
def get_logger(component: str):
"""Return a logger pre-bound with a component tag."""
return logger.bind(component=component)
# --- Usage ---
if __name__ == "__main__":
setup_logging(debug=True)
db_log = get_logger("database")
api_log = get_logger("api")
db_log.info("Connected to PostgreSQL on localhost:5432")
api_log.info("Listening on 0.0.0.0:8000")
api_log.warning("Rate limit reached for IP 192.168.1.1")
try:
result = 100 / 0
except ZeroDivisionError:
db_log.exception("Unexpected divide by zero during query optimisation")
Output:
09:00:01 | INFO | app_logger:38 - Logging configured: console=DEBUG, file=DEBUG
09:00:01 | INFO | app_logger:42 - Connected to PostgreSQL on localhost:5432
09:00:01 | INFO | app_logger:43 - Listening on 0.0.0.0:8000
09:00:01 | WARNING | app_logger:44 - Rate limit reached for IP 192.168.1.1
09:00:01 | ERROR | app_logger:48 - Unexpected divide by zero during query optimisation
Traceback (most recent call last):
...
ZeroDivisionError: division by zero
The setup_logging() function is called once at application startup (in main.py or your WSGI/ASGI entry point). Every module in your application calls get_logger("module_name") to get a logger that tags all its messages with the component name, making log filtering trivial. You can extend this by reading level and debug from environment variables or a configuration file rather than hard-coding them.
Frequently Asked Questions
How do I use Loguru with libraries that use the standard logging module?
Most third-party libraries (SQLAlchemy, httpx, FastAPI) log through Python's standard logging module. To route those messages into Loguru, add an InterceptHandler: create a class that inherits from logging.Handler, override emit() to call logger.opt(depth=6, exception=record.exc_info).log(level, record.getMessage()), and install it with logging.basicConfig(handlers=[InterceptHandler()], level=0, force=True). This captures all standard library logging output and routes it through Loguru's sinks.
Does Loguru work with asyncio and async web frameworks?
Yes. For async code, use logger.add(sink, enqueue=True) to make the sink write asynchronously via a background thread queue, which avoids blocking the event loop. For per-request context in async frameworks like FastAPI or Starlette, use logger.contextualize(request_id=..., user_id=...) inside a middleware function -- it uses contextvars.ContextVar under the hood, which is async-safe.
Can I create custom log levels in Loguru?
Yes, call logger.level("TRACE", no=5, color="<dim>", icon="@") to define a custom level below DEBUG, or any numeric value for other positions. After registering it, log with logger.log("TRACE", "Very detailed trace message"). Custom levels appear in the formatted output with the colour and icon you defined, and you can filter them normally with level="TRACE" in any sink.
How do I get JSON-formatted log output for log aggregators?
Pass a callable as the format parameter: logger.add(sink, format=lambda record: json.dumps({"time": str(record["time"]), "level": record["level"].name, "message": record["message"], **record["extra"]}) + "\n", serialize=True). Alternatively, use logger.add(sink, serialize=True) which outputs Loguru's own JSON format with all record fields. Most log aggregators (Datadog, Loki, Elastic) can ingest either format.
How do I stop Loguru from writing to stderr?
Every sink added by logger.add() returns an integer ID. Pass that ID to logger.remove(sink_id) to remove that specific sink. To remove all sinks including the default stderr one, call logger.remove() with no arguments. This is the first line in most production setups -- remove the default, then add exactly the sinks you want.
Conclusion
Loguru makes Python logging something you set up once and forget. You have seen how to log at different levels with zero configuration, add rotating file sinks with automatic cleanup, attach structured context fields with bind(), capture exceptions with enhanced tracebacks, and build a reusable logging module for multi-file projects. Every pattern here works in scripts, web applications, and background workers without changes.
The next step is to add the InterceptHandler to capture third-party library logs, and to wire setup_logging() to your application's configuration system so that log level and output directory come from environment variables rather than code. From there, piping the JSON-formatted file output into a log aggregator like Grafana Loki or Elastic gives you searchable, queryable logs across your entire stack.
See the official Loguru documentation for the complete sink options, serialisation reference, and async integration guide.
Related Articles
Further Reading: For more details, see the Python random module documentation.
Frequently Asked Questions
What is a Markov chain in simple terms?
A Markov chain is a mathematical model where the next state depends only on the current state, not on the sequence of events that preceded it. In text generation, this means the next word is predicted based only on the current word or phrase.
How does a Markov chain text generator work in Python?
A Python Markov chain text generator builds a dictionary of word transitions from training text. Each word maps to a list of words that follow it. The generator then randomly selects next words based on these observed probabilities to create new text.
What are the limitations of Markov chain text generation?
Markov chains produce text that can be grammatically inconsistent over long passages because they only consider local context (the previous few words). They lack understanding of meaning, coherence, and long-range dependencies that modern language models handle better.
Can I use Markov chains for purposes other than text generation?
Yes. Markov chains are used in weather prediction, stock market modeling, DNA sequence analysis, game AI, PageRank algorithms, and many simulation scenarios. Any system where transitions between states follow probabilistic rules can be modeled with Markov chains.
How do I improve the quality of Markov chain generated text?
Increase the chain order (use pairs or triples of words instead of single words as keys), use larger and higher-quality training data, add post-processing to fix grammar, and filter out nonsensical outputs. Higher-order chains produce more coherent text but require more training data.