Advanced
Making computer generated text mimic human speech is fascinating and actually not that difficult for an effect that is sometimes convincing, but certainly entertaining. Markov Chain’s is one way to do this. It works by generating new text based on historical texts where the original sequencing of neighboring words (or groups of words) is used to generate meaningful sentences. Read the below guide on how to code a Markov Chain text generator (code example in python) including explanation of the concept.
What’s really interesting, is that you can take historical texts of a person, then generate new sentences which can sound similar to the way that person speaks. Alternatively, you can combine texts from two different people and get a mixed “voice”.
I played around this with texts of speeches from two great presidents:
What my Markov Chain generated which was “trained” using the combination of texts from Obama speeches and Bartlet scripts, is as follows:
- ‘Can I burn my mother in North Carolina for giving us a great night planned.’
- ‘And so going forward, I believe that we can build a bomb into their church.’
- ‘’Charlie, my father had grown up in the Situation Room every time I came in.’’
- ‘This campaign must be ballistic.’,
What is a Markov Chain in the context of a text generation?
For a more technical explanation, I think you can find plenty of resources out there. In simple terms, it is an algorithm which is used to generate a new outcome from a weighted list of words based on historical texts. Now that’s rather abstract. In more practical terms, in the scenario for text generation, it is a way to use historical texts, chop it up into individual words (or sets of words), and then randomly chose a given word then randomly chose the next likely words based on historical sequences. For example:

This doesn’t just apply in text as well (although one of the most popular applications is in your smart phone where there’s predictive text), it can be used for any scenario where you use historical information to define next steps for a given state. For example, you could codify a given stock market pattern (such as the % daily changes for the last 30 days), then use that to see historically what was the likely next day outcome (example only.. I’m very doubtful how effective it would be).
Why are Markov Chain Text Generators so fun?
I’ve always wanted to build a text generator as it’s just an awesome way to see how you could mimic intelligence using a very cheap shortcut. You’ll see the algorithm below, and it is super simple. The other fact is that, like above example, you can use it to mix the ‘voice’ from two different persons and see the outcome.
How does the Markov Chain Text Generator work?
There are two phases for text generation with Markov Chains. There’s first the ‘dictionary build phase’ which involves gathering the historical texts, and then generating a dictionary with the key being a given word in a sentence, and then having the resultant being the natural follow-up words.

The second is the execution, where you start from a given word, then use that word to see what the next word would be in a probabilistic way. For example:

Now, there are some tricks which you need to be mindful of ( I found this out the hard way):
- You can’t start from any random word — if you do, then you’ll get sentences like this: “ate the cat.” . You have to keep track of “starting words” to keep things simple — hence you can have: “John ate the cat”.
- Don’t ignore punctuation— if you do remove punctuation, you’ll get sentence like this: “The dog barked at John cat”. Instead keep them there so that you can have a better chance to have a more realistic sentence — i.e. “The dog barked at John’s ca”
- End on a full-stop word. When you go through and start from a word, then find the next word, then find the next word and so on, you can continue until you reach a specified length, but then you’ll end up stopping in mid-sentence such as this: “The cat ate John’s”. Instead, simply end when you have a word that has a full stop (another reason not to remove the punctuation) — i.e. “The cat ate John’s boots.”
Markov Chain Example Code Source texts
I played around with different texts including: Eddie Murphy stand-up routines, Donald Trump tweets, Obama speeches, and Jed Bartlet dialogue. You can find the the markov chain example source text here. It’s great to use one source and then generate the dictionary, but then you can mix and match and use two sources (e.g. Obama and Bartlet) and then create the one dictionary file. Then when you traverse the dictionary you get the both voices.
It is important to make sure that you can balance the text — e.g. if you had a 8000 text from Obama and only 1000 text from Eddie Murphy, it’s likely that you would see more of the Obama words. Of course, when you build the dictionary, you can also add some artificial weighting towards the lighter text source to balance things out.
Markov Chain Summary
The Markov Chain text generator is not perfect — you’ll see when you create your own, that some text is just gibberish. The more text that you have the better. Secondly, using single words is not helpful in the dictionary — you should use groups of 2–3 words. The actual number depends on how much historical text you have.
You can find all the python code, source texts and Markov Chain python example code here. Good luck!
Subscribe to our newsletter
How To Use PyJWT for JSON Web Tokens in Python
Intermediate
Session cookies work fine for traditional web apps, but modern REST APIs and microservices need a stateless authentication mechanism — one where the server does not store session state and any service in the cluster can verify a token independently. JSON Web Tokens (JWTs) are the standard answer. A JWT is a base64-encoded JSON payload with a cryptographic signature that any server can verify without hitting a database.
The PyJWT library is the most widely used Python implementation of the JWT standard (RFC 7519). It is used internally by AWS SDK, GitHub Actions, and dozens of popular frameworks. In this article you will learm to create signed tokens with HS256 and RS256, validate claims like expiry and audience, implement refresh token rotation, and wire JWTs into a FastAPI dependency.
PyJWT Quick Example: Issue and Verify in 10 Lines
# quick_jwt.py
import jwt, datetime
SECRET = "my-256-bit-secret"
payload = {
"sub": "user_42",
"exp": datetime.datetime.now(datetime.UTC) + datetime.timedelta(hours=1)
}
token = jwt.encode(payload, SECRET, algorithm="HS256")
print(f"Token: {token[:50]}...")
decoded = jwt.decode(token, SECRET, algorithms=["HS256"])
print(f"Subject: {decoded['sub']}")
Output:
Token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIi...
Subject: user_42
Installing PyJWT
pip install PyJWT
# For RSA/EC algorithm support, install the cryptography extra:
pip install "PyJWT[cryptography]"
python -c "import jwt; print(jwt.__version__)"
Output:
2.8.0
| Algorithm | Key Type | Verification | Use Case |
|---|---|---|---|
| HS256 | Shared secret | Same secret | Single service, internal APIs |
| HS512 | Shared secret | Same secret | Higher security single service |
| RS256 | RSA private key | RSA public key | Microservices, OAuth2, OIDC |
| ES256 | EC private key | EC public key | Mobile, IoT (smaller tokens) |
Standard Claims and Expiry
JWTs have registered claim names defined by RFC 7519. PyJWT validates them automatically during decoding:
# jwt_claims.py
import jwt, datetime
SECRET = "production-secret-min-32-chars-long"
def create_access_token(user_id: str, roles: list) -> str:
now = datetime.datetime.now(datetime.UTC)
payload = {
# Registered claims (RFC 7519)
"iss": "https://api.myapp.com", # Issuer
"sub": user_id, # Subject (user identifier)
"aud": "myapp-frontend", # Audience
"exp": now + datetime.timedelta(minutes=15), # Expiry
"iat": now, # Issued At
"jti": f"{user_id}-{now.timestamp()}", # JWT ID (unique per token)
# Custom claims
"roles": roles,
"tier": "premium"
}
return jwt.encode(payload, SECRET, algorithm="HS256")
def verify_access_token(token: str) -> dict:
return jwt.decode(
token,
SECRET,
algorithms=["HS256"],
audience="myapp-frontend", # Must match aud claim
options={"require": ["exp", "iat", "sub", "iss"]}
)
token = create_access_token("user_42", ["read", "write"])
claims = verify_access_token(token)
print(f"User: {claims['sub']}, Roles: {claims['roles']}, Tier: {claims['tier']}")
# Test expiry
import time
expired_payload = {
"sub": "user_1",
"exp": datetime.datetime.now(datetime.UTC) - datetime.timedelta(seconds=1)
}
expired_token = jwt.encode(expired_payload, SECRET, algorithm="HS256")
try:
jwt.decode(expired_token, SECRET, algorithms=["HS256"])
except jwt.ExpiredSignatureError as e:
print(f"Rejected: {e}")
Output:
User: user_42, Roles: ['read', 'write'], Tier: premium
Rejected: Signature has expired.
RS256: Asymmetric JWT for Microservices
In a microservice architecture, each service needs to verify tokens independently without sharing a secret. RS256 lets a central auth service sign with its private key, and all other services verify with the public key:
# rs256_jwt.py
import jwt
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization
import datetime
# Generate RSA key pair (in production: load from file/secret manager)
private_key = rsa.generate_private_key(public_exponent=65537, key_size=2048)
public_key = private_key.public_key()
# Auth service: sign with private key
payload = {
"sub": "service_account_7",
"scope": "billing:read orders:write",
"exp": datetime.datetime.now(datetime.UTC) + datetime.timedelta(hours=1),
"iss": "https://auth.myapp.com"
}
token = jwt.encode(payload, private_key, algorithm="RS256")
print(f"RS256 token length: {len(token)} chars")
# Any microservice: verify with public key only (no private key needed)
decoded = jwt.decode(
token,
public_key,
algorithms=["RS256"],
options={"verify_aud": False}
)
print(f"Verified service: {decoded['sub']}, scope: {decoded['scope']}")
# Serialize public key to PEM for distribution
pem = public_key.public_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PublicFormat.SubjectPublicKeyInfo
)
# Load back from PEM and verify again
loaded_pubkey = serialization.load_pem_public_key(pem)
decoded2 = jwt.decode(token, loaded_pubkey, algorithms=["RS256"], options={"verify_aud": False})
print(f"PEM round-trip OK: {decoded2['sub']}")
Output:
RS256 token length: 472 chars
Verified service: service_account_7, scope: billing:read orders:write
PEM round-trip OK: service_account_7
Access and Refresh Token Pattern
Short-lived access tokens (15 minutes) with long-lived refresh tokens (7 days) are the standard pattern for secure stateless authentication:
# token_pair.py
import jwt, datetime, secrets
SECRET = "your-secret-key-min-32-chars"
REFRESH_SECRET = "different-refresh-secret-min-32-chars"
def issue_token_pair(user_id: str) -> dict:
now = datetime.datetime.now(datetime.UTC)
access_token = jwt.encode({
"sub": user_id,
"type": "access",
"exp": now + datetime.timedelta(minutes=15),
"iat": now
}, SECRET, algorithm="HS256")
refresh_token = jwt.encode({
"sub": user_id,
"type": "refresh",
"jti": secrets.token_hex(16), # Unique ID for revocation
"exp": now + datetime.timedelta(days=7),
"iat": now
}, REFRESH_SECRET, algorithm="HS256")
return {"access_token": access_token, "refresh_token": refresh_token}
def refresh_access_token(refresh_token: str) -> str:
payload = jwt.decode(refresh_token, REFRESH_SECRET, algorithms=["HS256"])
if payload.get("type") != "refresh":
raise ValueError("Not a refresh token")
# In production: check jti against a revocation list in Redis
now = datetime.datetime.now(datetime.UTC)
return jwt.encode({
"sub": payload["sub"],
"type": "access",
"exp": now + datetime.timedelta(minutes=15),
"iat": now
}, SECRET, algorithm="HS256")
tokens = issue_token_pair("user_42")
print(f"Access token (first 40 chars): {tokens['access_token'][:40]}...")
new_access = refresh_access_token(tokens["refresh_token"])
decoded = jwt.decode(new_access, SECRET, algorithms=["HS256"])
print(f"Refreshed token for: {decoded['sub']}, expires in 15 min")
Output:
Access token (first 40 chars): eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIi...
Refreshed token for: user_42, expires in 15 min
Frequently Asked Questions
How long should my HS256 secret be?
At least 32 characters (256 bits) for HS256, and 64 characters for HS512. Use secrets.token_hex(32) to generate a cryptographically random 64-character hex secret. Short or dictionary-word secrets are vulnerable to offline brute-force attacks — an attacker who captures a token can try millions of secrets per second without hitting your server.
Where should I store JWTs on the client side?
For browser apps, httpOnly cookies are more secure than localStorage because JavaScript cannot read httpOnly cookies, preventing XSS attacks from stealing tokens. For mobile apps and server-to-server calls, storing in memory or secure device storage is fine. Never store tokens in localStorage if the site runs any third-party JavaScript. For refresh tokens specifically, httpOnly + Secure + SameSite=Strict cookies are the recommended pattern.
How do I revoke a JWT before it expires?
JWTs are stateless by design — the server does not store them, so you cannot “delete” one. The standard approaches are: keep access tokens very short-lived (15 minutes or less) so revocation is rarely needed; maintain a Redis blocklist of revoked jti values checked on each request; or use opaque tokens for security-critical resources and only use JWTs for low-risk claims. The blocklist approach reintroduces some statefulness but is still much lighter than full session storage.
What is the “none” algorithm attack?
Early JWT libraries accepted tokens with "alg": "none" — meaning no signature — allowing attackers to forge any payload. PyJWT is safe by default: you must explicitly list allowed algorithms in jwt.decode(), and none is never allowed unless you pass algorithms=["none"] explicitly. Always pass the algorithms parameter and never include "none" in it. Similarly, never pass the encoded token’s header algorithm as the allowed algorithm — always hardcode the expected algorithm on the server.
How do I use PyJWT with FastAPI?
Create a dependency that reads the Authorization: Bearer <token> header, calls jwt.decode(), and either returns the claims dict or raises HTTPException(401) on failure. Use Depends(verify_token) in any route that requires authentication. FastAPI’s dependency injection system handles the rest — the dependency runs before the route handler, and any exception it raises short-circuits the request.
Conclusion
PyJWT makes stateless authentication straightforward in Python. For single-service APIs, HS256 with a 32+ character secret and 15-minute expiry is simple and secure. For microservices where multiple independent services need to verify tokens, RS256 lets you distribute the public key freely while keeping the signing key private. In both cases, always specify algorithms explicitly in jwt.decode(), always validate exp and iss, and use short-lived access tokens with refresh token rotation for production systems. The jti claim combined with a Redis blocklist gives you revocation capability when you need it without abandoning the stateless model entirely.
Related Articles
Further Reading: For more details, see the Python random module documentation.
Frequently Asked Questions
What is a Markov chain in simple terms?
A Markov chain is a mathematical model where the next state depends only on the current state, not on the sequence of events that preceded it. In text generation, this means the next word is predicted based only on the current word or phrase.
How does a Markov chain text generator work in Python?
A Python Markov chain text generator builds a dictionary of word transitions from training text. Each word maps to a list of words that follow it. The generator then randomly selects next words based on these observed probabilities to create new text.
What are the limitations of Markov chain text generation?
Markov chains produce text that can be grammatically inconsistent over long passages because they only consider local context (the previous few words). They lack understanding of meaning, coherence, and long-range dependencies that modern language models handle better.
Can I use Markov chains for purposes other than text generation?
Yes. Markov chains are used in weather prediction, stock market modeling, DNA sequence analysis, game AI, PageRank algorithms, and many simulation scenarios. Any system where transitions between states follow probabilistic rules can be modeled with Markov chains.
How do I improve the quality of Markov chain generated text?
Increase the chain order (use pairs or triples of words instead of single words as keys), use larger and higher-quality training data, add post-processing to fix grammar, and filter out nonsensical outputs. Higher-order chains produce more coherent text but require more training data.