Beginner
Twitter Bots can be super useful to help automate some of the interactions on social media in order to build and grow engagement but also automate some tasks. There has been many changes on the twitter developer account and sometimes it’s uncertain how to even create a tweet bot. This article will walk through step bey step on how to create a twitter bot with the latest Twitter API v2 and also provide some code you can copy and paste in your next project. We also end with how to create a more useful bot that can post some articles about python automatically.
In a nutshell, how a twitter bot works is that you will need to run your code for a twitter bot in your own compute that can be triggered from a Twitter webhook (not covered) which is called by twitter based on a given event, or by having your program run periodically to read and send tweets (covered in this article). Either way, there are some commonalities and in this article we will walk through how to read tweets, and then to send tweets which are from google news related to python!
Step 1: Sign up for Developer program
If you haven’t already you will need to either sign in or sign up for a twitter account through twitter.com. Make sure your twitter account has an email address allocated to it (if you’re not aware, you can create a twitter account with just your mobile phone number)

Next go to developer.twitter.com and sign up for the developer program (yes, you need to sign up for a second time). This enables you to create applications.

First you’ll need to answer some questions on purpose of the developer account. You can chose “Make a Bot”

Next you will need to agree to the terms and conditions, and then a verification email will be sent to your email address from your twitter account.
When you click on the email to verify your account, you can then enter your app name. This is an internal name and something that will make it easy for you to reference.

Once you click on keys, you will then be given a set of security token keys like below. Please copy them in a safe place as your python code will need to use them to access your specific bot. If you do lose your keys, or someone gets access to them for some reason, you can generate new keys from your developer.twitter.com console.
There are two keys which you will need to use:
- API Key (think of this like a username)
- API Key Secret (think of this like a password)
- Bearer Token (used for read queries such as getting latest tweets)
There is also a third key, a Bearer Token, but this you can ignore. It is for certain types of requests

At the bottom of the screen you’ll see a “Skip to Dashboard”, when you click on that you’ll then see the overview of your API metrics.
Within this screen you can see the limits of the number of calls per month for example and how much you have already consumed.

Next, click on the project and we have to generate the access tokens. Currently with the previous keys you can only read tweets, you cannot create ones as yet.
After clicking on the project, chose the “keys and tokens” tab and at the bottom you can generate the “Access Tokens”. In this screen you can also re-generate the API Keys and Bearer Token you just created before in case your keys were compromised or you forgot them.

Just like before, generate the keys and copy them.

By now, you have 5 security toknes:
- API Key – also known as the Consumer Key (think of this like a username)
- API Key Secret – also known as the Consumer Secret (think of this like a password)
- Bearer Token (used for read queries such as getting latest tweets)
- Access Token (‘username’ to allow you to create tweets)
- Access Token Secret (‘password’ to allow you to create tweets)
Step 2: Test your twitter API query
Now that you have the API keys, you can do some tests. If you are using a linux based machine you can use the curl command to do a query. Otherwise, you can use a site such as https://reqbin.com/curl to do an online curl request.
Here’s a simple example to get the most recent tweets. It uses the API https://api.twitter.com/2/tweets/search/recent which must include the query keyword which includes a range of parameter options (find out the list in the twitter query documentation).
curl --request GET 'https://api.twitter.com/2/tweets/search/recent?query=from:pythonhowtocode' --header 'Authorization: Bearer <your bearer token from step 1>'
The output is as follows:
{
"data": [{
"id": "1523251860110405633",
"text": "See our latest article on THE complete beginner guide on creating a #discord #bot in #python \n\nEasily add this to your #100DaysOfCode #100daysofcodechallenge #100daysofpython \n\nhttps://t.co/4WKvDVh1g9"
}],
"meta": {
"newest_id": "1523251860110405633",
"oldest_id": "1523251860110405633",
"result_count": 1
}
}
Here’s a much more complex example. This includes the following parameters:
%23– which is the escape characters for#and searches for hashtags. Below example is hashtag#python(case insensitive)%20– this is an escape character for a space and separates different filters with anANDoperation-is:retweet– this excludes retweets. The ‘-‘ sign preceding theisnegates the actual filter-is:reply– this excludes replies. The ‘-‘ sign preceding theisnegates the actual filtermax_results=20– an integer that defines the maximum number of return results and in this case 20 resultsexpansions=author_id– this makes sure to include the username internal twitter id and also the actual username under anincludessection at the bottom of the returned JSONtweet.fields=public_metrics,created_at– returns the interaction metrics such as number of likes, number of retweets, etc as well as the time (in GMT timezone) when the tweet was createduser.fields=created_at,location– this returns when the user account was created and the user self-reported location in their profile.
curl --request GET 'https://api.twitter.com/2/tweets/search/recent?query=%23python%20-is:retweet%20-is:reply&max_results=20&expansions=author_id&tweet.fields=public_metrics,created_at&user.fields=created_at,location' --header 'Authorization: Bearer <Your Bearer Token from Step 1>'
Result of this looks like the following – notice that the username details is in the includes section below where you can link the tweet with the username with the author_id field.
{{
"data": [{
"id": "1523688996676812800",
"text": "NEED a #JOB?\nSign up now https://t.co/o7lVlsl75X\nFREE. NO MIDDLEMEN\n#Jobs #AI #DataAnalytics #MachineLearning #Python #JavaScript #WomenWhoCode #Programming #Coding #100DaysofCode #DEVCommunity #gamedev #gamedevelopment #indiedev #IndieGameDev #Mobile #gamers #RHOP #BTC #ETH #SOL https://t.co/kMYD2417jR",
"author_id": "1332714745871421443",
"public_metrics": {
"retweet_count": 3,
"reply_count": 0,
"like_count": 0,
"quote_count": 0
},
"created_at": "2022-05-09T15:39:00.000Z"
},
....
}],
"includes": {
"users": [{
"name": "Job Preference",
"id": "1332714745871421443",
"username": "JobPreference",
"created_at": "2020-11-28T15:56:01.000Z"
},
....
}
Step 3: Reading tweets with python code
Building on top of the tests conducted on Step 2, it is a simple extra step in order to convert this to python code using the requests module which we’ll show first and after show a simpler way with the library tweepy. You can simply use the library to convert the curl command into a bit of python code. Here’s a structured version of this code where the logic is encapsulated in a class.
import requests, json
from urllib.parse import quote
from pprint import pprint
class TwitterBot():
URL_SEARCH_RECENT = 'https://api.twitter.com/2/tweets/search/recent'
def __init__(self, bearer_key):
self.bearer_key = bearer_key
def search_recent(self, query, include_retweets=False, include_replies=False):
url = self.URL_SEARCH_RECENT + "?query=" + quote(query)
if not include_retweets: url += quote(' ')+'-is:retweet'
if not include_replies: url += quote(' ')+'-is:reply'
url += '&max_results=20&expansions=author_id&tweet.fields=public_metrics,created_at&user.fields=created_at,location'
headers = {'Authorization': 'Bearer ' + self.bearer_key }
r = requests.get(url, headers = headers)
r.encoding = r.apparent_encoding. #Ensure to use UTF-8 if unicode characters
return json.loads(r.text)
#create an instance and pass in your Bearer Token
t = TwitterBot('<Insert your Bearer Token from Step 1>')
pprint( t.search_recent( '#python') )
The above code is fairly straightforward and does the following:
TwitterBot class– this class encapsulates the logic to send the API requestsTwitterBot.search_recent– this method takes in the query string, then escapes any special characters, then calls therequests.get()to call thehttps://api.twitter.com/2/tweets/search/recentAPI callpprint()– this simply prints the output in a more readable format
This is the output:


However, there is a simpler way which is to use tweepy.
pip install tweepy
Next you can use the tweepy module to search recent tweets:
import tweepy
client = tweepy.Client(bearer_token='<insert your token here from previous step>')
query = '#python -is:retweet -is:reply' #exclude retweets and replies with '-'
tweets = client.search_recent_tweets( query=query,
tweet_fields=['public_metrics', 'context_annotations', 'created_at'],
user_fields=['username','created_at','location'],
expansions=['entities.mentions.username','author_id'],
max_results=10)
#The details of the users is in the 'includes' list
user_data = {}
for raw_user in tweets.includes['users']:
user_data[ raw_user.id ] = raw_user
for index, tweet in enumerate(tweets.data):
print(f"[{index}]::@{user_data[tweet.author_id]['username']}::{tweet.created_at}::{tweet.text.strip()}\n")
print("------------------------------------------------------------------------------")
Output as follows:

Please note, that after calling the API a few times your number of tweets consumed will have increased and may have hit the limit. You can always visit the dashboard at https://developer.twitter.com/en/portal/dashboard to see how many requests have been consumed. Notice, that this does not count the number of actual API calls but the actual number of tweets. So it can get consumed pretty quickly.

Step 4: Sending out a tweet
So far we’ve only been reading tweets. In order to send a tweet you can use the create_tweet() function of tweepy.
client = tweepy.Client( consumer_key= "<API key from above - see step 1>",
consumer_secret= "<API Key secret - see step 1>",
access_token= "<Access Token - see step 1>",
access_token_secret= "<Access Token Secret - see step 1>")
# Replace the text with whatever you want to Tweet about
response = client.create_tweet(text='A little girl walks into a pet shop and asks for a bunny. The worker says” the fluffy white one or the fluffy brown one”? The girl then says, I don’t think my python really cares.')
print(response)
Output from Console:

Output from Twitter:

How to Send Automated Tweets About the Latest News
To make this a bit more of a useful bot rather than simply tweet out static text, we’ll make it tweet about the latest things happened in the news about python.
In order to search for news information, you can use the python library pygooglenews
pip install pygooglenews
The library searches Google news RSS feed and was developed by Artem Bugara. You can see the full article of he developed the Google News library. You can put in a keyword and also time horizon to make it work. Here’s an example to find the latest python articles in last 24 hours.
from pygooglenews import GoogleNews
gn = GoogleNews()
search = gn.search('python programming', when = '12h')
for article in search['entries']:
print(article.title)
print(article.published)
print(article.source.title)
print('-'*80) #string multiplier - show '-' 80 times
Here’s the output:
So, the idea would be to show a random article on the twitter bot which is related to python programming. The gn.search() functions returns a list of all the articles under the entries dictionary item which has a list of those articles. We will simply pick a random one and construct the tweet with the article title and the link to the article.
import tweepy
from pygooglenews import GoogleNews
from random import randint
client = tweepy.Client( consumer_key= "<your consumer/API key - see step 1>",
consumer_secret= "<your consumer/API secret - see step 1>",
access_token= "<your access token key - see step 1>",
access_token_secret= "<your access token secret - see step 1>")
gn = GoogleNews()
search = gn.search('python programming', when = '24h')
#Find random article in last 24 hours using randint between index 0 and the last index
article = search['entries'][ randint( 0, len( search['entries'])-1 ) ]
#construct the tweet text
tweet_text = f"In python news: {article.title}. See full article: {article.link}. #python #pythonprogramming"
#Fire off the tweet!
response = client.create_tweet( tweet_text )
print(response)
Output from the console on the return result:

And, most importantly, here’s the tweet from our @pythonhowtocode! Twitter automatically pulled the article image

This has currently been scheduled as a daily background job!
How To Use Python Joblib for Parallel Computing and Caching
Intermediate
You have a data processing loop that runs one item at a time — checking each file, scoring each user, training each model configuration. Your machine has eight cores and only one of them is working. The loop that takes twenty minutes could finish in three if you could just split the work across all available processors.
Joblib is a Python library that makes parallel computing and result caching easy to add to existing code. Its Parallel and delayed utilities turn a regular Python loop into a parallel job with one wrapper. Its Memory class caches function results to disk so that the second call with the same arguments returns instantly. Install it with pip install joblib. Scikit-learn uses Joblib internally for its own parallelism, so if you have scikit-learn installed, Joblib is already there.
This article covers parallelising loops with Parallel and delayed, choosing the right backend (loky, threading, multiprocessing), caching expensive computations with Memory, integrating with scikit-learn pipelines, and diagnosing performance with verbosity settings. By the end you will have both parallel execution and disk caching working in a realistic data pipeline.
Joblib Parallel: Quick Example
The quickest way to see Joblib’s effect is to replace a for loop with a Parallel call. The structure is almost identical — the main change is wrapping the function call with delayed().
# quick_joblib.py
import time
from joblib import Parallel, delayed
def slow_square(n: int) -> int:
"""Simulate a slow computation."""
time.sleep(0.5)
return n * n
numbers = list(range(8))
# Sequential -- takes 8 * 0.5 = 4 seconds
start = time.perf_counter()
sequential = [slow_square(n) for n in numbers]
seq_time = time.perf_counter() - start
print(f"Sequential: {sequential} in {seq_time:.2f}s")
# Parallel -- uses all available CPU cores
start = time.perf_counter()
parallel = Parallel(n_jobs=-1)(delayed(slow_square)(n) for n in numbers)
par_time = time.perf_counter() - start
print(f"Parallel: {parallel} in {par_time:.2f}s")
print(f"Speedup: {seq_time / par_time:.1f}x")
Output (on an 8-core machine):
Sequential: [0, 1, 4, 9, 16, 25, 36, 49] in 4.01s
Parallel: [0, 1, 4, 9, 16, 25, 36, 49] in 0.56s
Speedup: 7.2x
The n_jobs=-1 argument tells Joblib to use all available CPU cores. n_jobs=4 would use exactly four. The delayed(func)(args) pattern creates a lazy description of the function call without executing it — Joblib collects these descriptions and distributes them across workers. The return values are collected in the same order as the input, so parallel[3] is always the result of slow_square(3) regardless of which worker finished first.
What Is Joblib and When Should You Use It?
Joblib provides two things: easy parallelism through a process pool, and persistent disk caching of function results. These two features are independent — you can use either without the other. The parallelism is built on top of the loky process pool by default (a robust reimplementation of multiprocessing.Pool) with fallback to Python’s threading or the original multiprocessing pool.
| Tool | Best for | Overhead |
|---|---|---|
| Joblib Parallel (loky) | CPU-bound tasks, data processing | ~100ms startup |
| Joblib Parallel (threading) | IO-bound tasks, numpy releases GIL | ~5ms startup |
| concurrent.futures | Simple async IO, process pools | ~50ms startup |
| multiprocessing.Pool | CPU-bound, full control needed | ~100ms startup |
| asyncio | High-concurrency network IO | Near zero |
Joblib excels when your loop body is CPU-bound (model training, file parsing, image processing) and each iteration takes at least a few milliseconds — enough to justify the inter-process communication cost. For very fast operations (microsecond loops), parallelism overhead outweighs the benefit. The caching feature is valuable for any function with expensive deterministic computations: feature extraction, data loading, hyperparameter search.
Choosing the Right Backend
Joblib supports three execution backends, each suited to different workloads. Understanding when to use each prevents a common trap: the default process-based backend actually slows down IO-bound work because of serialisation overhead.
# backends.py
import time
import numpy as np
from joblib import Parallel, delayed
def cpu_task(size: int) -> float:
"""CPU-bound: pure Python computation."""
data = list(range(size))
return sum(x * x for x in data) / len(data)
def numpy_task(size: int) -> float:
"""Numpy releases the GIL -- threading backend works well here."""
arr = np.random.rand(size)
return float(np.sqrt(np.sum(arr ** 2)))
items = [100_000] * 8
# Default loky backend (separate processes, best for pure Python CPU work)
start = time.perf_counter()
results_loky = Parallel(n_jobs=4, backend="loky")(
delayed(cpu_task)(n) for n in items
)
print(f"loky (CPU work): {time.perf_counter() - start:.2f}s")
# Threading backend (shares memory, good when GIL is released by C extensions)
start = time.perf_counter()
results_thread = Parallel(n_jobs=4, backend="threading")(
delayed(numpy_task)(n) for n in items
)
print(f"threading (NumPy): {time.perf_counter() - start:.2f}s")
# Sequential for comparison
start = time.perf_counter()
results_seq = [numpy_task(n) for n in items]
print(f"sequential: {time.perf_counter() - start:.2f}s")
Output:
loky (CPU work): 0.48s
threading (NumPy): 0.31s
sequential: 1.12s
The loky backend spawns separate Python processes, each with their own memory space and GIL. This is the right choice for pure Python CPU work because it truly runs in parallel. The threading backend runs in threads within the same process. Because Python’s GIL prevents true parallel execution of pure Python code, threading only helps when the task calls into a C extension that releases the GIL — like NumPy, Pandas, or scikit-learn. The multiprocessing backend is the original process pool; prefer loky unless you have a specific compatibility reason to use it.
Caching Expensive Results with Memory
Joblib’s Memory class caches a function’s return value to disk, keyed by the function’s source code and its arguments. The second call with the same arguments reads from the cache instead of recomputing. This is useful for data loading, feature extraction, or any expensive deterministic step that you run repeatedly during development.
# caching.py
import time
import numpy as np
from joblib import Memory
# Create a cache directory
cache = Memory("./joblib_cache", verbose=1)
@cache.cache
def load_and_process(filepath: str, scale: float = 1.0) -> np.ndarray:
"""Simulate expensive data loading and processing."""
print(f" [COMPUTING] Loading {filepath} with scale={scale}")
time.sleep(2) # Simulate a 2-second load
data = np.random.rand(1000) * scale
return data
print("First call (cold cache):")
start = time.perf_counter()
result1 = load_and_process("data/features.npy", scale=2.0)
print(f" Took: {time.perf_counter() - start:.2f}s, mean={result1.mean():.4f}")
print("\nSecond call (cache hit):")
start = time.perf_counter()
result2 = load_and_process("data/features.npy", scale=2.0)
print(f" Took: {time.perf_counter() - start:.4f}s, mean={result2.mean():.4f}")
print("\nDifferent args (cache miss):")
start = time.perf_counter()
result3 = load_and_process("data/features.npy", scale=3.0)
print(f" Took: {time.perf_counter() - start:.2f}s, mean={result3.mean():.4f}")
Output:
First call (cold cache):
[COMPUTING] Loading data/features.npy with scale=2.0
Took: 2.01s, mean=0.9987
Second call (cache hit):
Took: 0.0031s, mean=0.9987
Different args (cache miss):
[COMPUTING] Loading data/features.npy with scale=3.0
Took: 2.01s, mean=1.4991
The cache is stored as compressed pickle files in the directory you specify. It is keyed on the function’s source code hash and all arguments — if you change the function body, Joblib invalidates the cache automatically on the next call. To clear the cache manually, call cache.clear() or delete the cache directory. The verbose=1 argument makes Joblib print whether it computed or loaded from cache; set it to 0 to silence this output in production.
Joblib with scikit-learn Pipelines
Scikit-learn uses Joblib internally for all its n_jobs parameters — cross-validation, grid search, random forests, and more all use the same Joblib infrastructure. You can control the backend and number of jobs globally using Joblib’s parallel_backend context manager, or pass n_jobs directly to estimators.
# sklearn_parallel.py
import time
import numpy as np
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score, GridSearchCV
from joblib import parallel_backend
# Generate a sample dataset
X, y = make_classification(n_samples=5000, n_features=20, random_state=42)
# Train a random forest using all CPU cores
print("Training RandomForest with n_jobs=-1...")
start = time.perf_counter()
rf = RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=42)
rf.fit(X, y)
print(f" Fit time: {time.perf_counter() - start:.2f}s")
# Cross-validation in parallel
start = time.perf_counter()
scores = cross_val_score(rf, X, y, cv=5, n_jobs=-1, scoring="accuracy")
print(f" CV scores: {scores.round(3)}, mean={scores.mean():.3f}, time={time.perf_counter() - start:.2f}s")
# Hyperparameter search -- each combo evaluated in parallel
param_grid = {
"n_estimators": [50, 100],
"max_depth": [5, 10, None],
}
start = time.perf_counter()
grid = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=3,
n_jobs=-1,
verbose=0,
)
grid.fit(X, y)
elapsed = time.perf_counter() - start
print(f" Best params: {grid.best_params_}, score={grid.best_score_:.3f}, time={elapsed:.2f}s")
Output:
Training RandomForest with n_jobs=-1...
Fit time: 0.31s
CV scores: [0.934 0.931 0.929 0.927 0.932], mean=0.931, time=0.48s
Best params: {'max_depth': None, 'n_estimators': 100}, score=0.931, time=1.24s
The n_jobs=-1 parameter on scikit-learn estimators and model-selection utilities goes directly to Joblib. Setting it uses all available cores for that operation. For nested parallelism (a parallel grid search that itself trains parallel random forests), Joblib automatically avoids over-subscribing the CPU — the inner jobs run sequentially when the outer jobs already fill all cores.
Real-Life Example: Parallel Feature Extraction Pipeline
The following pipeline processes a directory of text files, extracts word-frequency features from each, and caches the results. Combining Parallel with Memory gives you both speed and resilience — if the pipeline is interrupted, the cached results mean you do not repeat work already done.
# feature_pipeline.py
import os
import time
import re
from collections import Counter
from pathlib import Path
from joblib import Parallel, delayed, Memory
cache = Memory("./feature_cache", verbose=0)
# --- Create sample text files ---
SAMPLE_DIR = Path("sample_texts")
SAMPLE_DIR.mkdir(exist_ok=True)
sample_texts = {
"python.txt": "Python is a high-level programming language. Python emphasises readability.",
"data.txt": "Data science uses statistics and programming. Data analysis reveals patterns.",
"web.txt": "Web development creates websites and applications. The web uses HTML CSS JavaScript.",
"ai.txt": "Artificial intelligence mimics human thinking. Machine learning trains models on data.",
"cloud.txt": "Cloud computing provides on-demand resources. Cloud services scale automatically.",
}
for fname, text in sample_texts.items():
(SAMPLE_DIR / fname).write_text(text * 50) # Make files large enough to matter
@cache.cache
def extract_features(filepath: str) -> dict:
"""Extract word frequency features from a text file (cached)."""
text = Path(filepath).read_text().lower()
words = re.findall(r'\b[a-z]{3,}\b', text)
top_words = dict(Counter(words).most_common(10))
time.sleep(0.3) # Simulate expensive NLP processing
return {"file": Path(filepath).name, "word_count": len(words), "top_words": top_words}
def run_pipeline(data_dir: Path) -> list[dict]:
files = [str(f) for f in data_dir.glob("*.txt")]
print(f"Processing {len(files)} files in parallel...")
start = time.perf_counter()
results = Parallel(n_jobs=-1, verbose=10)(
delayed(extract_features)(f) for f in files
)
elapsed = time.perf_counter() - start
print(f"Done in {elapsed:.2f}s")
return results
features = run_pipeline(SAMPLE_DIR)
for feat in features:
top3 = list(feat["top_words"].keys())[:3]
print(f" {feat['file']:15s} words={feat['word_count']:,} top={top3}")
Output (first run — cold cache):
Processing 5 files in parallel...
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 0.4s finished
Done in 0.41s
python.txt words=350 top=['python', 'language', 'high']
data.txt words=350 top=['data', 'science', 'analysis']
web.txt words=350 top=['web', 'development', 'html']
ai.txt words=350 top=['learning', 'machine', 'data']
cloud.txt words=350 top=['cloud', 'computing', 'services']
s from a file or a database, the cache becomes stale when that data changes. You are responsible for clearing the cache when upstream data changes, either by calling memory.clear(), by passing a version argument to the function, or by using a time-based expiry implemented in the function body.
How do I track progress in a long Parallel job?
Set verbose=10 (the maximum) in Parallel() to print a status line after each completed job, including elapsed time, estimated remaining time, and memory usage. For a progress bar, use the tqdm library: wrap the generator with tqdm(delayed(func)(x) for x in items, total=len(items)) -- Joblib will pull items from the tqdm-wrapped iterator and tqdm updates the bar as items are consumed.
Are there memory issues with Joblib on long-running jobs?
When using the loky backend with large return values, worker memory can accumulate if workers are reused across many batches. Set max_nbytes="10M" in Parallel() to use memory-mapped files for return values above 10 MB instead of pickle serialisation. To prevent worker memory from growing across restarts, set Parallel(n_jobs=4, max_nbytes=None) combined with periodic worker recycling using loky.get_reusable_executor(max_workers=4, reuse="kill_workers").
Conclusion
Joblib makes two of the most common performance problems in data pipelines trivially easy to solve: parallelising embarrassingly parallel loops with Parallel and delayed, and caching expensive deterministic computations with Memory. You have seen how to replace a for loop with a parallel equivalent in four lines, choose the right backend for CPU-bound versus IO-bound work, cache results to disk, and integrate both patterns with scikit-learn.
The natural extension of the feature extraction pipeline is to add a cache validation step that checks file modification timestamps, and to feed the extracted features directly into a scikit-learn pipeline with n_jobs=-1 cross-validation -- so both the feature extraction and the model evaluation run in parallel with full caching.
For the full Joblib reference including memory-mapped arrays, batch processing, and custom backends, see the official Joblib documentation.
Related Articles
Further Reading: For more details, see the Python HTTP client documentation.
Pro Tips for Building a Better Twitter Bot
1. Respect Rate Limits with Exponential Backoff
The Twitter API enforces strict rate limits. Instead of crashing when you hit one, implement exponential backoff to retry gracefully. Wrap your API calls in a retry function that doubles the wait time after each failed attempt, starting from 1 second up to a maximum of 64 seconds. This keeps your bot running reliably without getting your credentials revoked.
# rate_limit_handler.py
import time
import requests
def api_call_with_backoff(url, headers, max_retries=5):
wait_time = 1
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
wait_time = min(wait_time * 2, 64)
else:
response.raise_for_status()
raise Exception("Max retries exceeded")
Output:
Rate limited. Waiting 1s...
Rate limited. Waiting 2s...
{'data': [{'id': '1234567890', 'text': 'Hello world'}]}
2. Never Hardcode API Keys
Store your API credentials in environment variables or a .env file, never in your source code. If you accidentally push hardcoded keys to a public GitHub repo, bots will find and abuse them within minutes. Use the python-dotenv library to load credentials from a .env file that you add to your .gitignore.
# secure_credentials.py
import os
from dotenv import load_dotenv
load_dotenv()
BEARER_TOKEN = os.getenv("TWITTER_BEARER_TOKEN")
API_KEY = os.getenv("TWITTER_API_KEY")
API_SECRET = os.getenv("TWITTER_API_SECRET")
if not BEARER_TOKEN:
raise ValueError("TWITTER_BEARER_TOKEN not set in .env file")
3. Add Logging Instead of Print Statements
Replace print() calls with Python’s built-in logging module. Logging gives you timestamps, severity levels, and the ability to write to files — essential for debugging a bot that runs unattended. When your bot tweets something unexpected at 3 AM, logs are the only way to figure out what happened.
# bot_with_logging.py
import logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
logging.FileHandler("bot.log"),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
logger.info("Bot started successfully")
logger.warning("Approaching rate limit: 14/15 requests used")
logger.error("Failed to post tweet: 403 Forbidden")
Output:
2026-03-26 10:15:30 [INFO] Bot started successfully
2026-03-26 10:15:31 [WARNING] Approaching rate limit: 14/15 requests used
2026-03-26 10:15:32 [ERROR] Failed to post tweet: 403 Forbidden
4. Track Posted Content to Avoid Duplicates
Bots that post the same content repeatedly get flagged and suspended. Keep a simple record of what you have already tweeted using a JSON file or SQLite database. Before posting, check if the content has been posted before. This is especially important for news bots that might encounter the same story from multiple sources.
5. Use a Scheduler for Consistent Posting
Instead of running your bot in a loop with time.sleep(), use a proper scheduler like schedule or APScheduler. Schedulers handle timing more reliably, support cron-like expressions, and make it easy to run different tasks at different intervals. For production bots, consider using system-level scheduling with cron (Linux) or Task Scheduler (Windows).
Frequently Asked Questions
Can I still build a Twitter bot with the API?
Yes, but access has changed. The free tier of the X (formerly Twitter) API v2 allows basic posting. For reading tweets or higher volume, you need a paid plan. Check current pricing at developer.x.com.
What Python library should I use for the Twitter/X API?
Use tweepy for the most mature Python wrapper with v2 API support. It handles OAuth 2.0 authentication, rate limiting, and provides clean methods for posting, searching, and streaming.
How do I authenticate with the Twitter API v2?
Use OAuth 2.0 Bearer Token for read-only access or OAuth 1.0a for posting. Generate credentials in the X Developer Portal, then pass them to tweepy.Client().
What are the rate limits for the Twitter API?
Rate limits vary by endpoint and plan. The free tier allows 1,500 tweets per month. Always implement rate limit handling with tweepy’s wait_on_rate_limit=True.
What can a Twitter bot do?
Bots can auto-post content, reply to mentions, retweet by keyword, track hashtags, analyze sentiment, and provide automated responses. Always follow the X API terms of service.
Hey,
Thank you so much! I have tried sample codes from other tutorials, including twitter API documentation and none of that really worked. Your code works nice, thank you really.
David
Thanks for the feedback, glad it was helpful.