Last Updated: June 01, 2026
Beginner
Twitter Bots can be super useful to help automate some of the interactions on social media in order to build and grow engagement but also automate some tasks. There has been many changes on the twitter developer account and sometimes it’s uncertain how to even create a tweet bot. This article will walk through step bey step on how to create a twitter bot with the latest Twitter API v2 and also provide some code you can copy and paste in your next project. We also end with how to create a more useful bot that can post some articles about python automatically.
In a nutshell, how a twitter bot works is that you will need to run your code for a twitter bot in your own compute that can be triggered from a Twitter webhook (not covered) which is called by twitter based on a given event, or by having your program run periodically to read and send tweets (covered in this article). Either way, there are some commonalities and in this article we will walk through how to read tweets, and then to send tweets which are from google news related to python!
Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.
Step 1: Sign up for Developer program
If you haven’t already you will need to either sign in or sign up for a twitter account through twitter.com. Make sure your twitter account has an email address allocated to it (if you’re not aware, you can create a twitter account with just your mobile phone number)

Next go to developer.twitter.com and sign up for the developer program (yes, you need to sign up for a second time). This enables you to create applications.

First you’ll need to answer some questions on purpose of the developer account. You can chose “Make a Bot”

Next you will need to agree to the terms and conditions, and then a verification email will be sent to your email address from your twitter account.
When you click on the email to verify your account, you can then enter your app name. This is an internal name and something that will make it easy for you to reference.

Once you click on keys, you will then be given a set of security token keys like below. Please copy them in a safe place as your python code will need to use them to access your specific bot. If you do lose your keys, or someone gets access to them for some reason, you can generate new keys from your developer.twitter.com console.
There are two keys which you will need to use:
- API Key (think of this like a username)
- API Key Secret (think of this like a password)
- Bearer Token (used for read queries such as getting latest tweets)
There is also a third key, a Bearer Token, but this you can ignore. It is for certain types of requests

At the bottom of the screen you’ll see a “Skip to Dashboard”, when you click on that you’ll then see the overview of your API metrics.
Within this screen you can see the limits of the number of calls per month for example and how much you have already consumed.

Next, click on the project and we have to generate the access tokens. Currently with the previous keys you can only read tweets, you cannot create ones as yet.
After clicking on the project, chose the “keys and tokens” tab and at the bottom you can generate the “Access Tokens”. In this screen you can also re-generate the API Keys and Bearer Token you just created before in case your keys were compromised or you forgot them.

Just like before, generate the keys and copy them.

By now, you have 5 security toknes:
- API Key – also known as the Consumer Key (think of this like a username)
- API Key Secret – also known as the Consumer Secret (think of this like a password)
- Bearer Token (used for read queries such as getting latest tweets)
- Access Token (‘username’ to allow you to create tweets)
- Access Token Secret (‘password’ to allow you to create tweets)
Step 2: Test your twitter API query
Now that you have the API keys, you can do some tests. If you are using a linux based machine you can use the curl command to do a query. Otherwise, you can use a site such as https://reqbin.com/curl to do an online curl request.
Here’s a simple example to get the most recent tweets. It uses the API https://api.twitter.com/2/tweets/search/recent which must include the query keyword which includes a range of parameter options (find out the list in the twitter query documentation).
curl --request GET 'https://api.twitter.com/2/tweets/search/recent?query=from:pythonhowtocode' --header 'Authorization: Bearer <your bearer token from step 1>'
The output is as follows:
{
"data": [{
"id": "1523251860110405633",
"text": "See our latest article on THE complete beginner guide on creating a #discord #bot in #python \n\nEasily add this to your #100DaysOfCode #100daysofcodechallenge #100daysofpython \n\nhttps://t.co/4WKvDVh1g9"
}],
"meta": {
"newest_id": "1523251860110405633",
"oldest_id": "1523251860110405633",
"result_count": 1
}
}
Here’s a much more complex example. This includes the following parameters:
%23– which is the escape characters for#and searches for hashtags. Below example is hashtag#python(case insensitive)%20– this is an escape character for a space and separates different filters with anANDoperation-is:retweet– this excludes retweets. The ‘-‘ sign preceding theisnegates the actual filter-is:reply– this excludes replies. The ‘-‘ sign preceding theisnegates the actual filtermax_results=20– an integer that defines the maximum number of return results and in this case 20 resultsexpansions=author_id– this makes sure to include the username internal twitter id and also the actual username under anincludessection at the bottom of the returned JSONtweet.fields=public_metrics,created_at– returns the interaction metrics such as number of likes, number of retweets, etc as well as the time (in GMT timezone) when the tweet was createduser.fields=created_at,location– this returns when the user account was created and the user self-reported location in their profile.
curl --request GET 'https://api.twitter.com/2/tweets/search/recent?query=%23python%20-is:retweet%20-is:reply&max_results=20&expansions=author_id&tweet.fields=public_metrics,created_at&user.fields=created_at,location' --header 'Authorization: Bearer <Your Bearer Token from Step 1>'
Result of this looks like the following – notice that the username details is in the includes section below where you can link the tweet with the username with the author_id field.
{{
"data": [{
"id": "1523688996676812800",
"text": "NEED a #JOB?\nSign up now https://t.co/o7lVlsl75X\nFREE. NO MIDDLEMEN\n#Jobs #AI #DataAnalytics #MachineLearning #Python #JavaScript #WomenWhoCode #Programming #Coding #100DaysofCode #DEVCommunity #gamedev #gamedevelopment #indiedev #IndieGameDev #Mobile #gamers #RHOP #BTC #ETH #SOL https://t.co/kMYD2417jR",
"author_id": "1332714745871421443",
"public_metrics": {
"retweet_count": 3,
"reply_count": 0,
"like_count": 0,
"quote_count": 0
},
"created_at": "2022-05-09T15:39:00.000Z"
},
....
}],
"includes": {
"users": [{
"name": "Job Preference",
"id": "1332714745871421443",
"username": "JobPreference",
"created_at": "2020-11-28T15:56:01.000Z"
},
....
}
Step 3: Reading tweets with python code
Building on top of the tests conducted on Step 2, it is a simple extra step in order to convert this to python code using the requests module which we’ll show first and after show a simpler way with the library tweepy. You can simply use the library to convert the curl command into a bit of python code. Here’s a structured version of this code where the logic is encapsulated in a class.
import requests, json
from urllib.parse import quote
from pprint import pprint
class TwitterBot():
URL_SEARCH_RECENT = 'https://api.twitter.com/2/tweets/search/recent'
def __init__(self, bearer_key):
self.bearer_key = bearer_key
def search_recent(self, query, include_retweets=False, include_replies=False):
url = self.URL_SEARCH_RECENT + "?query=" + quote(query)
if not include_retweets: url += quote(' ')+'-is:retweet'
if not include_replies: url += quote(' ')+'-is:reply'
url += '&max_results=20&expansions=author_id&tweet.fields=public_metrics,created_at&user.fields=created_at,location'
headers = {'Authorization': 'Bearer ' + self.bearer_key }
r = requests.get(url, headers = headers)
r.encoding = r.apparent_encoding. #Ensure to use UTF-8 if unicode characters
return json.loads(r.text)
#create an instance and pass in your Bearer Token
t = TwitterBot('<Insert your Bearer Token from Step 1>')
pprint( t.search_recent( '#python') )
The above code is fairly straightforward and does the following:
TwitterBot class– this class encapsulates the logic to send the API requestsTwitterBot.search_recent– this method takes in the query string, then escapes any special characters, then calls therequests.get()to call thehttps://api.twitter.com/2/tweets/search/recentAPI callpprint()– this simply prints the output in a more readable format
This is the output:


However, there is a simpler way which is to use tweepy.
pip install tweepy
Next you can use the tweepy module to search recent tweets:
import tweepy
client = tweepy.Client(bearer_token='<insert your token here from previous step>')
query = '#python -is:retweet -is:reply' #exclude retweets and replies with '-'
tweets = client.search_recent_tweets( query=query,
tweet_fields=['public_metrics', 'context_annotations', 'created_at'],
user_fields=['username','created_at','location'],
expansions=['entities.mentions.username','author_id'],
max_results=10)
#The details of the users is in the 'includes' list
user_data = {}
for raw_user in tweets.includes['users']:
user_data[ raw_user.id ] = raw_user
for index, tweet in enumerate(tweets.data):
print(f"[{index}]::@{user_data[tweet.author_id]['username']}::{tweet.created_at}::{tweet.text.strip()}\n")
print("------------------------------------------------------------------------------")
Output as follows:

Please note, that after calling the API a few times your number of tweets consumed will have increased and may have hit the limit. You can always visit the dashboard at https://developer.twitter.com/en/portal/dashboard to see how many requests have been consumed. Notice, that this does not count the number of actual API calls but the actual number of tweets. So it can get consumed pretty quickly.

Step 4: Sending out a tweet
So far we’ve only been reading tweets. In order to send a tweet you can use the create_tweet() function of tweepy.
client = tweepy.Client( consumer_key= "<API key from above - see step 1>",
consumer_secret= "<API Key secret - see step 1>",
access_token= "<Access Token - see step 1>",
access_token_secret= "<Access Token Secret - see step 1>")
# Replace the text with whatever you want to Tweet about
response = client.create_tweet(text='A little girl walks into a pet shop and asks for a bunny. The worker says” the fluffy white one or the fluffy brown one”? The girl then says, I don’t think my python really cares.')
print(response)
Output from Console:

Output from Twitter:

How to Send Automated Tweets About the Latest News
To make this a bit more of a useful bot rather than simply tweet out static text, we’ll make it tweet about the latest things happened in the news about python.
In order to search for news information, you can use the python library pygooglenews
pip install pygooglenews
The library searches Google news RSS feed and was developed by Artem Bugara. You can see the full article of he developed the Google News library. You can put in a keyword and also time horizon to make it work. Here’s an example to find the latest python articles in last 24 hours.
from pygooglenews import GoogleNews
gn = GoogleNews()
search = gn.search('python programming', when = '12h')
for article in search['entries']:
print(article.title)
print(article.published)
print(article.source.title)
print('-'*80) #string multiplier - show '-' 80 times
Here’s the output:
So, the idea would be to show a random article on the twitter bot which is related to python programming. The gn.search() functions returns a list of all the articles under the entries dictionary item which has a list of those articles. We will simply pick a random one and construct the tweet with the article title and the link to the article.
import tweepy
from pygooglenews import GoogleNews
from random import randint
client = tweepy.Client( consumer_key= "<your consumer/API key - see step 1>",
consumer_secret= "<your consumer/API secret - see step 1>",
access_token= "<your access token key - see step 1>",
access_token_secret= "<your access token secret - see step 1>")
gn = GoogleNews()
search = gn.search('python programming', when = '24h')
#Find random article in last 24 hours using randint between index 0 and the last index
article = search['entries'][ randint( 0, len( search['entries'])-1 ) ]
#construct the tweet text
tweet_text = f"In python news: {article.title}. See full article: {article.link}. #python #pythonprogramming"
#Fire off the tweet!
response = client.create_tweet( tweet_text )
print(response)
Output from the console on the return result:

And, most importantly, here’s the tweet from our @pythonhowtocode! Twitter automatically pulled the article image

This has currently been scheduled as a daily background job!
How To Use Python memray for Memory Profiling
Intermediate
Your Python service passes all tests and runs fine in development — then hits production and balloons to 4 GB of RAM. You restart it, it climbs again, and now you have a memory leak you cannot locate. You add a few print(sys.getsizeof(...)) calls, but they only measure individual objects, not the full allocation tree. You try the standard tracemalloc module and get a list of the top 10 allocations with no call-stack context. The problem could be anywhere across 50 modules and 300 functions.
memray is a memory profiler for Python developed by Bloomberg Engineering. It instruments every memory allocation and deallocation in your program — including C extensions and native code — and records a full call stack for each one. After a run you get a flame graph showing exactly which call path is responsible for each byte: not just which object, but which function called which function that eventually caused the allocation. It supports command-line profiling, pytest integration, and a live tracking mode so you can watch allocations happen in real time. Install it with pip install memray.
In this article we will cover how memray works and how it differs from tracemalloc, how to profile a script from the command line, how to read the flame graph and table reports, how to track live allocations, how to use the pytest-memray plugin to add memory limits to tests, how to profile specific code blocks with the Python API, and how to interpret results to find and fix real leaks. By the end you will have a complete toolkit for diagnosing memory problems in any Python application.
memray Quick Example: Finding a Memory Hog in 30 Seconds
Here is the minimum setup to profile a script and open the flame graph. First, create a script that has an obvious memory problem:
# leaky_script.py
def load_big_list():
return [i * 2 for i in range(5_000_000)]
def process(data):
# Creates a second copy -- doubles memory
return [str(x) for x in data]
if __name__ == "__main__":
nums = load_big_list()
strs = process(nums)
print(f"Processed {len(strs)} items")
Run it under memray from the terminal:
# run_memray.sh
python -m memray run leaky_script.py
Writing profile results into memray-leaky_script.py.1234.bin
[memray] Successfully generated profile results.
Run id: 1234
Command line: leaky_script.py
Start time: 2026-06-11 09:00:00.000
End time: 2026-06-11 09:00:02.345
Duration: 2.345 seconds
Total allocations: 5,000,132
Total memory allocated: 312.4 MB
Peak memory usage: 271.2 MB
Then generate the flame graph report and open it in a browser:
# generate_flamegraph.sh
python -m memray flamegraph memray-leaky_script.py.1234.bin
Wrote flamegraph to memray-flamegraph-leaky_script.py.1234.html
The flame graph shows two large bars: load_big_list responsible for ~150 MB (the integer list) and process responsible for ~162 MB (the string list). The call path is clear — both functions allocate heavily and neither releases until the script exits. The sections below cover the full memray toolkit.
What Is memray and How Does It Work?
memray is a deterministic memory profiler. Unlike sampling profilers that check memory usage at intervals, memray intercepts every call to the Python allocator (and to malloc/free in C extensions) and records the exact call stack at the moment of each allocation. This gives complete, lossless data rather than a statistical sample.
Python already ships with tracemalloc, which also tracks allocations. The key differences are scope and output. tracemalloc only tracks Python-level allocations and presents flat top-N lists. memray tracks both Python and native (C) allocations, records the full call chain, and produces interactive flame graphs, timeline views, and summary tables that show not just what allocated memory but the entire path through your code that led to the allocation.
| Feature | memray | tracemalloc | memory_profiler |
|---|---|---|---|
| Tracks C extension allocations | Yes | No | No |
| Full call stack per allocation | Yes | Partial | No |
| Flame graph output | Yes (HTML) | No | No |
| Live tracking mode | Yes | No | No |
| pytest integration | Yes (pytest-memray) | No | No |
| Performance overhead | Moderate (2-5x) | Low | High (line-by-line) |
| Platform support | Linux, macOS | All | All |
memray is built on Linux’s LD_PRELOAD mechanism and macOS’s interpose feature to hook the allocator at the C level. This is why it works on Linux and macOS but not Windows. It writes a compact binary trace file that you then convert to reports using the memray CLI.

Installing memray
Install memray with pip. It requires Python 3.7+ and works on Linux and macOS (not Windows):
# install_memray.sh
pip install memray
Successfully installed memray-1.13.0
Verify the installation:
# verify_memray.sh
python -m memray --version
memray, version 1.13.0
To use the pytest integration, also install the plugin:
# install_pytest_memray.sh
pip install pytest-memray
Successfully installed pytest-memray-1.6.0
Command-Line Profiling
The simplest way to profile a script is the memray run subcommand. It runs your script and writes a binary trace to a .bin file in the current directory:
# profile_script.sh
python -m memray run my_script.py
Writing profile results into memray-my_script.py.9821.bin
You can also profile a module with -m, exactly like running Python normally:
# profile_module.sh
python -m memray run -m pytest tests/
Writing profile results into memray-run-pytest.9955.bin
Key flags for memray run:
# memray_run_flags.sh
# Custom output file
python -m memray run -o my_profile.bin leaky_script.py
# Track only native (C) allocations as well
python -m memray run --native leaky_script.py
# Compress the output file
python -m memray run --compress-on-exit leaky_script.py
# Set a specific memory limit (kills process if exceeded)
python -m memray run --memory-limit 500MB leaky_script.py
The --native flag adds C extension allocations to the trace. Use it when you suspect NumPy, Pandas, or other native extensions are the source of a leak — without it, memray only sees Python-level allocations from those libraries.

Reading the Reports: Flame Graph and Stats Table
memray produces several report types from the same .bin trace file. The flame graph is the most useful for finding the source of large allocations:
# generate_reports.sh
# Flame graph (HTML -- open in browser)
python -m memray flamegraph memray-leaky_script.py.1234.bin
# Output: memray-flamegraph-leaky_script.py.1234.html
# Summary table (terminal output)
python -m memray stats memray-leaky_script.py.1234.bin
# Tree view (allocations as a call tree)
python -m memray tree memray-leaky_script.py.1234.bin
The stats command gives a quick summary directly in the terminal without opening a browser:
# stats_output.txt
---- Top 10 allocations by size ----
1) size=162.4 MB, allocated in process (leaky_script.py:7)
-> process (leaky_script.py:8)
2) size=150.1 MB, allocated in load_big_list (leaky_script.py:2)
-> __main__ (leaky_script.py:11)
3) size=1.2 MB, allocated in _bootstrap (importlib._bootstrap:1)
-> ...
Total allocations: 5,000,132
Total memory allocated: 312.4 MB
Peak memory: 271.2 MB
In the flame graph HTML, each box represents a function. The width of the box is proportional to the amount of memory allocated by that function and all its callees. Click a box to zoom in. The call path reads top-to-bottom: the widest box at the bottom is usually your entry point (__main__), and the widest box at the top is the function doing the most allocation. Use the “Show only allocations” toggle to filter out functions that only pass memory through without allocating.
Live Tracking Mode
Instead of profiling a full run and analyzing afterward, live mode streams allocations to a terminal UI in real time. This is especially useful for long-running servers or scripts where you want to watch memory grow and correlate it with specific operations:
# live_tracking.sh
python -m memray run --live leaky_script.py
Allocation Location Size Count
---------------------------------------------------------------------------
leaky_script.py:2 148.2 MB 5,000,000
leaky_script.py:7 122.1 MB 3,892,451
list_to_str leaky_script.py:8 8.4 MB 108,441
...
Peak memory: 271.2 MB Current: 249.8 MB [q]uit [r]eset
The live view updates every 0.1 seconds. Press q to stop the run early. The --live-port flag lets you connect a second terminal to the same live stream, which is useful for profiling a server process without interrupting it:
# live_remote.sh
# In terminal 1 -- start the server with live tracking
python -m memray run --live-port 5001 server.py
# In terminal 2 -- attach the viewer
python -m memray live 5001

Using the Python API for Targeted Profiling
If you only want to profile a specific section of a larger application — not the whole program — use memray’s Python context manager. This avoids noise from startup, shutdown, and unrelated code paths:
# targeted_profiling.py
import memray
def build_index(documents):
"""Build an inverted index from a list of documents."""
index = {}
for doc_id, text in enumerate(documents):
for word in text.lower().split():
if word not in index:
index[word] = []
index[word].append(doc_id)
return index
def search(index, query):
"""Return document IDs matching all query terms."""
terms = query.lower().split()
results = set(index.get(terms[0], []))
for term in terms[1:]:
results &= set(index.get(term, []))
return list(results)
# Sample data
docs = [
"Python memory profiling with memray",
"How to find memory leaks in Python",
"memray flame graph tutorial",
] * 10_000
# Profile only the index build -- not the search or data setup
with memray.Tracker("index_build.bin"):
index = build_index(docs)
# Analysis later:
# python -m memray flamegraph index_build.bin
print(f"Index built: {len(index)} unique terms")
print(f"Search results: {search(index, 'python memory')}")
Index built: 12 unique terms
Search results: [0, 1, 2, 3, ...]
The memray.Tracker context manager starts recording on entry and writes the .bin file on exit. You can also add native=True to catch C allocations: memray.Tracker("profile.bin", native=True). Use the targeted profiling approach in production services where you cannot afford to instrument the entire process — wrap only the suspicious function or request handler.
Testing Memory Usage with pytest-memray
pytest-memray integrates memray into your test suite. Run your tests with memory profiling and optionally enforce per-test memory limits:
# test_memory_limits.py
import pytest
def build_report(n_rows):
"""Build a report dict with n_rows entries."""
return {f"row_{i}": {"value": i, "label": f"Item {i}"} for i in range(n_rows)}
# This test will fail if it allocates more than 50 MB
@pytest.mark.limit_memory("50 MB")
def test_small_report_memory():
report = build_report(100_000)
assert len(report) == 100_000
# This test passes -- no limit set, just profiled
def test_large_report_memory():
report = build_report(1_000_000)
assert len(report) == 1_000_000
Run with the --memray flag to enable profiling:
# run_memray_tests.sh
pytest tests/test_memory_limits.py --memray
FAILED test_memory_limits.py::test_small_report_memory - Failed: Test was
limited to 50.0MB but allocated 89.3MB
PASSED test_memory_limits.py::test_large_report_memory
- Total memory allocated: 421.7MB
========== 1 failed, 1 passed in 2.34s ==========
The @pytest.mark.limit_memory("50 MB") decorator sets a hard ceiling. If the test allocates more than the limit, it fails with a clear message showing the actual allocation. Add this marker to any function that processes large data structures in a tight loop — it turns memory regressions into CI failures instead of production surprises. You can also pass --memray-bin-path=./profiles/ to save trace files from all tests for post-run analysis.

Identifying and Fixing Memory Leaks
A genuine memory leak in Python is usually one of three things: a growing container that is never cleared, a reference cycle that the garbage collector cannot break (often involving __del__ methods), or a native extension that leaks memory at the C level. memray’s flame graph makes all three visible.
Here is a realistic example of a container-based leak and how memray exposes it:
# cache_leak.py
import memray
# Global cache that is never evicted
_query_cache = {}
def expensive_query(key):
"""Simulates a database query with a result cache."""
if key not in _query_cache:
# Caches a 10KB result for every unique key -- forever
_query_cache[key] = b"x" * 10_240
return _query_cache[key]
def handle_requests(n):
"""Simulates n incoming requests with unique keys."""
for i in range(n):
result = expensive_query(f"user:{i}:profile")
return len(_query_cache)
with memray.Tracker("cache_leak.bin"):
total = handle_requests(5_000)
print(f"Cache size after run: {total} entries")
# python -m memray flamegraph cache_leak.bin
# Flame graph will show _query_cache holding ~50 MB with no deallocation path
Cache size after run: 5000 entries
When you open the flame graph, expensive_query will show a wide bar with a path leading to dict.__setitem__ — the cache assignment. Since there is no eviction, all 50 MB stays live until the process exits. The fix is to bound the cache with functools.lru_cache or cachetools.LRUCache. After fixing, run the same profile and verify the peak memory drops dramatically.
Real-Life Example: Profiling a Data Processing Pipeline
Here is a realistic data pipeline that reads a large dataset and produces an aggregate report. We will use memray to identify which stage uses the most memory and then refactor to reduce the peak.
# data_pipeline.py
import memray
import csv
import io
import random
# --- Generate sample CSV data in memory ---
def make_sample_csv(n_rows=500_000):
buf = io.StringIO()
writer = csv.writer(buf)
writer.writerow(["user_id", "product_id", "amount", "category"])
categories = ["electronics", "clothing", "food", "books", "sports"]
for i in range(n_rows):
writer.writerow([
f"user_{i % 10_000}",
f"prod_{random.randint(1, 1000)}",
round(random.uniform(1, 500), 2),
random.choice(categories),
])
buf.seek(0)
return buf
# --- Stage 1: Load everything into memory (naive approach) ---
def load_all_rows(csv_buf):
reader = csv.DictReader(csv_buf)
return list(reader) # entire dataset in a list of dicts
# --- Stage 2: Aggregate totals by category ---
def aggregate_by_category(rows):
totals = {}
for row in rows:
cat = row["category"]
amt = float(row["amount"])
if cat not in totals:
totals[cat] = {"count": 0, "total": 0.0}
totals[cat]["count"] += 1
totals[cat]["total"] += amt
return totals
with memray.Tracker("pipeline.bin"):
csv_data = make_sample_csv()
rows = load_all_rows(csv_data) # Stage 1
report = aggregate_by_category(rows) # Stage 2
for cat, stats in sorted(report.items()):
avg = stats["total"] / stats["count"]
print(f"{cat:12s}: {stats['count']:6d} orders, avg ${avg:.2f}")
books : 99812 orders, avg $250.33
clothing : 100203 orders, avg $249.88
electronics : 99876 orders, avg $250.14
food : 100442 orders, avg $249.71
sports : 99667 orders, avg $249.92
Running python -m memray flamegraph pipeline.bin will show load_all_rows responsible for the majority of peak memory — all 500,000 rows are held in a list of dicts at the same time. The fix is to stream the CSV row-by-row instead of loading it all at once. Replace load_all_rows with a streaming aggregator and the peak memory drops from ~200 MB to ~2 MB, because only one row is ever in memory at a time. This is the memray workflow in practice: profile, identify the stage, refactor, re-profile to confirm the improvement.

Frequently Asked Questions
Does memray work on Windows?
No. memray uses Linux’s LD_PRELOAD and macOS’s interpose mechanism to hook the allocator, neither of which exists on Windows. On Windows, consider using tracemalloc for Python allocations or Fil (also open-source) if you need C extension tracking. If you develop on Windows and deploy to Linux, you can run memray via WSL2 or in a Docker container for profiling purposes while keeping your main development on Windows.
How much does memray slow down my code?
Expect 2x to 5x slowdown in programs that allocate heavily. Programs that allocate infrequently (mostly numeric computation on pre-allocated arrays) may see only 10-20% overhead. memray is not designed for production use — run it in a staging or development environment. If you need in-production memory monitoring, use a metrics approach (periodic psutil.Process().memory_info().rss readings) rather than a deterministic profiler.
Does memray track garbage-collected objects?
memray tracks allocations and deallocations at the allocator level, which includes objects collected by Python’s cyclic garbage collector. When gc.collect() frees a cycle, memray records those deallocations. You can see “temporary” allocations (objects allocated and freed within the profiled window) by using the --show-temporary-allocations flag with the flame graph command. This is useful for diagnosing churn — code that creates and throws away millions of short-lived objects, driving CPU time in the allocator even if peak memory looks normal.
Does memray work with async code and FastAPI/aiohttp?
Yes. Since memray hooks the allocator at the C level, it is transparent to Python’s async machinery. Wrap your ASGI/WSGI app with memray.Tracker for a fixed profiling window, or use memray run --live to watch allocations as requests come in. For per-request profiling in FastAPI, add a middleware that starts a Tracker context at request start and stops it at response end, writing one trace file per request to a temp directory.
Why does memray show less memory than Task Manager for my NumPy script?
NumPy allocates memory through its own internal pools which may not map 1:1 to Python allocator calls. Use the --native flag (python -m memray run --native script.py) to also track C-level allocations including NumPy’s internal pools. Without --native, memray only sees the Python-side wrapper objects, which are much smaller than the actual array data stored in native memory.
Can I use memray inside Docker?
Yes, with one requirement: the container must have SYS_PTRACE capability to allow native tracing. Add --cap-add SYS_PTRACE to your docker run command or add cap_add: [SYS_PTRACE] to your docker-compose.yml service. If you only need Python-level profiling (not --native), the capability is not required. For Kubernetes deployments, add capabilities.add: ["SYS_PTRACE"] to the container’s securityContext.
Conclusion
memray turns memory debugging from a guessing game into a structured investigation. Run python -m memray run script.py to capture the full allocation trace, generate a flame graph with python -m memray flamegraph *.bin, and follow the widest call paths down to the function doing the actual allocating. The Python API’s memray.Tracker context manager lets you surgically profile one subsystem without the noise of a full run, and pytest-memray prevents memory regressions from reaching production by turning allocation spikes into CI failures.
The real-life pipeline example shows the workflow end to end: profile, read the flame graph, refactor the offending stage, re-profile to confirm the improvement. Try extending it by adding a streaming version of load_all_rows using a generator, re-running the profile, and comparing the two flame graphs side by side. The official documentation at bloomberg.github.io/memray covers advanced topics including attaching to running processes, the timeline view, and custom reporters.
Related Articles
Further Reading: For more details, see the Python HTTP client documentation.
Pro Tips for Building a Better Twitter Bot
1. Respect Rate Limits with Exponential Backoff
The Twitter API enforces strict rate limits. Instead of crashing when you hit one, implement exponential backoff to retry gracefully. Wrap your API calls in a retry function that doubles the wait time after each failed attempt, starting from 1 second up to a maximum of 64 seconds. This keeps your bot running reliably without getting your credentials revoked.
# rate_limit_handler.py
import time
import requests
def api_call_with_backoff(url, headers, max_retries=5):
wait_time = 1
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
wait_time = min(wait_time * 2, 64)
else:
response.raise_for_status()
raise Exception("Max retries exceeded")
Output:
Rate limited. Waiting 1s...
Rate limited. Waiting 2s...
{'data': [{'id': '1234567890', 'text': 'Hello world'}]}
2. Never Hardcode API Keys
Store your API credentials in environment variables or a .env file, never in your source code. If you accidentally push hardcoded keys to a public GitHub repo, bots will find and abuse them within minutes. Use the python-dotenv library to load credentials from a .env file that you add to your .gitignore.
# secure_credentials.py
import os
from dotenv import load_dotenv
load_dotenv()
BEARER_TOKEN = os.getenv("TWITTER_BEARER_TOKEN")
API_KEY = os.getenv("TWITTER_API_KEY")
API_SECRET = os.getenv("TWITTER_API_SECRET")
if not BEARER_TOKEN:
raise ValueError("TWITTER_BEARER_TOKEN not set in .env file")
3. Add Logging Instead of Print Statements
Replace print() calls with Python’s built-in logging module. Logging gives you timestamps, severity levels, and the ability to write to files — essential for debugging a bot that runs unattended. When your bot tweets something unexpected at 3 AM, logs are the only way to figure out what happened.
# bot_with_logging.py
import logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
logging.FileHandler("bot.log"),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
logger.info("Bot started successfully")
logger.warning("Approaching rate limit: 14/15 requests used")
logger.error("Failed to post tweet: 403 Forbidden")
Output:
2026-03-26 10:15:30 [INFO] Bot started successfully
2026-03-26 10:15:31 [WARNING] Approaching rate limit: 14/15 requests used
2026-03-26 10:15:32 [ERROR] Failed to post tweet: 403 Forbidden
4. Track Posted Content to Avoid Duplicates
Bots that post the same content repeatedly get flagged and suspended. Keep a simple record of what you have already tweeted using a JSON file or SQLite database. Before posting, check if the content has been posted before. This is especially important for news bots that might encounter the same story from multiple sources.
5. Use a Scheduler for Consistent Posting
Instead of running your bot in a loop with time.sleep(), use a proper scheduler like schedule or APScheduler. Schedulers handle timing more reliably, support cron-like expressions, and make it easy to run different tasks at different intervals. For production bots, consider using system-level scheduling with cron (Linux) or Task Scheduler (Windows).
Frequently Asked Questions
Can I still build a Twitter bot with the API?
Yes, but access has changed. The free tier of the X (formerly Twitter) API v2 allows basic posting. For reading tweets or higher volume, you need a paid plan. Check current pricing at developer.x.com.
What Python library should I use for the Twitter/X API?
Use tweepy for the most mature Python wrapper with v2 API support. It handles OAuth 2.0 authentication, rate limiting, and provides clean methods for posting, searching, and streaming.
How do I authenticate with the Twitter API v2?
Use OAuth 2.0 Bearer Token for read-only access or OAuth 1.0a for posting. Generate credentials in the X Developer Portal, then pass them to tweepy.Client().
What are the rate limits for the Twitter API?
Rate limits vary by endpoint and plan. The free tier allows 1,500 tweets per month. Always implement rate limit handling with tweepy’s wait_on_rate_limit=True.
What can a Twitter bot do?
Bots can auto-post content, reply to mentions, retweet by keyword, track hashtags, analyze sentiment, and provide automated responses. Always follow the X API terms of service.
Related Articles
- How To Build a Discord Bot with Python
- How To Handle API Rate Limits in Python
- How To Use Python Requests for REST APIs
Continue Learning Python
Tutorials you might also find useful:
Hey,
Thank you so much! I have tried sample codes from other tutorials, including twitter API documentation and none of that really worked. Your code works nice, thank you really.
David
Thanks for the feedback, glad it was helpful.