Beginner
Introduction: Unlocking AI with Python
The OpenAI API brings powerful language models directly into your Python applications. Whether you’re building chatbots, automating content creation, analyzing text, or generating embeddings, the official OpenAI Python SDK makes integration straightforward and intuitive. In this guide, we’ll explore everything from basic chat completions to advanced features like function calling and vision capabilities, complete with production-ready examples you can deploy immediately.
The modern AI landscape has democratized access to sophisticated language models. What once required significant ML expertise now takes just a few lines of Python. The OpenAI API currently powers applications used by millions of developers worldwide, and with the latest Python SDK (v1.0+), the experience is more elegant and Pythonic than ever. You’ll gain the skills to harness models like GPT-4o, GPT-4o-mini, and GPT-3.5-turbo in your projects.
By the end of this tutorial, you’ll understand how to initialize the client, handle authentication, construct effective prompts, stream responses for real-time interaction, invoke external tools through function calling, process images, generate embeddings, and implement robust error handling. We’ll also examine a complete CLI chatbot implementation that demonstrates conversation history management.
Quick Example: Your First API Call
Let’s get straight to it. Here’s a minimal example that demonstrates the power of the OpenAI API. This script creates a single chat completion request and displays the model’s response. It assumes your OPENAI_API_KEY environment variable is set:
# quick_chat.py
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain recursion in 20 words."}]
)
print(response.choices[0].message.content)
Output:
Recursion is a function calling itself to solve smaller versions of the same problem until reaching a base case.
The OpenAI() client automatically reads your API key from the environment, constructs a message, sends it to the model, and returns a structured response. The choices array contains the model’s completions, and message.content is the actual text response.
What Is the OpenAI API?
The OpenAI API is a REST interface that gives you programmatic access to OpenAI’s language models. Rather than using the web interface, you call the API from your application. The official Python SDK wraps this REST API, handling authentication, request formatting, and response parsing automatically.
OpenAI offers multiple models optimized for different use cases:
| Model | Best For | Context Window | Relative Cost | Speed |
|---|---|---|---|---|
| gpt-4o | Complex reasoning, multimodal, production | 128K tokens | Higher | Moderate |
| gpt-4o-mini | Fast, cost-effective, high volume | 128K tokens | Low | Fast |
| gpt-3.5-turbo | Legacy applications | 4K tokens | Very Low | Fastest |
For most new projects, we recommend gpt-4o-mini as your starting point. The API also supports embeddings, audio transcription, image generation, and fine-tuning.

Installing the OpenAI Python SDK
The official OpenAI Python SDK is available on PyPI. We recommend installing within a virtual environment:
# install_openai.sh
$ python3 -m venv openai_env
$ source openai_env/bin/activate
$ pip install openai
Output:
Successfully installed openai-1.30.0
Verify the installation:
# verify_openai.py
import openai
print(f"OpenAI SDK version: {openai.__version__}")
Output:
OpenAI SDK version: 1.30.0
The SDK requires Python 3.7 or higher.
Setting Up Your API Key
Every request requires authentication via an API key. Create one at platform.openai.com/api-keys. Never hardcode your key in source code. Use environment variables instead:
# setup_env.sh
$ export OPENAI_API_KEY="sk-proj-your-actual-key-here"
The OpenAI() client automatically reads this environment variable:
# client_init.py
from openai import OpenAI
client = OpenAI() # Reads OPENAI_API_KEY from environment
print("Client initialized successfully")
Output:
Client initialized successfully

Chat Completions: The Core API
Chat completions are the foundation of most OpenAI applications. You send a list of messages and the model generates a completion:
# chat_basic.py
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "What are three benefits of Python for data science?"}
],
max_tokens=200,
temperature=0.7
)
print(response.choices[0].message.content)
Output:
1. Rich Ecosystem: Libraries like pandas, NumPy, and scikit-learn provide comprehensive tools.
2. Ease of Learning: Python's readable syntax lets data scientists focus on algorithms.
3. Community and Integration: Strong community support and seamless production integration.
Key parameters: model specifies which model, messages is the conversation, max_tokens limits response length, and temperature controls randomness (0.7 is a good default).
System Messages and Conversation Roles
System messages set the assistant’s behavior and personality. Every conversation should begin with one:
# system_messages.py
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful Python tutor. Keep responses under 150 words."},
{"role": "user", "content": "What is a list comprehension?"}
],
temperature=0.5
)
print(response.choices[0].message.content)
Output:
A list comprehension is a concise way to create lists in Python:
squares = [x ** 2 for x in range(5)] # [0, 1, 4, 9, 16]
Messages have three roles: system (instructions), user (human input), and assistant (model responses). Store messages in a list to maintain multi-turn conversation context.
Streaming Responses
Streaming sends tokens as they’re generated, creating a real-time effect:
# streaming_response.py
from openai import OpenAI
client = OpenAI()
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a haiku about Python."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
Output:
Code flows like rivers
Functions call within themselves
Logic pure and clean
The stream=True parameter returns a generator that yields chunks as they arrive — perfect for web UIs.

Function Calling and Tool Use
Function calling lets the model request your application invoke specific functions:
# function_calling.py
import json
from openai import OpenAI
client = OpenAI()
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
}]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What's the weather in New York?"}],
tools=tools,
tool_choice="auto"
)
if response.choices[0].message.tool_calls:
call = response.choices[0].message.tool_calls[0]
print(f"Function: {call.function.name}")
print(f"Arguments: {call.function.arguments}")
Output:
Function: get_weather
Arguments: {"city": "New York", "unit": "fahrenheit"}
The model decides which function to invoke and structures arguments automatically. Your code executes the logic and sends results back.
Generating Embeddings
Embeddings are numerical representations of text for semantic search and similarity:
# embeddings_example.py
from openai import OpenAI
client = OpenAI()
texts = ["The cat sat on the mat.", "A feline rests on the rug.", "The dog ran through the park."]
response = client.embeddings.create(model="text-embedding-3-small", input=texts)
for i, item in enumerate(response.data):
print(f"Text {i}: {len(item.embedding)} dimensions, first 3: {item.embedding[:3]}")
Output:
Text 0: 1536 dimensions, first 3: [-0.0234, 0.0891, -0.0123]
Text 1: 1536 dimensions, first 3: [-0.0245, 0.0885, -0.0115]
Text 2: 1536 dimensions, first 3: [0.0123, 0.0342, 0.0789]
Semantically similar texts produce similar embeddings. Store them in vector databases like ChromaDB for powerful search.
Error Handling and Rate Limits
Production applications must handle errors gracefully:
# error_handling.py
from openai import OpenAI, RateLimitError, APIError
client = OpenAI()
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=10
)
print(response.choices[0].message.content)
except RateLimitError:
print("Rate limit exceeded. Wait before retrying.")
except APIError as e:
print(f"API error: {e.status_code} - {e.message}")
Output:
Hi there! How can I help you today?
Implement exponential backoff for rate limits — wait progressively longer between retries.

Real-Life Example: Interactive CLI Chatbot
Here’s a complete chatbot with conversation history:
# chatbot.py
from openai import OpenAI
class Chatbot:
def __init__(self, system_prompt="You are a helpful assistant."):
self.client = OpenAI()
self.messages = [{"role": "system", "content": system_prompt}]
def chat(self, user_input):
self.messages.append({"role": "user", "content": user_input})
try:
response = self.client.chat.completions.create(
model="gpt-4o-mini",
messages=self.messages,
temperature=0.7,
max_tokens=500
)
reply = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": reply})
return reply
except Exception as e:
return f"Error: {e}"
def save_history(self, filename="chat_history.txt"):
with open(filename, "w") as f:
for msg in self.messages:
f.write(f"{msg['role'].upper()}:\n{msg['content']}\n\n")
def run(self):
print("Chatbot ready. Type 'quit' to exit, 'save' to save history.\n")
while True:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() == "quit":
break
if user_input.lower() == "save":
self.save_history()
print("History saved.")
continue
print(f"Assistant: {self.chat(user_input)}\n")
if __name__ == "__main__":
Chatbot("You are a knowledgeable Python expert.").run()
Usage:
$ python chatbot.py
Chatbot ready. Type 'quit' to exit, 'save' to save history.
You: What's the difference between lists and tuples?
Assistant: Lists are mutable, tuples are immutable...
You: save
History saved.
This demonstrates conversation history management, error handling, persistent storage, and an interactive loop.
Frequently Asked Questions
How much does the OpenAI API cost?
OpenAI uses pay-per-token pricing. gpt-4o-mini costs roughly $0.15 per million input tokens. Set hard spending limits in your account settings.
What’s the difference between temperature and top_p?
temperature controls randomness directly (0 = deterministic, 2 = very random). top_p uses nucleus sampling. For most apps, adjust temperature and leave top_p at 1.0.
How long can a conversation be?
Limited by the context window: 128K tokens for gpt-4o/gpt-4o-mini. Monitor response.usage to track consumption.
Can I fine-tune the models?
Yes, OpenAI supports fine-tuning for specific models. Start with prompt engineering first — it’s usually sufficient and cheaper.
How do I handle sensitive data?
Never send PII (SSNs, credit cards) to the API. Use data scrubbing and anonymization. Review OpenAI’s privacy policy for compliance.
Conclusion
You now have a comprehensive foundation for building with the OpenAI API: chat completions, system messages, streaming, function calling, vision, embeddings, and error handling. The Python SDK makes integration elegant. Start with a simple chatbot and extend from there.
Visit the official documentation at platform.openai.com/docs for advanced features like fine-tuning and batch processing.
Setting Up the OpenAI Client
The official Python SDK is openai. Authentication via the OPENAI_API_KEY environment variable is the simplest and safest path:
# pip install openai
import os
from openai import OpenAI
# Reads from OPENAI_API_KEY env var
client = OpenAI()
# Or pass explicitly (never hardcode in production)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Test
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Say hello in one sentence"}],
)
print(resp.choices[0].message.content)
For local development, put your key in a .env file and load it with python-dotenv — never commit keys to git. In production, use your platform’s secrets manager (AWS Secrets Manager, Vault, GCP Secret Manager).
Chat Completions: The Workhorse Endpoint
Chat completions handle 95% of real-world use cases. The model takes a list of messages with roles (system, user, assistant) and returns the next assistant message:
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a Python expert. Be concise."},
{"role": "user", "content": "How do I read a CSV?"},
],
temperature=0.2, # 0.0 = deterministic, 1.0 = creative
max_tokens=200, # cap response length
)
print(resp.choices[0].message.content)
print(resp.usage.total_tokens) # tokens you'll be billed for
The system message shapes the model’s behavior across the conversation. temperature is the most-impactful parameter — drop it to 0 for code generation or structured outputs, raise it for creative writing.
Streaming Responses
For chat UIs and long completions, streaming the response gives users immediate feedback. Iterate over chunks as they arrive:
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a haiku about Python"}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
print()
Streaming reduces perceived latency from “5 seconds of waiting” to “instant first word”. Build it in from the start for any user-facing feature.
Function Calling / Tools
For agents that need to call code (lookup data, run calculations, fetch URLs), use the tools/function-calling feature. You describe the functions; the model decides when to call them and with what arguments:
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["city"],
},
},
}]
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
tool_choice="auto",
)
tool_call = resp.choices[0].message.tool_calls[0]
print(tool_call.function.name) # 'get_weather'
print(tool_call.function.arguments) # '{"city":"Tokyo","unit":"celsius"}'
# Your code calls the actual function, then sends the result back as a message
# with role="tool" and tool_call_id matching the call
Structured Outputs (JSON Mode)
When you need the model to return parseable JSON, use the response_format parameter or structured outputs:
from pydantic import BaseModel
class UserProfile(BaseModel):
name: str
age: int
interests: list[str]
resp = client.chat.completions.parse(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Give me a sample user profile"}],
response_format=UserProfile,
)
profile = resp.choices[0].message.parsed
print(profile.name, profile.age, profile.interests)
This is the modern way to do structured extraction — the SDK validates the output against your pydantic schema, raising if the model returns anything malformed.
Embeddings for Search and Similarity
Embeddings turn text into vectors. Cosine similarity between vectors approximates semantic similarity — the foundation of semantic search, RAG, and clustering:
resp = client.embeddings.create(
model="text-embedding-3-small",
input=[
"Python is a programming language",
"JavaScript runs in browsers",
"I love coding in Python every day",
],
)
for item in resp.data:
print(len(item.embedding)) # 1536 dimensions
# Cosine similarity to find similar texts
import numpy as np
vecs = np.array([d.embedding for d in resp.data])
sim = vecs @ vecs.T
print(sim) # how each pair compares
Common Pitfalls
- Hardcoding keys. Hardcoded API keys end up on GitHub, get scraped within hours, and get revoked. Always use environment variables.
- Not setting max_tokens. Unbounded responses can rack up costs fast. Set
max_tokenson every call. - Building chat history forever. Sending the whole conversation every turn means quadratic token growth. Truncate or summarize old messages once the context approaches the model’s window.
- Ignoring rate limits. OpenAI returns 429 errors when you hit RPM / TPM limits. Wrap calls with exponential backoff (
tenacitylibrary) or use the async client with concurrency limits. - Treating model output as code. Never
exec()oreval()generated code. Treat outputs as untrusted user input — validate, sanitize, sandbox.
FAQ
Q: Which model should I use?
A: gpt-4o-mini for cost-effective everyday work — fast and cheap. gpt-4o for harder tasks. o1 / o3 for deep reasoning. Use the smallest model that solves your problem.
Q: How do I control cost?
A: Three levers: pick a smaller model, set max_tokens, truncate conversation history. Monitor with the OpenAI dashboard — set alerts at 50% and 80% of your monthly cap.
Q: How do I handle long documents?
A: Split into chunks (1500-2000 tokens each), embed each chunk, retrieve relevant chunks for each query (RAG), and only send those to the model. LangChain and LlamaIndex automate this pattern.
Q: Is there an async client?
A: Yes — from openai import AsyncOpenAI. Same API, all methods are coroutines. Use it inside FastAPI handlers and async scrapers.
Q: What about local LLMs?
A: Run open-weight models via Ollama, llama.cpp, or LM Studio. They expose an OpenAI-compatible API — change base_url in the client and the rest of your code keeps working.
Wrapping Up
The OpenAI Python SDK is one of those rare ones where the surface area maps cleanly onto real-world tasks: chat completions, streaming, tool/function calling, structured outputs, and embeddings cover almost everything. Pick the smallest model that does the job, set max_tokens, use environment variables for keys, and validate model output before acting on it. Those four habits prevent 90% of production incidents.