Whatsapp controlled agent with websearch
I’ve been spending some of my free time tinkering building a POC of a small agentic application that would integrate WhatsApp with a chatbot. It’s not some polished production system- just a fun experiment I threw together to see how these pieces could work in tandem.
I’m running everything using a simple FastAPI server. It listens for webhooks from WhatsApp (so it can get messages from users), decides what to do with those messages, and then sends back a response. The part that turned out to be the most interesting was getting audio messages, transcribing them with OpenAI’s Whisper, and figuring out when to run a web search using DuckDuckGo.
Why This Setup?
The basic idea is straightforward: I would like to be able to build a custom solution to automate my daily tasks. I thought it would be interesting to ask my agent to do stuff by either writing to it through WhatsApp, or sending a voice message. The Agent checks if it can handle the question on its own, or if it needs some extra information from the web. If it needs to do a search, it uses DuckDuckGo to get results, summarizes them, and returns a short answer back to WhatsApp.
Prerequisites and Setting Up
Before we jump into the code, make sure you have:
• Python (though earlier versions may work).
• A WhatsApp Cloud API setup with a phone number, access token, and verification token.
• An OpenAI API key for Whisper (or any other transcription model you wish to use).
• The DuckDuckGo Search Python package installed.
• (Optional) A local or remote AI model that decides whether or not to search the web (in the snippets, we’ll use ollama with a model named deepseek-r1:7b as an example). You can modify this part as you see fit.
Dependencies
Let's start by installing the required dependencies:
1pip install fastapi uvicorn requests python-dotenv openai duckduckgo_search ollama
We’ll need to create a .env file in our project directory (the same location as our Python files) and add these lines (adjusting values as needed). This will allow to safely store our secrets outside of the codebase. This file needs to be added to the .gitignore.
1OPENAI_API_KEY = YOUR_OPENAI_API_KEY2WHATSAPP_PHONE_NUMBER_ID = YOUR_WHATSAPP_PHONE_NUMBER_ID3WHATSAPP_API_TOKEN = YOUR_WHATSAPP_API_TOKEN4WEBHOOK_VERIFY_TOKEN = YOUR_WEBHOOK_VERIFY_TOKEN
Project Structure
1project /2 ├── .env3 ├── main.py4 └── requirements.txt(optional, for listing dependencies)
Creating the FastAPI Application
We’ll start with the basic layout of our FastAPI app. We’ll import everything we need, load the environment variables, and set up logging and CORS. This part is mostly boilerplate.
Here, we’re importing ollama (which is optional if you have a different way to parse the prompt for “search needed or not”). We also import openai, requests, fastapi, and so on.
Basic Imports and Setup:
1# main.py23import os4import re5import json6import logging7import requests8import uvicorn9from dotenv import load_dotenv10from fastapi import FastAPI, Request11from fastapi.middleware.cors import CORSMiddleware12from openai import OpenAI13from duckduckgo_search import DDGS14import ollama1516# 1. Load environment variables17load_dotenv(verbose = True, override = True)1819# 2. Set up the OpenAI client with your API key20client = OpenAI(api_key = os.getenv("OPENAI_API_KEY"))2122# 3. Initialize the FastAPI app23app = FastAPI()2425# 4. Add CORS middleware(optional but helpful for local testing)26app.add_middleware(27 CORSMiddleware,28 allow_origins = ["*"],29 allow_credentials = True,30 allow_methods = ["*"],31 allow_headers = ["*"],32)3334# 5. Configure logging for debugging35logging.basicConfig(level = logging.INFO)36logger = logging.getLogger(__name__)
Environment Variables and WhatsApp Configuration
We need certain variables to be available: the WhatsApp phone number ID, the API token, and the verification token for the webhook.
The @app.get("/") is just a simple endpoint you can test to make sure the server is running.\
Let’s define them:
1# main.py(continuing)23# WhatsApp API configuration4WHATSAPP_PHONE_NUMBER_ID = os.getenv("WHATSAPP_PHONE_NUMBER_ID")5WHATSAPP_API_URL = f"https://graph.facebook.com/v17.0/{WHATSAPP_PHONE_NUMBER_ID}/messages"6WHATSAPP_API_TOKEN = os.getenv("WHATSAPP_API_TOKEN")7WEBHOOK_VERIFY_TOKEN = os.getenv("WEBHOOK_VERIFY_TOKEN")89@app.get("/")10async def root():11 return { "message": "Hello World" }
Handling the Webhook Verification
WhatsApp requires a verification step. When you set up your webhook, WhatsApp will send a GET request to "/callback" endpoint with a specific hub.verify_token and hub.challenge. We just need to match it with WEBHOOK_VERIFY_TOKEN from our .env.
1# main.py (continuing)23@app.get("/callback")4async def callback(request: Request):5 """6 This GET endpoint is used by WhatsApp to verify your webhook.7 """8 mode = request.query_params.get("hub.mode")9 token = request.query_params.get("hub.verify_token")10 challenge = request.query_params.get("hub.challenge")1112 if mode and token:13 if mode == "subscribe" and token == WEBHOOK_VERIFY_TOKEN:14 return int(challenge)15 else:16 return "Verification failed", 40317 else:18 return "Missing parameters", 400
If the token matches, we send back the challenge value, and WhatsApp knows we’re good to go.
Downloading WhatsApp Media
When a user sends an audio message, WhatsApp provides a media ID. We need to exchange that ID for a direct download URL and then download the actual file:
1def download_whatsapp_media(media_id):2 """3 Get the direct download URL for a WhatsApp media file by ID,4 then download the file.5 """6 try:7 media_url = f"https://graph.facebook.com/v17.0/{media_id}"8 headers = {"Authorization": f"Bearer {WHATSAPP_API_TOKEN}"}910 # Request the media download URL11 response = requests.get(media_url, headers=headers)12 response.raise_for_status()1314 media_data = response.json()15 if 'url' not in media_data:16 raise ValueError("Media URL not found in response.")1718 media_download_url = media_data['url']19 media_response = requests.get(media_download_url, headers=headers)20 media_response.raise_for_status()2122 return media_response.content23 except Exception as e:24 logger.error(f"Error downloading media: {str(e)}")25 raise
We’ll write the content to a temporary file before transcribing.
Transcribing Audio with Whisper
Our chatbot supports audio. We’ll need a function that can take in a file path (to the downloaded audio) and transcribe it via OpenAI’s Whisper model.
1def transcribe_audio_with_openai(audio_path):2 with open(audio_path, "rb") as audio_file:3 transcription = client.audio.transcriptions.create(4 model = "whisper-1",5 file = audio_file6 )7 return transcription.text
This uses the OpenAI client instance we set up earlier.
Deciding If We Need a Web Search
The next piece is a function that asks our model whether the user’s query requires a web search. If it does, we expect it to return JSON with search_required=True. Otherwise, we get an answer directly. This allows us to parse the answer and have a consistent response from the model.
1def decide_if_search_needed(user_query):2 """3 Ask a model if the query requires a web search.4 The model returns JSON indicating whether a search is needed,5 along with the search query if needed.6 """7 prompt = f"""8 You are an AI assistant. If you can confidently answer the question, do so.9 Always include JSON in your response.10 The JSON should have a key "answer" and a key "search_required".11 If you need to search the web, set search_required to True.12 If you can answer the question confidently, set search_required to false.13 If you need to search the web, include "search_query" in the JSON14 that contains the user prompt converted to a search query.15 If you don't need to search the web, set "search_query" to null.1617 Question: {user_query}18 """1920 response = ollama.chat(21 model="deepseek-r1:7b", # Or any model you'd like22 messages=[{"role": "user", "content": prompt}]23 )2425 # The 'message' key contains our raw response text26 answer = response["message"]["content"]27 logger.info(f"Raw response from model: {answer}")2829 # Default fallback data if JSON extraction fails30 json_data = {31 "answer": None,32 "search_required": True,33 "search_query": None34 }3536 # Attempt to extract JSON from the response37 json_match = re.search(r'```json(.*?)```', answer, re.DOTALL)38 if json_match:39 try:40 extracted_data = json.loads(json_match.group(1))41 json_data.update(extracted_data)42 logger.info(f"Successfully extracted JSON: {json_data}")43 return extracted_data44 except json.JSONDecodeError:45 logger.error("Error decoding JSON from the model's response.")46 return json_data4748 return json_data
Feel free to replace ollama with any LLM or logic you have on hand to classify queries.
Searching the Web with DuckDuckGo
If the model decides a search is needed, we can use the duckduckgo_search library to fetch results. We’ll just return the first five.
1def search_duckduckgo(query):2 """3 Perform a web search using DuckDuckGo and return the top results.4 """5 results = DDGS().text(query, max_results=5)6 logger.info(f"DuckDuckGo Results: {results}")7 return results
If the model decides a search is needed, we can use the duckduckgo_search library to fetch results. We’ll just return the first five.
Sending a WhatsApp Message Back
After we generate an answer—whether it’s from the model or from a summarized web search—we need to send it back via the WhatsApp API:
1def send_whatsapp_message(to, text):2 """3 Send a text message back to the user via WhatsApp Cloud API.4 """5 try:6 headers = {7 "Authorization": f"Bearer {WHATSAPP_API_TOKEN}",8 "Content-Type": "application/json",9 }1011 payload = {12 "messaging_product": "whatsapp",13 "recipient_type": "individual",14 "to": to,15 "type": "text",16 "text": {17 "preview_url": False,18 "body": text19 }20 }2122 logger.info(f"Sending message to WhatsApp API:\n{json.dumps(payload, indent=2)}")23 response = requests.post(WHATSAPP_API_URL, json=payload, headers=headers)24 logger.info(f"WhatsApp API Response Status: {response.status_code}")25 logger.info(f"WhatsApp API Response: {response.text}")2627 response.raise_for_status()28 logger.info(f"Message sent successfully to {to}")29 except requests.exceptions.RequestException as e:30 logger.error(f"WhatsApp API error: {str(e)}")31 if hasattr(e, 'response') and e.response is not None:32 logger.error(f"Response content: {e.response.text}")33 raise
Summarizing Search Results
1def summarize_search_results(query, results):2 """3 Summarize the DuckDuckGo search results using a local or remote model.4 """5 if not results:6 return "I couldn't find any relevant search results."78 prompt = f"Summarize the following search results for: {query}\n\n{results}"910 response = ollama.chat(11 model="deepseek-r1:7b",12 messages=[{"role": "user", "content": prompt}]13 )14 return response['message']['content']
We can feed the search results back into our model to generate a concise answer for the user.
Putting It All Together: Handling Incoming Messages Steps in the Workflow
Here’s a quick rundown of how messages get handled:
- Receive WhatsApp Webhook: WhatsApp sends a POST request to my endpoint whenever a new message arrives.
- Check Message Type: If it’s a text message, I capture the text. If it’s an audio file, I download it from WhatsApp, store it temporarily, and pass it through OpenAI’s Whisper for transcription.
- Decide if a Web Search Is Needed: I’ve hooked in a model (DeepSeek R1 7B) that analyzes the user’s question. If it can be answered without external data, the code just sends back the model’s response. If it needs more context, it proceeds with a DuckDuckGo search.
- Summarize Search Results: Another step with the AI model - feed in the results to generate a concise answer.
- Send Back the Reply via WhatsApp: Finally, I craft a JSON payload that WhatsApp’s API accepts and fire it off. Users get the text response in their chat.\
1@app.post("/callback")2async def whatsapp_webhook(request: Request):3 """4 This POST endpoint handles new messages from WhatsApp.5 It checks message type, transcribes audio if needed,6 decides if a web search is required, and responds back.7 """8 try:9 data = await request.json()1011 # Check if there's a message12 if "messages" in data["entry"][0]["changes"][0]["value"]:13 message = data["entry"][0]["changes"][0]["value"]["messages"][0]14 sender_id = message["from"]15 message_type = message["type"]16 user_text = None17 else:18 logger.info("No messages found in the webhook data.")19 return {"status": "success", "message": "No messages found"}2021 logger.info(f"Processing {message_type} message from {sender_id}")2223 # Handle text messages24 if message_type == "text":25 user_text = message["text"]["body"].strip()26 logger.info(f"Text message received: {user_text}")2728 # Handle audio messages29 elif message_type == "audio":30 try:31 media_id = message["audio"]["id"]32 logger.info(f"Downloading audio with ID: {media_id}")3334 audio_content = download_whatsapp_media(media_id)35 audio_path = "temp_audio.ogg"36 with open(audio_path, "wb") as f:37 f.write(audio_content)3839 logger.info("Transcribing audio...")40 user_text = transcribe_audio_with_openai(audio_path)41 logger.info(f"Transcription: {user_text}")4243 os.remove(audio_path) # Clean up44 except Exception as e:45 logger.error(f"Error processing audio: {str(e)}")46 user_text = "Sorry, I couldn't process the audio message."47 # Attempt cleanup even if there's an error48 if os.path.exists("temp_audio.ogg"):49 os.remove("temp_audio.ogg")5051 else:52 # Fallback for unsupported message types53 user_text = "I can only process text and audio messages at the moment."5455 # Decide if a search is needed56 logger.info(f"Deciding if search is needed for: {user_text}")57 deepseek_answer = decide_if_search_needed(user_text)58 logger.info(f"Decision from model: {deepseek_answer}")5960 if deepseek_answer["search_required"]:61 # Perform a web search62 search_query = deepseek_answer["search_query"] or user_text63 logger.info(f"Performing web search for: {search_query}")64 results = search_duckduckgo(search_query)65 summary = summarize_search_results(user_text, results) if results else "No relevant results found."66 send_whatsapp_message(sender_id, summary)67 return {"status": "success", "message": "Answered using web search"}6869 else:70 # No search needed, send the direct answer71 bot_answer = deepseek_answer["answer"] or "I couldn't generate an answer."72 send_whatsapp_message(sender_id, bot_answer)73 return {"status": "success", "message": "Answered using AI"}7475 except Exception as e:76 logger.error(f"Error processing webhook: {str(e)}")77 return {"status": "error", "message": "Internal server error"}
Running the Application
Once you have your file (main.py) with the snippets combined in the correct order, you can run:\
1python main.py
Your FastAPI server will be available at http://localhost:8000.
If you need to expose it to the internet for WhatsApp to reach, use ngrok or a similar tunneling tool:
1ngrok http 8000
Copy the HTTPS URL (something like https://<random-string>.ngrok.io) and use it as your webhook URL in the WhatsApp Cloud API dashboard.
Testing Everything
- Verify your webhook by using the GET /callback endpoint (WhatsApp will do this automatically).
- Send a Text Message from your phone to your bot’s WhatsApp number. Check your local logs to see if the message was received, and watch for a reply.
- Send an Audio Message (voice note) to test Whisper transcription. If everything is working, you’ll see the transcribed text in your logs, and the bot will respond appropriately.
Conclusion
By walking through these steps, you now have a WhatsApp chatbot capable of handling both text and audio messages. It can transcribe audio using OpenAI’s Whisper, decide when to search the web for extra info, summarize search results, and send a neat response back to the user—all with a relatively small amount of Python code.
Feel free to experiment further, tweak the model or prompts, add caching, or handle more message types. This is just a starting point for combining FastAPI, AI-driven transcription, and web-search capabilities into a single chatbot interface.
Happy coding! ⌨️

Michał Winiarski
Fullstack Software Developer