VOOZH about

URL: https://dev.to/jeremy_longshore/debugging-slack-integration-from-6-duplicate-responses-to-instant-acknowledgment-36ij

⇱ Debugging Slack Integration: From 6 Duplicate Responses to Instant Acknowledgment - DEV Community


The Problem: Bob Responded 6 Times to Every Message

I integrated my AI agent (Bob's Brain) with Slack, and it worked—sort of. Every time I sent a message, Bob responded six times with the exact same answer. The Cloudflare Tunnel logs showed constant timeout errors:

2025-10-09T08:12:20Z ERR Request failed error="Incoming request ended abruptly: context canceled"

This wasn't a "minor bug"—this was a production-breaking issue that made the integration unusable.

The Journey: What Actually Happened

Starting Point: Unstable Tunnels

Before we even got to Slack, we had tunnel stability issues:

localhost.run kept changing URLs:

  • cf011aadb6f85d.lhr.life
  • 0ca4fddc58e906.lhr.life
  • 7aa0d045663613.lhr.life

Every URL change required updating Slack Event Subscriptions. Not sustainable.

Solution: Switched to Cloudflare Tunnel (cloudflared)

  • Free, no account required for testing
  • Stable URL: https://editor-steering-width-innovation.trycloudflare.com
  • Persists as long as the process runs
# Install cloudflared
curl -sLO https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared-linux-amd64.deb

# Start tunnel in background
nohup cloudflared tunnel --url http://localhost:8080 > /tmp/cloudflared.log 2>&1 &

Side Quest: LlamaIndex API Migration

While setting up Slack, Bob's Knowledge Orchestrator was throwing deprecation warnings:

# OLD (deprecated)
from llama_index.core import ServiceContext, set_global_service_context
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=512)
set_global_service_context(service_context)

# NEW (Settings API)
from llama_index.core import Settings
Settings.llm = llm
Settings.chunk_size = 512

Why this mattered: Bob integrates three knowledge sources (653MB Knowledge DB, Analytics DB, Research index). The deprecation was blocking clean initialization.

Result after fix:

✅ Knowledge orchestrator initialized successfully

The Main Problem: Slack's 3-Second Timeout

Slack verified the webhook URL successfully. Bob started responding to messages. But every message triggered 6 duplicate responses.

Initial code flow:

  1. Slack sends webhook event
  2. Bob processes entire LLM query (10-60 seconds)
  3. Bob sends Slack message
  4. Bob returns HTTP 200

Slack's behavior:

  • Waits 3 seconds for HTTP 200
  • No response? Retry the event
  • Keeps retrying until it gets acknowledgment
  • Result: 4-6 duplicate event deliveries

The Debugging Process

First attempt: "Maybe it's the tunnel?"

  • Checked tunnel logs: Connection stable
  • Tested endpoint locally: curl http://localhost:8080/slack/events → Works fine

Second attempt: "Maybe it's LLM response time?"

  • Ollama (local): 5-15 seconds
  • Groq (cloud): 2-8 seconds
  • Even fastest responses exceeded Slack's 3-second window

Root cause identified:

@app.post("/slack/events")
def slack_events():
 # ... validation ...

 # ❌ PROBLEM: This takes 10-60 seconds
 answer = llm(prompt)
 slack_client.chat_postMessage(channel=channel, text=answer)

 # By the time we return HTTP 200, Slack has retried 4-6 times
 return jsonify({"ok": True})

The Solution: Immediate Acknowledgment + Background Processing

Key insight: Slack doesn't need to wait for the LLM response. It just needs to know we received the event.

Implementation

1. Create background processing function:

_slack_event_cache = {} # Deduplication cache

def _process_slack_message(text, channel, user, event_id):
 """Background processing - can take as long as needed"""
 try:
 # 1. Check cache
 cached = get_cached_llm_response(text)
 if cached:
 slack_client.chat_postMessage(channel=channel, text=cached['answer'])
 return

 # 2. Get conversation history
 history = get_conversation_history(user, limit=10)

 # 3. Route to optimal LLM
 routing = ROUTER.route(text)

 # 4. Query knowledge bases if complex
 knowledge_context = ""
 if routing['complexity'] > 0.3:
 knowledge_context = KNOWLEDGE.query(text, mode='auto')

 # 5. Generate answer
 llm = llm_client()
 prompt = build_conversation_prompt(history, text, knowledge_context)
 answer = llm(prompt)

 # 6. Send to Slack (no rush, we're in background)
 slack_client.chat_postMessage(
 channel=channel,
 text=f"{answer}\n\n_[via {routing['provider']}]_"
 )

 # 7. Cache and learn
 cache_llm_response(text, answer, ttl=3600)
 add_to_conversation(user, "user", text)
 add_to_conversation(user, "assistant", answer)
 COL.run_once([{"type": "slack_message", ...}])

 finally:
 # Cleanup dedup cache after 60 seconds
 threading.Timer(60, lambda: _slack_event_cache.pop(event_id, None)).start()

2. Modify webhook handler to return immediately:

@app.post("/slack/events")
def slack_events():
 payload = request.get_json(silent=True) or {}

 # Handle URL verification
 if payload.get("type") == "url_verification":
 return jsonify({"challenge": payload.get("challenge")})

 event = payload.get("event", {})
 event_id = payload.get("event_id", "")

 # ✅ CRITICAL: Deduplicate retries
 if event_id and event_id in _slack_event_cache:
 log.info(f"Ignoring duplicate event: {event_id}")
 return jsonify({"ok": True})

 if event_id:
 _slack_event_cache[event_id] = True

 # Validate event
 if event.get("bot_id") or event.get("type") not in ["message", "app_mention"]:
 return jsonify({"ok": True})

 text = event.get("text", "")
 channel = event.get("channel")
 user = event.get("user")

 if not text or not channel:
 return jsonify({"ok": True})

 # ✅ SOLUTION: Spawn background thread
 thread = threading.Thread(
 target=_process_slack_message,
 args=(text, channel, user, event_id),
 daemon=True
 )
 thread.start()

 # ✅ Return HTTP 200 immediately (< 100ms)
 log.info(f"Queued Slack message for background processing")
 return jsonify({"ok": True})

Why This Works

Before:

  • Slack → Webhook → Process (10-60s) → HTTP 200
  • Slack timeout → Retry → Process again → HTTP 200
  • Result: 6 responses

After:

  • Slack → Webhook → HTTP 200 (< 100ms)
  • Background: Process → Send Slack message
  • Deduplication: Retries ignored via event_id cache
  • Result: 1 response

Results

Performance:

  • HTTP 200 acknowledgment: < 100ms (was 10-60 seconds)
  • No more Cloudflare timeout errors
  • One message in → One response out

Testing:

# Before fix
User: "Hey Bob"
Bob: [response 1]
Bob: [response 2]
Bob: [response 3]
Bob: [response 4]
Bob: [response 5]
Bob: [response 6]

# After fix
User: "Hey Bob"
Bob: [response] ✓

Bonus: DiagPro Training

While debugging, I also trained Bob on a 19,000-word DiagPro customer avatar document using the /learn endpoint:

curl -X POST http://localhost:8080/learn \
 -H "Content-Type: application/json" \
 -H "X-API-Key: $BB_API_KEY" \
 -d '{
 "correction": "DiagPro is a $4.99 AI-powered automotive diagnostic platform targeting drivers aged 25-60 who fear being overcharged by mechanics..."
 }'

Bob's Circle of Life learning system processes this knowledge and makes it available for queries through the Knowledge Orchestrator.

Key Lessons

  1. Webhook timeout limits are real - Slack's 3-second timeout isn't negotiable
  2. Background processing is essential - Don't make the HTTP client wait for slow operations
  3. Deduplication is critical - Retries WILL happen; handle them gracefully
  4. Event IDs exist for a reason - Use them to detect duplicate deliveries
  5. Tunnel stability matters - Cloudflare Tunnel >>> localhost.run for production use

Related Posts

Tech Stack

  • Python 3.12 with Flask
  • Slack SDK for Python
  • Cloudflare Tunnel for public HTTPS
  • LlamaIndex for knowledge integration
  • Ollama (local), Groq, Google Gemini (cloud LLMs)
  • Redis for caching and conversation memory

Author: Jeremy Longshore
Email: jeremy@intentsolutions.io
GitHub: @jeremylongshore

Building production-grade AI agents with real-world integration lessons learned the hard way.