We are building a support ticket intelligence pipeline that classifies incoming requests and drafts contextual replies by combining Oxlo.ai embeddings, a local random forest classifier, and an LLM. It helps support teams cut response time by automating triage and surfacing relevant historical resolutions. The entire stack runs through Oxlo.ai's OpenAI-compatible API, so we can standardize on one provider for both vector and text generation workloads.
What you'll need
- Python 3.10+
- An Oxlo.ai API key from https://portal.oxlo.ai
- The OpenAI SDK and scikit-learn:
pip install openai scikit-learn numpy
Step 1: Configure the Oxlo.ai client
I start by instantiating the client. Oxlo.ai exposes a fully OpenAI-compatible endpoint, so the standard SDK works without wrappers.
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.oxlo.ai/v1",
api_key=os.environ["OXLO_API_KEY"]
)
Step 2: Generate embeddings for historical tickets
We need a knowledge base of resolved tickets. I embed them using Oxlo.ai's BGE-Large model so we can retrieve later and train a classifier on the same vectors.
import numpy as np
historical_tickets = [
{"id": 1, "text": "Cannot connect to database after latest deploy.", "priority": "high", "resolution": "Rollback the migration and re-run with SSL enabled."},
{"id": 2, "text": "Logo is misaligned on the billing page in Safari.", "priority": "low", "resolution": "Add flex-center to the container CSS."},
{"id": 3, "text": "Intermittent 500 errors on checkout API.", "priority": "high", "resolution": "Increase the connection pool size in Redis."},
{"id": 4, "text": "Dark mode toggle missing in settings.", "priority": "low", "resolution": "Shipped in v2.4.1 under Appearance."},
{"id": 5, "text": "Webhook signatures fail verification.", "priority": "high", "resolution": "Document the HMAC-SHA256 format and provide a code sample."},
]
def embed_text(text):
response = client.embeddings.create(
input=text,
model="bge-large"
)
return response.data[0].embedding
for t in historical_tickets:
t["embedding"] = embed_text(t["text"])
print(f"Embedded {len(historical_tickets)} tickets.")
Step 3: Train a local priority classifier
Now we treat the embeddings as feature vectors and train a lightweight random forest to predict priority. This is where traditional ML meets the LLM stack.
from sklearn.ensemble import RandomForestClassifier
X = np.array([t["embedding"] for t in historical_tickets])
y = [t["priority"] for t in historical_tickets]
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X, y)
print("Classifier trained on embedding features.")
Step 4: Retrieve similar tickets
When a new ticket arrives, we find the closest historical matches by cosine similarity over the same embedding space.
from sklearn.metrics.pairwise import cosine_similarity
def retrieve_similar(embedding, tickets, top_k=2):
embeddings = np.array([t["embedding"] for t in tickets])
sims = cosine_similarity([embedding], embeddings)[0]
top_indices = np.argsort(sims)[-top_k:][::-1]
return [tickets[i] for i in top_indices]
Step 5: Define the system prompt
The system prompt tells the LLM how to behave, referencing the classification and retrieved context we will inject.
SYSTEM_PROMPT = """You are a senior support engineer assistant.
Your job is to:
1. Acknowledge the user's issue in one sentence.
2. State the predicted priority and explain why based on similar tickets.
3. Draft a concise, actionable reply that references the resolution of the most relevant historical ticket.
Keep the tone technical and direct. Do not ask the user to verify information already provided."""
Step 6: Assemble the inference pipeline
This function ties the pieces together: embed the new ticket, predict priority, retrieve neighbors, and call Oxlo.ai's Llama 3.3 70B to generate the response.
def process_ticket(ticket_text):
# Embed the incoming ticket
emb = embed_text(ticket_text)
# Predict priority using the local classifier
priority = clf.predict([emb])[0]
# Retrieve similar historical tickets
neighbors = retrieve_similar(emb, historical_tickets, top_k=2)
# Build the context block
context = "\n\n".join(
f"Ticket: {n['text']}\nPriority: {n['priority']}\nResolution: {n['resolution']}"
for n in neighbors
)
user_message = (
f"New ticket: {ticket_text}\n"
f"Predicted priority: {priority}\n\n"
f"Historical context:\n{context}"
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return priority, response.choices[0].message.content
Run it
Here is a realistic incoming ticket and the output from the pipeline.
ticket = "API keys rotated this morning and now all requests return 401 Unauthorized."
priority, reply = process_ticket(ticket)
print(f"Predicted priority: {priority}")
print("---")
print(reply)
Example output:
Predicted priority: high
---
We see you are hitting authentication failures after a key rotation.
Priority: high. This matches previous high-priority incidents involving verification and access issues.
Action: Please verify that your integration is using the newly issued key format. Refer to the resolution for webhook signatures fail verification to adjust your header calculation and ensure the HMAC-SHA256 code sample is applied correctly.
Wrap-up
This pipeline shows how to treat an LLM as one component in a broader ML system rather than a monolithic black box. Oxlo.ai simplifies the architecture because embeddings and chat completions share one endpoint and one request-based pricing model, which keeps costs predictable when you scale to thousands of tickets. A concrete next step is to replace the random forest with a fine-tuned classifier hosted on Oxlo.ai, or stream the generated replies into a Slack webhook for real-time triage.
For further actions, you may consider blocking this person and/or reporting abuse
