VOOZH about

URL: https://dev.to/shashank_ms_6a35baa4be138/integrating-llm-with-other-machine-learning-models-c0o

⇱ Integrating LLM with Other Machine Learning Models - DEV Community


We are building a support ticket intelligence pipeline that classifies incoming requests and drafts contextual replies by combining Oxlo.ai embeddings, a local random forest classifier, and an LLM. It helps support teams cut response time by automating triage and surfacing relevant historical resolutions. The entire stack runs through Oxlo.ai's OpenAI-compatible API, so we can standardize on one provider for both vector and text generation workloads.

What you'll need

  • Python 3.10+
  • An Oxlo.ai API key from https://portal.oxlo.ai
  • The OpenAI SDK and scikit-learn: pip install openai scikit-learn numpy

Step 1: Configure the Oxlo.ai client

I start by instantiating the client. Oxlo.ai exposes a fully OpenAI-compatible endpoint, so the standard SDK works without wrappers.

from openai import OpenAI
import os

client = OpenAI(
 base_url="https://api.oxlo.ai/v1",
 api_key=os.environ["OXLO_API_KEY"]
)

Step 2: Generate embeddings for historical tickets

We need a knowledge base of resolved tickets. I embed them using Oxlo.ai's BGE-Large model so we can retrieve later and train a classifier on the same vectors.

import numpy as np

historical_tickets = [
 {"id": 1, "text": "Cannot connect to database after latest deploy.", "priority": "high", "resolution": "Rollback the migration and re-run with SSL enabled."},
 {"id": 2, "text": "Logo is misaligned on the billing page in Safari.", "priority": "low", "resolution": "Add flex-center to the container CSS."},
 {"id": 3, "text": "Intermittent 500 errors on checkout API.", "priority": "high", "resolution": "Increase the connection pool size in Redis."},
 {"id": 4, "text": "Dark mode toggle missing in settings.", "priority": "low", "resolution": "Shipped in v2.4.1 under Appearance."},
 {"id": 5, "text": "Webhook signatures fail verification.", "priority": "high", "resolution": "Document the HMAC-SHA256 format and provide a code sample."},
]

def embed_text(text):
 response = client.embeddings.create(
 input=text,
 model="bge-large"
 )
 return response.data[0].embedding

for t in historical_tickets:
 t["embedding"] = embed_text(t["text"])

print(f"Embedded {len(historical_tickets)} tickets.")

Step 3: Train a local priority classifier

Now we treat the embeddings as feature vectors and train a lightweight random forest to predict priority. This is where traditional ML meets the LLM stack.

from sklearn.ensemble import RandomForestClassifier

X = np.array([t["embedding"] for t in historical_tickets])
y = [t["priority"] for t in historical_tickets]

clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X, y)

print("Classifier trained on embedding features.")

Step 4: Retrieve similar tickets

When a new ticket arrives, we find the closest historical matches by cosine similarity over the same embedding space.

from sklearn.metrics.pairwise import cosine_similarity

def retrieve_similar(embedding, tickets, top_k=2):
 embeddings = np.array([t["embedding"] for t in tickets])
 sims = cosine_similarity([embedding], embeddings)[0]
 top_indices = np.argsort(sims)[-top_k:][::-1]
 return [tickets[i] for i in top_indices]

Step 5: Define the system prompt

The system prompt tells the LLM how to behave, referencing the classification and retrieved context we will inject.

SYSTEM_PROMPT = """You are a senior support engineer assistant.

Your job is to:
1. Acknowledge the user's issue in one sentence.
2. State the predicted priority and explain why based on similar tickets.
3. Draft a concise, actionable reply that references the resolution of the most relevant historical ticket.

Keep the tone technical and direct. Do not ask the user to verify information already provided."""

Step 6: Assemble the inference pipeline

This function ties the pieces together: embed the new ticket, predict priority, retrieve neighbors, and call Oxlo.ai's Llama 3.3 70B to generate the response.

def process_ticket(ticket_text):
 # Embed the incoming ticket
 emb = embed_text(ticket_text)
 
 # Predict priority using the local classifier
 priority = clf.predict([emb])[0]
 
 # Retrieve similar historical tickets
 neighbors = retrieve_similar(emb, historical_tickets, top_k=2)
 
 # Build the context block
 context = "\n\n".join(
 f"Ticket: {n['text']}\nPriority: {n['priority']}\nResolution: {n['resolution']}"
 for n in neighbors
 )
 
 user_message = (
 f"New ticket: {ticket_text}\n"
 f"Predicted priority: {priority}\n\n"
 f"Historical context:\n{context}"
 )
 
 response = client.chat.completions.create(
 model="llama-3.3-70b",
 messages=[
 {"role": "system", "content": SYSTEM_PROMPT},
 {"role": "user", "content": user_message},
 ],
 )
 
 return priority, response.choices[0].message.content

Run it

Here is a realistic incoming ticket and the output from the pipeline.

ticket = "API keys rotated this morning and now all requests return 401 Unauthorized."

priority, reply = process_ticket(ticket)

print(f"Predicted priority: {priority}")
print("---")
print(reply)

Example output:

Predicted priority: high
---

We see you are hitting authentication failures after a key rotation.

Priority: high. This matches previous high-priority incidents involving verification and access issues.

Action: Please verify that your integration is using the newly issued key format. Refer to the resolution for webhook signatures fail verification to adjust your header calculation and ensure the HMAC-SHA256 code sample is applied correctly.

Wrap-up

This pipeline shows how to treat an LLM as one component in a broader ML system rather than a monolithic black box. Oxlo.ai simplifies the architecture because embeddings and chat completions share one endpoint and one request-based pricing model, which keeps costs predictable when you scale to thousands of tickets. A concrete next step is to replace the random forest with a fine-tuned classifier hosted on Oxlo.ai, or stream the generated replies into a Slack webhook for real-time triage.