VOOZH about

URL: https://dev.to/shashank_ms_6a35baa4be138/code-analysis-with-llm-2l9

⇱ Code Analysis with LLM - DEV Community


We are building a lightweight CLI tool that feeds Python source files to an LLM and returns structured bug reports, security flags, and refactoring hints. It helps developers catch issues before opening pull requests or running heavy static analysis suites. Because Oxlo.ai charges a flat rate per request, you can pass entire files or long stack traces without worrying about token meter creep.

What you'll need

Step 1: Scaffold the project and read source files

Create a new directory and a file named analyzer.py. We will use the standard library ast module to extract function and class bodies so the LLM receives logical chunks instead of one undifferentiated wall of text.

import ast
import os
from pathlib import Path

def get_python_files(root: str):
 return [p for p in Path(root).rglob("*.py") if ".venv" not in p.parts]

def extract_chunks(source: str, filename: str):
 tree = ast.parse(source)
 chunks = []
 for node in ast.walk(tree):
 if isinstance(node, (ast.FunctionDef, ast.ClassDef, ast.AsyncFunctionDef)):
 start = node.lineno
 end = node.end_lineno or start
 snippet = "\n".join(source.splitlines()[start - 1:end])
 chunks.append({"name": node.name, "lines": (start, end), "code": snippet})
 if not chunks:
 chunks.append({"name": "module", "lines": (1, len(source.splitlines())), "code": source})
 return chunks

Step 2: Lock down the system prompt

The system prompt forces the model to emit strict JSON and prevents vague rambling. I keep it in its own variable so I can tweak severity thresholds without touching the API logic.

SYSTEM_PROMPT = """You are a senior staff engineer performing code review.
Analyze the provided Python code snippet and report findings as a JSON object with this exact schema:
{
 "findings": [
 {
 "line": integer or null,
 "severity": "low" | "medium" | "high",
 "category": "bug" | "security" | "performance" | "style",
 "message": "concise explanation",
 "fix": "concrete code suggestion or null"
 }
 ]
}
Rules:
- Return ONLY the JSON object, no markdown fences.
- If there are no issues, return {"findings": []}.
- Prioritize correctness and security over style nits."""

Step 3: Call Oxlo.ai for analysis

We initialize the OpenAI client pointing at Oxlo.ai and send each chunk. I use DeepSeek V3.2 here because it handles coding and reasoning well, and Oxlo.ai serves it with no cold starts. The flat per-request pricing means a 300-line file costs the same as a 20-line file, so we do not need to truncate context to save money.

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key=os.environ.get("OXLO_API_KEY"))

def analyze_chunk(chunk: dict, filename: str):
 user_message = f"File: {filename}\nFunction/Class: {chunk['name']}\nLines: {chunk['lines'][0]}-{chunk['lines'][1]}\n\n

```python\n{chunk['code']}\n```

"
 response = client.chat.completions.create(
 model="deepseek-v3.2",
 messages=[
 {"role": "system", "content": SYSTEM_PROMPT},
 {"role": "user", "content": user_message},
 ],
 temperature=0.2,
 )
 raw = response.choices[0].message.content.strip()
 try:
 return json.loads(raw)
 except json.JSONDecodeError:
 return {"findings": [{"line": None, "severity": "low", "category": "style", "message": "Model returned non-JSON. Raw output logged.", "fix": None}], "raw": raw}

Step 4: Render readable reports

Raw JSON is not friendly in a terminal. This formatter collapses duplicates and prints severity-colored lines. It also aggregates findings across an entire directory.

def print_report(filename: str, findings: list):
 if not findings:
 return
 print(f"\n{filename}")
 print("-" * len(filename))
 for f in findings:
 line = f.get("line") or "?"
 sev = f.get("severity", "low").upper()
 cat = f.get("category", "general")
 msg = f.get("message", "No details")
 fix = f.get("fix")
 print(f" [{sev}] Line {line} ({cat}): {msg}")
 if fix:
 print(f" Suggested fix: {fix}")

def analyze_file(path: Path):
 source = path.read_text(encoding="utf-8")
 chunks = extract_chunks(source, path.name)
 all_findings = []
 for chunk in chunks:
 result = analyze_chunk(chunk, str(path))
 for finding in result.get("findings", []):
 if finding.get("line") is None:
 finding["line"] = chunk["lines"][0]
 all_findings.append(finding)
 print_report(str(path), all_findings)

Step 5: Batch analyze a directory

The entry point accepts a path, discovers Python files while skipping virtual environments, and runs the analyzer on each file. This is the glue that turns the script into a daily tool.

if __name__ == "__main__":
 import sys
 target = sys.argv[1] if len(sys.argv) > 1 else "."
 if Path(target).is_file():
 analyze_file(Path(target))
 else:
 for py_file in get_python_files(target):
 analyze_file(py_file)

Run it

Create a deliberately buggy file named example.py and point the script at it.

# example.py
import os

def fetch_user_data(user_id):
 query = f"SELECT * FROM users WHERE id = {user_id}"
 os.system(f"echo {query}")
 return {"id": user_id, "admin": True}

def divide(a, b):
 return a / b

Run the analyzer from your terminal.

export OXLO_API_KEY="YOUR_OXLO_API_KEY"
python analyzer.py example.py

Expected output looks something like this. Exact wording will vary by model temperature, but the structure should stay consistent.

example.py
------------
 [HIGH] Line 4 (security): SQL injection risk from unsanitized user_id interpolation.
 Suggested fix: Use parameterized queries, e.g., cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
 [HIGH] Line 5 (security): os.system call with untrusted input enables command injection.
 Suggested fix: Use subprocess.run with a list of arguments or avoid shell execution.
 [MEDIUM] Line 9 (bug): Potential ZeroDivisionError when b is zero.
 Suggested fix: Add a guard clause, e.g., if b == 0: raise ValueError("division by zero")

Wrap-up

You now have a working LLM-powered code analyzer that runs on flat per-request pricing through Oxlo.ai. Two concrete next steps: wire the script into a Git pre-commit hook so it blocks commits on high-severity findings, or swap the model to qwen-3-32b on Oxlo.ai if you are analyzing multilingual codebases that mix Python with documentation in other languages. For pricing details, see https://oxlo.ai/pricing.