CodeLens-7B

A fine-tuned Qwen2.5-7B-Instruct model specialized for code review, bug detection, and programming assistance. It analyzes code snippets, identifies issues, suggests improvements, and writes clean solutions across multiple programming languages.

Key Details


Base model	Qwen/Qwen2.5-7B-Instruct
Method	QLoRA (4-bit NF4, rank 16, alpha 16)
Library	Unsloth + TRL SFTTrainer
Dataset	sahil2801/CodeAlpaca-20k (10K examples)
Hardware	NVIDIA RTX A5000 (24GB VRAM) on RunPod
Training time	~2.65 hours (500 steps)
Final loss	0.450
Parameters trained	40.4M of 7.66B (0.53%)
Format	ChatML
Output	Merged 16-bit safetensors

Dataset

Trained on 10,000 examples from sahil2801/CodeAlpaca-20k, a code instruction-following dataset covering code generation, debugging, explanation, and review tasks across Python, JavaScript, Java, C, SQL, and more.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sriksven/CodeLens-7B")
tokenizer = AutoTokenizer.from_pretrained("sriksven/CodeLens-7B")

messages = [
 {
 "role": "system",
 "content": "You are an expert code reviewer and programmer. Analyze code, find bugs, suggest improvements, and write clean efficient solutions.",
 },
 {
 "role": "user",
 "content": "Review this Python function for bugs and improvements:\n\ndef find_duplicates(lst):\n seen = []\n dupes = []\n for i in lst:\n if i in seen:\n dupes.append(i)\n seen.append(i)\n return dupes",
 },
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Unsloth (faster inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
 model_name="sriksven/CodeLens-7B",
 max_seq_length=2048,
 load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

Capabilities

Code review — analyze code for bugs, anti-patterns, and style issues
Bug detection — identify logical errors, off-by-one mistakes, edge cases
Code generation — write functions, classes, and scripts from descriptions
Code explanation — explain what a piece of code does step by step
Refactoring suggestions — propose cleaner, more efficient alternatives
Multi-language — Python, JavaScript, Java, C/C++, SQL, HTML/CSS, and more

Intended Use

Local code review assistant
Programming tutoring and education
Code quality tooling in CI/CD pipelines
Prototyping developer tools with local LLMs

Limitations

Trained on instruction-following code data, not real code review conversations from PRs
May not catch security vulnerabilities that require deep context
Code suggestions should be tested before use in production
Best with shorter code snippets (functions/classes) rather than full files
No execution or testing capability — suggestions are pattern-based

Training Metrics

Loss decreased steadily from 2.17 to 0.27 over 500 steps (~13 epochs), indicating strong learning on the code instruction data.

Step	Loss	Epoch
10	2.168	0.26
100	0.503	2.05
250	0.430	6.41
400	0.310	10.26
500	0.278	12.83

Training Infrastructure


GPU	NVIDIA RTX A5000 24GB
Cloud	RunPod ($0.27/hr)
Framework	Unsloth 2026.5.2 + TRL + Transformers 5.5.0
Precision	BF16 training, 4-bit NF4 base quantization
Optimizer	AdamW 8-bit
Learning rate	2e-4, linear decay
Batch size	16 effective (4 per device × 4 accumulation)
Packing	Enabled

Source Code

Training scripts: github.com/sriksven/LLM-FineTune-Suite

License

Apache 2.0

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for sriksven/ExtractIQ-7B

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2617)

this model

Quantizations

2 models

URL: https://huggingface.co/sriksven/ExtractIQ-7B

⇱ sriksven/ExtractIQ-7B · Hugging Face