AUTOMOTIVE
Production-ready domain-adapted variant of Qwen2.5-3B-Instruct, fine-tuned on automotive instruction-following data using QLoRA with Unsloth optimization. Features comprehensive evaluation pipeline, dataset engineering, and experiment tracking for enterprise-grade LLM development.
Specialized toward automotive question answering, diagnostic explanations, vehicle maintenance assistance, and technical guidance. Trained on a curated subset of 20,000 samples from the BAAI automotive industry instruction dataset with advanced data quality controls.
Fine-tuned on 20,000 curated samples from BAAI/IndustryInstruction_Automobiles with a comprehensive data engineering pipeline featuring automated quality controls and versioning.
Duplicate detection and removal (exact + near-duplicate) · Quality scoring with flagging system · Token length filtering (10–512 tokens) · Malformed structure detection · Automated quality reporting · Dataset versioning with quality tracking
Validation-based early stopping · Overfitting detection (threshold: 2.0) · Best checkpoint selection via eval_loss · Gradient checkpointing for memory efficiency · MLflow experiment tracking with DagsHub · Target modules: all attention + MLP projections
MULTI-METRIC EVALUATION SUITE · 20 SAMPLES · LLM-AS-A-JUDGE: GROQ LLAMA 3.3-70B
Automated dataset quality analysis · Duplicate detection and removal · Quality scoring and filtering · Dataset versioning system · Comprehensive quality reporting
# Load model from Hugging Face Hub from transformers import AutoTokenizer, AutoModelForCausalLMmodel_name = "Nasim435/Qwen-3B-Automotive-20K"
tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype="auto" )
prompt = "Explain symptoms of a failing alternator and diagnostic steps." messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True )
inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=200, temperature=0.7, top_p=0.9, do_sample=True, pad_token_id=tokenizer.eos_token_id )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Research model — experimental fine-tune, not intended for production safety systems
- Hallucination risk — may generate inaccurate automotive advice (8.4/10 risk score)
- Safety critical — not suitable for safety-critical or professional mechanical decision-making
- Domain scope — trained on 20K samples; generalization beyond automotive may be limited
- Quality assurance — 96.6% data quality score with 263 flagged samples requiring review
Real-time GPU utilization tracking · Memory usage profiling · Training throughput monitoring · Automated performance benchmarking
MIT License · English · 2026
Production-Ready Automotive AI Assistant
- Downloads last month
- 14
