VOOZH about

URL: https://huggingface.co/LARK-Lab/CodeScaler-1.7B

โ‡ฑ LARK-Lab/CodeScaler-1.7B ยท Hugging Face


CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

๐Ÿ‘ GitHub Code
๐Ÿ‘ GitHub Page
๐Ÿ‘ Datasets on Hugging Face
๐Ÿ‘ CodeScaler on Hugging Face

Overview

We propose CodeScaler, an execution-free reward model designed to scale both reinforcement learning training and test-time inference for code generation. CodeScaler is trained on carefully curated preference data derived from verified code problems and incorporates syntax-aware code extraction and validity-preserving reward shaping to ensure stable and robust optimization.

This model is the official CodeScaler-1.7B trained from Skywork/Skywork-Reward-V2-Qwen3-1.7B on LARK-Lab/CodeScalerPair-51K.

Performance on RM-Bench

Model Code Chat Math Safety Easy Normal Hard Avg
Skywork/Skywork-Reward-Llama-3.1-8B 54.5 69.5 60.6 95.7 89 74.7 46.6 70.1
TIGER-Lab/AceCodeRM-7B 66.9 66.7 65.3 89.9 79.9 74.4 62.2 72.2
TIGER-Lab/AceCoder-RM-32B 72.1 73.7 70.5 88 84.5 78.3 65.5 76.1
Skywork/Skywork-Reward-V2-Qwen3-1.7B 72.3 69.6 71.4 92.9 92.8 82.3 54.5 76.6
Skywork/Skywork-Reward-V2-Qwen3-4B 74.4 78.2 73.6 95.7 92.1 85 64.4 80.5
Skywork/Skywork-Reward-V2-Qwen3-8B 73.6 80.6 75 96.5 91.8 85.5 67 80.5
CodeScaler-1.7B (this model) 73.1 74.4 74.7 93.1 91.7 83.2 61.5 78.8
CodeScaler-4B 76.3 80.4 79 95.8 92.9 86.5 69.2 82.9
CodeScaler-8B 76.9 83 79.9 96.4 92.5 87.9 71.8 84.1

Usage

RM Scoring

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification



device = "cuda" if torch.cuda.is_available() else "cpu"

model_path = 'LARK-Lab/CodeScaler-1.7B'

tokenizer = AutoTokenizer.from_pretrained(model_path)
reward_model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
reward_model.eval()

question = """\
Given an integer array nums and an integer k, return the total number of continuous subarrays whose sum equals k.
A subarray is a contiguous part of the array.
For example:
```
Input:
nums = [1, 1, 1], k = 2

Output:
2
```
"""

program_correct = """\
from collections import defaultdict

def subarraySum(nums, k):
 prefix = 0
 count = 0
 freq = defaultdict(int)
 freq[0] = 1 # Important: subarray starting from index 0

 for num in nums:
 prefix += num

 if prefix - k in freq:
 count += freq[prefix - k]

 freq[prefix] += 1

 return count
"""

program_wrong = """\
def subarraySum(nums, k):
 left = 0
 curr_sum = 0
 count = 0

 for right in range(len(nums)):
 curr_sum += nums[right]

 while curr_sum > k and left <= right:
 curr_sum -= nums[left]
 left += 1

 if curr_sum == k:
 count += 1

 return count
"""


convs = [
 [
 {
 "content": question,
 "role": "user",
 },
 {
 "role": "assistant",
 "content": program
 }
 ] for program in [program_correct, program_wrong]
]


texts = [
 tokenizer.apply_chat_template(conv, tokenize=False)
 for conv in convs
]

toks = tokenizer(
 texts,
 truncation=True,
 padding=True,
 max_length=2048,
 return_tensors="pt",
)

with torch.no_grad():
 outputs = reward_model(
 input_ids=toks["input_ids"].to(device),
 attention_mask=toks["attention_mask"].to(device),
 )
 scores = outputs.logits.squeeze(-1).cpu().tolist()


print("RM Scores:", scores)
# RM Scores: [12.513851165771484, -0.46548914909362793]

RL Training

Please refer to https://github.com/LARK-AI-Lab/CodeScaler for rl training details.

Citation

If you find our work helpful, please consider citing:

@misc{zhu2026codescalerscalingcodellm,
 title={CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models}, 
 author={Xiao Zhu and Xinyu Zhou and Boyu Zhu and Hanxu Hu and Mingzhe Du and Haotian Zhang and Huiming Wang and Zhijiang Guo},
 year={2026},
 eprint={2602.17684},
 archivePrefix={arXiv},
 primaryClass={cs.LG},
 url={https://arxiv.org/abs/2602.17684}, 
}
Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
ยท

Model tree for LARK-Lab/CodeScaler-1.7B

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(1)
this model

Dataset used to train LARK-Lab/CodeScaler-1.7B

Collection including LARK-Lab/CodeScaler-1.7B

Paper for LARK-Lab/CodeScaler-1.7B