Samantha Qwen2 7B AWQ
Trained on 2x4090 using QLoRa and FSDP
Launch Using VLLM
python -m vllm.entrypoints.openai.api_server \
--model macadeliccc/Samantha-Qwen2-7B-AWQ \
--chat-template ./examples/template_chatml.jinja \
--quantization awq
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
chat_response = client.chat.completions.create(
model="macadeliccc/Samantha-Qwen2-7B-AWQ",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."},
]
)
print("Chat response:", chat_response)
Prompt Template
<|im_start|>system
You are a friendly assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.
Quants
- Downloads last month
- 5
Safetensors
Model size
8B params
Tensor type
I32
·
F16 ·
Model tree for macadeliccc/Samantha-Qwen2-7B-AWQ
Base model
Qwen/Qwen2-7B