VOOZH about

URL: https://huggingface.co/upstage/llama-30b-instruct-2048/discussions/6

โ‡ฑ upstage/llama-30b-instruct-2048 ยท Can it run faster than 2 tokens/second on one A100?


Can it run faster than 2 tokens/second on one A100?

#6
by aibarito-ua - opened

Hello!
I am trying to run this model on one A100, but the speed is quite slow - 2 tokens/sec. Does anybody know how to make it faster?
I have tried 8-bit-mode and it is allocating twice less gpu memory, but the speed is not increasing.

ยท Sign up or log in to comment