Prometheus and Grafana¶
Source https://github.com/vllm-project/vllm/tree/main/examples/observability/prometheus_grafana.
This is a simple example that shows you how to connect vLLM metric logging to the Prometheus/Grafana stack. For this example, we launch Prometheus and Grafana via Docker. You can checkout other methods through Prometheus and Grafana websites.
Install:
Launch¶
Prometheus metric logging is enabled by default in the OpenAI-compatible server. Launch via the entrypoint:
vllmservemistralai/Mistral-7B-v0.1\
--max-model-len2048
Launch Prometheus and Grafana servers with docker compose:
dockercomposeup
Submit some sample requests to the server:
wgethttps://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
vllmbenchserve\
--modelmistralai/Mistral-7B-v0.1\
--tokenizermistralai/Mistral-7B-v0.1\
--endpoint/v1/completions\
--dataset-namesharegpt\
--dataset-pathShareGPT_V3_unfiltered_cleaned_split.json\
--request-rate3.0
Navigating to http://localhost:8000/metrics will show the raw Prometheus metrics being exposed by vLLM.
Grafana Dashboard¶
Navigate to http://localhost:3000. Log in with the default username (admin) and password (admin).
Add Prometheus Data Source¶
Navigate to http://localhost:3000/connections/datasources/new and select Prometheus.
On Prometheus configuration page, we need to add the Prometheus Server URL in Connection. For this setup, Grafana and Prometheus are running in separate containers, but Docker creates DNS name for each container. You can just use http://prometheus:9090.
Click Save & Test. You should get a green check saying "Successfully queried the Prometheus API.".
Import Dashboard¶
Navigate to http://localhost:3000/dashboard/import, upload grafana.json, and select the prometheus datasource. You should see a screen that looks like the following:
