Configuration¶
Plugin Setup¶
To load the plugin, set the VLLM_PLUGINS environment variable before running vLLM:
Usage¶
You can then use vLLM as usual:
from vllm import LLM
llm = LLM(
model="ibm-ai-platform/micro-g3.3-8b-instruct-1b",
max_model_len=128,
max_num_seqs=2,
)
See the Examples page for more usage patterns.
pyproject.toml Reference¶
The pyproject.toml includes several key build configurations:
Build Configuration¶
[tool.uv]
build-constraint-dependencies = ["torch==2.11.0"]
extra-build-variables = { vllm = { VLLM_TARGET_DEVICE = "cpu" } }
These settings ensure:
- All packages are built with the same PyTorch version (2.11.0)
- vLLM is built specifically for the CPU backend
Source Repositories¶
The plugin pulls dependencies from specific Git repositories:
[tool.uv.sources]
vllm = { git = "https://github.com/vllm-project/vllm", rev = "..." }
torch-spyre = { git = "https://github.com/torch-spyre/torch-spyre", rev = "..." }
This ensures that both torch-spyre and vllm are compiled from source, instead of pulling pre-compiled wheels from PyPI.
PyTorch CPU Index¶
This ensures the CPU flavor of PyTorch is installed, as CUDA support is not required.