Configuration¶

Plugin Setup¶

To load the plugin, set the VLLM_PLUGINS environment variable before running vLLM:

export VLLM_PLUGINS=spyre_inference

Usage¶

You can then use vLLM as usual:

from vllm import LLM

llm = LLM(
    model="ibm-ai-platform/micro-g3.3-8b-instruct-1b",
    max_model_len=128,
    max_num_seqs=2,
)

See the Examples page for more usage patterns.

pyproject.toml Reference¶

The pyproject.toml includes several key build configurations:

Build Configuration¶

[tool.uv]
build-constraint-dependencies = ["torch==2.11.0"]
extra-build-variables = { vllm = { VLLM_TARGET_DEVICE = "cpu" } }

These settings ensure:

All packages are built with the same PyTorch version (2.11.0)
vLLM is built specifically for the CPU backend

Source Repositories¶

The plugin pulls dependencies from specific Git repositories:

[tool.uv.sources]
vllm = { git = "https://github.com/vllm-project/vllm", rev = "..." }
torch-spyre = { git = "https://github.com/torch-spyre/torch-spyre", rev = "..." }

This ensures that both torch-spyre and vllm are compiled from source, instead of pulling pre-compiled wheels from PyPI.

PyTorch CPU Index¶

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

This ensures the CPU flavor of PyTorch is installed, as CUDA support is not required.