Skip to content

WebLLM

WebLLM is a web application that runs large language models entirely in the browser using WebGPU. No server required — the model runs locally on your GPU, keeping your conversations private and eliminating API costs.

  • Browser-Native — Runs entirely client-side using WebGPU acceleration
  • Model Selection — Choose from multiple available models
  • Streaming Responses — See tokens generated in real-time as the model produces output
  • Resizable Interface — Adjustable response area for comfortable reading
  • Privacy — All processing happens locally, no data leaves your browser

WebLLM requires a browser with WebGPU support (Chrome 113+, Edge 113+). A GPU with sufficient VRAM is recommended for reasonable inference speeds.

LanguageTypeScript (89.5%)
APIWebGPU
RuntimeBun

The source code is available on the project’s GitHub repository.