WebLLM
Overview
Section titled “Overview”WebLLM is a web application that runs large language models entirely in the browser using WebGPU. No server required — the model runs locally on your GPU, keeping your conversations private and eliminating API costs.
Features
Section titled “Features”- Browser-Native — Runs entirely client-side using WebGPU acceleration
- Model Selection — Choose from multiple available models
- Streaming Responses — See tokens generated in real-time as the model produces output
- Resizable Interface — Adjustable response area for comfortable reading
- Privacy — All processing happens locally, no data leaves your browser
Requirements
Section titled “Requirements”WebLLM requires a browser with WebGPU support (Chrome 113+, Edge 113+). A GPU with sufficient VRAM is recommended for reasonable inference speeds.
Tech Stack
Section titled “Tech Stack”| Language | TypeScript (89.5%) |
| API | WebGPU |
| Runtime | Bun |
Source Code
Section titled “Source Code”The source code is available on the project’s GitHub repository.