Petals
Run large language models at home, BitTorrent‑style
- Generate text with Llama 3.1 (up to 405B), Mixtral (8x22B), Falcon (40B+) or BLOOM (176B) and fine‑tune them for your tasks — using a consumer-grade GPU or Google Colab.
- You load a part of the model, then join a network of people serving its other parts. Single‑batch inference runs at up to 6 tokens/sec for Llama 2 (70B) and up to 4 tokens/sec for Falcon (180B) — enough for chatbots and interactive apps.
- Beyond classic LLM APIs — you can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of PyTorch and 🤗 Transformers.
Top contributors right now:
Loading...
Follow development in Discord or via email:
We send updates once a few months. No spam.
We sent you an email to confirm your address. Click it and you're in!
Featured on:
This project is a part of the BigScience research workshop.