Replicate
by Replicate, Inc.
Run AI with an API
Cloud platform to run, fine-tune, and deploy open-source and custom machine learning models through a simple API, without managing GPU infrastructure.
At a Glance
- Category
- Machine Learning Infrastructure
- Pricing
- Usage-based, Enterprise
- Target Market
- Enterprise, Startups, Developers
- Founded
- 2019
- Headquarters
- San Francisco, California, United States
Key Features
- ✓Run models via API
Access thousands of community-contributed, production-ready models (image, video, speech, music, and LLMs) and call them with as little as one line of code.
- ✓Cog
Open-source tool for packaging machine learning models into containers that automatically generates an API server and handles cloud infrastructure.
- ✓Fine-tuning
Customize existing models on your own data to create specialized versions for particular tasks.
- ✓Deployments & auto-scaling
Deploy your own models on dedicated hardware with automatic scaling up and down based on traffic, paying only for active compute.
- ✓Client SDKs
Official Python and JavaScript/Node.js client libraries for interacting with the platform programmatically.
- ✓Per-second usage billing
Transparent per-second GPU and CPU pricing (plus per-token/per-image rates for certain models) so you are billed only for what you use.
Capabilities
Use Cases
- •Add AI features to applications
Integrate image generation, video, speech, music, and language models into products through a single API without building ML infrastructure.
- •Deploy custom and fine-tuned models
Package models with Cog and deploy them to production with managed, auto-scaling GPU infrastructure.
- •Prototype and experiment with ML
Quickly test and iterate on open-source models in the cloud without provisioning or managing GPUs.
Ideal For
Best For
- ✓Developers integrating AI models into apps via API
- ✓Running open-source models without managing GPU infrastructure
- ✓Fine-tuning and deploying custom or proprietary models to production
Not Ideal For
- ✗Teams that require fully on-premise or air-gapped model hosting
Integrations
Deployment
Pricing
Pay-as-you-go (usage-based)
Per-second hardware billing, e.g. CPU $0.000100/sec (~$0.36/hr), Nvidia T4 $0.000225/sec (~$0.81/hr), Nvidia A100 80GB $0.001400/sec (~$5.04/hr), Nvidia H100 $0.001525/sec (~$5.49/hr); some models billed per token/per image (e.g. FLUX 1.1 Pro $0.04/image)
- ✓Billed by processing time on public models
- ✓Per-second GPU/CPU rates that scale with multi-GPU configs (2x/4x/8x)
- ✓Private/custom models billed for setup, idle, and active time
- ✓Official Python and JavaScript SDKs
Enterprise
Custom
- ✓Dedicated account manager
- ✓Priority support
- ✓Higher GPU limits
- ✓Performance SLAs
- ✓Volume discounts
Pricing is usage-based. Most public models are billed by processing time at per-second hardware rates that vary by GPU/CPU tier (e.g. CPU ~$0.36/hr, Nvidia T4 ~$0.81/hr, L40S ~$3.51/hr, A100 80GB ~$5.04/hr, H100 ~$5.49/hr), with multi-GPU options scaling proportionally. Certain models (notably large language models and some image models) are billed per input/output token or per output image instead of by time. Private/custom models run on dedicated hardware and are billed for setup, idle, and active processing time. Enterprise adds a dedicated account manager, priority support, higher GPU limits, performance SLAs, and volume discounts. Exact per-model rates are listed on each model page and on the pricing page.
Connect
Stay Ahead of the Curve
Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.
Subscribe