R

Replicate

by Replicate, Inc.

Machine Learning InfrastructureModel Inference APIMLOps

Run AI with an API

Usage-based · Enterprise·Added July 2, 2026·Updated July 2, 2026
Share:
THE DAILY BRIEF
Replicate

by Replicate, Inc.

Machine Learning InfrastructureModel Inference APIMLOps

Run AI with an API

Usage-based · Enterprise

Cloud platform to run, fine-tune, and deploy open-source and custom machine learning models through a simple API, without managing GPU infrastructure.

At a Glance

Category
Machine Learning Infrastructure
Pricing
Usage-based, Enterprise
Target Market
Enterprise, Startups, Developers
Founded
2019
Headquarters
San Francisco, California, United States

Key Features

  • Run models via API
  • Cog
  • Fine-tuning
  • Deployments & auto-scaling
  • Client SDKs
  • Per-second usage billing

Capabilities

api access
fine tuning
model deployment
auto scaling
sdk available
custom model hosting
usage based billing

Use Cases

  • Add AI features to applications
  • Deploy custom and fine-tuned models
  • Prototype and experiment with ML

Ideal For

Best For

  • Developers integrating AI models into apps via API
  • Running open-source models without managing GPU infrastructure
  • Fine-tuning and deploying custom or proprietary models to production

Not Ideal For

  • Teams that require fully on-premise or air-gapped model hosting

Pricing

Pay-as-you-go (usage-based)

Per-second hardware billing, e.g. CPU $0.000100/sec (~$0.36/hr), Nvidia T4 $0.000225/sec (~$0.81/hr), Nvidia A100 80GB $0.001400/sec (~$5.04/hr), Nvidia H100 $0.001525/sec (~$5.49/hr); some models billed per token/per image (e.g. FLUX 1.1 Pro $0.04/image)

  • Billed by processing time on public models
  • Per-second GPU/CPU rates that scale with multi-GPU configs (2x/4x/8x)
  • Private/custom models billed for setup, idle, and active time
  • Official Python and JavaScript SDKs

Enterprise

Custom

  • Dedicated account manager
  • Priority support
  • Higher GPU limits
  • Performance SLAs
  • Volume discounts

Pricing is usage-based. Most public models are billed by processing time at per-second hardware rates that vary by GPU/CPU tier (e.g. CPU ~$0.36/hr, Nvidia T4 ~$0.81/hr, L40S ~$3.51/hr, A100 80GB ~$5.04/hr, H100 ~$5.49/hr), with multi-GPU options scaling proportionally. Certain models (notably large language models and some image models) are billed per input/output token or per output image instead of by time. Private/custom models run on dedicated hardware and are billed for setup, idle, and active processing time. Enterprise adds a dedicated account manager, priority support, higher GPU limits, performance SLAs, and volume discounts. Exact per-model rates are listed on each model page and on the pricing page.

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

beri.net

Subscribe at beri.net/subscribe for twice-weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Cloud platform to run, fine-tune, and deploy open-source and custom machine learning models through a simple API, without managing GPU infrastructure.

At a Glance

Category
Machine Learning Infrastructure
Pricing
Usage-based, Enterprise
Target Market
Enterprise, Startups, Developers
Founded
2019
Headquarters
San Francisco, California, United States

Key Features

  • Run models via API

    Access thousands of community-contributed, production-ready models (image, video, speech, music, and LLMs) and call them with as little as one line of code.

  • Cog

    Open-source tool for packaging machine learning models into containers that automatically generates an API server and handles cloud infrastructure.

  • Fine-tuning

    Customize existing models on your own data to create specialized versions for particular tasks.

  • Deployments & auto-scaling

    Deploy your own models on dedicated hardware with automatic scaling up and down based on traffic, paying only for active compute.

  • Client SDKs

    Official Python and JavaScript/Node.js client libraries for interacting with the platform programmatically.

  • Per-second usage billing

    Transparent per-second GPU and CPU pricing (plus per-token/per-image rates for certain models) so you are billed only for what you use.

Capabilities

api access
fine tuning
model deployment
auto scaling
sdk available
custom model hosting
usage based billing

Use Cases

  • Add AI features to applications

    Integrate image generation, video, speech, music, and language models into products through a single API without building ML infrastructure.

  • Deploy custom and fine-tuned models

    Package models with Cog and deploy them to production with managed, auto-scaling GPU infrastructure.

  • Prototype and experiment with ML

    Quickly test and iterate on open-source models in the cloud without provisioning or managing GPUs.

Ideal For

Best For

  • Developers integrating AI models into apps via API
  • Running open-source models without managing GPU infrastructure
  • Fine-tuning and deploying custom or proprietary models to production

Not Ideal For

  • Teams that require fully on-premise or air-gapped model hosting

Integrations

API Support
SDK Available
SDK:PythonJavaScript/Node.js

Deployment

Self-Hosted
Cloud-Hosted
On-Premise
Cloud-hosted API

Pricing

Pay-as-you-go (usage-based)

Per-second hardware billing, e.g. CPU $0.000100/sec (~$0.36/hr), Nvidia T4 $0.000225/sec (~$0.81/hr), Nvidia A100 80GB $0.001400/sec (~$5.04/hr), Nvidia H100 $0.001525/sec (~$5.49/hr); some models billed per token/per image (e.g. FLUX 1.1 Pro $0.04/image)

  • Billed by processing time on public models
  • Per-second GPU/CPU rates that scale with multi-GPU configs (2x/4x/8x)
  • Private/custom models billed for setup, idle, and active time
  • Official Python and JavaScript SDKs

Enterprise

Custom

  • Dedicated account manager
  • Priority support
  • Higher GPU limits
  • Performance SLAs
  • Volume discounts

Pricing is usage-based. Most public models are billed by processing time at per-second hardware rates that vary by GPU/CPU tier (e.g. CPU ~$0.36/hr, Nvidia T4 ~$0.81/hr, L40S ~$3.51/hr, A100 80GB ~$5.04/hr, H100 ~$5.49/hr), with multi-GPU options scaling proportionally. Certain models (notably large language models and some image models) are billed per input/output token or per output image instead of by time. Private/custom models run on dedicated hardware and are billed for setup, idle, and active processing time. Enterprise adds a dedicated account manager, priority support, higher GPU limits, performance SLAs, and volume discounts. Exact per-model rates are listed on each model page and on the pricing page.

Connect

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe