Groq

Name: Groq
Author: Groq, Inc.

by Groq, Inc.

InferenceAI InfrastructureLLM API

Ultra-fast LPU inference for open-weight LLMs

Usage-based · Free tier · Enterprise·Added July 2, 2026·Updated July 2, 2026

THE DAILY BRIEF

Groq

by Groq, Inc.

InferenceAI InfrastructureLLM API

Ultra-fast LPU inference for open-weight LLMs

Usage-based · Free tier · Enterprise

Groq is an AI inference platform powered by its custom LPU (Language Processing Unit) chip, delivering ultra-low-latency, high-throughput token generation for open-weight LLMs through GroqCloud's OpenAI-compatible API.

At a Glance

Category: Inference
Pricing: Usage-based, Free tier, Enterprise
Target Market: Developers, Startups, Enterprise
Founded: 2016
Headquarters: San Jose, California, USA

Key Features

✓LPU inference chip
✓GroqCloud API
✓Built-in agentic tools
✓Batch API
✓Prompt caching
✓LoRA adapter serving

Capabilities

✓api access

✓text generation

✓code generation

✓speech to text

✓text to speech

✓agent orchestration

Use Cases

•Real-time conversational AI and voice agents
•High-throughput agentic applications
•Speech transcription at scale
•Sovereign and regulated on-prem inference

Ideal For

Best For

✓Ultra-low-latency LLM inference
✓Real-time voice and chat assistants
✓High-volume, cost-sensitive token workloads
✓Migrating off OpenAI with minimal code change
✓On-premise or sovereign inference deployments

Not Ideal For

✗Teams needing proprietary frontier models (GPT-4/Claude/Gemini)
✗Image-generation workloads
✗Managed model training or full fine-tuning as a service

Pricing

Free

✓Free API key with rate limits
✓OpenAI-compatible endpoints

Pay-as-you-go

Usage-based per token

✓Per-token pricing (e.g., Llama 3.1 8B Instant $0.05/$0.08 per 1M in/out; Llama 3.3 70B $0.59/$0.79)
✓Batch API 50% discount
✓Prompt caching 50% discount

Enterprise

Custom

✓Dedicated capacity
✓LoRA adapter serving
✓On-prem GroqRack / GroqNode

Usage-based per-token pricing published at groq.com/pricing (verified 2026-07). Free developer tier with rate limits, then pay-as-you-go. Speech-to-text billed per hour transcribed (Whisper V3 Large ~$0.111/hr; Turbo ~$0.04/hr); text-to-speech per 1M characters. Built-in tools priced separately. Prices change as the model roster updates.

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

beri.net

Subscribe at beri.net/subscribe for twice-weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi | X: x.com/rajeshberi

Visit Website

At a Glance

Category: Inference
Pricing: Usage-based, Free tier, Enterprise
Target Market: Developers, Startups, Enterprise
Founded: 2016
Headquarters: San Jose, California, USA

Key Features

✓
LPU inference chip
Custom silicon purpose-built for AI inference, delivering very high tokens/sec at low cost.
✓
GroqCloud API
OpenAI-compatible, tokens-as-a-service inference for open-weight models.
✓
Built-in agentic tools
Web search, website visit, code execution, and browser automation callable from the API.
✓
Batch API
50% lower cost for large-scale asynchronous inference jobs.
✓
Prompt caching
50% discount on cached input tokens for repeated context.
✓
LoRA adapter serving
Deploy multiple custom LoRA fine-tunes at base-model speed (enterprise tier).

Capabilities

✓api access

✓text generation

✓code generation

✓speech to text

✓text to speech

✓agent orchestration

Use Cases

•
Real-time conversational AI and voice agents
Low-latency token streaming plus Whisper speech-to-text and Orpheus text-to-speech power responsive chat and voice assistants.
•
High-throughput agentic applications
Agents that make many fast LLM calls with built-in web search and code execution via Groq Compound.
•
Speech transcription at scale
Whisper Large v3 Turbo transcription at very high real-time factors and low cost.
•
Sovereign and regulated on-prem inference
GroqRack clusters for data-residency and air-gapped enterprise deployments.

Ideal For

Best For

✓Ultra-low-latency LLM inference
✓Real-time voice and chat assistants
✓High-volume, cost-sensitive token workloads
✓Migrating off OpenAI with minimal code change
✓On-premise or sovereign inference deployments

Not Ideal For

✗Teams needing proprietary frontier models (GPT-4/Claude/Gemini)
✗Image-generation workloads
✗Managed model training or full fine-tuning as a service

Integrations

✓API Support

✓SDK Available

SDK:PythonJavaScript/TypeScript

Deployment

✓Self-Hosted

✓Cloud-Hosted

✓On-Premise

GroqCloud (hosted API)GroqRack / GroqNode on-premise clusters

Pricing

✓Free Trial Available

Free

✓Free API key with rate limits
✓OpenAI-compatible endpoints

Pay-as-you-go

Usage-based per token

✓Per-token pricing (e.g., Llama 3.1 8B Instant $0.05/$0.08 per 1M in/out; Llama 3.3 70B $0.59/$0.79)
✓Batch API 50% discount
✓Prompt caching 50% discount

Enterprise

Custom

✓Dedicated capacity
✓LoRA adapter serving
✓On-prem GroqRack / GroqNode

Connect

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Latest Articles

View All →

Groq

At a Glance

Key Features

Capabilities

Use Cases

Ideal For

Best For

Not Ideal For

Pricing

Free

Pay-as-you-go

Enterprise

THE DAILY BRIEF

At a Glance

Key Features

Capabilities

Use Cases

Ideal For

Best For

Not Ideal For

Integrations

Deployment

Pricing

Free

Pay-as-you-go

Enterprise

Connect

Stay Ahead of the Curve

Related Products

Cerebras

Fireworks AI

Baseten

Latest Articles

Microsoft's $2.5B Bet: AI Can't Deploy Itself

19 Days Dark: How a Shutdown Broke Enterprise AI's Vendor Myth

Microsoft and AWS Bet $3.5B That AI Deployment Is Broken

$145B Cloud War: Meta's Move That Wiped $12B in One Day