Architecture

ModelBeam provides a unified REST API that dispatches inference jobs to GPU workers running open-source AI models.

System Overview

Client → api.modelbeam.ai → Job Queue → GPU Workers → Storage

                          Status Updates
                    (Polling / Webhooks / WebSockets)

Components

API Gateway (api.modelbeam.ai)

  • Handles authentication, rate limiting, and request validation
  • Creates job records and dispatches to GPU workers
  • Manages billing (balance checks, price calculation, deductions)
  • Serves job status via polling endpoint

GPU Workers

  • Serverless GPU instances running AI models
  • Auto-scale based on demand
  • Send status callbacks as jobs progress
  • Upload results to object storage

Storage (storage.modelbeam.ai)

  • Stores generated files (images, audio, video)
  • Results available via direct download URLs
  • Temporary storage with configurable retention

Real-time Layer (soketi.modelbeam.ai)

  • Pusher-compatible WebSocket server
  • Pushes real-time status updates to connected clients
  • Private channels per client for security

Integration Points

IntegrationURLProtocol
REST APIhttps://api.modelbeam.aiHTTPS
WebSocketswss://soketi.modelbeam.aiWSS
MCP Serverhttps://mcp.modelbeam.ai/mcpHTTPS (Streamed)
Storagehttps://storage.modelbeam.aiHTTPS
Status Pagehttps://status.modelbeam.aiHTTPS

Authentication Flow

  1. User registers at modelbeam.ai and receives $5 free credits
  2. User creates an API key from the dashboard
  3. API key is sent as Bearer token in the Authorization header
  4. API validates the key, checks rate limits, and processes the request

Job Flow

  1. Client sends a POST request to a generation/analysis endpoint
  2. API validates parameters, checks balance, calculates price
  3. Balance is deducted and a job record is created (status: pending)
  4. Job is dispatched to a GPU worker
  5. API returns {"data": {"request_id": "UUID"}}
  6. Worker processes the job, sends progress updates
  7. On completion, results are uploaded to storage
  8. Client receives results via polling, webhook, or WebSocket