Architecture
ModelBeam provides a unified REST API that dispatches inference jobs to GPU workers running open-source AI models.System Overview
Components
API Gateway (api.modelbeam.ai)
- Handles authentication, rate limiting, and request validation
- Creates job records and dispatches to GPU workers
- Manages billing (balance checks, price calculation, deductions)
- Serves job status via polling endpoint
GPU Workers
- Serverless GPU instances running AI models
- Auto-scale based on demand
- Send status callbacks as jobs progress
- Upload results to object storage
Storage (storage.modelbeam.ai)
- Stores generated files (images, audio, video)
- Results available via direct download URLs
- Temporary storage with configurable retention
Real-time Layer (soketi.modelbeam.ai)
- Pusher-compatible WebSocket server
- Pushes real-time status updates to connected clients
- Private channels per client for security
Integration Points
| Integration | URL | Protocol |
|---|---|---|
| REST API | https://api.modelbeam.ai | HTTPS |
| WebSockets | wss://soketi.modelbeam.ai | WSS |
| MCP Server | https://mcp.modelbeam.ai/mcp | HTTPS (Streamed) |
| Storage | https://storage.modelbeam.ai | HTTPS |
| Status Page | https://status.modelbeam.ai | HTTPS |
Authentication Flow
- User registers at
modelbeam.aiand receives $5 free credits - User creates an API key from the dashboard
- API key is sent as
Bearertoken in theAuthorizationheader - API validates the key, checks rate limits, and processes the request
Job Flow
- Client sends a POST request to a generation/analysis endpoint
- API validates parameters, checks balance, calculates price
- Balance is deducted and a job record is created (status:
pending) - Job is dispatched to a GPU worker
- API returns
{"data": {"request_id": "UUID"}} - Worker processes the job, sends progress updates
- On completion, results are uploaded to storage
- Client receives results via polling, webhook, or WebSocket