Model Discovery
ModelBeam provides a dynamic model registry. Never hardcode model names — use the models endpoint to discover available models at runtime.GET /api/v1/client/models
Returns all available models with their capabilities, limits, and defaults.Query Parameters
| Parameter | Type | Description |
|---|---|---|
filter[inference_types] | string | Comma-separated inference types to filter by |
per_page | integer | Models per page (default: 25) |
page | integer | Page number |
Inference Types
| Type | Description |
|---|---|
txt2img | Text to Image |
img2img | Image to Image |
txt2audio | Text to Speech |
txt2video | Text to Video |
img2video | Image to Video |
aud2video | Audio to Video |
txt2music | Text to Music |
txt2embedding | Text to Embedding |
vid2txt | Video URL to Text |
aud2txt | Audio URL to Text |
videofile2txt | Video File to Text |
audiofile2txt | Audio File to Text |
transcribe | Unified Transcription |
img2txt | Image to Text (OCR) |
img_upscale | Image Upscale |
img_rmbg | Background Removal |
videos_replace | Video Replace (Animate) |
Example Request
Model Schema
Schema Notes
loras—nullfor models without LoRA support, or an array of{"display_name", "name"}objects. Use in generation requests asloras: [{"name": "LoraSlug", "weight": 0.8}].features— Varies by model type. Image models havesupports_guidance,supports_steps,supports_negative_prompt. Video models addsupports_last_frame. TTS models havesupports_voice_clone,supports_custom_voice,supports_voice_design.limits— Varies by model type. Image models have width/height/steps limits. Music models havemin_caption/max_caption,min_duration/max_duration,min_bpm/max_bpm. TTS models havemin_text/max_text,min_speed/max_speed. Embedding models havemax_input_tokens/max_total_tokens.languages— For TTS models, contains supported languages with voice presets.
Available Models
Text to Image
| Model | Slug | Max Resolution | Max Steps |
|---|---|---|---|
| FLUX.1 Schnell 12B NF4 | Flux1schnell | 2048x2048 | 10 |
| FLUX.2 Klein 4B BF16 | Flux_2_Klein_4B_BF16 | 1536x1536 | 10 |
| Z-Image-Turbo INT8 | ZImageTurbo_INT8 | 1536x1536 | 8 |
Image to Image
| Model | Slug | Features |
|---|---|---|
| FLUX.2 Klein 4B BF16 | Flux_2_Klein_4B_BF16 | Steps, guidance, negative prompt |
| Qwen Image Edit Plus NF4 | QwenImageEdit_Plus_NF4 | Prompt-only editing |
Text to Speech
| Model | Slug | Features |
|---|---|---|
| Kokoro | Kokoro | 11 languages, 40+ voices |
| Qwen3 TTS 12Hz 1.7B CustomVoice | Qwen3_TTS_12Hz_1_7B_CustomVoice | Custom voice, voice clone, voice design |
| Qwen3 TTS 12Hz 1.7B VoiceDesign | Qwen3_TTS_12Hz_1_7B_VoiceDesign | Voice design from instructions |
| Qwen3 TTS 12Hz 1.7B Base | Qwen3_TTS_12Hz_1_7B_Base | Clone voice from reference audio |
| Chatterbox | Chatterbox | Voice cloning |
Text to Video
| Model | Slug | Max Resolution | Max Frames |
|---|---|---|---|
| LTX-Video 13B Distilled FP8 | Ltxv_13B_0_9_8_Distilled_FP8 | 1280x1280 | 120 |
| LTX Video 2.3 22B Distilled INT8 | LTX_2_3_22B_Dist_INT8 | 1280x1280 | 120 |
Image to Video
| Model | Slug | Max Resolution | Max Frames |
|---|---|---|---|
| LTX-2.3 22B Distilled INT8 | Ltx2_3_22B_Dist_INT8 | 1280x1280 | 120 |
| LTX Video 2.0 19B Distilled FP8 | LTX_2_19B_Dist_FP8 | 1280x1280 | 120 |
Audio to Video
| Model | Slug | Max Resolution | Max Frames |
|---|---|---|---|
| LTX Video 2.1 9B Distilled FP8 | Ltx2_19B_Dist_FP8 | 1280x1280 | 120 |
Transcription
| Model | Slug | Types |
|---|---|---|
| Whisper Large V3 | WhisperLargeV3 | vid2txt, aud2txt, transcribe, audiofile2txt, videofile2txt |
OCR
| Model | Slug |
|---|---|
| Nanonets OCR S F16 | Nanonets_Ocr_S_F16 |
Embeddings
| Model | Slug | Max Tokens |
|---|---|---|
| BGE M3 FP16 | Bge_M3_FP16 | 8192 per input, 300K total |
Music
| Model | Slug | Duration |
|---|---|---|
| ACE-Step 1.5 Turbo | AceStep_1_5_Turbo | 10-600s |
| ACE-Step 1.5 Base | AceStep_1_5_Base | 10-600s |
| ACE-Step 1.5 XL Turbo INT8 | AceStep_1_5_XL_Turbo_INT8 | 10-600s |
Background Removal
| Model | Slug |
|---|---|
| BEN2 | Ben2 |
Image Upscale
| Model | Slug |
|---|---|
| Real-ESRGAN x4 | RealESRGAN_x4 |
Video Replace
| Model | Slug |
|---|---|
| Wan 2.2 Animate | Wan2_2_Animate |