Supported AI models — Opten catalog

FLUX.1 Pro / FLUX.1.1 Pro Ultra

FLUX.1 is Black Forest Labs' flagship image model (schnell, dev, pro, 1.1 pro Ultra). Its dual CLIP + T5-XXL encoder interprets long cohe…

Image

FLUX Kontext Pro / Max / Multi

FLUX Kontext is Black Forest Labs' image-to-image editing model (Pro, Max, Multi). It accepts an input image plus a change instruction. T…

Image

GPT Image 1

GPT Image 1 is an OpenAI image model with natural-language prompting and strong in-image text rendering. It runs in ChatGPT and via API,…

Image

GPT Image 1.5

GPT Image 1.5 is OpenAI's image model with improved photorealism, identity preservation during editing, and multi-image input. It support…

Image

GPT Image 2

GPT Image 2 is OpenAI's image model with SOTA in-image text rendering and a thinking mode. It treats prompts as design briefs, processes…

Image

Grok Imagine (Aurora)

Grok Imagine (Aurora) is xAI's image model with an autoregressive MoE Transformer architecture, not diffusion. It excels at photorealisti…

Video

Happy Horse 1.0

Happy Horse 1.0 (快乐小马) is Alibaba ATH AI Innovation Unit's video model — 15B parameters, unified single-stream Transformer. It generates…

Video

Higgsfield Soul 2.0 / Soul Cinema / DoP

Higgsfield is a platform with proprietary models — Soul 2.0 (image, up to 4K), Soul Cinema (era-aware image), and DoP (image-to-video, 5…

Image

Imagen 4

Imagen 4 is Google's next-generation image model with upgraded typography and ultra-photorealism. It works on natural language, is optimi…

Image

Imagen 4 Ultra

Imagen 4 Ultra is Google's premium Imagen 4 with maximum detail and prompt fidelity. It rewards long, detailed descriptions (100–400 word…

Video

Kling 2.6 Pro

Kling 2.6 Pro is Kuaishou's video model on klingai.com. It generates clips up to 10 seconds at 1080p and supports T2V, I2V, Elements (up…

Video

Kling 3.0

Kling 3.0 is Kuaishou's flagship video model on klingai.com. Duration up to 15 seconds, Multi-shot with up to 6 shots in one generation,…

Video

Kling Motion Control

Kling Motion Control is a Kuaishou Kling mode for transferring motion from a reference video onto a character from an image. Duration 5–1…

Video

Kling O1

Kling O1 is Kuaishou's reasoning video model on klingai.com. Duration up to 10 seconds, resolution up to 1080p, four specialized modes: I…

Video

LTX 2 (Fast / Pro)

LTX 2 is Lightricks' open-source video model at ltx.io. It comes in two versions: Fast (up to 20 seconds, 2x faster) and Pro (up to 10 se…

Video

Luma Ray 2

Luma Ray 2 is Luma's large-scale video model in Dream Machine, trained directly on video data. It understands natural motion, realistic l…

Video

Luma Ray 3 / Ray 3.14 / Ray 3 Reasoning

Luma Ray 3 is the Ray 3 lineup: Ray 3.14 (the workhorse, default for 90% of tasks) and Ray 3 Reasoning (the multimodal «thinking» model).…

Image

Luma Uni-1

Luma Uni-1 is a Luma Labs image model with a unique architecture: decoder-only autoregressive transformer (NOT diffusion), generating pix…

Image

MidJourney V7

Midjourney V7 is Midjourney's flagship image model released on April 3, 2025. V7 fundamentally changed prompt writing — the model now und…

Image

MidJourney V8 (Alpha)

Midjourney V8 Alpha is the new model, available since March 17, 2026 only on alpha.midjourney.com (Discord is not supported). V8 is not a…

Image

MidJourney V8.1 (Alpha)

Midjourney V8.1 Alpha is the V8 upgrade, available only on alpha.midjourney.com (no Discord). Main shifts: HD is now default (the --hd fl…

Image

MidJourney Niji (5/6/7)

Midjourney Niji is the specialized Midjourney model for anime, manga, and Eastern illustration. The current recommended version is Niji 7…

Video

MidJourney Video

Midjourney Video is Midjourney's Image-to-Video model for short animations of still images. Pure Text-to-Video isn't supported: a referen…

Video

MiniMax Hailuo 02

MiniMax Hailuo 02 is the predecessor of Hailuo 2.3, still relevant for its unique FL2V (First-and-Last-Frame-to-Video) mode and strong ph…

Video

MiniMax Hailuo 2.3

MiniMax Hailuo 2.3 is the flagship of MiniMax video models: T2V and I2V up to 1080P, 25fps, with bracket camera syntax `[Push in]` suppor…

Video

MiniMax I2V-01-Live

MiniMax I2V-01-Live is a specialized Image-to-Video model for animating 2D illustrations: anime, manga, digital portraits, concept art. U…

Image

mystic

Mystic 2.5 is Freepik's proprietary image model in the Pikaso platform. Text-to-image up to 2K, with Style and Character reference types…

Image

Nano Banana 2

Nano Banana 2 is Google's second-generation image model in the Gemini API, with up to 2K resolution, basic thinking mode, and support for…

Image

Nano Banana Pro

Nano Banana Pro is Google's flagship in Gemini 3 Pro Image: 4K, up to 14 references (6 high fidelity), full thinking mode, and SOTA text…

Video

OmniHuman 1.5

OmniHuman 1.5 is ByteDance's specialized video model for animating people via Image + Audio → Video. 1024×1024 at 30fps, up to 30 seconds…

Video

PixVerse V6 (V5.5)

PixVerse V6 is a video model from PixVerse with native audio generation, multi-shot mode, and 20+ cinematic lens controls. It supports T2…

Image

Qwen Image (V1 / V2.0)

Qwen Image is the Alibaba Qwen team's image model with leading text rendering: commercial-grade English and Chinese, multi-line layouts,…

Image

Recraft V4 / V4 Pro

Recraft V4 is the only AI model that produces true editable SVG with structured layers. Accurate text rendering, real design taste, two p…

Image

Reve Image 1.0

Reve Image 1.0 is an image model from Reve AI with 12 billion parameters, native 2048×2048, and 4K upscaling. It's #1 on Artificial Analy…

Video

Runway Act-Two

Runway Act-Two is a performance transfer model, not text-to-video. You feed it a driving video with an actor's performance and a characte…

Video

Runway Gen-4

Runway Gen-4 is an image-to-video model from Runway with native 720p (upscale to 4K) and a fixed duration of 5 or 10 seconds. Generation…

Video

Runway Gen-4.5

Runway Gen-4.5 is Runway's first model with full text-to-video alongside image-to-video. The Autoregressive-to-Diffusion architecture imp…

Video

Seedance 1.0 Lite

Seedance 1.0 Lite is the lightweight Seedance variant from ByteDance. Fixed duration of 5 or 10 seconds, resolution 480p or 720p, text-on…

Video

Seedance 1.0 Pro

Seedance 1.0 Pro is the full-featured first-generation video model from ByteDance on the 即梦 (Jimeng) platform. It produces 5 or 10 second…

Video

Seedance 1.5 Pro

Seedance 1.5 Pro is ByteDance's intermediate video model between generations 1.0 and 2.0. It produces 5 or 10 second clips up to 1080p an…

Video

Seedance 2.0

Seedance 2.0 is ByteDance's flagship video model on the 即梦 (Jimeng) platform. It generates 4–15 second clips up to 2K, accepts up to 9 im…

Video

Seedance New

Seedance New is the latest iteration of ByteDance's video model, successor to Seedance 2.0 with experimental refinements. It makes 4–15 s…

Image

Seedream 4.0

Seedream 4.0 is the baseline image model from ByteDance and the first generation of the family. Text-to-image up to 2K, optimal prompt le…

Image

Seedream 4.5

Seedream 4.5 is the mainstream version of ByteDance's image model. Text-to-image, image-to-image, and multi-image blending up to 4K. Opti…

Image

Seedream 5 Lite

Seedream 5 Lite is the latest version of ByteDance's image model. Text-to-image, image-to-image, multi-image blending, inpainting, and ou…

Video

Sora 2 / Sora 2 Pro

Sora 2 is OpenAI's video model with native audio, support for up to two characters via the Characters API, and clips of 4-20 seconds. The…

Video

Veed Fabric 1.0

Veed Fabric 1.0 is a specialized lip-sync model, not a general video generator. The input is an image plus audio (or a TTS speech script)…

Video

Google Veo 3

Veo 3 is the first Google DeepMind model to generate audio natively together with video: dialogue, background sounds, music, SFX. Clips a…

Video

Google Veo 3.1 (incl. Veo 3.1 Fast and Veo 3.1 Fast Relax)

Veo 3.1 is Google DeepMind's updated video model with stronger prompt adherence, native 1080p, vertical 9:16 format, and image-to-video.…

Image

Wan (General — 2.5 / 2.6)

Wan is Alibaba's open-source T2I model, available via fal.ai, Replicate, and for local execution. It accepts natural language prompts wit…

Image

Z-Image (Base / Turbo)

Z-Image is Alibaba Tongyi-MAI's compact 6B image model with open Apache 2.0 weights. Its key features are bilingual text rendering (Engli…