01 / Overview

ERNIE-Imageat a glance.

01

Baidu shipped ERNIE 5.0 on January 22, 2026 as a 2.4 trillion parameter mixture of experts model with under three percent active parameters per token, a 128K token context window, and native omni-modal input and output across text, image, audio, and video. The release pairs the frontier text tier with GenFlow 3.0 agents and a 200 million monthly active user footprint inside Baidu's consumer surface, and it sits at 1460 on LMArena (measured January 15, 2026) for rank one in China and rank eight globally. On the Artificial Analysis Intelligence Index the model scores 29, which lands below the current frontier aggregate of 57 but keeps pace with the top tier on Chinese reading, math, and code categories. You reach ERNIE 5.0 text and omni-modal generation through Baidu's Qianfan API at roughly $0.60 per one million input tokens and $2.10 per one million output tokens, and that is where every text deep dive on this site points its code samples.

02

ERNIE-Image is the image generation sibling and the endpoint you drive from the playground on this site. It is an 8 billion parameter Diffusion Transformer released under Apache 2.0 with open weights, and it is the reason this blog exists as a working surface rather than a pure reading room. On GenEval with the prompt enhancer enabled it posts 0.8728 and on LongTextBench it lands at 0.9733, both of which place it at the top of the open-weight field for typography density and CJK glyph fidelity. fal hosts five endpoints you can call today. fal-ai/ernie-image is the 50 step standard model at $0.03 per megapixel, fal-ai/ernie-image/turbo is the 8 step fast path at $0.01 per megapixel, fal-ai/ernie-image/lora and fal-ai/ernie-image/lora/turbo layer custom adapters on top of either base, and fal-ai/ernie-image-trainer lets you fit your own LoRA from a zip of reference frames.

03

The split matters because ERNIE 5.0 and ERNIE-Image solve different problems and bill on different meters. You use ERNIE 5.0 when you want long-context reasoning, Chinese-first knowledge, or omni-modal output from a single call, and you pay Baidu Qianfan per token for it. You use ERNIE-Image when you need a poster, a menu, a multi-panel comic, a dense keynote slide, or anything with legible Chinese type at scale, and you pay fal per megapixel for it. The editorial side of this blog covers ERNIE 5.0 benchmarks, agent workflows, and Qianfan setup. The playground side ships live pixels from fal-ai/ernie-image. Read a post for intelligence, hit the playground for images, and the pricing page tells you which meter you are about to spend on.

01 / Who it's for
  • 01Chinese-market product teams shipping bilingual posters, packaging, and social
  • 02Agencies building comic strips, dense menu cards, and multi-panel visual assets
  • 03Research teams evaluating frontier omni-modal models against GPT-5 and Gemini 3.1
  • 04Indie devs who want an open-weight image model with a hosted fal endpoint
  • 05Localization teams migrating from DALL-E 3 or Imagen 4 to a CJK-native pipeline
02 / When to pick
  • 01Your output has to render legible Chinese, Japanese, or Korean glyphs at poster scale
  • 02You need dense text layouts, aligned columns, or multi-panel comics with consistent type
  • 03You want an Apache 2.0 open-weight image model you can also self-host or fine-tune
  • 04You want frontier-grade reasoning in a 128K window at a lower price than GPT-5
  • 05You need one model for text, image, audio, and video in a single omni-modal call
03 / Infrastructure

fal hosts ERNIE-Image with both the 50 step standard endpoint and the 8 step Turbo, plus LoRA variants and a trainer, all on a single API key with async queues and webhooks, so you never touch a GPU to ship Chinese-first typography at production scale.

02 / Integration

Call ERNIE-Imagein under 20 lines.

typescriptfal-ai/ernie-image
import { fal } from "@fal-ai/client";

fal.config({ credentials: process.env.FAL_KEY });

const result = await fal.subscribe("fal-ai/ernie-image", {
  input: {
    prompt:
      "A vertical coffee shop poster. Large centered headline reads 'MORNING POUR' in bold serif, Chinese subline 手冲咖啡 八点开门 in elegant brush script beneath. Three espresso cups line the bottom with prices 38元 48元 58元. Warm cream background, deep espresso brown type, thin gold accent rule across the middle.",
    aspect_ratio: "3:4",
    num_images: 1,
    enable_prompt_enhancer: true,
    num_inference_steps: 50,
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs?.map((log) => log.message).forEach(console.log);
    }
  },
});

console.log(result.data.images[0].url);
console.log(`Seed: ${result.data.seed}`);
Expected output
{ images: [{ url: "https://v3.fal.media/files/ernie/..." }], seed: 412839 }
Full API reference
03 / Pricing

What ERNIE-Imagecosts on fal.ai.

01fal-ai/ernie-image
$0.03per megapixel

1 image at 1024x1024 (1.05 MP)

$0.0315
02fal-ai/ernie-image/turbo
$0.01per megapixel

1 image at 1024x1024 (8 steps)

$0.0105
03fal-ai/ernie-image/lora
$0.03per megapixel

1 image at 1024x1024 with custom LoRA

$0.0315
04fal-ai/ernie-image/lora/turbo
$0.01per megapixel

1 image at 1024x1024 with LoRA (8 steps)

$0.0105
05Qianfan API: ERNIE 5.0 input
$0.60per 1M tokens

50K input token prompt

$0.03
06Qianfan API: ERNIE 5.0 output
$2.10per 1M tokens

10K output token response

$0.021

ERNIE 5.0 text and omni-modal runs via Baidu Qianfan; ERNIE-Image image gen runs on fal.ai.

Official pricing page
04 / Comparison

ERNIE-Imagevs the field.

01 · PRIMARYfal-ai/ernie-image
ERNIE-Image
Res
2K (tiled)
Dur
n/a
Price
$0.03/MP
Elo
GenEval 0.87

Chinese-first typography, dense text, multi-panel comics

02fal-ai/flux-2-pro
Flux 2 Pro
Res
2K
Dur
n/a
Price
$0.06/MP
Elo
GenEval 0.85

Photoreal hero stills and general prompt fidelity

03-
GPT Image 2
Res
4096px
Dur
n/a
Price
$0.19/image
Elo
GenEval 0.83

ChatGPT workflow integration, safety-tuned output

04fal-ai/ideogram/v3
Ideogram 3
Res
2048px
Dur
n/a
Price
$0.08/image
Elo
GenEval 0.79

English typography and logo-ready posters

05fal-ai/imagen4
Imagen 4
Res
2K
Dur
n/a
Price
$0.04/image
Elo
GenEval 0.84

Photography realism and Google ecosystem integration

If your output has to carry legible Chinese characters, dense columns of type, or multi-panel comic layouts, ERNIE-Image wins on both quality and price. For English-only photoreal hero shots, Flux 2 Pro or Imagen 4 still sets the bar.

By the numbers

The numbers.

What this publication is and isn't, in numbers.

01/ Published posts
10

Each one is dated, second-person, and opinionated.

02/ Topic categories
7

Filter by the constraint you care about.

03/ Total reading time
67min

Total length of every post in the archive.

04/ Em-dashes tolerate
0

Not a single U+2014 survives our ship check.

05/ Featured picks
1

Editor-selected cover stories.

06/ Posts illustrated
100%

Custom covers on every featured post.

05 / FAQ

Frequentlyasked.

01How does ERNIE 5.0 score on benchmarks versus GPT-5 and Claude Opus 4.7?

ERNIE 5.0 posts 1460 on LMArena as of January 15, 2026, which puts it at rank one inside China and rank eight globally. On the Artificial Analysis Intelligence Index it scores 29 against a frontier aggregate of 57, so GPT-5, Claude Opus 4.7, and Gemini 3.1 Pro still lead on composite reasoning. ERNIE 5.0 leads on Chinese reading comprehension, long-context Chinese code tasks, and omni-modal generation from a single call. You reach the text tier through Baidu's Qianfan API at https://qianfan.cloud.baidu.com for roughly $0.60 input and $2.10 output per million tokens, which lands it below GPT-5 on price while keeping pace on most CJK evaluations.

02How does ERNIE-Image quality compare on Chinese and CJK glyphs?

ERNIE-Image is the current open-weight leader on dense CJK typography. On LongTextBench it scores 0.9733, the highest published number for open weights, and on GenEval with the prompt enhancer it lands at 0.8728. You can drive it live from fal-ai/ernie-image and ask for multi-panel menus, bilingual posters, or dense keynote slides and it will render every character legibly, even at small sizes. Flux 2 Pro and Ideogram 3 both ship good English typography but trip on simplified and traditional Chinese glyphs beyond ten characters. If your layout is Chinese-first or bilingual, ERNIE-Image is the endpoint you want.

03Why do text and image billing split between Qianfan and fal.ai?

They are different products on different infrastructure. ERNIE 5.0 is Baidu's frontier text and omni-modal tier served from Baidu Qianfan at https://qianfan.cloud.baidu.com with per-token billing at roughly $0.60 input and $2.10 output per million tokens. ERNIE-Image is the 8 billion parameter open-weight DiT hosted on fal at fal-ai/ernie-image, fal-ai/ernie-image/turbo, and the LoRA variants, all billed per megapixel at $0.03 standard or $0.01 on Turbo. The editorial side of this blog covers ERNIE 5.0 reasoning and agent workflows. The playground you see here runs fal-ai/ernie-image so you can ship pixels without a Baidu account.

04Can I access ERNIE from outside China?

Yes for both tiers, with different paths. ERNIE-Image runs on fal at fal-ai/ernie-image with a single FAL_KEY, so you reach it from anywhere fal is reachable. ERNIE 5.0 text runs on Baidu Qianfan at https://qianfan.cloud.baidu.com, which now supports international sign-ups and non-CN billing for developers outside mainland China. If your stack is global-first and you only need the image tier, stay on fal. If you need ERNIE 5.0 text reasoning, create a Qianfan account, generate an API key, and call the chat completions endpoint like any OpenAI-compatible service.

05How do I train a LoRA on ERNIE-Image?

Use fal-ai/ernie-image-trainer. You upload a zip of 15 to 50 reference frames, set the subject or style name, and the trainer fits an adapter you can then plug into either fal-ai/ernie-image/lora (50 step standard at $0.03 per megapixel) or fal-ai/ernie-image/lora/turbo (8 step fast at $0.01 per megapixel). The open-weight Apache 2.0 base means you can also pull the weights from Hugging Face and train locally on a single H100, but the fal trainer is the one-click path if you want a hosted adapter in under an hour.

06When does ERNIE-Image beat Flux 2 Pro or GPT Image 2?

Three scenarios. One, any layout that has to carry legible simplified or traditional Chinese characters at any scale beyond a short caption. ERNIE-Image posts 0.9733 on LongTextBench where Flux 2 Pro trips past ten CJK glyphs. Two, multi-panel comic strips and dense menu cards where consistent alignment across ten plus text blocks matters more than photoreal surface detail. Three, cost-sensitive production where you want to ship thousands of variants a day. fal-ai/ernie-image is $0.03 per megapixel versus Flux 2 Pro at $0.06. For English-only photoreal hero stills, Flux 2 Pro and Imagen 4 still win on skin, hair, and subtle lighting.

07How do I migrate a DALL-E 3 pipeline to ERNIE-Image?

Swap the client and rewrite two parameters. DALL-E 3 takes size strings like '1024x1024'; fal-ai/ernie-image takes aspect_ratio as '1:1', '16:9', '9:16', '4:3', '3:4', '3:2', '2:3', or '21:9'. DALL-E 3's quality flag becomes num_inference_steps (50 on the standard endpoint, 8 on Turbo). DALL-E 3's style parameter maps to enable_prompt_enhancer; leave it true unless you want strict literal prompts. Keep your prompt text as-is, install @fal-ai/client, set FAL_KEY, and call fal.subscribe('fal-ai/ernie-image'). You drop from $0.04 per image on DALL-E 3 to $0.03 per megapixel on ERNIE-Image and pick up CJK typography as a bonus.

08How do I produce dense text layouts and multi-panel comics?

Three levers on fal-ai/ernie-image. One, keep enable_prompt_enhancer set to true. It bumps GenEval to 0.8728 and cleans up layout instructions. Two, write prompts as structured blocks. State the panel grid, the text in each panel, and the exact position of every headline and subline. Three, pick aspect_ratio that matches the layout. Use '3:4' for vertical posters, '4:3' for menu cards, and '1:1' for 2x2 comic grids. For four-panel comics with consistent type, describe panel 1 through panel 4 explicitly and note the speech bubble text in quotes so the model treats it as glyph rather than decoration.

09How do I set up Baidu Qianfan for ERNIE 5.0 text calls?

Create a Qianfan account at https://qianfan.cloud.baidu.com, verify your identity, and create an application in the console. Grab the API key and secret key from the application detail page. Qianfan exposes an OpenAI-compatible chat completions endpoint, so you can use the official openai SDK by pointing base_url at the Qianfan endpoint and passing your API key. Call the ERNIE 5.0 model id as listed in the console. Billing is usage-based at roughly $0.60 per million input tokens and $2.10 per million output tokens, with a free tier for initial testing. For omni-modal requests pass image, audio, or video blocks in the messages array.

10Why run ERNIE-Image on fal.ai?

Eight reasons stack up. One, single FAL_KEY covers fal-ai/ernie-image, the Turbo variant, both LoRA endpoints, and the trainer. Two, async queues with webhooks handle bursts without cold starts. Three, per-megapixel billing at $0.03 standard and $0.01 Turbo beats hosted GPU rentals once you pass a few hundred images a day. Four, LoRA training and serving run on the same API key and URL scheme, so fine-tune to production is one line. Five, fal auto-scales, no instance warmup. Six, the endpoint sits alongside 600 plus other models, so you can chain ERNIE-Image with upscalers, video models, or LLMs in one pipeline. Seven, Apache 2.0 open weights means you can always self-host if fal pricing ever stops working. Eight, logs and queue status are first class, so you get observability without wiring Prometheus yourself.

Also reading