Everything you need to shipChinese-first typography and posterswith Baidu ERNIE.
10 guides covering the ERNIE family end to end. ERNIE 5.0 omni-modal reasoning, ERNIE-Image 3.0 DiT for dense CJK text and multi-panel layouts. Real numbers, real code, real pipelines.
Dated, opinionated, written once and kept current. Every entry is one subject, answered. No filler between the signal.
ERNIE-Imageat a glance.
Baidu shipped ERNIE 5.0 on January 22, 2026 as a 2.4 trillion parameter mixture of experts model with under three percent active parameters per token, a 128K token context window, and native omni-modal input and output across text, image, audio, and video. The release pairs the frontier text tier with GenFlow 3.0 agents and a 200 million monthly active user footprint inside Baidu's consumer surface, and it sits at 1460 on LMArena (measured January 15, 2026) for rank one in China and rank eight globally. On the Artificial Analysis Intelligence Index the model scores 29, which lands below the current frontier aggregate of 57 but keeps pace with the top tier on Chinese reading, math, and code categories. You reach ERNIE 5.0 text and omni-modal generation through Baidu's Qianfan API at roughly $0.60 per one million input tokens and $2.10 per one million output tokens, and that is where every text deep dive on this site points its code samples.
ERNIE-Image is the image generation sibling and the endpoint you drive from the playground on this site. It is an 8 billion parameter Diffusion Transformer released under Apache 2.0 with open weights, and it is the reason this blog exists as a working surface rather than a pure reading room. On GenEval with the prompt enhancer enabled it posts 0.8728 and on LongTextBench it lands at 0.9733, both of which place it at the top of the open-weight field for typography density and CJK glyph fidelity. fal hosts five endpoints you can call today. fal-ai/ernie-image is the 50 step standard model at $0.03 per megapixel, fal-ai/ernie-image/turbo is the 8 step fast path at $0.01 per megapixel, fal-ai/ernie-image/lora and fal-ai/ernie-image/lora/turbo layer custom adapters on top of either base, and fal-ai/ernie-image-trainer lets you fit your own LoRA from a zip of reference frames.
The split matters because ERNIE 5.0 and ERNIE-Image solve different problems and bill on different meters. You use ERNIE 5.0 when you want long-context reasoning, Chinese-first knowledge, or omni-modal output from a single call, and you pay Baidu Qianfan per token for it. You use ERNIE-Image when you need a poster, a menu, a multi-panel comic, a dense keynote slide, or anything with legible Chinese type at scale, and you pay fal per megapixel for it. The editorial side of this blog covers ERNIE 5.0 benchmarks, agent workflows, and Qianfan setup. The playground side ships live pixels from fal-ai/ernie-image. Read a post for intelligence, hit the playground for images, and the pricing page tells you which meter you are about to spend on.
- 01Chinese-market product teams shipping bilingual posters, packaging, and social
- 02Agencies building comic strips, dense menu cards, and multi-panel visual assets
- 03Research teams evaluating frontier omni-modal models against GPT-5 and Gemini 3.1
- 04Indie devs who want an open-weight image model with a hosted fal endpoint
- 05Localization teams migrating from DALL-E 3 or Imagen 4 to a CJK-native pipeline
- 01Your output has to render legible Chinese, Japanese, or Korean glyphs at poster scale
- 02You need dense text layouts, aligned columns, or multi-panel comics with consistent type
- 03You want an Apache 2.0 open-weight image model you can also self-host or fine-tune
- 04You want frontier-grade reasoning in a 128K window at a lower price than GPT-5
- 05You need one model for text, image, audio, and video in a single omni-modal call
fal hosts ERNIE-Image with both the 50 step standard endpoint and the 8 step Turbo, plus LoRA variants and a trainer, all on a single API key with async queues and webhooks, so you never touch a GPU to ship Chinese-first typography at production scale.
The posts we point people at when they ask where to start with Baidu ERNIE.
Three to read first.
Baidu Qianfan vs fal Endpoints: When to Use Each
The ERNIE family splits across two platforms. Qianfan serves ERNIE 5.0 text and multimodal from China. fal serves ERNIE-Image globally. Here is the exact routing call.
CJK Text Rendering: Where ERNIE-Image Beats Flux 2 Pro and GPT Image 2
Debugging ERNIE-Image: Why Your English Posters Look Off
Every topic we cover.
Comparison
- CJK Text Rendering: Where ERNIE-Image Beats Flux 2 Pro and GPT Image 2
- ERNIE 5.0 vs 4.5: The Omni-Modal Leap
- ERNIE 5.0 vs GPT-5 vs Claude Opus 4.7: Real Benchmark Reads
Integration
Debugging
Workflow
Technique
Use case
Prompting
The category with the most coverage. 3 posts in this thread.
All 3 in ComparisonMore on Comparison.
Call ERNIE-Imagein under 20 lines.
import { fal } from "@fal-ai/client";
fal.config({ credentials: process.env.FAL_KEY });
const result = await fal.subscribe("fal-ai/ernie-image", {
input: {
prompt:
"A vertical coffee shop poster. Large centered headline reads 'MORNING POUR' in bold serif, Chinese subline 手冲咖啡 八点开门 in elegant brush script beneath. Three espresso cups line the bottom with prices 38元 48元 58元. Warm cream background, deep espresso brown type, thin gold accent rule across the middle.",
aspect_ratio: "3:4",
num_images: 1,
enable_prompt_enhancer: true,
num_inference_steps: 50,
},
logs: true,
onQueueUpdate: (update) => {
if (update.status === "IN_PROGRESS") {
update.logs?.map((log) => log.message).forEach(console.log);
}
},
});
console.log(result.data.images[0].url);
console.log(`Seed: ${result.data.seed}`);{ images: [{ url: "https://v3.fal.media/files/ernie/..." }], seed: 412839 }What ERNIE-Imagecosts on fal.ai.
1 image at 1024x1024 (1.05 MP)
1 image at 1024x1024 (8 steps)
1 image at 1024x1024 with custom LoRA
1 image at 1024x1024 with LoRA (8 steps)
50K input token prompt
10K output token response
ERNIE 5.0 text and omni-modal runs via Baidu Qianfan; ERNIE-Image image gen runs on fal.ai.
Official pricing pageLatest posts.
ERNIE 5.0 vs GPT-5 vs Claude Opus 4.7: Real Benchmark Reads
Two headline numbers tell you where ERNIE 5.0 sits: LMArena 1460 at rank 8, AA Intelligence Index 29. Here is what both actually mean for your product.
ERNIE-Image vs Turbo: The 50 vs 8 Step Tradeoff
Integrating ERNIE-Image LoRAs Into Your Brand System
LongTextBench and GenEval: Reading ERNIE-Image's Scores
Multi-Panel Comics with ERNIE-Image: Step by Step
Prompting ERNIE-Image for Dense Typography
ERNIE-Image is an 8B DiT that was trained with posters, CJK signage, and multi-panel comics in the corpus. Here is how you write prompts that actually render the words you asked for.
ERNIE-Imagevs the field.
Chinese-first typography, dense text, multi-panel comics
Photoreal hero stills and general prompt fidelity
ChatGPT workflow integration, safety-tuned output
English typography and logo-ready posters
Photography realism and Google ecosystem integration
If your output has to carry legible Chinese characters, dense columns of type, or multi-panel comic layouts, ERNIE-Image wins on both quality and price. For English-only photoreal hero shots, Flux 2 Pro or Imagen 4 still sets the bar.
The numbers.
What this publication is and isn't, in numbers.
Each one is dated, second-person, and opinionated.
Filter by the constraint you care about.
Total length of every post in the archive.
Not a single U+2014 survives our ship check.
Editor-selected cover stories.
Custom covers on every featured post.
What we write about most.
Keyword frequency across every post. The bigger the word, the more often we come back to it.
Frequentlyasked.
01How does ERNIE 5.0 score on benchmarks versus GPT-5 and Claude Opus 4.7?
ERNIE 5.0 posts 1460 on LMArena as of January 15, 2026, which puts it at rank one inside China and rank eight globally. On the Artificial Analysis Intelligence Index it scores 29 against a frontier aggregate of 57, so GPT-5, Claude Opus 4.7, and Gemini 3.1 Pro still lead on composite reasoning. ERNIE 5.0 leads on Chinese reading comprehension, long-context Chinese code tasks, and omni-modal generation from a single call. You reach the text tier through Baidu's Qianfan API at https://qianfan.cloud.baidu.com for roughly $0.60 input and $2.10 output per million tokens, which lands it below GPT-5 on price while keeping pace on most CJK evaluations.
02How does ERNIE-Image quality compare on Chinese and CJK glyphs?
ERNIE-Image is the current open-weight leader on dense CJK typography. On LongTextBench it scores 0.9733, the highest published number for open weights, and on GenEval with the prompt enhancer it lands at 0.8728. You can drive it live from fal-ai/ernie-image and ask for multi-panel menus, bilingual posters, or dense keynote slides and it will render every character legibly, even at small sizes. Flux 2 Pro and Ideogram 3 both ship good English typography but trip on simplified and traditional Chinese glyphs beyond ten characters. If your layout is Chinese-first or bilingual, ERNIE-Image is the endpoint you want.
03Why do text and image billing split between Qianfan and fal.ai?
They are different products on different infrastructure. ERNIE 5.0 is Baidu's frontier text and omni-modal tier served from Baidu Qianfan at https://qianfan.cloud.baidu.com with per-token billing at roughly $0.60 input and $2.10 output per million tokens. ERNIE-Image is the 8 billion parameter open-weight DiT hosted on fal at fal-ai/ernie-image, fal-ai/ernie-image/turbo, and the LoRA variants, all billed per megapixel at $0.03 standard or $0.01 on Turbo. The editorial side of this blog covers ERNIE 5.0 reasoning and agent workflows. The playground you see here runs fal-ai/ernie-image so you can ship pixels without a Baidu account.
04Can I access ERNIE from outside China?
Yes for both tiers, with different paths. ERNIE-Image runs on fal at fal-ai/ernie-image with a single FAL_KEY, so you reach it from anywhere fal is reachable. ERNIE 5.0 text runs on Baidu Qianfan at https://qianfan.cloud.baidu.com, which now supports international sign-ups and non-CN billing for developers outside mainland China. If your stack is global-first and you only need the image tier, stay on fal. If you need ERNIE 5.0 text reasoning, create a Qianfan account, generate an API key, and call the chat completions endpoint like any OpenAI-compatible service.
05How do I train a LoRA on ERNIE-Image?
Use fal-ai/ernie-image-trainer. You upload a zip of 15 to 50 reference frames, set the subject or style name, and the trainer fits an adapter you can then plug into either fal-ai/ernie-image/lora (50 step standard at $0.03 per megapixel) or fal-ai/ernie-image/lora/turbo (8 step fast at $0.01 per megapixel). The open-weight Apache 2.0 base means you can also pull the weights from Hugging Face and train locally on a single H100, but the fal trainer is the one-click path if you want a hosted adapter in under an hour.
06When does ERNIE-Image beat Flux 2 Pro or GPT Image 2?
Three scenarios. One, any layout that has to carry legible simplified or traditional Chinese characters at any scale beyond a short caption. ERNIE-Image posts 0.9733 on LongTextBench where Flux 2 Pro trips past ten CJK glyphs. Two, multi-panel comic strips and dense menu cards where consistent alignment across ten plus text blocks matters more than photoreal surface detail. Three, cost-sensitive production where you want to ship thousands of variants a day. fal-ai/ernie-image is $0.03 per megapixel versus Flux 2 Pro at $0.06. For English-only photoreal hero stills, Flux 2 Pro and Imagen 4 still win on skin, hair, and subtle lighting.
07How do I migrate a DALL-E 3 pipeline to ERNIE-Image?
Swap the client and rewrite two parameters. DALL-E 3 takes size strings like '1024x1024'; fal-ai/ernie-image takes aspect_ratio as '1:1', '16:9', '9:16', '4:3', '3:4', '3:2', '2:3', or '21:9'. DALL-E 3's quality flag becomes num_inference_steps (50 on the standard endpoint, 8 on Turbo). DALL-E 3's style parameter maps to enable_prompt_enhancer; leave it true unless you want strict literal prompts. Keep your prompt text as-is, install @fal-ai/client, set FAL_KEY, and call fal.subscribe('fal-ai/ernie-image'). You drop from $0.04 per image on DALL-E 3 to $0.03 per megapixel on ERNIE-Image and pick up CJK typography as a bonus.
08How do I produce dense text layouts and multi-panel comics?
Three levers on fal-ai/ernie-image. One, keep enable_prompt_enhancer set to true. It bumps GenEval to 0.8728 and cleans up layout instructions. Two, write prompts as structured blocks. State the panel grid, the text in each panel, and the exact position of every headline and subline. Three, pick aspect_ratio that matches the layout. Use '3:4' for vertical posters, '4:3' for menu cards, and '1:1' for 2x2 comic grids. For four-panel comics with consistent type, describe panel 1 through panel 4 explicitly and note the speech bubble text in quotes so the model treats it as glyph rather than decoration.
09How do I set up Baidu Qianfan for ERNIE 5.0 text calls?
Create a Qianfan account at https://qianfan.cloud.baidu.com, verify your identity, and create an application in the console. Grab the API key and secret key from the application detail page. Qianfan exposes an OpenAI-compatible chat completions endpoint, so you can use the official openai SDK by pointing base_url at the Qianfan endpoint and passing your API key. Call the ERNIE 5.0 model id as listed in the console. Billing is usage-based at roughly $0.60 per million input tokens and $2.10 per million output tokens, with a free tier for initial testing. For omni-modal requests pass image, audio, or video blocks in the messages array.
10Why run ERNIE-Image on fal.ai?
Eight reasons stack up. One, single FAL_KEY covers fal-ai/ernie-image, the Turbo variant, both LoRA endpoints, and the trainer. Two, async queues with webhooks handle bursts without cold starts. Three, per-megapixel billing at $0.03 standard and $0.01 Turbo beats hosted GPU rentals once you pass a few hundred images a day. Four, LoRA training and serving run on the same API key and URL scheme, so fine-tune to production is one line. Five, fal auto-scales, no instance warmup. Six, the endpoint sits alongside 600 plus other models, so you can chain ERNIE-Image with upscalers, video models, or LLMs in one pipeline. Seven, Apache 2.0 open weights means you can always self-host if fal pricing ever stops working. Eight, logs and queue status are first class, so you get observability without wiring Prometheus yourself.
Keep reading.The full blog is open.
No gates, no sign-up, no newsletter. Just 10 dated posts on Baidu ERNIE.
Browse the full blog
Sort by date, filter by category, search by keyword.
Baidu Qianfan vs fal Endpoints: When to Use Each
The ERNIE family splits across two platforms. Qianfan serves ERNIE 5.0 text and multimodal from China. fal serves ERNIE-Image globally. Here is the exact routing call.