Feature 01Use case

Multi-Panel Comics with ERNIE-Image: Step by Step

Build a four-panel comic with ERNIE-Image from a single scene layout prompt. Keep your character consistent, land the text bubbles, and see why this beats Flux or SDXL for comic work.

By ernie-api editorial..6 min read

Comics are where most diffusion models fall apart. You want four panels that read left to right, the same character in every frame, and speech bubbles the glyphs actually spell. SDXL hands you mush text. Flux gives you gorgeous single frames that never match each other. ERNIE-Image, the 8B DiT Baidu released under Apache 2.0, was trained on dense multi-panel layouts and glyph-heavy posters from day one.

The workflow in one breath

You pass ERNIE a scene layout prompt that treats the image as a comic page, describe each panel as a sub-scene with identical character cues, and let the model render art and bubble glyphs in one pass. No typography layer, no compositing. The 50-step base endpoint handles this at 1024x1024.

Four-panel comic workflow
Four-panel comic workflow

Step 1: lock your character sheet

Before you render a single panel, write a character block you paste verbatim into every prompt. Three visual hooks that survive compression:

  • Silver trench coat with torn left sleeve
  • Round wire glasses, cracked right lens
  • Red scarf, always visible

Ask for nine properties and the model drops half. Three is the sweet spot.

Step 2: the scene layout prompt

ERNIE reads panel language natively:

JAVASCRIPT
1import { fal } from "@fal-ai/client";
2
3fal.config({ credentials: process.env.FAL_KEY });
4
5const sheet = "silver trench coat with torn left sleeve, round wire glasses with cracked right lens, red scarf";
6
7const prompt = `Four-panel comic page, 2x2 grid layout, hand-inked line art, noir palette with teal accents.
8Panel 1 top-left: detective (${sheet}) stands in rain outside a neon ramen shop, speech bubble reads "She was here an hour ago."
9Panel 2 top-right: close-up of the detective's face, glasses reflecting neon, thought bubble reads "The napkin was still warm."
10Panel 3 bottom-left: detective pushes through the ramen shop door, owner at the counter, speech bubble reads "You again?"
11Panel 4 bottom-right: detective slides a photograph across the counter, speech bubble reads "Tell me everything."
12Clean gutters, consistent character across all four frames.`;
13
14const result = await fal.subscribe("fal-ai/ernie-image", {
15 input: { prompt, image_size: "square_hd", num_inference_steps: 50, guidance_scale: 4.5, seed: 42 },
16 logs: true
17});
18
19console.log(result.data.images[0].url);

The character sheet repeats inside every panel, so the model has four chances to bind the same visual. Guidance 4.5 is the sweet spot; higher values bleed panels together.

Step 3: bubble glyphs inside the prompt

ERNIE's typography head wins here. Short bubbles of four to eight words land clean nine times out of ten. CJK glyphs render at higher fidelity than Latin ones; a Chinese-language noir strip is the best demo this model has.

Character consistency across panels
Character consistency across panels

Step 4: iterate on seed, not prompt

If panel three looks off, rerun with a new seed. The character sheet stays anchored. Run five in parallel:

JAVASCRIPT
1const seeds = [42, 117, 220, 555, 901];
2const results = await Promise.all(
3 seeds.map((seed) => fal.subscribe("fal-ai/ernie-image", {
4 input: { prompt, image_size: "square_hd", num_inference_steps: 50, seed }
5 }))
6);

At $0.03 per megapixel, a 1024x1024 render is $0.03. Five parallel runs cost $0.15 and you pick the best. For drafts drop to /turbo at 8 steps: $0.01 per megapixel, a full strip for a cent.

Why this beats Flux and SDXL

Flux 2 Pro is sharper on single-subject photorealism. Ask it for a four-panel page and one panel lands, the other three smear. SDXL gives you four distinct panels but every character looks like a different person. ERNIE renders the whole page in one call, respects your grid, keeps the character anchored, and draws the bubbles. One call, one file, $0.03.

Try it with the character you have been sketching. Four panels, fifty steps, one seed. Three cents.


00Back to the archive
Also reading