Inline Processing

The second pass of pulldown-cmark's parsing process handles inline elements like emphasis, links, and code spans.

Overview

Inline processing happens during event iteration rather than as a separate full-document pass. When the parser encounters a block that can contain inlines, it processes the inline elements on demand.

The main inline elements handled are:

Emphasis and strong emphasis (* and _)
Code spans (`)
Links and images
HTML tags and entities
Autolinks
Extension elements like strikethrough and math

Processing Model

The inline processor:

Scans text for special characters
Identifies potential inline markers
Resolves matched pairs (like * for emphasis)
Handles nested elements
Processes escapes and entities

Delimiter Handling

Emphasis-type elements use a sophisticated delimiter handling system:

Identify delimiter runs (consecutive *, _, etc)
Determine if they can open and/or close
Match pairs according to CommonMark rules
Handle nested cases correctly

The InlineStack struct manages this:

#![allow(unused)]
fn main() {
struct InlineStack {
    stack: Vec<InlineEl>,
    lower_bounds: [usize; 9],
}

struct InlineEl {
    start: TreeIndex,
    count: usize,      // Number of delimiters
    run_length: usize, // Full run length
    c: u8,            // Delimiter character
    both: bool,       // Can both open and close
}
}

Link Processing

Link processing involves:

Finding link text in brackets
Handling different link types:
- Inline [text](url)
- Reference [text][ref]
- Collapsed [ref][]
- Shortcut [ref]
Resolving references in link definitions
Processing link destinations and titles

The link processor maintains a stack to handle nested links and images:

#![allow(unused)]
fn main() {
struct LinkStackEl {
    node: TreeIndex,
    ty: LinkStackTy,
}

enum LinkStackTy {
    Link,
    Image,
    Disabled, // For nested links
}
}

Code Spans

Code span processing has special rules:

Match backtick sequences of equal length
Handle backslash escapes
Strip leading/trailing spaces according to spec
Prevent misinterpreting internal backticks

HTML Processing

HTML blocks have already been recognized by the block parser. What remains is inline HTML tags between normal text. Handling this involves:

Identifying HTML constructs:
- Tags
- Comments
- CDATA sections
- Processing instructions
Validating structure
Preserving content exactly
Handling entities

The HTML processor uses a state machine to track context:

#![allow(unused)]
fn main() {
struct HtmlScanGuard {
    cdata: usize,
    processing: usize, 
    declaration: usize,
    comment: usize,
}
}

String Handling

Inline processing needs efficient string handling:

Copy-on-write strings to avoid allocation
Smart handling of escaped characters
Entity resolution
UTF-8 awareness

The CowStr type provides this and is documented in detail here.

Event Generation

As inline elements are processed, they generate events:

Start/end events for container elements
Text events for content
Specialized events for atomic elements
Source position tracking

Events are yielded in document order:

#![allow(unused)]
fn main() {
enum Event<'a> {
    Start(Tag<'a>),
    End(TagEnd),
    Text(CowStr<'a>),
    Code(CowStr<'a>),
    Html(CowStr<'a>),
    // ...
}
}

pulldown-cmark guide