Inline Processing
The second pass of pulldown-cmark's parsing process handles inline elements like emphasis, links, and code spans.
Overview
Inline processing happens during event iteration rather than as a separate full-document pass. When the parser encounters a block that can contain inlines, it processes the inline elements on demand.
The main inline elements handled are:
- Emphasis and strong emphasis (* and _)
- Code spans (`)
- Links and images
- HTML tags and entities
- Autolinks
- Extension elements like strikethrough and math
Processing Model
The inline processor:
- Scans text for special characters
- Identifies potential inline markers
- Resolves matched pairs (like * for emphasis)
- Handles nested elements
- Processes escapes and entities
Delimiter Handling
Emphasis-type elements use a sophisticated delimiter handling system:
- Identify delimiter runs (consecutive
*
,_
, etc) - Determine if they can open and/or close
- Match pairs according to CommonMark rules
- Handle nested cases correctly
The InlineStack
struct manages this:
#![allow(unused)] fn main() { struct InlineStack { stack: Vec<InlineEl>, lower_bounds: [usize; 9], } struct InlineEl { start: TreeIndex, count: usize, // Number of delimiters run_length: usize, // Full run length c: u8, // Delimiter character both: bool, // Can both open and close } }
Link Processing
Link processing involves:
-
Finding link text in brackets
-
Handling different link types:
- Inline
[text](url)
- Reference
[text][ref]
- Collapsed
[ref][]
- Shortcut
[ref]
- Inline
-
Resolving references in link definitions
-
Processing link destinations and titles
The link processor maintains a stack to handle nested links and images:
#![allow(unused)] fn main() { struct LinkStackEl { node: TreeIndex, ty: LinkStackTy, } enum LinkStackTy { Link, Image, Disabled, // For nested links } }
Code Spans
Code span processing has special rules:
- Match backtick sequences of equal length
- Handle backslash escapes
- Strip leading/trailing spaces according to spec
- Prevent misinterpreting internal backticks
HTML Processing
HTML blocks have already been recognized by the block parser. What remains is inline HTML tags between normal text. Handling this involves:
-
Identifying HTML constructs:
- Tags
- Comments
- CDATA sections
- Processing instructions
-
Validating structure
-
Preserving content exactly
-
Handling entities
The HTML processor uses a state machine to track context:
#![allow(unused)] fn main() { struct HtmlScanGuard { cdata: usize, processing: usize, declaration: usize, comment: usize, } }
String Handling
Inline processing needs efficient string handling:
- Copy-on-write strings to avoid allocation
- Smart handling of escaped characters
- Entity resolution
- UTF-8 awareness
The CowStr
type provides this and is documented in detail here.
Event Generation
As inline elements are processed, they generate events:
- Start/end events for container elements
- Text events for content
- Specialized events for atomic elements
- Source position tracking
Events are yielded in document order:
#![allow(unused)] fn main() { enum Event<'a> { Start(Tag<'a>), End(TagEnd), Text(CowStr<'a>), Code(CowStr<'a>), Html(CowStr<'a>), // ... } }