Performance Optimizations

This chapter covers the key performance optimizations implemented in pulldown-cmark. The library uses several techniques to achieve fast Markdown parsing while maintaining standards compliance and a clean architecture.

SIMD-Accelerated Character Scanning

One of the most performance-critical operations in Markdown parsing is scanning text for special characters that may indicate inline markup. pulldown-cmark uses SIMD (Single Instruction Multiple Data) instructions on x86_64 platforms to accelerate this scanning.

The SIMD optimization is implemented in scanners.rs and operates by:

Creating a lookup table of special characters (like *, _, etc.)
Loading 16 bytes at a time into a SIMD register
Performing parallel lookups to identify special characters
Generating a bitmask indicating which bytes matched

#![allow(unused)]
fn main() {
// Example from firstpass.rs showing the core scanning logic
#[target_feature(enable = "ssse3")]
unsafe fn compute_mask(lut: &[u8; 16], bytes: &[u8], ix: usize) -> i32 {
    let bitmap = _mm_loadu_si128(lut.as_ptr() as *const __m128i);
    let input = _mm_loadu_si128(bytes.as_ptr().add(ix) as *const __m128i);
    let bitset = _mm_shuffle_epi8(bitmap, input);
    let higher_nibbles = _mm_and_si128(_mm_srli_epi16(input, 4), _mm_set1_epi8(0x0f));
    let bitmask = _mm_shuffle_epi8(bitmask_lookup, higher_nibbles);
    let tmp = _mm_and_si128(bitset, bitmask);
    let result = _mm_cmpeq_epi8(tmp, bitmask);
    _mm_movemask_epi8(result)
}
}

This SIMD optimization can provide significant speedups when processing large documents, since character scanning is such a common operation. The code falls back to scalar processing when SIMD is not available.

Memory-Efficient String Storage

The library uses a custom string type CowStr that can represent strings. Refer to the string handling documentation for more details on the performance optimizations inherent to this type.

Tree Structure Optimization

The AST (Abstract Syntax Tree) is stored in a vec-based tree structure that provides:

Fast node creation during parsing
Efficient tree traversal
Memory locality from vector storage

#![allow(unused)]
fn main() {
pub(crate) struct Tree<T> {
    nodes: Vec<Node<T>>,
    spine: Vec<TreeIndex>,
    cur: Option<TreeIndex>, 
}

pub(crate) struct Node<T> {
    pub child: Option<TreeIndex>,
    pub next: Option<TreeIndex>,
    pub item: T,
}
}

Key optimizations in the tree structure include:

Using indices instead of pointers for node references
Maintaining a "spine" for fast access to ancestor nodes
Storing nodes contiguously in a vector for better cache usage
Using non-zero indices to save space in option types

Protection Against Pathological Input

It is important that the parser performance remain linear with respect to the input length, otherwise the parser would find itself vulnerable to potential DOS attacks. This may not be important for all consumers, but for anyone depending on the library to handle user generated content this is critical.

Several protections are in place to prevent quadratic time or memory usage on malicious input:

Link nesting depth is limited:

#![allow(unused)]
fn main() {
pub(crate) const LINK_MAX_NESTED_PARENS: usize = 32;
}

Table column expansion is bounded:

#![allow(unused)]
fn main() {
// Limit to prevent quadratic growth from empty cells
const MAX_AUTOCOMPLETED_CELLS: usize = 1 << 18;
}

Link reference expansion tracking:

#![allow(unused)]
fn main() {
// Track expansion to prevent quadratic growth from reference definitions
let mut link_ref_expansion_limit: usize = text.len().max(100_000);
}

Key Performance Considerations

When using the library, keep in mind:

SIMD optimizations require the simd feature and x86_64 platform
Large documents benefit most from SIMD scanning
The parser is designed for streaming, allowing incremental processing
Pathological input protection may limit processing of extremely nested or repetitive content

Benchmarking

The library includes benchmarks to measure performance of key operations:

String handling with different storage strategies
Tree operations
Full document parsing
Pathological input cases

When making changes that could affect performance, run the benchmarks to ensure optimizations are effective:

cargo bench

Note that some optimizations (like SIMD) are platform-specific, so testing on multiple platforms may be necessary.

pulldown-cmark guide