Adding Extensions
This guide explains how to add new extensions to pulldown-cmark. Extensions allow you to parse additional Markdown syntax beyond the CommonMark specification.
If you are looking to get your extension merged upstream, it's a good idea to discuss it with the maintainers before getting to work.
Overview
Adding an extension typically requires:
- Adding a feature flag in the
Options
bitflags - Adding any new data structures needed to represent the extension's AST nodes
- Implementing block parsing in
firstpass.rs
if the extension adds block-level elements - Implementing inline parsing in
parse.rs
if the extension adds inline elements - Adding HTML rendering support in
html.rs
- Adding tests to verify the extension works correctly
Let's walk through each of these steps in detail.
Adding the Feature Flag
Extensions are controlled via the Options
bitflags defined in lib.rs
. Add a new constant using the next available bit:
#![allow(unused)] fn main() { bitflags::bitflags! { pub struct Options: u32 { // Existing options... const ENABLE_MY_EXTENSION = 1 << N; // N is next available bit } } }
This allows users to enable your extension with:
#![allow(unused)] fn main() { let mut options = Options::empty(); options.insert(Options::ENABLE_MY_EXTENSION); }
Adding AST Data Structures
Extensions often need new AST node types to represent their syntax. These are defined in several places:
Tag
enum inlib.rs
for container elementsTagEnd
enum inlib.rs
for end tagsEvent
enum inlib.rs
for new event typesItemBody
enum inparse.rs
for internal AST nodes
For example, the tables extension defines:
#![allow(unused)] fn main() { // In lib.rs pub enum Tag<'a> { // ... Table(Vec<Alignment>), TableHead, TableRow, TableCell, } // In parse.rs pub(crate) enum ItemBody { // ... Table(AlignmentIndex), TableHead, TableRow, TableCell, } }
Follow existing patterns for naming and make sure to implement all the necessary traits (Debug
, Clone
, etc.).
Implementing Block Parsing
If your extension adds block-level elements (like tables, footnotes, etc.), you'll need to:
- Add scanning functions in
scanners.rs
to detect your syntax - Add parsing logic in
firstpass.rs
to build the block structure - Update the
scan_containers()
function if your blocks can be nested
For example, the tables extension adds:
#![allow(unused)] fn main() { // In scanners.rs pub(crate) fn scan_table_head(data: &[u8]) -> (usize, Vec<Alignment>) { // Scan table header row syntax... } // In firstpass.rs impl<'a> FirstPass<'a, 'b> { fn parse_table(&mut self, ...) -> Option<usize> { // Parse table structure... } } }
Follow these guidelines when implementing block parsing:
- Use the
scan_
prefix for low-level scanning functions - Make scanning functions return the number of bytes consumed
- Handle edge cases like empty lines and indentation
- Properly integrate with the container block structure
- Follow the parsing strategies used by existing extensions
Implementing Inline Parsing
If your extension adds inline elements (like strikethrough, math, etc.), you'll need to:
- Add marker detection in
parse_line()
infirstpass.rs
- Add opener/closer matching logic in
handle_inline()
- Add conversion from internal AST to events
For example, the strikethrough extension adds:
#![allow(unused)] fn main() { // In firstpass.rs impl<'a, 'b> FirstPass<'a, 'b> { fn parse_line(&mut self, ..) -> (usize, Option<Item>) { match byte { b'~' => { // Handle tilde markers... } } } } }
Inline parsing tips:
- Use the
MaybeX
pattern for markers that need matching - Handle backslash escaping correctly
- Support nested inline elements
- Follow CommonMark rules for flanking conditions
- Reuse existing inline parsing infrastructure
Adding HTML Rendering
HTML rendering is handled in html.rs
. You'll need to:
- Add HTML tag generation for your new elements
- Update the
body_to_tag_end()
anditem_to_event()
functions - Handle any special rendering requirements
For example:
#![allow(unused)] fn main() { // In html.rs impl<'a, I, W> HtmlWriter<'a, I, W> { fn start_tag(&mut self, tag: Tag<'a>) -> Result<(), W::Error> { match tag { Tag::MyExtension => { self.write("<my-extension>") } // ... } } } }
HTML rendering tips:
- Follow HTML5 standards
- Handle escaping properly
- Consider accessibility
- Test in different contexts
Testing
Add tests to verify your extension works correctly. pulldown-cmark is principally tested with spec documents, which are Markdown files containing test cases. Each extension should have a file under specs/
explaining how the feature works along with test cases. Have a look at the existing specs for inspiration.
Other kinds of testing you should consider:
- Unit tests alongside implementation
- Integration tests in
tests/
- round-trip tests
- Edge case tests
- Interaction tests with other extensions
For example:
#![allow(unused)] fn main() { #[test] fn test_my_extension() { let input = "Test my extension syntax"; let mut options = Options::empty(); options.insert(Options::ENABLE_MY_EXTENSION); let parser = Parser::new_ext(input, options); // Test parsing result... } }
Testing tips:
- Test both positive and negative cases
- Test interactions with other syntax
- Test error conditions
- Test HTML output
- Test with different options enabled
- Run the different fuzzers to find crashes (
fuzz/
parse target) and performance issues (dos-fuzzer/
)
Example: Adding Subscript Extension
Here's a complete example of adding a hypothetical subscript extension that uses ~text~
for subscript:
#![allow(unused)] fn main() { // In lib.rs bitflags::bitflags! { pub struct Options: u32 { const ENABLE_SUBSCRIPT = 1 << 15; } } pub enum Tag<'a> { Subscript, } // In parse.rs pub(crate) enum ItemBody { MaybeSubscript(usize), // For opener/closer matching Subscript, } impl<'a, F> Parser<'a, F> { fn parse_line(&mut self, ..) -> (usize, Option<Item>) { match byte { b'~' => { // Handle subscript markers... } } } } // In html.rs impl<'a, I, W> HtmlWriter<'a, I, W> { fn start_tag(&mut self, tag: Tag<'a>) -> Result<(), W::Error> { match tag { Tag::Subscript => self.write("<sub>"), } } } }
Tips and Best Practices
- Study existing extensions for patterns to follow
- Keep parsing efficient
- Handle edge cases gracefully
- Document your extension thoroughly
- Consider adding feature flags for subfeatures
- Follow CommonMark principles where possible
- Test extensively
- Consider compatibility with other extensions
Common Pitfalls
- Not handling nested elements correctly
- Improper escaping in HTML output
- Not following CommonMark precedence rules
- Inefficient parsing of large documents
- Poor error recovery
- Not handling edge cases
- Breaking existing syntax
- Not documenting limitations