Skip to content

Commit

Permalink
Merge pull request #276 from gjtorikian/ast
Browse files Browse the repository at this point in the history
Reintroduce AST parse/walk
  • Loading branch information
gjtorikian authored Apr 30, 2024
2 parents 62248b3 + 9a8b57d commit c021fff
Show file tree
Hide file tree
Showing 36 changed files with 16,615 additions and 73 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,6 @@ build/
actual.txt
test.txt
test/progit
test/benchinput.md
test/benchmark/large.md

*.orig
8 changes: 8 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ end

group :benchmark do
gem "benchmark-ips"
gem "markly"
gem "kramdown"
gem "kramdown-parser-gfm"
gem "redcarpet"
Expand Down
192 changes: 150 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,108 @@ require 'commonmarker'
Commonmarker.to_html('"Hi *there*"', options: {
parse: { smart: true }
})
# <p>“Hi <em>there</em>”</p>\n
# => <p>“Hi <em>there</em>”</p>\n
```

The second argument is optional--[see below](#options) for more information.
(The second argument is optional--[see below](#options-and-plugins) for more information.)

### Generating a document

You can also parse a string to receive a `:document` node. You can then print that node to HTML, iterate over the children, and do other fun node stuff. For example:

```ruby
require 'commonmarker'

doc = Commonmarker.parse("*Hello* world", options: {
parse: { smart: true }
})
puts(doc.to_html) # => <p><em>Hello</em> world</p>\n

doc.walk do |node|
puts node.type # => [:document, :paragraph, :emph, :text, :text]
end
```

(The second argument is optional--[see below](#options-and-plugins) for more information.)

When it comes to modifying the document, you can perform the following operations:

- `insert_before`
- `insert_after`
- `prepend_child`
- `append_child`
- `delete`

You can also get the source position of a node by calling `source_position`:

```ruby
doc = Commonmarker.parse("*Hello* world")
puts doc.first_child.first_child.source_position
# => {:start_line=>1, :start_column=>1, :end_line=>1, :end_column=>7}
```

You can also modify the following attributes:

- `url`
- `title`
- `header_level`
- `list_type`
- `list_start`
- `list_tight`
- `fence_info`

#### Example: Walking the AST

You can use `walk` or `each` to iterate over nodes:

- `walk` will iterate on a node and recursively iterate on a node's children.
- `each` will iterate on a node and its children, but no further.

```ruby
require 'commonmarker'

# parse some string
doc = Commonmarker.parse("# The site\n\n [GitHub](https://www.github.com)")

# Walk tree and print out URLs for links
doc.walk do |node|
if node.type == :link
printf("URL = %s\n", node.url)
end
end
# => URL = https://www.github.com

# Transform links to regular text
doc.walk do |node|
if node.type == :link
node.insert_before(node.first_child)
node.delete
end
end
# => <h1><a href=\"#the-site\"></a>The site</h1>\n<p>GitHub</p>\n
```

#### Example: Converting a document back into raw CommonMark

You can use `to_commonmark` on a node to render it as raw text:

```ruby
require 'commonmarker'

# parse some string
doc = Commonmarker.parse("# The site\n\n [GitHub](https://www.github.com)")

# Transform links to regular text
doc.walk do |node|
if node.type == :link
node.insert_before(node.first_child)
node.delete
end
end

doc.to_commonmark
# => # The site\n\nGitHub\n
```

## Options and plugins

Expand All @@ -53,21 +151,23 @@ Note that there is a distinction in comrak for "parse" options and "render" opti

### Parse options

| Name | Description | Default |
| --------------------- | ------------------------------------------------------------------------------------ | ------- |
| `smart` | Punctuation (quotes, full-stops and hyphens) are converted into 'smart' punctuation. | `false` |
| `default_info_string` | The default info string for fenced code blocks. | `""` |
| Name | Description | Default |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ------- |
| `smart` | Punctuation (quotes, full-stops and hyphens) are converted into 'smart' punctuation. | `false` |
| `default_info_string` | The default info string for fenced code blocks. | `""` |
| `relaxed_autolinks` | Enable relaxing of the autolink extension parsing, allowing links to be recognized when in brackets, as well as permitting any url scheme. | `false` |

### Render options

| Name | Description | Default |
| ----------------- | ------------------------------------------------------------------------------------------------------ | ------- |
| `hardbreaks` | [Soft line breaks](http://spec.commonmark.org/0.27/#soft-line-breaks) translate into hard line breaks. | `true` |
| `github_pre_lang` | GitHub-style `<pre lang="xyz">` is used for fenced code blocks with info tags. | `true` |
| `width` | The wrap column when outputting CommonMark. | `80` |
| `unsafe` | Allow rendering of raw HTML and potentially dangerous links. | `false` |
| `escape` | Escape raw HTML instead of clobbering it. | `false` |
| `sourcepos` | Include source position attribute in HTML and XML output. | `false` |
| Name | Description | Default |
| -------------------- | ------------------------------------------------------------------------------------------------------ | ------- |
| `hardbreaks` | [Soft line breaks](http://spec.commonmark.org/0.27/#soft-line-breaks) translate into hard line breaks. | `true` |
| `github_pre_lang` | GitHub-style `<pre lang="xyz">` is used for fenced code blocks with info tags. | `true` |
| `width` | The wrap column when outputting CommonMark. | `80` |
| `unsafe` | Allow rendering of raw HTML and potentially dangerous links. | `false` |
| `escape` | Escape raw HTML instead of clobbering it. | `false` |
| `sourcepos` | Include source position attribute in HTML and XML output. | `false` |
| `escaped_char_spans` | Wrap escaped characters in span tags | `true` |

As well, there are several extensions which you can toggle in the same manner:

Expand All @@ -80,19 +180,21 @@ Commonmarker.to_html('"Hi *there*"', options: {

### Extension options

| Name | Description | Default |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------- | ------- |
| `strikethrough` | Enables the [strikethrough extension](https://github.github.com/gfm/#strikethrough-extension-) from the GFM spec. | `true` |
| `tagfilter` | Enables the [tagfilter extension](https://github.github.com/gfm/#disallowed-raw-html-extension-) from the GFM spec. | `true` |
| `table` | Enables the [table extension](https://github.github.com/gfm/#tables-extension-) from the GFM spec. | `true` |
| `autolink` | Enables the [autolink extension](https://github.github.com/gfm/#autolinks-extension-) from the GFM spec. | `true` |
| `tasklist` | Enables the [task list extension](https://github.github.com/gfm/#task-list-items-extension-) from the GFM spec. | `true` |
| `superscript` | Enables the superscript Comrak extension. | `false` |
| `header_ids` | Enables the header IDs Comrak extension. from the GFM spec. | `""` |
| `footnotes` | Enables the footnotes extension per `cmark-gfm`. | `false` |
| `description_lists` | Enables the description lists extension. | `false` |
| `front_matter_delimiter` | Enables the front matter extension. | `""` |
| `shortcodes` | Enables the shortcodes extension. | `true` |
| Name | Description | Default |
| --------------------------- | ------------------------------------------------------------------------------------------------------------------- | ------- |
| `strikethrough` | Enables the [strikethrough extension](https://github.github.com/gfm/#strikethrough-extension-) from the GFM spec. | `true` |
| `tagfilter` | Enables the [tagfilter extension](https://github.github.com/gfm/#disallowed-raw-html-extension-) from the GFM spec. | `true` |
| `table` | Enables the [table extension](https://github.github.com/gfm/#tables-extension-) from the GFM spec. | `true` |
| `autolink` | Enables the [autolink extension](https://github.github.com/gfm/#autolinks-extension-) from the GFM spec. | `true` |
| `tasklist` | Enables the [task list extension](https://github.github.com/gfm/#task-list-items-extension-) from the GFM spec. | `true` |
| `superscript` | Enables the superscript Comrak extension. | `false` |
| `header_ids` | Enables the header IDs Comrak extension. from the GFM spec. | `""` |
| `footnotes` | Enables the footnotes extension per `cmark-gfm`. | `false` |
| `description_lists` | Enables the description lists extension. | `false` |
| `front_matter_delimiter` | Enables the front matter extension. | `""` |
| `shortcodes` | Enables the shortcodes extension. | `true` |
| `multiline_block_quotes` | Enables the multiline block quotes extension. | `false` |
| `math_dollars`, `math_code` | Enables the math extension. | `false` |

For more information on these options, see [the comrak documentation](https://github.com/kivikakk/comrak#usage).

Expand Down Expand Up @@ -202,26 +304,32 @@ If there were no errors, you're done! Otherwise, make sure to follow the comrak

## Benchmarks

Some rough benchmarks:

```
$ bundle exec rake benchmark
❯ bundle exec rake benchmark
input size = 11064832 bytes
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
redcarpet 2.000 i/100ms
commonmarker with to_html
1.000 i/100ms
kramdown 1.000 i/100ms
Markly.render_html 1.000 i/100ms
Markly::Node#to_html 1.000 i/100ms
Commonmarker.to_html 1.000 i/100ms
Commonmarker::Node.to_html
1.000 i/100ms
Kramdown::Document#to_html
1.000 i/100ms
Calculating -------------------------------------
redcarpet 22.317 (± 4.5%) i/s - 112.000 in 5.036374s
commonmarker with to_html
5.815 (± 0.0%) i/s - 30.000 in 5.168869s
kramdown 0.327 (± 0.0%) i/s - 2.000 in 6.121486s
Markly.render_html 15.606 (±25.6%) i/s - 71.000 in 5.047132s
Markly::Node#to_html 15.692 (±25.5%) i/s - 72.000 in 5.095810s
Commonmarker.to_html 4.482 (± 0.0%) i/s - 23.000 in 5.137680s
Commonmarker::Node.to_html
5.092 (±19.6%) i/s - 25.000 in 5.072220s
Kramdown::Document#to_html
0.379 (± 0.0%) i/s - 2.000 in 5.277770s
Comparison:
redcarpet: 22.3 i/s
commonmarker with to_html: 5.8 i/s - 3.84x (± 0.00) slower
kramdown: 0.3 i/s - 68.30x (± 0.00) slower
Markly::Node#to_html: 15.7 i/s
Markly.render_html: 15.6 i/s - same-ish: difference falls within error
Commonmarker::Node.to_html: 5.1 i/s - 3.08x slower
Commonmarker.to_html: 4.5 i/s - 3.50x slower
Kramdown::Document#to_html: 0.4 i/s - 41.40x slower
```
2 changes: 2 additions & 0 deletions ext/commonmarker/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ publish = false
magnus = "0.6"
comrak = { version = "0.23", features = ["shortcodes"] }
syntect = { version = "5.2", features = ["plist-load"] }
typed-arena = "2.0"
rctree = "0.6"

[lib]
name = "commonmarker"
Expand Down
36 changes: 33 additions & 3 deletions ext/commonmarker/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,14 @@ use std::path::PathBuf;
use ::syntect::highlighting::ThemeSet;
use comrak::{
adapters::SyntaxHighlighterAdapter,
markdown_to_html, markdown_to_html_with_plugins,
markdown_to_html, markdown_to_html_with_plugins, parse_document,
plugins::syntect::{SyntectAdapter, SyntectAdapterBuilder},
ComrakOptions, ComrakPlugins,
};
use magnus::{
define_module, exception, function, r_hash::ForEach, scan_args, Error, RHash, Symbol, Value,
};
use node::CommonmarkerNode;

mod options;
use options::iterate_options_hash;
Expand All @@ -21,11 +22,36 @@ use plugins::{
syntax_highlighting::{fetch_syntax_highlighter_path, fetch_syntax_highlighter_theme},
SYNTAX_HIGHLIGHTER_PLUGIN,
};
use typed_arena::Arena;

mod node;
mod utils;

pub const EMPTY_STR: &str = "";

fn commonmark_parse(args: &[Value]) -> Result<CommonmarkerNode, magnus::Error> {
let args = scan_args::scan_args::<_, (), (), (), _, ()>(args)?;
let (rb_commonmark,): (String,) = args.required;

let kwargs =
scan_args::get_kwargs::<_, (), (Option<RHash>,), ()>(args.keywords, &[], &["options"])?;
let (rb_options,) = kwargs.optional;

let mut comrak_options = ComrakOptions::default();

if let Some(rb_options) = rb_options {
rb_options.foreach(|key: Symbol, value: RHash| {
iterate_options_hash(&mut comrak_options, key, value)?;
Ok(ForEach::Continue)
})?;
}

let arena = Arena::new();
let root = parse_document(&arena, &rb_commonmark, &comrak_options);

CommonmarkerNode::new_from_comrak_node(root)
}

fn commonmark_to_html(args: &[Value]) -> Result<String, magnus::Error> {
let args = scan_args::scan_args::<_, (), (), (), _, ()>(args)?;
let (rb_commonmark,): (String,) = args.required;
Expand Down Expand Up @@ -145,9 +171,13 @@ fn commonmark_to_html(args: &[Value]) -> Result<String, magnus::Error> {

#[magnus::init]
fn init() -> Result<(), Error> {
let module = define_module("Commonmarker")?;
let m_commonmarker = define_module("Commonmarker")?;

m_commonmarker.define_module_function("commonmark_parse", function!(commonmark_parse, -1))?;
m_commonmarker
.define_module_function("commonmark_to_html", function!(commonmark_to_html, -1))?;

module.define_module_function("commonmark_to_html", function!(commonmark_to_html, -1))?;
node::init(m_commonmarker).expect("cannot define Commonmarker::Node class");

Ok(())
}
Loading

0 comments on commit c021fff

Please sign in to comment.