Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown writer: fully customizable list markers #10479

Open
jraygauthier opened this issue Dec 20, 2024 · 9 comments
Open

Markdown writer: fully customizable list markers #10479

jraygauthier opened this issue Dec 20, 2024 · 9 comments

Comments

@jraygauthier
Copy link

jraygauthier commented Dec 20, 2024

Describe your proposed improvement and the problem it solves.

I would like to be able to fully customize the produced ordered and unordered list markers in Markdown output.

The UL bullet marker appear to have a solution via PR #1826. There seems to be no similar OL marker customization issue/PR tough.

The method of configuration seems to be in discussion at #5584.

Here, I am mostly interested by the spaces around the markers. Could pandoc provide a way to customize those? A couple of potential approaches:

  • bullet_before, bullet_total (applicable to both UL and OL, potential fallback if UL/OL specific option not provided)
  • ul_bullet_before, ul_bullet_total, ol_bullet_before, ol_bullet_total (preferred)
  • ul_bullet_before, ul_bullet_total, ol_bullet_before, ol_bullet_total
  • ul_bullet_before, ul_bullet_after, ol_bullet_before, ol_bullet_after
  • *_single_* vs *_multi_* variants useful? Like https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md030.md.
  • Could existing --tab-stop option (which currently does not appear to have any bearing on lists' style) be reused alongside with *bullet_before?

For the OL bullet marker, I assume one could pick between 1. and #. (extension needed).

This would allow me to use pandoc to format / prettify the markdown files I work with. The current hardcoded choice does not match my preferred style. As I use lists a lot, this is quite annoying which prevents me from using pandoc as my auto formatter (or any other tool by the way as none appear to provide this feature).

Desired result (Test.md, 1 space before markers, within a total of 4 space width):

# Test

 -  Unordered A.

    Text block.

 -  Unordered B.

    Text block.

 1. Ordered A.

    Text block.

 2. Ordered B.

    Text block.

Current pandoc markdown output (close to desired, but not quite there, no space before the marker):

$ pandoc Test.md --to markdown
# Test

-   Unordered A.

    Text block.

-   Unordered B.

    Text block.

1.  Ordered A.

    Text block.

2.  Ordered B.

    Text block.

Current commonmark markdown output (this one is oddly inconsistent in between ol an ul and doesn't appear to be influenced by --tab-stop):

$ pandoc Test.md --to commonmark
# Test

- Unordered A.

  Text block.

- Unordered B.

  Text block.

1.  Ordered A.

    Text block.

2.  Ordered B.

    Text block.
@jgm
Copy link
Owner

jgm commented Dec 20, 2024

I'm reluctant to go down this path. There are ever so many choices that might be made in rendering markdown, and everyone has their own stylistic preferences. Allowing this to be configurable would add a lot of code complexity (hence also bugs and maintenance time).

@jgm
Copy link
Owner

jgm commented Dec 20, 2024

For some other examples of similar requests:

@jgm
Copy link
Owner

jgm commented Dec 20, 2024

Either we ignore all such requests, or we provide a way to address them all. This would probably involve a config file as in #5584. But then I can imagine similar requests for other formats, e.g. RST. The complexity starts to get out of hand.

@jraygauthier
Copy link
Author

I can understand your reluctance, pandoc is already a pretty large and complex piece of software. At least, the issue has been filed. I wanted to make sure this aspect was mentioned for completeness were you to go forward with #5584 😃.

@0xdevalias
Copy link

0xdevalias commented Jan 13, 2025

@jraygauthier While I've only just started looking into it more deeply myself, I wonder if you'll find the --lua-filters and/or Custom Writers as sufficient to solve your needs here?

With an example from djot-writer.lua(Ref: 1, 2):

Inlines.Emph = function(el)
  return concat{ "_", inlines(el.content), "_" }
end
Blocks.BulletList = function(el)
  local attr = render_attributes(el, true)
  local result = {attr, cr}
  for i=1,#el.content do
    result[#result + 1] = hang(blocks(el.content[i], blankline), 2, concat{"-",space})
  end
  local sep = blankline
  if is_tight_list(el) then
    sep = cr
  end
  return concat(result, sep)
end

Originally posted by @0xdevalias in #10527 (comment)

Combining that with a similar pattern for reading from --metadata that I used in this gist could handle the CLI args aspect of it maybe:

  • https://gist.github.com/0xdevalias/794d1aa03c357425c4c9583d9edc0303#original-poc---lua-filter-script
  • https://gist.github.com/0xdevalias/794d1aa03c357425c4c9583d9edc0303#file-2_pandoc-markdown-devalias-lua
    • local bullet_marker = '-'
      local bullet_marker_indent = string.rep(" ", #bullet_marker)
      local emphasis_marker = '_'
      
      -- ..snip..
      
      -- Helper function to safely extract values from Pandoc's metadata
      -- Handles boolean values directly and converts other types appropriately
      --
      -- @param meta: The metadata table from Pandoc
      -- @param key: The key to look up in the metadata
      -- @param default: Value to return if key is not found
      -- @return: Value from metadata (preserving boolean type) or default
      function get_metadata_value(meta, key, default)
          local value = meta[key]
          if value == nil then
              return default
          end
      
          if type(value) == "boolean" then
              return value
          end
      
          if type(value) == "table" and value.text ~= nil then
              return value.text
          end
      
          return tostring(value)
      end
      
      -- Processes document metadata to configure the filter
      -- Called by Pandoc at the start of document processing
      --
      -- @param meta: The document's metadata table
      -- @return: The (possibly modified) metadata table
      function Meta(meta)
          -- Read configuration from metadata, falling back to defaults if not specified
          debug_mode = get_metadata_value(meta, 'debug', debug_mode) == true
          bullet_marker = get_metadata_value(meta, 'bullet-marker', bullet_marker)
          bullet_marker_indent = string.rep(" ", #bullet_marker)
          emphasis_marker = get_metadata_value(meta, 'emphasis-marker', emphasis_marker)
      
          debug("Initialized with FORMAT: %s", FORMAT)
          debug("bullet-marker: %s", bullet_marker)
          debug("bullet-marker-indent: '%s' (without quotes)", bullet_marker_indent)
          debug("emphasis-marker: %s", emphasis_marker)
          debug("debug-mode: %s", debug_mode)
      
          return meta
      end

See also:


Allowing this to be configurable would add a lot of code complexity (hence also bugs and maintenance time).

Either we ignore all such requests, or we provide a way to address them all. This would probably involve a config file as in #5584.

I'm not sure if it would make it any less complex/buggy/error prone/etc, but a random thought I just had was that even if this was only customisable at the 'power user' level by passing options to the Writer via a lua script/etc (eg. not exposed via normal CLI args/config file/etc), that would be useful. At the very least, that would reduce the scope for landing the main sort of feature without needing to consider the best way to implement a config file/etc.

Currently the main option I am seeing to handle this on our own would be to create a new Custom Writer that delegates any blocks/etc we don't care about to one of the default markdown writers, and then provides overrides for what we do want to change. For something like Emph, the implementation is trivial; but for a bunch of the other customisations I feel like it would start to get overly complex/duplicating of functionality to re-implement all of those handlers in lua.

@jraygauthier
Copy link
Author

@0xdevalias :

While I've only just started looking into it more deeply myself, I wonder if you'll find the --lua-filters and/or Custom Writers as sufficient to solve your needs here?

Thanks for the ref, I wasn't aware this feature existed. It looks feasible although most likely quite a complex workaround for what's needed (where piping to sed might just do the trick). I will have to play with this a bit more to be able to say if that would do. Clearly looks nice and brings about a lot of possibilities for doing a great deal more.

@0xdevalias
Copy link

0xdevalias commented Jan 16, 2025

It looks feasible although most likely quite a complex workaround for what's needed (where piping to sed might just do the trick)

Yeah.. I guess it depends on your usecase/complexity of the processing. Quick hacks are great until you have to start tweaking and reasoning about all the edge cases that come up on something more complex; and in those cases I then find myself wishing I had gone with the more structured parser/replacement in the beginning. Hard to say without knowing the specifics of your usecase though.

Like another alternative I considered (though somewhat in-efficient), was potentially using pandoc from X -> markdown, and then just running through some other markdown to do markdown -> flavour of markdown I want. But then at that stage, unless that 2nd tool already implements the specifics I want as built in options, I probably just end up writing the same sort of thing in that tool that I could have done in a pandoc custom writer from the beginning.

In any case, here's a bit of a boilerplate starter for the custom writer, care of ChatGPT and an afternoon procrastination hyperfocus:

-- Usage: pbpaste | pandoc -f gfm -t gfm_devalias_writer.lua --metadata emphasis_char="_" | subl

function Writer(doc, opts)
  -- Retrieve metadata directly from the document
  local emphasis_char = pandoc.utils.stringify(doc.meta.emphasis_char) or "_"

  -- Debug logging
  print("Metadata emphasis_char:", emphasis_char)

  -- Define filters
  local filter = {
    Emph = function(elem)
      -- Apply custom emphasis characters
      local content = pandoc.utils.stringify(elem.content)
      return pandoc.RawInline('markdown', emphasis_char .. content .. emphasis_char)
    end,
  }

  -- Apply the filter and write the document in GFM format
  return pandoc.write(doc:walk(filter), 'gfm', opts)
end

As always, the Emph case is easy, and then I wasted way too many hours going in circles (perhaps too tired brain that late in the day) trying to implement BulletList in a nice way.

@0xdevalias
Copy link

And if you want to skim through the notes I've been making along this journey of discovery around custom writers, this might save you some time/pain along the way:

@jgm
Copy link
Owner

jgm commented Jan 17, 2025

By the way, the Emph case isn't quite right:

    Emph = function(elem)
      -- Apply custom emphasis characters
      local content = pandoc.utils.stringify(elem.content)
      return pandoc.RawInline('markdown', emphasis_char .. content .. emphasis_char)
    end,

Instead of stringify, just include the original so it can be rendered properly as markdown.

      local delim = pandoc.RawInline('markdown', emphasis_char)
      table.insert(el.content, delim) -- insert delim at end
      table.insert(el.content, 1, delim)  -- insert delim at beginning
      return el.content -- return the content with the added delims (a list)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants