-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML in output (proof-of-concept) #5172
Comments
@PerBothner The first question coming to my mind is indeed about security - can we get this secure enough within a browser env? Linked to this:
Currently I have my doubts about both, JS execution and asset loading. We might get somewhere here with putting the snippets into iframes and applying very strict security policies, re-routing of asset source can be achieve with a service worker. Still that type of "isolation" is subject of many browser bugs, so I feel a bit uneasy about the whole concept. To not only argue with FUD from my side - the main attack vector I see here is breaking out of any secluded area and gaining access the the terminal's JS context and thus direct shell access. Such a break could happen through any weak security setup in our counter measures or browser bugs itself, which means, that we need perfect knowledge/control here (eww what a burden). So any ideas, how not to fall into these traps? |
Something I tried and gave up on as it was a little too big of a change I wanted at the time was to add the concept of monaco editor-style "view zones" to the terminal. Basically the embedder would register a view zone that inserts a chunk of empty space in the renderer and manages a DOM element that is put there. The embedder could dispose of this view zone whenever it was done with it and maybe also change properties on it (like height). The great thing about this approach is that it works similar to decorations (which were also inspired by monaco), its entirely up to the embedder to implement their sequence handler/feature (eg. clicking a link could open the image inline) and it also doesn't get into the mess of touching how the buffer works and making it more complex by having multiple buffer line types. The main challenges are scrolling as you've called out in your example and having the renderer support the gap properly. I found the WIP branch I was working on master...Tyriar:xterm.js:zone_wip. I think the conclusion I came to was we must work out smooth scrolling and then pixel-based scrolling, where the scroll bar would be able to land in between buffer rows, instead of always having the top of the terminal show the top of some row. |
Idk if inline placing of HTML snippets is a good idea. It basically can be arbitrary in height/width - how is that supposed to work inline? How will too wide content be handled (typically we only have a top-down scrollbar)? What about overprinting with text content? How about this - always render complex content into a separate buffer, thus make it a full-viewport view. This way the content size does not matter, as we can place scrollbars as needed for both directions. This "other buffer" could be accessible through a link-like annotation in the original REPL context, and even could be held as long as the marker in the original buffer stays active (thus gets auto-evicted on scrolling off). |
About security, there are various approaches. I suggest we support some level of configurability, since different applications may prefer different approaches. I would focus on what should be default (standard) for a general-purpose terminal-emulator, either a stand-alone application or embedded in an IDE. One approach is to remove "bad stuff" using a blacklist or whitelist (preferable) of allowed HTML. Domterm uses this Not allowing JavaScript has worked fine for the "REPL" uses cases I'm most interested in. However, JavaScript might be useful in some applications - for example you might want animated or interactive output. One possibility is to allow the terminal to install "extensions" under explicit user control, but that may too limited. Another approach is to wrap all HTML inside an A possible problem using Another possibility is that emitted HTML needs to include a session-specific randomly-generated passkey, passed in via an environment variable. I haven't really though about this, but if I remember correctly this is the approach used by GraphTerm. |
By the way, a very interesting related project was GraphTerm, written by R. Saravanan. He also wrote the even older XMLTerm, which was a big inspiration to me. |
@Tyriar The "view zone" approach might work. However, I don't know anything about Monaco view zones, nor have I looked into the decorator API and implementation. I'm unclear if it saves us anything in terms of implementation complexity - we still need to deal with scrolling, and (preferably) zones that are a non-integral number of rows high. Furthermore, it seems like it would be helpful to have this extra content be part of the buffer. I think it may simplify some of the logic - and I think it ties in with being able to at support lines with different heights and fonts (which I think at some point we should). The user model should be that "rich output" is part of the buffer. Serialization is probably simpler if the DOM element is directly accessible from the buffer structure. |
@jerch _ Idk if inline placing of HTML snippets is a good idea._ If by "inline" you mean in the CSS If you mean vertical interleaving of user input lines, fixed-width output lines, and "rich" HTML output lines, I disagree: I think that is very useful. It's the paradigm of REPL updated to allow rich text, graphics, images, math etc in the "print" part. That said, having the option to individually show commands (or just their output) in a separate window, delete commands, "fold" command output etc is useful . That can be built on a "shell integration" protocol such that the terminal can understand the concepts of commands as consisting of prompt, input, and output (with possible nesting). |
Yes, with "inline" I mean within the normal text buffer progression. Plz note, that my argument was not about usefulness - I also think thats quite useful. My argument is about breaking the UI so badly, that ppl will get confused, how to properly interact with it. Especially for stuff, that needs a proper width/height to show up correctly. |
Another demo/screenshot - the The
This assumes the patch in issue #5173 is applied. Without it, literal newlines in the |
@jerch My argument is about breaking the UI so badly, that ppl will get confused, how to properly interact with it. The use-case I'm focusing on (for now) is a REPL (including a shell) with rich output. This uses the normal scrollable screen buffer, with full-width output. The cursor never moves backwards, except within the prompt+input area. I believe this would be straightforward and intuitive. Mixing regular column-based output with rich text using the alternate screen buffer, where the cursor can jump between column-based and rich output - that is more complicated. Not necessarily for the user, but for the application programmer. We would have to define the cursor semantics of rich output (see below). I think if an application wants to do combine rich text with interactive full-screen alternate-buffer use, the preferred way to do that would be to clear the screen and output HTML to cover the entire screen (scrollable as needed). Instead of using row+column addressing to navigate to sections to update, one should use Of course we still want to define how row+column addressing is defined when moving through rich-text HTML blocks - even if that's not the recommended way to do things. The simplest would be to define each HTML block as a single large character-cell: It might be an extra-tall line containing a single extra-wide character. This model is more-or-less what the prototype does. |
This is really interesting. I've been thinking a lot about this problem as well. I think trying to make it "pixel perfect" or out of band with the current row x col model, is not the right approach as it will cause a lot of problems with scrolling, cursor positioning, resizing, etc. I would propose a very simple change. Just allow the terminal to reserve a number of "rows" as a portal to 3rd-party content. These rows/portals would participate in scrolling. If any of the cells inside of the reserved rows/portal are modified (e.g. a terminal command wants to write text into that zone or clears that zone, etc.) then we delete the portal, and convert the reserved area back to just regular cells. Then we just need an API to access that portal -- maybe it can be as simple as providing a reference to an HTML div, with a dispose event (resize and scrolling events are not even necessary as the client can use a ResizeObservers, IntersectionObservers, or getBoundingClientRect to manage the div). This could work great for the image addon, and for @PerBothner HTML content (and for a bunch of ideas that I'm trying to implement as well). The key though is not linking this core functionality to what gets put inside these portals. that can be handled outside of core xterm code through addons or 3rd party code. By dealing with this at the "row" level, we also remove all of the complexity around terminal resizing. I think the implementation of this could be straight-forward. I've been thinking about trying to make this change, but as I'm not as familiar with the codebase, I'm worried about some edge case (especially with the webgl renderer). |
This has come up a few times recently for me wrt notebooks/python REPLs, where we could show a webview inline in the terminal in a view zone (making room for the element between 2 lines). Where I'm at currently with that though is that there are 2 types of users this stuff is mainly targeting; programmers and data scientists and the rich UI is primarily for the data science crowd that would usually opt for a full notebook interface, of which VS Code has a fully featured one. So it doesn't feel worth the effort on my end to invest in richer terminal UI for that. This is similar to the case of printing images to the console beyond the simple image sequence support we already have. For that you could instead run Also view zones were originally considered for the AI chat view, and for that I'm pretty happy with where I landed where I shift the terminal depending on the position of the cursor or overlay: Overlay: Shifted (cursor on last line): TLDR I'm not sure it's worth myself investing in this stuff further. A generic view zone implementation would be a great addition to the project though imo. |
We all agree that the VT100-terminal model have been very flexible and useful but we would like to support images, math, rich text, and complex scripts (which do not work well with fixed columns). One solution is the notebook interface, which can be very nice. However, I believe it would be preferable if we can combine or integrate the terminal interface and the notebook interface into a single tool:
|
If we want to add rich output (and maybe notebook functionality) to the terminal, then I believe limiting ourselves to fixed-size cells is not acceptable:
To fix these problems, we can still work with conceptuals "cells" but cells may have different sizes. A cell may consist of one or more combined characters,or some image. A "line" is number of cells in one or more rows. The terminal divides lines into rows. The primary cursor addressing is by line plus cell or character (depending on context) within line. Wrapped lines are considered a single line in this mode. To support cursor movement beyond the nominal terminal height we can allow relative movement abovr the home position. Alternatively, we can have a command to disable moving the home position automatically on scrolling. The "legacy" (row, column) addressing will of course be supporting using traditional escape sequences. It is possible to fix some of the problems mentioned above without modifying the application using a special shell-integration mode, but I'll leave out the details. |
Of course a reasonable response is "this sounds interesting, but it is too researchy and quite beyond the scope of the xterm.js project". I agree - but I believe xterm.js provides a good and performant base for working on these problems, at least if you're open to the BufferLine re-write (PR #4928) when it gets polished, passes the tests, and assuming performance isn't hurt. (I'm still working on it, on and off, but various things have slowed me down.) |
@PerBothner to be clear, the view zone concept isn't fixed size, you as an xterm.js embedder would be able to create as many as you want, as large as you want and afterwards resize them to whatever size you want. This is a view zone in monaco for reference: So instead of having different types of buffer lines, all buffer lines are standard monospaced and don't really change at all, other than the fact that they can have gaps in between them where a HTML container is rendered and positioned for you via the API. Then the contents of the container is up to the embedder. Another very important place where this just works will be the webgl renderer where all that it needs to know about is the gap position(s) and size(s).
Some of the ideas in #4928 are quite interesting, for reflow especially not needing to jump through hoops to wrap/unwrap lines like we do now by having something own the whole unwrapped line would be great. However, the PR is enormous and extremely scary as it changes a lot of core functionality. When I consider the possibility of shipping such a large refactor in VS Code; the time to study the PR, verify it, deal with inevitable regressions, complexities that come from suddenly cells may no longer be the same size, etc. it's hard for me to reconcile. Especially when I weigh it against both the impact it has for VS Code and other work I'm involved in right now which has much more impact (native terminal completions, GPU renderer in VS Code's editor, AI terminal integration, improving shell integration). So at least for the state that PR is in right now, I can't see myself having the time any time soon to be able to review and merge it unfortunately. What the zones idea does provide in contrast is:
|
Zones: If I'm understanding "zones" correctly, it's basically a 100%-width HTMLElement attached to a marker. (However, if that is all it is, it doesn't explain the amount of WebGL code, so I assume I'm missing something.) It looks like IImageSpec in the image addon is also an element attached to a marker. Would it make sense to re-factor the image addon to use something like a Zone concept? (I haven't looked enough at either.) If it makes sense, I can take a look. There is a lot of WebGL code (which I don't understand), but I don't see any DomRenderer code. Is that handled automagically, or is it just not implemented yet? Why does IZoneWidget duplicate IWidgett? Is this a TypeScript think I'm missing? For me too it is important to be able to serialize images and embedded HTML. I see issue #4470 - is that likely to happen? I don't know WASM (yet) but I might be able to help out. |
The BufferLine re-write: Some of the complexity is because of desire to be able to switch at run-time between the old and the new implementation (defaulting to the old) until we are sufficiently comfortable with the new implementation. That reduces the risk, but it makes the code harder to understand and review. (Though it's been a while since I tested if the code for the old implementation still works.) I am certainly not asking for a review until the code is more polished, the tests pass, and performance has been benchmaked. (Though early feedback is certainly welcome.) Plus some concrete benefit or feature that it enables. (This Issue was supposed to be an example of such a feature.) |
@Tyriar Appreciate your insight (and the link to the zone_wip branch)! The arbitrarily sized view zones idea is really interesting. I'll noodle on that a bit, and maybe see if I can make any progress on it over the holidays. My biggest worry is around how that will mess with the terminal size. If the terminal was 25 rows, and then I opened up a zone which was approximately 5 rows... I think we'd have to send a SIGWINCH saying the terminal now only has 20 rows. As the zone scrolls off the top, the terminal would re-gain rows again (and get resized). Also laughing a bit at what it would look like to run something like "vi" with a zone opened up in the middle (although I guess that is one of the use-cases). |
@PerBothner any change to rendering will need a bunch of webgl code as it's so verbose. The branch isn't doing much particularly complicated, just translating the gaps so we know what rows to render.
💯 it makes a lot of sense to do this as it would make the image addon a lot simpler. I don't think it would mess with any hard requirements of images taking up actual room in the buffer?
I think the DOM code isn't done yet as it was just a hacked up prototype and I was tackling the harder webgl side first.
I guess I was thinking maybe there would be more widgets, I think I'd remove that if I was to clean up the PR.
That would be great if you could investigate this. If you do find the time to look at improving the idea in the branch, feel free to ignore the webgl side. Once it's in a good state I should be able to implement that part without too much effort.
I just want to set expectations so you don't waste your time; if the PR is really large unfortunately it's going to have a hard to getting merged due to my time available vs the impact in VS Code.
I like how simple the serialize addon is right now. Another benefit of the view zone approach, it puts the onus on to embedders to restore the view zones. So when you call into the serialize addon, you would also record all the view zones and then recreate them when you restore the serialized content. This also aligns with the philosophy I know we've talked about before in some issue about the purpose of xterm.js which is to provide a good baseline terminal with features that all terminals expect, with the capability to extend it to add more modern non-standard functionality.
@sawka view zones would be purely front end, there shouldn't be any interaction between them and the pty size. Using a view zone in the viewport would mean that parts of the viewport according to the pty would maybe be offscreen, this is just a consequence of the feature. Related, if there's a pty resize that renders the view zone/
The alt buffer wasn't really in mind when I was conceiving the feature. I think you could do it, it would just need custom resize logic on the embedder size which may get complicated. |
My
html-blocks
forkallows an application to "print" output lines containing HTML to an xterm.js terminal. This is a generalization of existing extensions to "print" images, such as using Sixel. However, using HTML is often preferable, as the output can be scaled, selected (copied), can include clickable links and buttons, is usually more compact, and more easily serialized. The output can also re-flow (based on terminal width and zoom), and can also react to style changes, such a light vs dark mode.The current implementation is restricted to HTML blocks that extend the full width of the screen. It is also limited to output that gets appended to the end of the buffer. While limited, this is basically what you need to support a REPL with "rich" output. For example a graphing program that display plots using SVG. Emitting nicer-looking and copyable tables. A symbolic math program that emits formulas using MathML.
Replacing/updating previously-printed HTML blocks would be a straight-forward extension. Another natural extension would be a protocol for buttons that when clicked sends a string to the application.
This is a very preliminary proof-of-concept, not usable for real use. It is based on my
buffer-cell-cursor
fork, which is mostly-usable (though there are still bugs to fix). I think of thehtml-blocks
branch as an example and motivation for thebuffer-cell-cursor
branch: The former adds a new classElementBufferLine
that extendsBufferLine
.Screenshots and examples
Gnuplot is plotting program that can emit plots in a number of formals, including SVG and "domterm" (which is just SVG wrapped in an escape sequence). Gnuplot defaults to "domterm" output when the
DOMTERM
environment varible is set. The following shows an example, running gnuplot in batch (rather than interactive) mode.More examples later.
Issues
Before polishing and finishing the
html-blocks
branch we need to polish and finish thebuffer-cell-cursor
branch that it depends on.Scrolling is xterm.js is done as multiples of rows. This doesn't work well when lines are different heights. A work-around is to treat an HTML block as multiple rows, rounding up the height divided by standard row height. However, this leads to ugly excess space. It is also not good long-term. For example one might want to allow plain-text lines with a mix of font sizes:
We don't want each line to be some multiple of the "standard row size" depending on the font used.
Probably the best solution is to change the scrolling API and implementation to work in terms of pixels rather than rows.
This is not inherently complicated, but it is extensive. I added a
scrollPartialLines
option to enable scrolling by fractional rows, but it does not do much yet. Getting it working should be a separate issue and PR.If lines no longer are a fixed height, mapping between pixel offsets and (row, char) offsets are no longer simple
multiplications or divisions. Linear or binary search may be needed, augmented with caching. However, note that while (for example) mapping a mouse click to a (row, char) offset may require a linear or binary search, the constant factors are small, since we are restricted to the visible screen.
Only the Dom renderer has the needed support, but I see no reason the WebGl renderer should be a problem.
Selection is not implemented. Ideally, one would want selection to extend across both regular rows and parts of HTML blocks.
Re-flow on screen resize is not implemented.
Truncation of output is not implemented. (Most people who want rich HTML output will probably want infinite scrollback, so it is a lesser priority.)
Serialization of HTML segments has not been implemented, though no complicated issues are foreseen.
Trying it out
Let me know if you want to try it out.
My current test-bed uses xterm.js embedded in DomTerm. DomTerm provides "safety-scrubbing" of the HTML that most people will want, and some other features to make the feature easier. I can provide instructions, if requested.
The next step is to make this feature not depend on DomTerm. Specifcally, it should be accessible from the Demo. This would probably involve a new addon (which we might call
addon-html-blocks
). This would include customizable safety-scrubbing.The text was updated successfully, but these errors were encountered: