Skip to content
This repository has been archived by the owner on Feb 15, 2023. It is now read-only.

Don't crash when memory is exhausted #389

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DemiMarie
Copy link

@DemiMarie DemiMarie commented Jul 23, 2017

Previously, returning NULL from gumbo_parser_allocate() was usually
unchecked. Instead, bail out with longjmp() and return NULL from
gumbo_parse_with_options().

Fixes #388.

Previously, returning NULL from gumbo_parser_allocate was usually
unchecked.  Instead, bail out with longjmp() and return NULL from
gumbo_parse_with_options().

if (SETJMP(parser._oom_buf)) {
if (parser._parser_state)
parser_state_destroy(&parser);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually deallocate all the used memory? What about all the allocations for nodes and errors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn’t, even if I add a call to gumbo_destroy_output at the end whenever there would be something non-NULL to pass to it.

One option that is guaranteed to work is to place all allocations in a linked list and on error destroy the list. On the other hand, the user can always do that themselves, since Gumbo can call user-provided callbacks.

Copy link
Contributor

@craigbarnes craigbarnes Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering using the arena pull request to solve this problem for lua-gumbo. Unfortunately, the state of that pull request as it stands still contains dozens of unchecked calls to malloc and has a few other problems I can't recall off the top of my head.

Gumbo is essentially an abandoned project at this point though. It seems unlikely that any of the newer bug reports will be fixed in this repo.

@@ -1954,6 +1954,32 @@ TEST_F(GumboParserTest, FragmentWithNamespace) {
EXPECT_EQ(0, GetChildCount(div));
}

TEST_F(GumboParserTest, OutOfMemory) {
GumboOptions options;
memcpy((void*)&options, (void*)&kGumboDefaultOptions, sizeof options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for memcpy() here. You can just use:

GumboOptions options = kGumboDefaultOptions;

@kevinhendricks
Copy link
Contributor

kevinhendricks commented Oct 9, 2017 via email

@kevinhendricks
Copy link
Contributor

kevinhendricks commented Oct 9, 2017 via email

@craigbarnes
Copy link
Contributor

@kevinhendricks

I'm maintaining a fork of libgumbo within the lua-gumbo repo. If you want to set up a GitHub org, I'd be glad to contribute some of the changes I've made back and help with maintenance.

I've noticed a few other forks around GitHub that may be interested in merging efforts too. There's also a Python html5-parser library that seems to be based on a fork of gumbo.

@DemiMarie
Copy link
Author

DemiMarie commented Oct 9, 2017 via email

@craigbarnes
Copy link
Contributor

craigbarnes commented Oct 9, 2017

Right now the only actively maintained HTML parser that is meant for
standalone use is Servo's html5ever. It has a LOT going for it (for one,
it is immune to most code execution exploits by virtue of being written in
safe Rust; it is also faster). However, it isn't exactly easy to bind from
anything other than Rust.

In the case of this particular bug, we could get the same behaviour as in Rust by just calling abort if malloc fails. I already do that in lua-gumbo via a custom allocator. It would be nicer (for certain use cases) to recover and return NULL from gumbo_parse though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants