Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed malloc with concurrent executions of memory.grow #404

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions dlmalloc/src/malloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -4105,7 +4105,15 @@ static void* sys_alloc(mstate m, size_t nb) {
msegmentptr ss = (m->top == 0)? 0 : segment_holding(m, (char*)m->top);
ACQUIRE_MALLOC_GLOBAL_LOCK();

#ifdef __wasilibc_unmodified_upstream
if (ss == 0) { /* First time through or recovery */
#else
// In wasi-libc, this branch should never be taken as CALL_MORECORE(0) may
// lead to incorrect allocations. See try_init_allocator() for more
// information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is inside if (MORECORE_CONTIGUOUS && ... block.
if we are using MORECORE_CONTIGUOUS=1, it's the bug to fix i guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would say setting MORECORE_CONTIGUOUS to 1 is correct. The comments in malloc.c say this:

MORECORE_CONTIGUOUS       default: 1 (true) if HAVE_MORECORE
  If true, take advantage of fact that consecutive calls to MORECORE
  with positive arguments always return contiguous increasing
  addresses.  This is true of unix sbrk. It does not hurt too much to
  set it true anyway, since malloc copes with non-contiguities.
  Setting it false when definitely non-contiguous saves time
  and possibly wasted space it would take to discover this though.

So, in general I think we definitely want to benefit from the fact that the WebAssembly memory is contiguous and only fall back to the non-contiguous malloc if some other allocator uses memory.grow in the same program, as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. it makes sense.
i vaguely concern that discontiguous fallback logic will never be tested well though.

assert(ss != 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but ss can be 0 here if there was not enough space in the initial heap, can't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MoritzS Would you be able to comment on this question?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, however we still cannot take the if-branch because using CALL_MORECORE(0) will almost always be incorrect (and that's the whole point of why try_init_allocator() exists). So, maybe we should make sure that try_init_allocator() either allocates at least one page if the initial heap is too small, or maybe even directly abort?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with assuming the discontiguous case if ss == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I initially assumed that ss can only be 0 in cases where the memory allocation failed, but that is not the case. I will change the code so that it falls back to the non-contiguous case if ss is 0.

if (0) {
#endif
char* base = (char*)CALL_MORECORE(0);
if (base != CMFAIL) {
size_t fp;
Expand Down Expand Up @@ -4139,13 +4147,23 @@ static void* sys_alloc(mstate m, size_t nb) {
ssize < nb + SYS_ALLOC_PADDING) {
size_t esize = granularity_align(nb + SYS_ALLOC_PADDING - ssize);
if (esize < HALF_MAX_SIZE_T) {
#ifdef __wasilibc_unmodified_upstream
char* end = (char*)CALL_MORECORE(esize);
if (end != CMFAIL)
ssize += esize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like an upstream bug to assume contiguity.
does dlmalloc still have a maintained upstream?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dmalloc upstream has not been updated since 2012.

The comments at the top of dlmalloc say that it should handle the case of non-contiguous memory, though it will opportunistically detect contiguous memory and make use of it. If it has a bug, it'd be good to have a comment here mentioning that we're fixing an upstream bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a bug, as it works correctly if dlmalloc is the only part of the program that uses sbrk (or MORECORE). That's just not necessarily the case in wasm where we want to allow other functions to grow the memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just not necessarily the case in wasm where we want to allow other functions to grow the memory.

Can you remind me where is this requirement coming from? Would it not simplify thing considerably if were to declare it undefined behavior the call sbkr() or memory.grow on a memory that was owned/used/controlled by malloc? i.e. can we not consider malloc as owning the entire region of memory from __heap_base to the end of the memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with giving malloc all the memory between __heap_base and __heap_end (which is not the end of the entire memory!), but I don't think we should prohibit other functions from growing the memory.

The use-case for me is to allow programs compiled with wasi-libc to have their own memory allocators in addition to malloc. When an application has very specific memory allocation patterns, such as all allocations having the same size, it is often more efficient to write a specialized allocator for a specific part of the program. On a regular linux target, you would then use mmap to allocate entire pages, so you don't need to mess with sbrk. However, in WebAssembly, sbrk/memory.grow is the only way to allocate new memory, so custom allocators need to able to use it without breaking malloc.

In any case, all memory pages allocated by malloc using memory.grow will continue to be used exclusively by malloc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if its worth it (or possible) to support that kind of thing.

In that first example wouldn't make more sense if mypage was just statically allocated in the assembly file? e.g:

.section .data.mypage,"",@ 
mypage:                                                                                                                                             
    .size mypage, 65536

no. because the linear memory is shared with another module.

Then the linker could place it and it would no runtime cost to allocate.

I'm not sure what the second example is used for, but I wonder if it couldn't depend on the malloc symbol existing? Or at least weakly depend on it so its used when it exists?

it's basically same as the first example.

i don't think it's possible unless malloc is exported by the module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a. do not support it (memory.grow outside of this malloc) at all
b. support it only if it's done at the very beginning of the module lifetime (eg. around wasm start function)
c. support it only if it's done before the first call of malloc
d. support it only if it's serialized with normal malloc/free calls
e. support it anytime, even from another thread

@MoritzS
can't you make your use case backed by malloc as @sbc100 suggested? why not?
if you can't, which of the above do you need for your use case?

@alexcrichton @sunfishcode
which does the preview1 adapter need?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if its worth it (or possible) to support that kind of thing.
In that first example wouldn't make more sense if mypage was just statically allocated in the assembly file? e.g:

.section .data.mypage,"",@ 
mypage:                                                                                                                                             
    .size mypage, 65536

no. because the linear memory is shared with another module.

If you have two modules sharing the same memory, isn't its reasonable to assume that one of them will export a malloc function and the other one will use it. The need to coordinate somehow, and having them both call memory.grow and while assume that other can just deal with it seems.. fragile. How do you know the allocator in other other module is ok with having a non-contiguous heap? Isn't that fairly big assumption? Why not just mandate that the other module export an allocator for you to use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if its worth it (or possible) to support that kind of thing.
In that first example wouldn't make more sense if mypage was just statically allocated in the assembly file? e.g:

.section .data.mypage,"",@ 
mypage:                                                                                                                                             
    .size mypage, 65536

no. because the linear memory is shared with another module.

If you have two modules sharing the same memory, isn't its reasonable to assume that one of them will export a malloc function and the other one will use it. The need to coordinate somehow, and having them both call memory.grow and while assume that other can just deal with it seems.. fragile. How do you know the allocator in other other module is ok with having a non-contiguous heap? Isn't that fairly big assumption? Why not just mandate that the other module export an allocator for you to use?

it isn't possible to assume proper export of malloc functions when you want to deal with the existing wasi modules.
i totally agree it's a fragile hack.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which does the preview1 adapter need?

Sorry I haven't been keeping up with the rest of the context here, but the preview1 adapter won't ever be used with threads so it's ok if something odd goes wrong. Ideally wasi-libc would allow other modules to use memory.grow instructions themselves, however, and continue to work with that.

else { /* Can't use; try to release */
(void) CALL_MORECORE(-ssize);
br = CMFAIL;
}
#else
// After calling CALL_MORECORE(ssize) above to try to extend the
// current contiguous space another thread could have called
// memory.grow. If we then took the result CALL_MORECORE(esize) as
// the end of the region, we would assume that all space allocated
// by the call to memory.grow from the other thread belongs to us.
// This is a similar issue as described in try_init_allocator().
br = CMFAIL;
#endif
}
}
}
Expand Down Expand Up @@ -4175,7 +4193,16 @@ static void* sys_alloc(mstate m, size_t nb) {
char* end = CMFAIL;
ACQUIRE_MALLOC_GLOBAL_LOCK();
br = (char*)(CALL_MORECORE(asize));
#ifdef __wasilibc_unmodified_upstream
end = (char*)(CALL_MORECORE(0));
#else
// It can happen that between two calls to CALL_MORECORE another thread
// executes memory.grow. This then usually leads to corrupted memory.
// We know that asize must be a multiple of the WebAssembly page size,
// otherwise CALL_MORECORE would have failed. So, end must just be br +
// asize.
end = br + asize;
#endif
RELEASE_MALLOC_GLOBAL_LOCK();
if (br != CMFAIL && end != CMFAIL && br < end) {
size_t ssize = end - br;
Expand Down