Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot limit memory usage; OOM when creating many small files. #1576

Open
rhaps0dy opened this issue Nov 25, 2024 · 5 comments
Open

Cannot limit memory usage; OOM when creating many small files. #1576

rhaps0dy opened this issue Nov 25, 2024 · 5 comments
Assignees
Milestone

Comments

@rhaps0dy
Copy link

I run a Kubernetes cluster with Blobfuse CSI and we're having lots of problems from Blobfuse2 dying when too many commands are sent to it. I've managed to reproduce this with a few-liner. With any account and an existing (perhaps empty) container, run:

AZURE_STORAGE_ACCOUNT=test-account
AZURE_STORAGE_ACCOUNT_CONTAINER=test-container
read -s AZURE_STORAGE_ACCESS_KEY  # paste your account's key

docker run --memory=1G --privileged \
  -e AZURE_STORAGE_ACCOUNT="$AZURE_STORAGE_ACCOUNT" \
  -e AZURE_STORAGE_ACCOUNT_CONTAINER="$AZURE_STORAGE_ACCOUNT_CONTAINER" \
  -e AZURE_STORAGE_ACCESS_KEY="$AZURE_STORAGE_ACCESS_KEY" \
  -it ghcr.io/alignmentresearch/public/blobfuse2:2.3.2-ubuntu-22.04 \
  /bin/bash -c 'blobfuse2 mount --tmp-path=/bf /mnt; for i in {1..10000}; do echo $i > /mnt/$i & done; wait'

That is, with 1GB of limit memory, create 10000 tiny files simultaneously. The container will crash after a bit with a lot of errors like:

/bin/bash: line 1: /mnt/8948: Transport endpoint is not connected

which indicates blobfuse2 has died, in this case due to OOMKill.

It's true that the amount of memory here is pretty small (1 GB), but the same crash happens at larger amounts of memory with many concurrent files (perhaps larger files as well). I have also reproduced the crash with the following config file:

An alternative config file which also crashes, and does no caching.
allow-other: true

# Pipeline configuration. Choose components to be engaged. The order below is the priority order that needs to be followed.
components:
  - libfuse
  - file_cache
  # IMPORTANT disable `attr_cache`, see its config below.
  # - attr_cache
  - azstorage

libfuse:
  default-permission: "0644"

  # IMPORTANT: kernel cache and use blobfuse's cache only. The kernel doesn't know that it should re-read a file that
  # has been modified in other remotes, so the read doesn't even hit Blobfuse and it is stale.
  direct-io: true
  # This should have no effect because Direct I/O bypasses the page cache entirely
  disable-writeback-cache: true

  # Cache file sizes, etc. in kernel cache for this many seconds. This does not impact file_cache staleness, so we can
  # safely set it a bit higher, to make repeated `cd` `ls` nicer.
  attribute-expiration-sec: 10
  entry-expiration-sec: 10
  negative-entry-expiration-sec: 10

file_cache:
  # This is a guideline, not an absolute, because of `hard-limit: false`
  max-size-mb: 10000

  # Sync actually forces an upload of the file, if it is not already uploaded. That's closer to correct POSIX semantics,
  # so we enable it. This is what it actually does:
  # https://github.com/Azure/azure-storage-fuse/blob/e5f6bd7fc2d18b4d6988c741bf08f07d3d10549c/component/file_cache/file_cache.go#L1230-L1236
  sync-to-flush: true
  ignore-sync: false

  # After this many seconds, blobfuse will compare the modified time of the local file and the remote file, and fetch it
  # if it has been modified. Makes reads stale by `refresh-sec`, added to the `attr-cache.timeout-sec`.
  refresh-sec: 0

  # Do not error operations to files which exceed `max-size-mb`. Just download them instead (and try to evict the rest
  # of the cache).
  hard-limit: false

  # Error if the --tmp-path is not empty at startup
  allow-non-empty-temp: false


attr_cache:
  # `attr_cache` causes reads to be stale by `timeout-sec` seconds. This is because the last modified time of the file
  # is compared to the cloud's last modified time to decide whether to download. BUT the cloud modified time is cached
  # by `attr_cache`, so we need to wait for `timeout-sec` for that to expire.
  #
  # On the other hand, enabling this may speed up `ls -l` or `du -h` or other listings that use attributes, but that's
  # not clear.
  #
  # For now we disable it.
  timeout-sec: 1
  no-symlinks: false  # enable symlinks, with a performance penalty

azstorage:
  type: adls
  • How can I limit the memory usage of blobfuse2 to prevent it from randomly crashing?
@vibhansa-msft
Copy link
Member

Can you try with below config once and see:

allow-other: true

components:
  - libfuse
  - file_cache
  - azstorage

libfuse:
  attribute-expiration-sec: 120
  entry-expiration-sec: 120
  negative-entry-expiration-sec: 120

file_cache:
  timeout-sec: 0

azstorage:
  type: adls

OR

allow-other: true

components:
  - libfuse
  - block_cache
  - attr_cache
  - azstorage

libfuse:
  attribute-expiration-sec: 120
  entry-expiration-sec: 120
  negative-entry-expiration-sec: 120

block_cache:
  block-size-mb: 8
  mem-size-mb: 800
  prefetch: 12
  parallelism: 50

azstorage:
  type: adls

@vibhansa-msft vibhansa-msft self-assigned this Nov 26, 2024
@vibhansa-msft vibhansa-msft added this to the v2-2.4.1 milestone Nov 26, 2024
@rhaps0dy
Copy link
Author

Still the same problem:

cat > bf2.yaml   # create either of the files that you indicated

docker run --memory=1G --privileged \
  -e AZURE_STORAGE_ACCOUNT="$AZURE_STORAGE_ACCOUNT" \
  -e AZURE_STORAGE_ACCOUNT_CONTAINER="$AZURE_STORAGE_ACCOUNT_CONTAINER" \
  -e AZURE_STORAGE_ACCESS_KEY="$AZURE_STORAGE_ACCESS_KEY" -v $(pwd)/bf2.yaml:/etc/bf2.yaml \
  -it ghcr.io/alignmentresearch/public/blobfuse2:2.3.2-ubuntu-22.04 \
  /bin/bash -c 'blobfuse2 mount --tmp-path=/bf --config-file=/etc/bf2.yaml /mnt; for i in {1..10000}; do echo $i > /mnt/$i & done; wait'

Sometimes I don't get any errors printed, but the program ends very quickly and I can check the Azure portal and see that only a few files (~20 or so) have been created, which is also incorrect behavior.

@vibhansa-msft
Copy link
Member

How much of memory and cpu cores your pod gets? I suspect some low resource availability causing the crash here.

@rhaps0dy
Copy link
Author

As stated in my reproducing snippet, the container gets 1G of memory and unlimited cores.
The actual memory usage of the mount command above is 4GiB, so any container below that dies.

In my actual deployment there is no reserved amount of memory, but the host has 1TiB of RAM, so it is unlikely that the final amount is as low as 1GiB.

Is there some way I can guarantee there will be enough memory for blobfuse2?

@vibhansa-msft
Copy link
Member

I guess 1GB might be a low limit as blobfuse2's config that we shared last requests 800MB just for block-cache. On top of this there are other components like attr-cache which also needs memory and there are other processes running as well on the pod. May be creating a pod with higher memory like 4GB or something might help. If that is not a possibility then file-cache migth be the only way. You need to analyze on your pod how much of free memory you have after blobfuse starts. Do not run any test just create the pod and log in to that to analyze what is the current memory utilization without any load on the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants