Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No space left on device #1560

Open
sandip094 opened this issue Nov 7, 2024 · 19 comments
Open

No space left on device #1560

sandip094 opened this issue Nov 7, 2024 · 19 comments
Assignees
Milestone

Comments

@sandip094
Copy link

Which version of blobfuse was used?

  • blobfuse2 version 2.3.2

Which OS distribution and version are you using?

  • Red Hat Enterprise Linux release 8.7 (Ootpa)

What was the issue encountered?

Getting the below error after running for few minutes on the RMAN backup
released channel: C1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of backup plus archivelog command at 11/07/2024 13:09:32 ORA-19502: write error on file "/rman-backup/step/1/2024-10-20_0115/STEP_1736_1_m839hd9q_20241107.incr1c", block number 91441152 (block size=8192) ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 28: No space left on device Additional information: 4294967295 Additional information: 1048576

Configuration file is as below -/etc/blobfuse/blobfuseconfig.yaml
`logging:
type: base
level: log_info
max-file-size-mb: 32
file-count: 10
track-time: true
max-concurrency: 40
components:

  • libfuse
  • file_cache
  • azstorage

libfuse:
default-permission: 0644
attribute-expiration-sec: 120
entry-expiration-sec: 120
negative-entry-expiration-sec: 240
ignore-open-flags: true

file_cache:
path: /mnt/blobfusetmp
timeout-sec: 20
max-size-mb: 30720
allow-non-empty-temp: true
cleanup-on-start: true

azstorage:
type: block
account-name: xxxxx
account-key: xxxxx
mode: key
container: xxxxx`

Service file content -/etc/systemd/system/blobfuse2.service
`[Unit]
Description=A virtual file system adapter for Azure Blob storage.
After=network-online.target
Requires=network-online.target

[Service]
User=oracle
Group=dba
Environment=BlobMountingPoint=/rman-backup
Environment=BlobConfigFile=/etc/blobfuse/blobfuseconfig.yaml
Environment=BlobCacheTmpPath=/mnt/blobfusetmp
Environment=BlobLogPath=/var/log/blobfuse
Type=forking
ExecStart=/usr/bin/blobfuse2 mount ${BlobMountingPoint} --config-file=${BlobConfigFile}
ExecStop=/usr/bin/blobfuse2 unmount ${BlobMountingPoint}
ExecStartPre=+/usr/bin/install -d -o oracle -g dba ${BlobCacheTmpPath}
ExecStartPre=+/usr/bin/install -d -o oracle -g dba ${BlobLogPath}
ExecStartPre=+/usr/bin/install -d -o oracle -g dba ${BlobMountingPoint}

[Install]
WantedBy=multi-user.target`

Backup files size is as follows:
28M control01.ctl 8.1G stepsysblob_step_1.dbf 743G stepsysdata_step_1.dbf 4.6G sysaux_step_1.dbf 801M system_step_1.dbf 20G temp_step_1.dbf 80G undo_t1_step_1.dbf 101M users_step_1.dbf

@vibhansa-msft
Copy link
Member

"ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 28: No space left on device Additional information: 4294967295 Additional information: 1048576" : Kindly check the disk usage of "/mnt/blobfusetmp". Logs indicate the disk might be running out of space. I see you have kept 20 seconds as disk timeout and ~30GB disk space. If your application (RMAN in your case) generates more data than this limit in the given time frame the disk might just exhaust.

@vibhansa-msft vibhansa-msft self-assigned this Nov 11, 2024
@vibhansa-msft vibhansa-msft added this to the v2-2.4.1 milestone Nov 11, 2024
@sandip094
Copy link
Author

Hello @vibhansa-msft ,
I have this much of temp available. So what are your recommendation ? How does this calculation happens,to change these things
"20 seconds as disk timeout and ~30GB disk space."
Image

@vibhansa-msft
Copy link
Member

timeout-sec: 20
max-size-mb: 30720

30GB space and 20 second timeout is something that you have configured in the .yaml file. If you have 600+ GB of disk space available you can increase the limit from 30GB to 100 may be and also reduce the timeout from 20 to 0 or 2 seconds. Timeout is useful only when your application reads the same file again and again. If process is going to read a file only once keeping the timeout to 0 saves the disk usage.

Also, Blobfuse deletes a file from local cache only if all open handles for the given file are closed. If your application does not close the handle then the file will remain in cache untill you mount. In such cases as well you will observe the disk is getting full. If you suspect this you can force a hard limit where your file open calls will start to fail if the disk is reaching configured capacity.

@sandip094
Copy link
Author

Hello @vibhansa-msft ,
Post changing the mentioned values backup still failed with no space error.
Observations:

  1. /mnt becomes 100% in no time
  2. /rman-backup becomes 100G which it shouldnt be
    Image
    Image

@vibhansa-msft
Copy link
Member

How big is the backup you are trying to take?
'df' command showing 100G in /rman-backup is not your container or data upload size. It just shows the configured size for your disk for temp cache and its usage. As per this your temp cache is 100% full which means either the files are not being closed by RMAN or it's generating too much of data in a short span of time. Can you enabel debug logs and share the log file with us, it will be easier that way to rule out possibility of not closing the file part.

@sandip094
Copy link
Author

For some reason the debug log file is not getting generated
[root@asose2e798c623453573167ad8162-db-1 bin]# cd /var/log/blobfuse/ [root@asose2e798c6273167ad8162-db-1 blobfuse]# ls -ltr total 0
[root@asose2e798c623453573167ad8162-db-1 blobfuse]# cat /etc/blobfuse/blobfuseconfig.yaml | grep level level: LOG_DEBUG

@vibhansa-msft
Copy link
Member

If you have syslog filters installed it shall be in '/var/log/blobfuse2.log' file, otherwise by default it will go to '/var/log/messages'. If you are using AKS then logs might be directed to the pod directory created on the node.

@sandip094
Copy link
Author

Hello @vibhansa-msft ,
Please find the attached logs .

blobfuse.log

Regards
Sandeep

@vibhansa-msft
Copy link
Member

Nov 12 11:36:15 asose2e798c6273167ad8162-db-1 blobfuse2[3964550]: Error: fusermount3: entry for /rman-backup not found in /etc/mtab
Nov 12 11:36:15 asose2e798c6273167ad8162-db-1 blobfuse2[3964550]: exit status 1
Nov 12 11:36:31 asose2e798c6273167ad8162-db-1 blobfuse2[3964568]: [/rman-backup] LOG_CRIT [mount.go (432)]: Starting Blobfuse2 Mount : 2.3.2 on [Oracle Linux Server 8.7]
Nov 12 11:36:31 asose2e798c6273167ad8162-db-1 blobfuse2[3964568]: [/rman-backup] LOG_CRIT [mount.go (434)]: Logging level set to : LOG_WARNING
Nov 12 11:36:31 asose2e798c6273167ad8162-db-1 blobfuse2[3964568]: [/rman-backup] LOG_ERR [file_cache.go (239)]: FileCache: config error [tmp-path not set]
Nov 12 11:36:31 asose2e798c6273167ad8162-db-1 blobfuse2[3964568]: [/rman-backup] LOG_ERR [pipeline.go (69)]: Pipeline: error creating pipeline component file_cache [config error in file_cache error [tmp-path not set]]
Nov 12 11:36:31 asose2e798c6273167ad8162-db-1 blobfuse2[3964568]: [/rman-backup] LOG_ERR [mount.go (442)]: mount : failed to initialize new pipeline [config error in file_cache error [tmp-path not set]]
Nov 12 11:39:38 asose2e798c6273167ad8162-db-1 blobfuse2[3964744]: Error: directory is already mounted

This is syslog file and has many logs other than blobfuse. Last few logs from blobfuse end I could see in here are just about failing to mount due to invalid path.

@sandip094
Copy link
Author

sandip094 commented Nov 20, 2024

Hello @vibhansa-msft ,
My bad attached the wrong file ealier.

Got some more information around this one:

  • We have one big file around 800 GB and the temp mount is having around 738 GB so that might be the one reason we are getting "no space left" error. If that is true what is solution for the large files other than increasing the temp mount ?

@sandip094
Copy link
Author

sandip094 commented Nov 20, 2024

Attached the latest logs
blobfuse3.zip

@vibhansa-msft
Copy link
Member

If you are dealing with file as large as 800GB then file-cache is not advised. Kindly migrate to block-cache model and then try your workflow again.

@sandip094
Copy link
Author

Hello @vibhansa-msft ,
Thanks for the suggestion.I have switched to block-cache model and getting a different error now .
Attached the debug log: blobfuse2-block.log

Here is my config file:
`[oracle@asose2e798c6273167ad8162-db-1 .blobfuse2]$ cat /etc/blobfuse/blobfuseconfig.yaml

Refer ./setup/baseConfig.yaml for full set of config parameters

#allow-other: false

logging:
type: base
level: log_debug

components:

  • libfuse
  • block_cache
  • attr_cache
  • azstorage

libfuse:
attribute-expiration-sec: 120
entry-expiration-sec: 120
negative-entry-expiration-sec: 240

block_cache:
block-size-mb: 32
mem-size-mb: 4096
prefetch: 80
parallelism: 128

attr_cache:
timeout-sec: 7200

azstorage:
type: block
account-name: xx
account-key:xx
mode: key
container: xx
[oracle@asose2e798c6273167ad8162-db-1 .blobfuse2]$ nproc
20
`

@vibhansa-msft
Copy link
Member

How did you upload the files to your storage account:

Thu Nov 21 07:07:42 UTC 2024 : blobfuse2[288824] : [/rman-backup] LOG_ERR [block_cache.go (384)]: BlockCache::validateBlockList : Block size mismatch for step/archivelog/2024-11-21_0707_STEP_1788_1_ns3alqpj_20241121.arc [block: KWq2zj+jR0BVj1jYTi4BwQ==, size: 512]
Thu Nov 21 07:07:42 UTC 2024 : blobfuse2[288824] : [/rman-backup] LOG_ERR [libfuse_handler.go (712)]: Libfuse::libfuse_open : Failed to open step/archivelog/2024-11-21_0707_STEP_1788_1_ns3alqpj_20241121.arc [block size mismatch for step/archivelog/2024-11-21_0707_STEP_1788_1_ns3alqpj_20241121.arc]

I see block-cache is not able to open this file as the block-size in your config file is set to 32mb and this particular file has smaller block-size. As of now block-cache only works for files that have exactly the same block size on backend. If objective of your workflow is to just read the file then mount blobfuse in read-only mode and it will stop making this strict check. If you wish to overwrite the file then this might not work with block-cache for now unless you create the file with block-cache initially.

@sandip094
Copy link
Author

Hello @vibhansa-msft ,
Currently my block size for the oracle files is 8192
SQL> SELECT TABLESPACE_NAME, BLOCK_SIZE
FROM DBA_TABLESPACES;
2
TABLESPACE_NAME BLOCK_SIZE


STEPSYSBLOB 8192
STEPSYSDATA 8192
SYSAUX 8192
SYSTEM 8192
TEMP 8192
UNDO_T1 8192
USERS 8192

And CPU is
[oracle@asose2e798c6273167ad8162-db-1 .blobfuse2]$ nproc 20

[oracle@asose2e798c6273167ad8162-db-1 .blobfuse2]$ free -h total used free shared buff/cache available Mem: 157Gi 60Gi 92Gi 121Mi 4.5Gi 95Gi

So with this what should my config looks like ?

@vibhansa-msft
Copy link
Member

As per the below log, there is a block in your file which is 512 size. If this was the last block, Blobfuse2 would have allowed it and file open would have been success. But either it's in between block or all blocks in the file following this are of smaller size hence the open fails. You need to validate how this file was created in the first place.

[block: KWq2zj+jR0BVj1jYTi4BwQ==, size: 512]

@mortenjoenby
Copy link

@vibhansa-msft , what if different files are using different block sizes?
We are using Oracle RMAN to do the backups, and I believe the blocksize of the archived redo logs is 512 bytes, but on other files it's 8KB.

@mortenjoenby
Copy link

We (I am working with @sandip094) have been using file-cache for quite some time now, but I am wondering when you would suggest using streaming block-cache mode?
We would like to use the same mode for ALL setups (we have quite many) no matter the size of the database. I was looking at your "decision tree" here - https://github.com/Azure/azure-storage-fuse?tab=readme-ov-file#config-guide - and it seems with very large files that block-cache mode is the right thing, but I am not sure ...

@vibhansa-msft
Copy link
Member

Block size creating trouble here is no dependent on the block-size that RMAN is using, rather its block-size that Blobfuse2 is using. When you use File-cache the block-size is determined dynamically based on the file-size while in case of block-cache block-size is fixed (default 8MB) and can be configured by the user.
If you have created some files using file-cache and later editing using block-cache this might create issue or if the files were written using some other tool which used a different block-size.
If you use block-cache for all your workflows from creating files to modifying those everything shall work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants