Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write file failed when using block-cache #1537

Open
Binyang2014 opened this issue Oct 9, 2024 · 1 comment
Open

Write file failed when using block-cache #1537

Binyang2014 opened this issue Oct 9, 2024 · 1 comment
Assignees
Milestone

Comments

@Binyang2014
Copy link

Which version of blobfuse was used?

2.3.2

Which OS distribution and version are you using?

Ubuntu 22.04

If relevant, please share your mount command.

Mount via k8s blob-csi-driver. mount option:

  mountOptions:
  - --allow-other
  - --attr-timeout=3600
  - --entry-timeout=3600
  - --attr-cache-timeout=7200
  - --block-cache
  - --block-cache-pool-size=81920
  - --block-cache-block-size=8
  - --block-cache-path=/mnt/blobfusecache
  - --block-cache-disk-timeout=7200
  - --block-cache-prefetch=12
  - --block-cache-prefetch-on-open=false

What was the issue encountered?

Write file failed. File not be persisted in remote storage

Have you found a mitigation/solution?

No

Please share logs if available.

Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [libfuse_handler.go (792)]: Libfuse::libfuse_write : error writing file iter_0020000/mp_rank_00/model_optim_rng.pt, handle: 162696 [staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
                                                   If you see this message, contact [email protected] or create a GitHub issue. We're working on a fix. More details: https://aka.ms/blobfuse2warnings.]
Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [block_cache.go (1491)]: BlockCache::getBlockIDList : Staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
                                                   If you see this message, contact [email protected] or create a GitHub issue. We're working on a fix. More details: https://aka.ms/blobfuse2warnings.
Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [block_cache.go (1445)]: BlockCache::commitBlocks : Failed to get block id list for iter_0020000/mp_rank_00/model_optim_rng.pt [staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
                                                   If you see this message, contact [email protected] or create a GitHub issue. We're working on a fix. More details: https://aka.ms/blobfuse2warnings.]
Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [block_cache.go (1076)]: BlockCache::getOrCreateBlock : Failed to commit blocks for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt [staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
                                                   If you see this message, contact [email protected] or create a GitHub issue. We're working on a fix. More details: https://aka.ms/blobfuse2warnings.]
Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [block_cache.go (1006)]: BlockCache::WriteFile : Unable to allocate block for iter_0020000/mp_rank_00/model_optim_rng.pt [staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
                                                   If you see this message, contact [email protected] or create a GitHub issue. We're working on a fix. More details: https://aka.ms/blobfuse2warnings.]
Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [libfuse_handler.go (792)]: Libfuse::libfuse_write : error writing file iter_0020000/mp_rank_00/model_optim_rng.pt, handle: 162696 [staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
                                                   If you see this message, contact [email protected] or create a GitHub issue. We're working on a fix. More details: https://aka.ms/blobfuse2warnings.]
Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [block_cache.go (1491)]: BlockCache::getBlockIDList : Staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
                                                   If you see this message, contact [email protected] or create a GitHub issue. We're working on a fix. More details: https://aka.ms/blobfuse2warnings.
Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [block_cache.go (1445)]: BlockCache::commitBlocks : Failed to get block id list for iter_0020000/mp_rank_00/model_optim_rng.pt [staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
                                                   If you see this message, contact [email protected] or create a GitHub issue. We're working on a fix. More details: https://aka.ms/blobfuse2warnings.]
Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [block_cache.go (1076)]: BlockCache::getOrCreateBlock : Failed to commit blocks for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt [staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
                                                   If you see this message, contact [email protected] or create a GitHub issue. We're working on a fix. More details: https://aka.ms/blobfuse2warnings.]
Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [block_cache.go (1006)]: BlockCache::WriteFile : Unable to allocate block for iter_0020000/mp_rank_00/model_optim_rng.pt [staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
                                                   If you see this message, contact [email protected] or create a GitHub issue. We're working on a fix. More details: https://aka.ms/blobfuse2warnings.]
Oct 09 06:03:58 openpai-000003 blobfuse2[3214717]: [/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/7457115dfa3ef0cc6da5c97af5af6ebd71d363cd1962130e7ae3f0564f920c21/globalmount] LOG_ERR [libfuse_handler.go (792)]: Libfuse::libfuse_write : error writing file iter_0020000/mp_rank_00/model_optim_rng.pt, handle: 162696 [staged block 52 has less data 3145728 for 162696=>iter_0020000/mp_rank_00/model_optim_rng.pt
                                                   Notice: The random write flow using block cache is temporarily blocked due to potential data integrity issues. This is a precautionary measure.
@ashruti-msft ashruti-msft self-assigned this Oct 11, 2024
@ashruti-msft
Copy link
Collaborator

Hi, we found some issues with random write using block-cache and currently are working on fixing those. Meanwhile, we have blocked random write operations using block-cache. We suggest using file-cache for random writes or wait for our next release. Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants