Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix hami bug when vgpu-number no set #3867

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

weapons97
Copy link

@weapons97 weapons97 commented Dec 10, 2024

In the Hami project volcano-vgpu-device-plugin , if the user does not set volcano.sh/vgpu-number, it will cause a bug where the plugin does not respond to container resource mounts. Therefore, the check in checkVGPUResourcesInPod needs to be modified.
related issue:
Project-HAMi/volcano-vgpu-device-plugin#40

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign k82cn
You can assign the PR to them by writing /assign @k82cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 10, 2024
Signed-off-by: weapons97 <[email protected]>
Signed-off-by: weipeng <[email protected]>
@weapons97
Copy link
Author

/assign @k82cn

@archlitchi
Copy link
Contributor

Yes, i think sooner or later, we need to add a mutatingwebhookConfiguration for volcano.sh/vgpu-memory, manually add a volcano.sh/vgpu-number: 1 to these tasks.

Since it doesn't conflict with this fix, so i'll vote approve

@archlitchi
Copy link
Contributor

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 18, 2024
return true
}
_, ok = container.Resources.Limits[VolcanoVGPUNumber]
_, ok := container.Resources.Limits[VolcanoVGPUNumber]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if user just set VolcanoVGPUNumber but not set VolcanoVGPUMemory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will try to find an unused GPU, and use that exclusively, if such can't be find, then it will be stuck in pending state.

@Monokaix
Copy link
Member

Please add ut to cover all possible cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants