Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post refactor changes needed #71

Open
3 of 6 tasks
vsoch opened this issue Apr 9, 2024 · 6 comments · Fixed by #69
Open
3 of 6 tasks

Post refactor changes needed #71

vsoch opened this issue Apr 9, 2024 · 6 comments · Fixed by #69

Comments

@vsoch
Copy link
Member

vsoch commented Apr 9, 2024

  • Update fluence to go 1.20 or 1.21: We are going to hit issues using fluence (go 1.19) with other integrations like rainbow (go 1.20) and on our systems (go 1.20), and after refactor: testing idea to wrap coscheduling #69 should consider updating.
  • Take in the entire group to calculate resources: right now we use one representative pod, and we should be taking in the entire group instead.
  • containment / JGF format issues: I think the Name needs to reference the index of resource relative to others, and then the paths -> containment needs to be the path to it plus that name.
  • Carefully review resources generation for Jobspec (I have only looked at this superficially)
@vsoch vsoch changed the title Update fluence to go 1.20 or 1.21 Post refactor changes needed May 14, 2024
vsoch referenced this issue May 22, 2024
This set of changes includes the following:

1. Renaming short variable names to be longer and more understandable.
2. Not using the Status.ScheduleStartTime for the pod start time, but instead
adding a new field. This previous field was there for a different purpose.
3. Creating named identifiers for resource types that can be shared in the
jgf module along with others that use the same relations / vertex types.
4. Removing comments that are not necessary.
5. Changing JGF types from int to int64 that warrant it.
6. Spelling mistakes, etc.
7. Removing need to write jobspec to temporary file (we just need string)

The JGF and utils modules need some additional looking - specifically I am worried
that the paths->containment is not set, and sometimes the name reflects the index
of the overall graph (global) and other times the index of the resource type. I
think we likely want the latter for the inner name, but I am not sure in practice
that fluxion is using it (internally). I am pushing these changes to assess testing,
etc., and will update the PR as needed. There could also have been changes to
upstream since the PR was opened that warrant additional fixes.

Signed-off-by: vsoch <[email protected]>
@milroy
Copy link
Member

milroy commented May 22, 2024

I agree with these items. Here's more detail here on the containment / JGF format issues from PR 69: #69 (comment)

@vsoch vsoch reopened this May 24, 2024
@vsoch
Copy link
Member Author

vsoch commented May 28, 2024

Adding a note for myself:

Here

// Here we build the subnet according to topology.kubernetes.io/zone label
subnetName := node.Labels["topology.kubernetes.io/zone"]
subnet := fluxgraph.MakeSubnet(sdnCount, subnetName)
sdnCount = sdnCount + 1
fluxgraph.MakeEdge(cluster, subnet, jgf.ContainsRelation)
fluxgraph.MakeEdge(subnet, cluster, jgf.InRelation)
where we create a subnet, it is within a listing of nodes. My understanding of subnet zones (which I need to verify with a node listing) is that we can have multiple nodes under the same subnet. So for this part of the code, we likely need to create a lookup of subnets by name, and only create another one if we haven't already created it. Or have them created in a separate outer loop first, both would be equivalent. I don't think we do anything with this in the current implementation (so likely no issue) but if we had multiple subnet zones I think we might either get duplicate nodes in the graph with different indices, or something like that. Wanted to put a note here so I don't forget (have a lot in my head at the moment).

OK going to try this:

image

@vsoch
Copy link
Member Author

vsoch commented Jun 29, 2024

@cmisale @milroy I'm working on the second bullet above, and wanted to have discussion about the format that we want. We currently do something like (and please correct me if I'm wrong - I get this confused with jobspec nextgen):

version: 1
resources:
  - type: slot
    label: default
    count: 2
    with:
    - type:  core
      count: 16
tasks:
  - command: [ "app" ]
    slot: default
    count:
      per_slot: 1

With memory / GPU added if it's defined for the pod. And that is done via parsing one container and then having the slot->count be the number of nodes (I think). If we parse each container individually (which assumes they might be different) I'm wanting to know what that should look like? The only thing that made sense to me was to move the count down to the node, and then to be able to say how many of each node is requested:

version: 1
resources:
  - type: slot
    label: default
    with:
    - type: node
      count: 1
      with:
        - type:  core
          count: 4
        - type: gpu
          count: 1
    - type: node
      count: 4
      with:
        - type:  core
          count: 16

tasks:
  - command: [ "app" ]
    slot: default
    count:
      per_slot: 1

But I remember there was a reason for having the slot right above the core (and not including the nodes) so I think that might be wrong. That said, I don't know how to enforce a design here with nodes of different types, because the approach that asks for "this many CPU across whatever resources you need" doesn't well capture the multiple (different containers).

If possible, let's have some discussion on the above! I have the next PR well underway but I paused here because I wasn't sure.

@cmisale
Copy link
Collaborator

cmisale commented Jul 2, 2024

hm I have to say I don't remember that well how to define jobspecs.. I was much better before lol.
That said, I'm not convinced we need to count all pods in a group.
Do pods in a group belong to the same controller id? If so, they all have the same requirements and we can just for ask n slots where n is the number of pods.
This assumes I am understanding the question correctly, which might not be the case :D

@vsoch
Copy link
Member Author

vsoch commented Jul 2, 2024

I think if we want to ask fluxion for the right resources, and if the pods vary slightly, we might need to customize that request for the pods that we see. For example, let's say the group has two pods that request 32 cores each, 1 gpu (some ML thing), and then 2 more pods that just need 16 cores and no gpu (some service). Up until this point we have used a "representative pod" and then multiplied it by count (maybe all 4 pods require 32 cpu), and in practice that is the most likely use case (homogeneous clusters). But we could very easily, for example, have this "potpourri of pods" that need different resources for the applications running within. This use case is more an orchestrated idea, maybe 2 pods running an application, and 2 running a service. The homogenous use case is more for different ranks of an MPI application.

At least that is my understanding - I think @milroy probably can better comment!

@vsoch
Copy link
Member Author

vsoch commented Jul 2, 2024

hm I have to say I don't remember that well how to define jobspecs.. I was much better before lol.

@cmisale I forget almost every period between working on them and have to remind myself what the heck a "slot" is... 😆

This is me reading in my vast library of books about flux trying to answer that question...

image

I started reading in my late 30s, and I'm still not sure what "total" vs "per_slot" is, but likely someone will figure it eventually. I humbly request it on my tombstone - "Here lies Vanessa, she wanted a "per_slot" task for her test run of lammps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants