Post refactor changes needed #71

vsoch · 2024-04-09T01:25:37Z

Update fluence to go 1.20 or 1.21: We are going to hit issues using fluence (go 1.19) with other integrations like rainbow (go 1.20) and on our systems (go 1.20), and after refactor: testing idea to wrap coscheduling #69 should consider updating.
Take in the entire group to calculate resources: right now we use one representative pod, and we should be taking in the entire group instead.
containment / JGF format issues: I think the Name needs to reference the index of resource relative to others, and then the paths -> containment needs to be the path to it plus that name.
Carefully review resources generation for Jobspec (I have only looked at this superficially)
- Resources should include TPUs, accelerators, etc. cbeffce#r142267695
- Support millicpu (int32 -> int64) refactor: testing idea to wrap coscheduling #69 (comment)

The text was updated successfully, but these errors were encountered:

This set of changes includes the following: 1. Renaming short variable names to be longer and more understandable. 2. Not using the Status.ScheduleStartTime for the pod start time, but instead adding a new field. This previous field was there for a different purpose. 3. Creating named identifiers for resource types that can be shared in the jgf module along with others that use the same relations / vertex types. 4. Removing comments that are not necessary. 5. Changing JGF types from int to int64 that warrant it. 6. Spelling mistakes, etc. 7. Removing need to write jobspec to temporary file (we just need string) The JGF and utils modules need some additional looking - specifically I am worried that the paths->containment is not set, and sometimes the name reflects the index of the overall graph (global) and other times the index of the resource type. I think we likely want the latter for the inner name, but I am not sure in practice that fluxion is using it (internally). I am pushing these changes to assess testing, etc., and will update the PR as needed. There could also have been changes to upstream since the PR was opened that warrant additional fixes. Signed-off-by: vsoch <[email protected]>

milroy · 2024-05-22T08:50:13Z

I agree with these items. Here's more detail here on the containment / JGF format issues from PR 69: #69 (comment)

vsoch · 2024-05-28T04:29:44Z

Adding a note for myself:

Here

flux-k8s/src/fluence/utils/utils.go

Lines 144 to 149 in 33ab097

    
           // Here we build the subnet according to topology.kubernetes.io/zone label 
        
           subnetName := node.Labels["topology.kubernetes.io/zone"] 
        
           subnet := fluxgraph.MakeSubnet(sdnCount, subnetName) 
        
           sdnCount = sdnCount + 1 
        
           fluxgraph.MakeEdge(cluster, subnet, jgf.ContainsRelation) 
        
           fluxgraph.MakeEdge(subnet, cluster, jgf.InRelation)

where we create a subnet, it is within a listing of nodes. My understanding of subnet zones (which I need to verify with a node listing) is that we can have multiple nodes under the same subnet. So for this part of the code, we likely need to create a lookup of subnets by name, and only create another one if we haven't already created it. Or have them created in a separate outer loop first, both would be equivalent. I don't think we do anything with this in the current implementation (so likely no issue) but if we had multiple subnet zones I think we might either get duplicate nodes in the graph with different indices, or something like that. Wanted to put a note here so I don't forget (have a lot in my head at the moment).

OK going to try this:

vsoch · 2024-06-29T20:04:28Z

@cmisale @milroy I'm working on the second bullet above, and wanted to have discussion about the format that we want. We currently do something like (and please correct me if I'm wrong - I get this confused with jobspec nextgen):

version: 1
resources:
  - type: slot
    label: default
    count: 2
    with:
    - type:  core
      count: 16
tasks:
  - command: [ "app" ]
    slot: default
    count:
      per_slot: 1

With memory / GPU added if it's defined for the pod. And that is done via parsing one container and then having the slot->count be the number of nodes (I think). If we parse each container individually (which assumes they might be different) I'm wanting to know what that should look like? The only thing that made sense to me was to move the count down to the node, and then to be able to say how many of each node is requested:

version: 1
resources:
  - type: slot
    label: default
    with:
    - type: node
      count: 1
      with:
        - type:  core
          count: 4
        - type: gpu
          count: 1
    - type: node
      count: 4
      with:
        - type:  core
          count: 16

tasks:
  - command: [ "app" ]
    slot: default
    count:
      per_slot: 1

But I remember there was a reason for having the slot right above the core (and not including the nodes) so I think that might be wrong. That said, I don't know how to enforce a design here with nodes of different types, because the approach that asks for "this many CPU across whatever resources you need" doesn't well capture the multiple (different containers).

If possible, let's have some discussion on the above! I have the next PR well underway but I paused here because I wasn't sure.

cmisale · 2024-07-02T02:20:55Z

hm I have to say I don't remember that well how to define jobspecs.. I was much better before lol.
That said, I'm not convinced we need to count all pods in a group.
Do pods in a group belong to the same controller id? If so, they all have the same requirements and we can just for ask n slots where n is the number of pods.
This assumes I am understanding the question correctly, which might not be the case :D

vsoch · 2024-07-02T03:11:11Z

I think if we want to ask fluxion for the right resources, and if the pods vary slightly, we might need to customize that request for the pods that we see. For example, let's say the group has two pods that request 32 cores each, 1 gpu (some ML thing), and then 2 more pods that just need 16 cores and no gpu (some service). Up until this point we have used a "representative pod" and then multiplied it by count (maybe all 4 pods require 32 cpu), and in practice that is the most likely use case (homogeneous clusters). But we could very easily, for example, have this "potpourri of pods" that need different resources for the applications running within. This use case is more an orchestrated idea, maybe 2 pods running an application, and 2 running a service. The homogenous use case is more for different ranks of an MPI application.

At least that is my understanding - I think @milroy probably can better comment!

vsoch · 2024-07-02T04:58:29Z

hm I have to say I don't remember that well how to define jobspecs.. I was much better before lol.

@cmisale I forget almost every period between working on them and have to remind myself what the heck a "slot" is... 😆

This is me reading in my vast library of books about flux trying to answer that question...

I started reading in my late 30s, and I'm still not sure what "total" vs "per_slot" is, but likely someone will figure it eventually. I humbly request it on my tombstone - "Here lies Vanessa, she wanted a "per_slot" task for her test run of lammps.

vsoch mentioned this issue Apr 20, 2024

refactor: testing idea to wrap coscheduling #69

Merged

vsoch changed the title ~~Update fluence to go 1.20 or 1.21~~ Post refactor changes needed May 14, 2024

milroy closed this as completed in #69 May 24, 2024

vsoch reopened this May 24, 2024

vsoch mentioned this issue May 28, 2024

jgf: refactor to use shared functions and fix containment #76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post refactor changes needed #71

Post refactor changes needed #71

vsoch commented Apr 9, 2024 •

edited

Loading

milroy commented May 22, 2024

vsoch commented May 28, 2024 •

edited

Loading

vsoch commented Jun 29, 2024

cmisale commented Jul 2, 2024

vsoch commented Jul 2, 2024

vsoch commented Jul 2, 2024

Post refactor changes needed #71

Post refactor changes needed #71

Comments

vsoch commented Apr 9, 2024 • edited Loading

milroy commented May 22, 2024

vsoch commented May 28, 2024 • edited Loading

vsoch commented Jun 29, 2024

cmisale commented Jul 2, 2024

vsoch commented Jul 2, 2024

vsoch commented Jul 2, 2024

vsoch commented Apr 9, 2024 •

edited

Loading

vsoch commented May 28, 2024 •

edited

Loading