Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifing stack softlimit #17

Open
Klaas- opened this issue Mar 23, 2020 · 3 comments
Open

Modifing stack softlimit #17

Klaas- opened this issue Mar 23, 2020 · 3 comments

Comments

@Klaas-
Copy link

Klaas- commented Mar 23, 2020

Hi,
I've noticed cyclecloud recently changed the behavior for limits of stack sizes.
Now it add this:

$ cat /etc/security/limits.conf |grep stack
#        - stack - max stack size (KB)
*               hard    stack           unlimited
*               soft    stack           unlimited

However I am not sure where this comes from, I can't find it in this repo and it is not from the CentOS HPC Image as far as I could tell (https://github.com/openlogic/AzureBuildCentOS)

In any case if someone else is falling over this, Abaqus at least does not accept unlimited as a soft limit.

Greetings
Klaas

@anhoward
Copy link
Contributor

Hi Klaas,
I just double-checked and I can't find anywhere in our code that's making that change. I also checked on a vanilla Slurm cluster deployed with CycleCloud 7.9.3 and don't see the stack size change that you're seeing. When we do modify limits, we put those changes in /etc/security/limits.d/cyclecloud.conf, but the only modifications there are increasing the number of open files. Nothing to do with stack size. Could another package you're installing either via a cluster-init project or via a custom image be adding that?

Thanks,
-Andy

@anhoward
Copy link
Contributor

Hi Klaas,
I just realized after my last comment that this is the PBSpro repo, not Slurm (which is where I've spent most of my time lately). Sure enough, I can reproduce this with a fresh CycleCloud PBSpro cluster. I'll look through our recipes more closely, but I'm not aware of any changes we made to limits recently.

When you say the behavior "recently changed", do you know what version you upgraded from? It's possible if you were previously using a version that had an older PBSpro installation that maybe their packages changed to increase the stack limit. The other possibility is that one of the dependency packages has updated to make this change.

One thing you could do as a workaround would be to set the stack limit explicitly in your job script. Just doing ulimit -s <int> will set the stack size lower than the hard limit. That may get your Abaqus jobs working again.

@Klaas-
Copy link
Author

Klaas- commented Mar 25, 2020

@anhoward during the last ~2 months, I did not update the cyclecloud version, that's why I think this is from some content that is being downloaded on the fly.

My last cyclecloud update:
Name : cyclecloud
Version : 7.9.2
Install Date: Thu 23 Jan 2020 10:25:08 AM UTC

I know how to work around the problem, the issue is more that this change seems to be a silent one, I am fairly sure my master install worked after the 7.9.2 update, and stopped working a couple of days ago when I tried out the HB120v2 machines - this first lead me to believe it is an issue related to the machine type until I figured out that abaqus is so stupid it can't deal with unlimited stacksize softlimits....

In general I would be interested where the modification is coming from, I could not find it in the installation here, or in the OS image which would be my first candidates to look. Are your 'common' chef modules also located on github?

Greetings
Klaas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants