Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBS jobs queued are not running #841

Closed
xpillons opened this issue Apr 27, 2022 · 3 comments · Fixed by #842 or #1052
Closed

PBS jobs queued are not running #841

xpillons opened this issue Apr 27, 2022 · 3 comments · Fixed by #842 or #1052

Comments

@xpillons
Copy link
Collaborator

The jobs are in queue and not running. Tried to terminate execute nodes through portal and manually, but they did not stop.
PBS Version = 19.1.1

Cyclecloud service behaves abnormally like not starting and stopping the VM’s properly.

Initially 40 VM’s are showed ready state in Azure cyclecloud monitoring portal, but none of the jobs are Running, instead they are in Queue.
Stopped the VM’s manually and Cyclecloud allocated the new VM’s to the Queued jobs and after sometime jobs are started Running.

Jobs are completed after sometime. but the VM’s are not terminated automatically.

[root@scheduler bin]# source /opt/cycle/pbspro/venv/bin/activate
(venv) [root@scheduler bin]# pip list | grep pbspro
cyclecloud-pbspro   2.0.9

@xpillons xpillons added kind/bug Something isn't working customer-requirement labels Apr 27, 2022
@xpillons
Copy link
Collaborator Author

Should be fixed by Azure/cyclecloud-pbspro@9d32c0f

@xpillons
Copy link
Collaborator Author

xpillons commented May 9, 2022

Re-open as 2.0.13 is introducing new issues

@xpillons
Copy link
Collaborator Author

xpillons commented Jun 27, 2022

The Root Cause is a bad formatted JSON output provided by OpenPBS which makes the autoscaler library to failed, interrupting the calculation engine.
This bad formatted JSON is coming from an environment variable provided to the job inherited from the modules loaded, and passed to the qsub command thru the -V option. The bad thing here is that sometimes these kind of jobs can start especially if there are available resources to run them, but they will impact any upcoming jobs for which new resources need to be provisioned.

Workaround => don't qsub with the -V option but instead pass only the environment variables needed

Azure/cyclecloud-pbspro#43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant