Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for GPU instances on AWS #27

Open
willprice opened this issue Dec 11, 2020 · 5 comments
Open

Add support for GPU instances on AWS #27

willprice opened this issue Dec 11, 2020 · 5 comments
Labels
AWS documentation Improvements or additions to documentation

Comments

@willprice
Copy link

Currently launching instances with GPUs on AWS does not provision the VMs with the necessary drivers capable of interacting with the GPUs. It would be good to have some documentation for people who wish to use CitC in this manner. I plan on working on this today and will hopefully submit some PRs with instructions on this.

@milliams milliams added AWS documentation Improvements or additions to documentation labels Dec 11, 2020
@colinsauze
Copy link

For my first go at getting a GPU image build I added the following to compute_image_extra.sh, just got this built but nvidia-smi is complaining about drivers.

sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
sudo dnf clean all
sudo dnf -y module install nvidia-driver:latest-dkms
sudo dnf -y install cuda

@willprice
Copy link
Author

As noted by @colinsauze, It is also necessary to increase the size of the image, this can be achieved by adding the following

    launch_block_device_mappings { 
           device_name = "/dev/sda1"
           volume_size =  40
    }

to the end of the source "amazon-ebs" "aws" section in /etc/citc/packer/all.pkr.hcl

@willprice
Copy link
Author

It is also necessary to install kernel-devel before install the nvidia drivers to ensure that the dkms module can be built, without that it will fail.

@willprice
Copy link
Author

@willprice
Copy link
Author

Once clusterinthecloud/docs#17 is merged, this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants