-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ch9 - training deep neural network - how to attach GPU? #165
Comments
When you launch the notebook instance, make sure to specify a machine type
with a GPU. You can also stop the instance, add a GPU, and then restart it
thanks,
Lak
…On Fri, Feb 10, 2023, 8:41 AM James ***@***.***> wrote:
@lakshmanok <https://github.com/lakshmanok> - regarding my earlier issue (
#164
<#164>),
I've ended up manually exporting the data from BQ to cloud storage using
the GUI.
The rest of the notebook is working fine, but now I'm training the deep
neural network it's awfully slow (I'm still on the first of the 10 epochs
and it's not even half way through it after 10 minutes!).
I'm guessing that the problem is that I'm using CPUs rather than a
GPU...on p.322 of the book you state *"Making sure that the Vertex AI
Workbench notebook that I’m working on has a GPU attached to it, I can now
launch off the training job..."* but if I'm not mistaken it's not covered
in the textbook or notebook how to do this?
The GC docs refer to creating a separate CustomJob
<https://cloud.google.com/vertex-ai/docs/training/configure-compute#create_custom_job_gpus-console>
to achieve this - is that what you did or is there a quicker way?
—
Reply to this email directly, view it on GitHub
<#165>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANJPZ2FRQLPPMDNEJOI2DLWWZVT3ANCNFSM6AAAAAAUYAAVTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Still no quicker I'm afraid - one epoch is taking about 20 minutes... My notebook instance now comes with a GPU: And when starting the notebook I've selected as my kernel Tensorflow 2 (Local), previously it was Python (local) : I can't see any other options for specifying that my GPU should be used... |
Also - I have a really dumb question but it's come up before in this book so I may as well ask it now... I'd like to see the CPU/GPU usage of the VM that my notebook is running on. In other cloud platforms (eg. Azure) you have to connect the notebook to a VM manually every time, which makes this easy to do. But in GCP, everything seems to happen in the background and it's not clear how to inspect your VM....if I go to the VM Instances API in the console, it looks like I don't have any: Please could you advise? Sorry if this is a stupid question but I'm guessing it's not just me who is confused! |
(1) If you didn't change the line that says DEVELOP=True in the notebook,
each epoch should take less than a minute. By any chance, are the compute
on the notebook & the bucket region different?
(2) you can look at notebook gpu/cpu usage etc. by click on the notebook
name (in the Vertex Workbench area), and selecting the "Monitoring" tab
thanks
Lak
…On Fri, Feb 10, 2023 at 9:43 AM James ***@***.***> wrote:
Actually I have a really dumb question but it's come up before in this
book so I may as well ask it now...
I'd like to see the CPU/GPU usage of the VM that my notebook is running
on. In other cloud platforms (eg. Azure) you have to connect the notebook
to a VM manually every time, which makes this easy to do.
But in GCP, everything seems to happen in the background and it's not
clear how to inspect your VM....if I go to the VM Instances API in the
console, it looks like I don't have any:
[image: image]
<https://user-images.githubusercontent.com/8484188/218159743-09a7e68a-de0b-4fbf-ace3-b8a51d0248c9.png>
Please could you advise? Sorry if this is a really stupid question.
—
Reply to this email directly, view it on GitHub
<#165 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANJPZYEPRVZURWUX6TCQA3WWZ44NANCNFSM6AAAAAAUYAAVTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks for these quick response by the way...I'll be sure to mention them in the glowing Amazon review I give of the book once I'm done with it! |
Yeah, you don't need to run on the full dataset. You can just try it out
on a small sample.
In later chapters, I'll have you copy over my model that was trained over
the whole thing.
Re: monitoring: you are using managed notebooks rather than the
user-managed notebooks that I was using:
https://cloud.google.com/vertex-ai/docs/workbench/managed/introduction
Part of the control you give up when you ask Vertex AI to manage the
notebook lifecycle is that it runs it in a tenant project, so your ability
to monitor is limited
Think of managed notebooks as being like Google Colab.
Lak
…On Fri, Feb 10, 2023 at 10:06 AM James ***@***.***> wrote:
1.
Oh I changed DEVELOP=True to DEVELOP=False after successfully running
2 epochs. The flow of the Jupyter notebook is somewhat different to the
textbook chapter so I thought that was what I was supposed to do - maybe
not!
2.
Unfortunately I can't see any Monitoring tab, only Logs:
[image: image]
<https://user-images.githubusercontent.com/8484188/218164225-ad6f5059-dd10-4932-8684-687c1ac75053.png>
Thanks for these quick response by the way...I'll be sure to mention them
in the glowing Amazon review I give of the book once I'm done with it!
—
Reply to this email directly, view it on GitHub
<#165 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANJPZ6FK7MZHGZ5DQJAWGDWWZ7R7ANCNFSM6AAAAAAUYAAVTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I see! Thank you Lak. |
@lakshmanok - regarding my earlier issue (#164), I've ended up manually exporting the data from BQ to cloud storage using the GUI.
The rest of the notebook is working fine, but now I'm training the deep neural network it's awfully slow (I'm still on the first of the 10 epochs and it's not even half way through it after 10 minutes!).
I'm guessing that the problem is that I'm using CPUs rather than a GPU...on p.322 of the book you state "Making sure that the Vertex AI Workbench notebook that I’m working on has a GPU attached to it, I can now launch off the training job..." but if I'm not mistaken it's not covered in the textbook or notebook how to do this?
I've already set up my fully-managed notebook to enable an NVIDIA T4 GPU, but I believe that it won't be attached automatically without me doing something else.
The GC docs refer to creating a separate CustomJob to achieve this - is that what you did or is there a quicker way?
The text was updated successfully, but these errors were encountered: