-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no import change notebooks updates #830
no import change notebooks updates #830
Conversation
…ripts, doc updates Signed-off-by: Erik Ordentlich <[email protected]>
Signed-off-by: Erik Ordentlich <[email protected]>
Signed-off-by: Erik Ordentlich <[email protected]>
Signed-off-by: Erik Ordentlich <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome new feature! Minor comments that are mostly stylistic.
Signed-off-by: Erik Ordentlich <[email protected]>
Signed-off-by: Erik Ordentlich <[email protected]>
Great suggestions. Will incorporate in revision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Signed-off-by: Erik Ordentlich <[email protected]>
Signed-off-by: Erik Ordentlich <[email protected]>
build |
@eordentlich I have a question in mind: Would it be beneficial to move "no-code-change" to an independent folder for better organization and future reference? Since this is an important feature, having a centralized location for reference or a pointer could be helpful. Do you have any thoughts on how cuDF or cuML direct users to "no-code-change"? |
cudf no code change for pandas is called cudf-pandas: https://rapids.ai/cudf-pandas/ and the code lives in cudf/pandas . The analogue in our case could be spark_rapids_ml/pyspark which seems a little strange to me. Open to suggestions. |
Agree. Also, it seems nontrivial to maintain a repository or a folder. We can revisit this in the future when needed. |
--bucket ${GCS_BUCKET} \ | ||
--enable-component-gateway \ | ||
--subnet=default \ | ||
--no-shielded-secure-boot | ||
``` | ||
**Note**: the `properties` settings are for demonstration purposes only. Additional tuning may be required for optimal performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow the properties got left out of the shell block here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Looks like indenting to the --
fixes it and preserves no-spaces between the entries on copy.
notebooks/aws-emr/README.md
Outdated
- In the [AWS EMR console](https://console.aws.amazon.com/emr/), click "Clusters", you can find the cluster id of the created cluster. Wait until all the instances have the Status turned to "Running". | ||
- In the [AWS EMR console](https://console.aws.amazon.com/emr/), click "Workspace(Notebooks)", then create a workspace. Wait until the status becomes ready and a JupyterLab webpage will pop up. | ||
- In the [AWS EMR console](https://console.aws.amazon.com/emr/), click "Clusters", you can find the cluster id of the created cluster. Wait until the cluster has the "Waiting" status. | ||
- To use notebooks on EMR you will need an EMR Studio and an associated Workspace. If you don't already have these, in the [AWS EMR console](https://console.aws.amazon.com/emr/), on the left, in the "EMR Studio" section, click the respective "Studio" and "Workspace (Notebooks)" links and follow instructions. Please check EMR documentation for further instructions. Note that the Studio VPC should match the VPC of the subnet used for the cluster. Select "\*Default\*" for all security group prompts and drop downs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the user must set up internet gateway for the VPC along with routing table - do we want to make those steps explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will clarify that the subnet should have internet access. We actually need it to install things in the init scripts even for the emr cluster vpc. Adding a few other clarifications.
Signed-off-by: Erik Ordentlich <[email protected]>
build |
Signed-off-by: Erik Ordentlich <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.