-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC] Project 4: Hyperparameter Optimization API in Katib for LLMs #2339
Comments
/area gsoc |
/assign |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-label lifecycle/stale |
@Electronic-Waste: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle stale |
Motivation
The rapid advancements and growing popularity of Large Language Models (LLMs) have driven an increased need for effective LLMOps in Kubernetes environments. To address this, we developed a train API within the Training Python SDK, simplifying the process of fine-tuning LLMs using distributed PyTorchJob workers. However, hyperparameter optimization remains a crucial yet labor-intensive task for enhancing model performance.
Goal
This project aims to develop a high-level API for tuning hyperparameters of LLMs that automates the process of hyperparameter optimization in Kubernetes.
By leveraging the capabilities of Katib and Training Operator, this API allows users to define custom objective function or import pretained models and datasets from external platforms like HuggingFace and Amazon S3, as well as specify objective metrics, optimization algorithm, optimization goal, resources configuration, etc, then this API will automate the creation and execution of Experiment and Trials to find out best hyperparameters. This abstraction of Kubernetes infrastructure complexities will enable data scientists to optimize hyperparameters efficiently and effectively.
What I Did in GSoC Project & Ongoing Works
tune
API for LLM hyperparameters optimization #2393tune
API #2423tune
api with LLM hyperparameter optimization #2420huggingface_hub
Version in the storage initializer to fix ImportError training-operator#2180train_args
andlora_config
training-operator#2181train
API training-operator#2187What I Learned from This Project
This is my first experience with open source, and as a beginner with Docker and Kubernetes, I gained significant knowledge throughout this project. Beyond understanding containers, Kubernetes, API development, and CI/CD pipelines, I’ve learned valuable lessons that will benefit my future studies and work:
Think from the User's Perspective: One key lesson was the importance of considering the user’s needs. Discussing API design with my mentors taught me to focus on what functionalities users need and how they prefer to use them. Listening to users’ feedback is crucial for effective product design.
Don't Fear Bugs: I used to be flustered by bugs and unsure how to address them. My mentor guided me through the debugging process, showing me how to understand and trace bugs. The key is to approach debugging methodically and think through the problem.
Communication is Important: Communication is important in collaboration, especially in open source projects. There are various ways of communicating in open-source projects, such as GitHub issues or PRs, Slack, and community meetings. And I’m grateful to my mentor for discussing my challenges during weekly meeting and providing invaluable guidance.
Every Contribution Counts: Initially, I thought contributing to open source was complex. I learned that every contribution, no matter how small, is valuable and appreciated. For example, contributing to documentation is crucial, especially for newcomers.
In The End
Thank you to Google for this invaluable opportunity. I’m deeply grateful to everyone who supported me throughout this project @andreyvelich @johnugeorge @deepanker13 @tenzen-y @nsingl00 @Electronic-Waste . Your suggestions, advice, and help were essential to completing my work.
And I want to say huge thanks to my mentor @andreyvelich . I'm impressed by your deep knowledge of the project and the industry, and your willingness to help. Your encouragement during our first meeting, sharing that you also found Kubernetes challenging at first, gave me great confidence. I appreciate the time and effort you invested in guiding me through this project, from the overall design of the API to the details of code formatting. I’ve learned a lot from your guidance.
I believe that anyone contributing to open source in their spare time has a passion for coding, and I’m glad to have worked with such a dedicated group. I will continue contributing and hope to support other beginners in the future.
The text was updated successfully, but these errors were encountered: