Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mountPath to use /tmp instead of /data #1584

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

alvarobartt
Copy link
Contributor

Description

This PR updates the mountPath so as to point to /tmp instead of /data, as the HF_HOME environment variable in the TGI DLCs is set to /tmp instead of the default /data set within the TGI default image (see Dockerfile); the change has always been there for the TGI DLC, but when setting the container URI i.e. the image within the Kubernetes manifests to the TGI DLC instead of the default TGI image (hosted on GitHub) as per GoogleCloudPlatform/ai-on-gke#816, the mountPath were not updated, and it was still using /data which was leading to some issues when running the examples.

This has already been fixed within this repository in this PR, and also within GoogleCloudPlatform/ai-on-gke#852.

cc @annapendleton @raushan2016

Closes #1581

Tasks

  • The contributing guide has been read and followed.
  • The samples added / modified have been fully tested.
  • Workflow files have been added / modified, if applicable.
  • Region tags have been properly added, if new samples.
  • Editable variables have been used, where applicable.
  • All dependencies are set to up-to-date versions, as applicable.
  • Merge this pull-request for me once it is approved.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly understand the changes:

This pull request, authored by alvarobartt, addresses an issue where the mountPath in several Kubernetes YAML files was incorrectly set to /data instead of /tmp. The issue stemmed from a mismatch between the default image and a Text Generation Inference (TGI) Docker image, where the HF_HOME environment variable points to /tmp.

Intent: To correct the mountPath to /tmp for consistency with the TGI DLCs and resolve issues encountered when using the TGI DLCs in Kubernetes manifests.

Changes:

The pull request modifies four YAML files, updating the mountPath from /data to /tmp within the volumeMounts section:

  • ai-ml/llm-multiple-gpus/falcon-40b/text-generation-inference.yaml (line 54)
  • ai-ml/llm-multiple-gpus/llama2-70b/text-generation-inference.yaml (line 59)
  • ai-ml/llm-multiple-gpus/llama3-70b/text-generation-inference.yaml (line 61)
  • ai-ml/llm-multiple-gpus/mixtral-8x7b/text-generation-inference.yaml (line 59)

Activity: The PR description mentions that this fix has already been implemented in another repository and references related pull requests (GoogleCloudPlatform/ai-on-gke#816 and GoogleCloudPlatform/ai-on-gke#852).

This change ensures consistency across different environments and resolves potential runtime issues.


A little haiku to lighten the mood:

/data path is wrong,
/tmp now holds the secrets,
Code flows smoothly on.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR updates the mountPath for several LLM deployments to use /tmp instead of /data, aligning with the HF_HOME environment variable setting in the TGI DLC images. This change is necessary to avoid issues when using TGI DLC images as the container URI in Kubernetes manifests. The PR description clearly explains the problem, its cause, and the solution. The references to related PRs provide helpful context. Overall, the changes are well-justified and effectively address the identified issue.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide my feedback shortly. In the meantime, here's a summary for you and other reviewers to quickly get up to speed:

This pull request, authored by alvarobartt, aims to fix an issue where the mountPath in several Kubernetes YAML files was incorrectly set to /data instead of /tmp. This was causing problems because the HF_HOME environment variable in the Text Generation Inference (TGI) Docker images is set to /tmp.

The issue stemmed from using the TGI DLC (Docker container image) instead of the default TGI image. While the DLC already used /tmp, the Kubernetes manifests weren't updated to reflect this change.

Here's a breakdown of the changes:

This fix ensures that the examples run correctly when using the TGI DLC.


And now, a little haiku to lighten the mood:

/data path is wrong,
/tmp now holds the secrets,
Code compiles at last!

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR updates the mountPath to /tmp to align with the HF_HOME environment variable in TGI DLCs. This change is necessary for compatibility with the TGI DLC images and resolves issues encountered when using these images in Kubernetes deployments. The PR description clearly outlines the problem, its cause, the solution, and related fixes in other repositories. Overall, the changes are well-justified and implemented correctly.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide feedback shortly. In the meantime, here's a summary for you and other reviewers to quickly get up to speed:

This pull request, submitted by alvarobartt, addresses an issue where the mountPath in several Kubernetes YAML files was incorrectly set to /data instead of /tmp. The issue arose due to a mismatch between the default image and a Text Generation Inference (TGI) Docker image used in the examples. The TGI DLC uses /tmp as its HF_HOME environment variable, while the default image uses /data. This PR corrects the mountPath to /tmp for consistency.

Here's a breakdown of the changes:

I'll provide a more detailed review shortly.


A little haiku to brighten your day:

/tmp now takes stage,
Data's path, a thing of the past,
Smooth code flows again.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR correctly addresses the mountPath issue with TGI DLCs. The change ensures consistency with the HF_HOME environment variable and resolves the issues encountered when using TGI DLC images. The PR description is well-written and provides sufficient context, including references to related PRs. The suggested comments enhance clarity and maintainability across the affected YAML files.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly understand the changes:

This pull request, submitted by alvarobartt, addresses an issue where the mountPath in several Kubernetes YAML files was incorrectly set to /data instead of /tmp. The issue arose when switching to a new Docker image where the HF_HOME environment variable points to /tmp.

Intent: The primary goal is to correct the mountPath to /tmp for consistency with the new Docker image and to resolve issues caused by the mismatch.

Changes:

The pull request modifies four YAML files, updating the mountPath from /data to /tmp within the volumeMounts section:

  • ai-ml/llm-multiple-gpus/falcon-40b/text-generation-inference.yaml (lines 54-55)
  • ai-ml/llm-multiple-gpus/llama2-70b/text-generation-inference.yaml (lines 59-60)
  • ai-ml/llm-multiple-gpus/llama3-70b/text-generation-inference.yaml (lines 61-62)
  • ai-ml/llm-multiple-gpus/mixtral-8x7b/text-generation-inference.yaml (lines 59-60)

Activity: The PR description mentions that this fix has already been implemented in other related PRs (links provided in the description).

This PR is a straightforward fix that ensures consistency and resolves existing issues. I'll provide a more detailed review shortly.


A tiny poem for your coding pleasure:

A path mislaid,
From /data to /tmp's shade,
Now all is made right.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR updates the mountPath for Text Generation Inference deployments to /tmp to align with the HF_HOME environment variable in TGI DLC images. This change is necessary to resolve issues encountered when using TGI DLC images in Kubernetes deployments. The PR description provides a clear explanation of the issue, its cause, and the proposed solution. Referencing related PRs adds valuable context.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide feedback shortly. In the meantime, here's a summary for you and other reviewers to quickly get up to speed:

This pull request, authored by alvarobartt, addresses an issue where the mountPath in several Kubernetes YAML files was incorrectly set to /data instead of /tmp. The issue stems from a discrepancy between the default TGI image and the TGI DLC (Docker container image). The DLC uses /tmp as the location for the HF_HOME environment variable, where the Hugging Face model is stored. This PR updates the mountPath to /tmp to resolve compatibility problems.

Here's a breakdown of the changes:

  • Intent: Correct the mountPath in Kubernetes YAML files to point to /tmp instead of /data for consistency with the TGI DLC's HF_HOME environment variable.
  • Changes: The mountPath in the following files has been updated from /data to /tmp:
    • ai-ml/llm-multiple-gpus/falcon-40b/text-generation-inference.yaml (lines 54-57)
    • ai-ml/llm-multiple-gpus/llama2-70b/text-generation-inference.yaml (lines 59-62)
    • ai-ml/llm-multiple-gpus/llama3-70b/text-generation-inference.yaml (lines 61-64)
    • ai-ml/llm-multiple-gpus/mixtral-8x7b/text-generation-inference.yaml (lines 59-62)
  • Activity: The PR description mentions related work in other repositories and closes issue Serve an LLM with multiple GPUs in GKE is doesn't work and fails with The node was low on resource: ephemeral-storage. #1581.

I'll provide a more detailed review shortly. In the meantime, let me know if you have any questions.


A little haiku to lighten the mood:

/tmp now takes stage,
Models load, a happy sight,
Fixes flow like streams.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR makes a necessary change to ensure compatibility between the Kubernetes manifests and the TGI DLC image. The change is well-explained and the provided context is helpful. A minor suggestion would be to standardize the comments across all affected files for better consistency.

Copy link
Member

@raushan2016 raushan2016 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There still some issue in this sample. It fails with

"Traceback (most recent call last):
  File ""/opt/conda/lib/python3.11/site-packages/text_generation_server/models/flash_causal_lm.py"", line 1269, in warmup
    _, batch, _ = self.generate_token(batch)
  File ""/opt/conda/lib/python3.11/contextlib.py"", line 81, in inner
    return func(*args, **kwds)
  File ""/opt/conda/lib/python3.11/site-packages/text_generation_server/models/flash_causal_lm.py"", line 1730, in generate_token
    prefill_logprobs_tensor = torch.log_softmax(out, -1)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 514.00 MiB. GPU 0 has a total capacity of 21.96 GiB of which 393.12 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 20.87 GiB is allocated by PyTorch, and 382.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I even tried with g2-standard-48 and it still keeps on crashing. Can you please run some tests to validate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all logs feel free to request access. But the error above should be enough to find the issue and diff between TGI default and TGI DLC image, as the only difference is the image and mountpath
https://docs.google.com/spreadsheets/d/1hKZP9X2ueP-Zvnb9zIXMLk6LeGXvfr-mGR6z-NGfA3s/edit?gid=1556804789#gid=1556804789

Copy link
Member

@raushan2016 raushan2016 Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked with a2-highgpu-2g which has 40G (A100) GPU memory. Which means there is something not right with the DLC image and breaks using LLAMA3-70B running on L4 GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed report @raushan2016, let me run some tests on our end to investigate and I'll ping you as soon as those are completed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure sounds fair, I'll try to investigate the issue today! Thanks for your time!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alvarobartt Somehow things are working now, maybe the issue was fixed in the image since the image is pointing to latest ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the confirmation, but yes that is odd indeed, because AFAIK us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu124.2-3.ubuntu2204.py311 was released some time ago (not sure if they rolled an update over that image, but doesn't look like it as per https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry
For LLAMA3 this image works us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-1.ubuntu2204.py310

But the current image us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu124.2-3.ubuntu2204.py311 doesn't work.

So something related to the changed TGI or Cuda version. Will really be helpful if you can look into this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the PR with right images which are working
#1591

@@ -51,7 +51,10 @@ spec:
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alvarobartt Falcon model has this error. Tried couple of different version > 2 all of them failed.

INFO 2025-01-16T23:04:59.120810571Z [resource.labels.containerName: llm] �[2m2025-01-16T23:04:59.120688Z�[0m �[31mERROR�[0m �[2mtext_generation_launcher�[0m�[2m:�[0m Error when initializing model
INFO 2025-01-16T23:04:59.120839184Z [resource.labels.containerName: llm] Traceback (most recent call last): File "/opt/conda/bin/text-generation-server", line 8, in sys.exit(app()) File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call return get_command(self)(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main return _main( File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main rv = self.invoke(ctx) File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper return callback(**use_params) # type: ignore File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve server.serve( File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve asyncio.run( File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete self.run_forever() File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once handle._run() File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run self._context.run(self._callback, *self._args)
INFO 2025-01-16T23:04:59.120940322Z [resource.labels.containerName: llm] > File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
INFO 2025-01-16T23:04:59.120942462Z [resource.labels.containerName: llm] model = get_model(
INFO 2025-01-16T23:04:59.120944599Z [resource.labels.containerName: llm] File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 714, in get_model
INFO 2025-01-16T23:04:59.120946477Z [resource.labels.containerName: llm] return FlashRWSharded(
INFO 2025-01-16T23:04:59.120948410Z [resource.labels.containerName: llm] File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_rw.py", line 77, in init
INFO 2025-01-16T23:04:59.120950825Z [resource.labels.containerName: llm] model = FlashRWForCausalLM(config, weights)
INFO 2025-01-16T23:04:59.120953217Z [resource.labels.containerName: llm] File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 659, in init
INFO 2025-01-16T23:04:59.120955289Z [resource.labels.containerName: llm] self.transformer = FlashRWModel(config, weights)
INFO 2025-01-16T23:04:59.120957337Z [resource.labels.containerName: llm] File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 593, in init
INFO 2025-01-16T23:04:59.120959345Z [resource.labels.containerName: llm] [
INFO 2025-01-16T23:04:59.120961371Z [resource.labels.containerName: llm] File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 594, in
INFO 2025-01-16T23:04:59.120963316Z [resource.labels.containerName: llm] FlashRWLargeLayer(layer_id, config, weights)
INFO 2025-01-16T23:04:59.120965300Z [resource.labels.containerName: llm] File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 525, in init
INFO 2025-01-16T23:04:59.120967342Z [resource.labels.containerName: llm] self.ln_layer = FlashRWLayerNorm(config, prefix, weights)
INFO 2025-01-16T23:04:59.120969434Z [resource.labels.containerName: llm] File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 484, in init
INFO 2025-01-16T23:04:59.120971427Z [resource.labels.containerName: llm] self.num_ln = config.num_ln_in_parallel_attn
INFO 2025-01-16T23:04:59.120973477Z [resource.labels.containerName: llm] File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 264, in getattribute
INFO 2025-01-16T23:04:59.120975858Z [resource.labels.containerName: llm] return super().getattribute(key)
INFO 2025-01-16T23:04:59.120977917Z [resource.labels.containerName: llm] AttributeError: 'RWConfig' object has no attribute 'num_ln_in_parallel_attn'
INFO 2025-01-16T23:04:59.160341634Z [resource.labels.containerName: llm] �[2m2025-01-16T23:04:59.160221Z�[0m �[31mERROR�[0m �[2mtext_generation_launcher�[0m�[2m:�[0m Error when initializing model

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found an issue but its resolved > 6 months back
huggingface/text-generation-inference#2349

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants