Evaluate on Jamba-1.5-Mini #69

coranholmes · 2024-09-24T04:22:39Z

The scripts stuck at:
[nltk_data] Downloading package punkt to /root/nltk_data...

But it works fine when I evaluate llama3-8b-instruct. I am wondering whether there is any setting I need to config for Jamba? I have already added the model in MODEL_SELCT

        jamba_mini)
            MODEL_PATH="${MODEL_DIR}/AI21-Jamba-1.5-mini"
            MODEL_TEMPLATE_TYPE="Jamba"
            MODEL_FRAMEWORK="hf"
            TOKENIZER_PATH=$MODEL_PATH
            ;;

template.py

'Jamba': "<|startoftext|><|bom|><|system|> You are a helpful assistant.<|eom|><|bom|><|user|> {task_template}<|eom|><|bom|><|assistant|> "

The text was updated successfully, but these errors were encountered:

coranholmes · 2024-09-24T15:46:51Z

I manage to make it run but I am getting the following error:

usage: prepare.py [-h] --save_dir SAVE_DIR [--benchmark BENCHMARK] --task TASK [--subset SUBSET] --tokenizer_path TOKENIZER_PATH [--tokenizer_type TOKENIZER_TYPE] --max_seq_length MAX_SEQ_LENGTH
                  [--num_samples NUM_SAMPLES] [--random_seed RANDOM_SEED] [--model_template_type MODEL_TEMPLATE_TYPE] [--remove_newline_tab] [--chunk_idx CHUNK_IDX] [--chunk_amount CHUNK_AMOUNT]
prepare.py: error: argument --tokenizer_type: expected one argument
[NeMo W 2024-09-24 15:27:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
      warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)

Predict niah_single_1
from benchmark_root/jamba_13k_v1/synthetic/4096/data/niah_single_1/validation.jsonl
to benchmark_root/jamba_13k_v1/synthetic/4096/pred/niah_single_1.jsonl
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/utils/manifest_utils.py", line 476, in read_manifest
    f = open(manifest.get(), 'r', encoding='utf-8')
FileNotFoundError: [Errno 2] No such file or directory: 'benchmark_root/jamba_13k_v1/synthetic/4096/data/niah_single_1/validation.jsonl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/afs/xxx/Codes/RULER/scripts/pred/call_api.py", line 333, in <module>
    main()
  File "/mnt/afs/xxx/Codes/RULER/scripts/pred/call_api.py", line 238, in main
    data = read_manifest(task_file)
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/utils/manifest_utils.py", line 478, in read_manifest
    raise Exception(f"Manifest file could not be opened: {manifest}")
Exception: Manifest file could not be opened: <class 'nemo.utils.data_utils.DataStoreObject'>: store_path=benchmark_root/jamba_13k_v1/synthetic/4096/data/niah_single_1/validation.jsonl, local_path=benchmark_root/jamba_13k_v1/synthetic/4096/data/niah_single_1/validation.jsonl
usage: prepare.py [-h] --save_dir SAVE_DIR [--benchmark BENCHMARK] --task TASK [--subset SUBSET] --tokenizer_path TOKENIZER_PATH [--tokenizer_type TOKENIZER_TYPE] --max_seq_length MAX_SEQ_LENGTH
                  [--num_samples NUM_SAMPLES] [--random_seed RANDOM_SEED] [--model_template_type MODEL_TEMPLATE_TYPE] [--remove_newline_tab] [--chunk_idx CHUNK_IDX] [--chunk_amount CHUNK_AMOUNT]
prepare.py: error: argument --tokenizer_type: expected one argument
[NeMo W 2024-09-24 15:27:10 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
      warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)

hsiehjackson · 2024-09-24T16:42:35Z

If you setup TOKENIZER_PATH then you should also setup TOKENIZER_TYPE. You can set TOKENIZER_TYPE=hf. I setup for anyone who doesn't have TOKENIZER_PATH in here.

coranholmes · 2024-09-25T07:23:05Z

I remove TOKENIZER_PATH=$MODEL_PATH in the config_models.sh and I finally manage to evaluate Jamba with RULER. I'll share some of the tips here. If you are using the image cphsieh/ruler:0.1.0 provided by the author, you need to make several changes to the environment:

reinstall transformers >=4.44.2 huggingface_hub==0.23.2
reinstall memba-ssm casual-conv1d>=1.2.0 so that they match your cuda version
Or it will cause the following error:

undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb

install bitsandbytes

I manage to run with MODEL_FRAMEWORK="hf" but not vllm, still not be able to figure out the right versions of different packages to run with vllm. I attach a complete pip list in case anyone needs it.
ruler_pip_list.txt

hsiehjackson · 2024-10-08T06:21:51Z

@coranholmes Can you try vllm with the latest container cphsieh/ruler:0.2.0?

dawenxi-007 · 2024-10-09T22:10:12Z

@hsiehjackson , I tried cphsieh/ruler:0.2.0 and got the following error:
ValueError: Fast Mamba kernels are not available. Make sure to they are installed and that the mamba module is on a CUDA device

I tried different versions of transformers, it didn't help. Any suggestions?
causal-conv1d==1.4.0
mamba-ssm==2.2.2

hsiehjackson · 2024-10-10T16:03:10Z

@dawenxi-007 Can you try pull the docker again? I update to be compatible with HF and vLLM.

dawenxi-007 · 2024-10-13T17:56:46Z

@hsiehjackson
I built the docker with latest code repo
cd docker/ DOCKER_BUILDKIT=1 docker build -f Dockerfile -t cphsieh/ruler:0.2.0

If in the config_models.sh, I set MODEL_FRAMEWORK="hf", I still got the following error:
ValueError: Fast Mamba kernels are not available. Make sure to they are installed and that the mamba module is on a CUDA device

If in the config_models.sh, I set MODEL_FRAMEWORK="vllm", I got the following error:
ValueError: The number of required GPUs exceeds the total number of available GPUs in the placement group.
It seems vllm mode does not support TP.

hsiehjackson · 2024-10-14T20:32:10Z

@dawenxi-007 can you check whether you can see GPUs inside the docker container?

dawenxi-007 · 2024-10-15T00:06:27Z

@hsiehjackson , yes, I forgot to enable the gpus argument. Now I can see the model is loading into the GPUs, however, I got the following OOM error:
`
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 39.38 GiB of which 1.07 GiB is free. Process 340734 has 38.30 GiB memory in use. Of the allocated memory 34.95 GiB is allocated by PyTorch, and 2.86 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

`
Jamba mini 1.5 is a 52B model, with a BF16, it should just occupy 100GB with some for the KV cache. I have 4xA100s (40G), not sure why.

hsiehjackson · 2024-10-15T05:03:31Z

Do you use HF or vLLM? If you are using vLLM, maybe you can first reduce max_position_embeddings in the config.json since vLLM will occupy enough memory for running that length.

dawenxi-007 · 2024-11-21T19:25:57Z

Thanks @hsiehjackson!

I set it with HF and now was able to run it with 4xH100s. The results are a little bit confusing to me. In the config.tasks.sh file, I have all 13 tasks enabled as the following,

synthetic=(
    "niah_single_1"
    "niah_single_2"
    "niah_single_3"
    "niah_multikey_1"
    "niah_multikey_2"
    "niah_multikey_3"
    "niah_multivalue"
    "niah_multiquery"
    "vt"
    "cwe"
    "fwe"
    "qa_1"
    "qa_2"
)

However, the results only show 8 of them, for example, for 128K,

0,1,2,3,4,5,6,7,8
Tasks,niah_single_1,niah_multikey_2,niah_multikey_3,vt,cwe,fwe,qa_1,qa_2
Score,100.0,86.2,85.0,72.68,99.44,99.93,79.8,53.0
Nulls,0/500,0/500,0/500,0/500,0/500,0/500,0/500,0/500

This made the avg core of the result worse than the official number you posted. Do you have any idea what could be the issue?

hsiehjackson · 2024-11-21T19:30:15Z

@dawenxi-007 can you check whether you have all 13 prediction jsonl files under folder ${RESULTS_DIR}/pred? I think maybe some of your runs are not successfully finished.

dawenxi-007 · 2024-11-21T19:33:06Z

Only 8.

root@node060:/workspace/benchmark/RULER/scripts/benchmark_root/jamba1.5-mini/synthetic/131072/pred# ls
cwe.jsonl  fwe.jsonl  niah_multikey_2.jsonl  niah_multikey_3.jsonl  niah_single_1.jsonl  qa_1.jsonl  qa_2.jsonl  submission.csv  summary.csv  vt.jsonl

hsiehjackson · 2024-11-21T19:37:50Z

You can also check /workspace/benchmark/RULER/scripts/benchmark_root/jamba1.5-mini/synthetic/131072/data should have 13 files.
If yes, then the issue is that the runs for those 5 missing tasks have crashed.

dawenxi-007 · 2024-11-21T19:42:35Z

Oh, yes, there are 13 files under the data folder.
I am curious what caused the crash. I checked all other context length, all of them only have 8 results, even for 4k.

dawenxi-007 · 2024-11-21T20:09:36Z

It seems that some nltk errors stopped some of those tests.

Traceback (most recent call last):
  File "/workspace/benchmark/RULER/scripts/data/synthetic/niah.py", line 271, in <module>
    main()
  File "/workspace/benchmark/RULER/scripts/data/synthetic/niah.py", line 262, in main
    write_jsons = generate_samples(
  File "/workspace/benchmark/RULER/scripts/data/synthetic/niah.py", line 215, in generate_samples
    input_text, answer = generate_input_output(num_haystack)
  File "/workspace/benchmark/RULER/scripts/data/synthetic/niah.py", line 143, in generate_input_output
    document_sents = sent_tokenize(text.strip())
  File "/usr/local/lib/python3.10/dist-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
    tokenizer = _get_punkt_tokenizer(language)
  File "/usr/local/lib/python3.10/dist-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
    return PunktTokenizer(language)
  File "/usr/local/lib/python3.10/dist-packages/nltk/tokenize/punkt.py", line 1744, in __init__
    self.load_lang(lang)
  File "/usr/local/lib/python3.10/dist-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
    lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
  File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 579, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/english/

  Searched in:
    - '/root/nltk_data'
    - '/usr/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


Prepare niah_single_3 with lines: 500 to benchmark_root/jamba1.5-mini/synthetic/4096/data/niah_single_3/validation.jsonl
Used time: 0.2 minutes

dawenxi-007 · 2024-11-21T21:32:15Z

The root cause is that some tests rely on the nltk data packages, which is missing in the docker image. After I downloaded the packages, it seems running okay now. I would update it if any new issue popped out. But thanks for your help! @hsiehjackson

dawenxi-007 · 2024-11-27T02:31:34Z

@hsiehjackson I was able to get the all the results. One more question, do we have detail descriptions for all the 13 tests. I noticed that the paper only shows 8 tests with examples at Table 2. From the tests, QA_2 gave the much worse results compared to others. Want to know a little bit more details on why.

hsiehjackson · 2024-11-27T02:37:54Z

You can find the description of 13 tasks in Appendix B Table 5 :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate on Jamba-1.5-Mini #69

Evaluate on Jamba-1.5-Mini #69

coranholmes commented Sep 24, 2024 •

edited

Loading

coranholmes commented Sep 24, 2024

hsiehjackson commented Sep 24, 2024

coranholmes commented Sep 25, 2024

hsiehjackson commented Oct 8, 2024

dawenxi-007 commented Oct 9, 2024 •

edited

Loading

hsiehjackson commented Oct 10, 2024

dawenxi-007 commented Oct 13, 2024

hsiehjackson commented Oct 14, 2024

dawenxi-007 commented Oct 15, 2024

hsiehjackson commented Oct 15, 2024

dawenxi-007 commented Nov 21, 2024

hsiehjackson commented Nov 21, 2024

dawenxi-007 commented Nov 21, 2024

hsiehjackson commented Nov 21, 2024

dawenxi-007 commented Nov 21, 2024

dawenxi-007 commented Nov 21, 2024

dawenxi-007 commented Nov 21, 2024

dawenxi-007 commented Nov 27, 2024

hsiehjackson commented Nov 27, 2024

Evaluate on Jamba-1.5-Mini #69

Evaluate on Jamba-1.5-Mini #69

Comments

coranholmes commented Sep 24, 2024 • edited Loading

coranholmes commented Sep 24, 2024

hsiehjackson commented Sep 24, 2024

coranholmes commented Sep 25, 2024

hsiehjackson commented Oct 8, 2024

dawenxi-007 commented Oct 9, 2024 • edited Loading

hsiehjackson commented Oct 10, 2024

dawenxi-007 commented Oct 13, 2024

hsiehjackson commented Oct 14, 2024

dawenxi-007 commented Oct 15, 2024

hsiehjackson commented Oct 15, 2024

dawenxi-007 commented Nov 21, 2024

hsiehjackson commented Nov 21, 2024

dawenxi-007 commented Nov 21, 2024

hsiehjackson commented Nov 21, 2024

dawenxi-007 commented Nov 21, 2024

dawenxi-007 commented Nov 21, 2024

dawenxi-007 commented Nov 21, 2024

dawenxi-007 commented Nov 27, 2024

hsiehjackson commented Nov 27, 2024

coranholmes commented Sep 24, 2024 •

edited

Loading

dawenxi-007 commented Oct 9, 2024 •

edited

Loading