Skip to content

Commit

Permalink
Hipify blog post
Browse files Browse the repository at this point in the history
---------

Co-authored-by: Suyash Tandon <[email protected]>
Co-authored-by: Gina Sitaraman <[email protected]>
Co-authored-by: Sriranjani Sitaraman <[email protected]>
Co-authored-by: Maria Ruiz Varela <[email protected]>
Co-authored-by: Suyash Tandon <[email protected]>
Co-authored-by: bobrobey <[email protected]>
Co-authored-by: Sam Wu <[email protected]>
  • Loading branch information
7 people committed Apr 26, 2024
1 parent 1ae2a97 commit e29e1ac
Show file tree
Hide file tree
Showing 10 changed files with 832 additions and 24 deletions.
19 changes: 0 additions & 19 deletions .markdownlint-cli2.yaml

This file was deleted.

18 changes: 18 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
default: true
MD010:
code_blocks: false
MD013: false
MD026:
punctuation: '.,;:'
MD029:
style: ordered
MD033: false
MD034: false
MD041: false
MD046: false # Allow indented code blocks (which are output by nbconvert)
MD051: false
ignores:
- CHANGELOG.md
- docs/CHANGELOG.md
- "{,docs/}{RELEASE,release}.md"
- tools/autotag/templates/**/*.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,16 +92,18 @@ you'll use for pre-training. The cell blocks with `%%sh` represent Linux command

```sh
%%sh
#Install DeepSpeed
python -m pip install --upgrade pip

cd /home/aac
#Install DeepSpeed
home_dir=$PWD
cd $home_dir
git clone --recursive https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed
pip install .[dev,1bit,autotuning]

# Clone Megatron-DeepSpeed

cd /home/aac
cd $home_dir
git clone https://github.com/microsoft/Megatron-DeepSpeed.git
cd Megatron-DeepSpeed
pip3 install pybind11 nltk transformers
Expand All @@ -120,15 +122,15 @@ bash download_vocab.sh

# Pre-process data for oscar dataset

export BASE_SRC_PATH=/home/aac/Megatron-DeepSpeed
export BASE_SRC_PATH=$home_dir/Megatron-DeepSpeed
export BASE_DATA_PATH=${BASE_SRC_PATH}/dataset
python3 ${BASE_SRC_PATH}/tools/preprocess_data.py --input ${BASE_DATA_PATH}/oscar-1GB.jsonl --output-prefix ${BASE_DATA_PATH}/my-gpt2 --vocab-file ${BASE_DATA_PATH}/gpt2-vocab.json --dataset-impl mmap --tokenizer-type GPT2BPETokenizer --merge-file ${BASE_DATA_PATH}/gpt2-merges.txt --append-eod --workers 8

# Install FlashAttention (optional). FlashAttention delivers a rapid and memory-efficient
# solution for attention mechanisms. If you don't want to use FlashAttention, remove
# the '--use-flash-attn' flag in the script.

cd /home/aac
cd $home_dir
git clone --recursive https://github.com/ROCmSoftwarePlatform/flash-attention.git
cd flash-attention
py_version=$(python -V | grep -oP '(?<=[.])\w+(?=[.])')
Expand Down
Loading

0 comments on commit e29e1ac

Please sign in to comment.