V1.2.0 cicl #214

reymondzzzz · 2023-11-08T11:53:12Z

No description provided.

* Print statements for debugging and initial support for Code Llama * Added multiple print statements for debugging fine tuning * Added support for Code Llama 7b * Depending on the training parameters I set I either get an out of memory GPU error or ValueError(“optimizer got an empty parameter list”) * Code Llama fine-tuning but fails on checkpoint * commenting print statements * updating default config behavior * Begin adding encoding for Code Llama * adding BOS and EOS tokens for Code Llama, model running properly * getting rid of #? * Print statements for debugging and initial support for Code Llama * Added multiple print statements for debugging fine tuning * Added support for Code Llama 7b * Depending on the training parameters I set I either get an out of memory GPU error or ValueError(“optimizer got an empty parameter list”) * Code Llama fine-tuning but fails on checkpoint * commenting print statements * updating default config behavior * Begin adding encoding for Code Llama * adding BOS and EOS tokens for Code Llama, model running properly * getting rid of #?

saving in safe_tensors format

…for some models to set a lora to them)

TOKENIZERS_PARALLELISM=false while finetuning

add inference fixes for codellama

* add deepseek inference and finetuning * no extra kwargs * add deepseek-ai/deepseek-coder-5.7bmqa-base

Revert "move caps to the root" This reverts commit b440517.

* fixes * add extra ENVS to use fork() method inside the container

mitya52 and others added 30 commits November 1, 2023 10:02

sometimes nvidia-smi returns [N/A] temperature, so we need to handle it

44fcdf7

container fix

92bbf97

finetune filtering is almost working, no it is finetune's turn

b22b6c8

new datasource

23d3437

rm finetune filter from data pipeline

60c87f0

massive datapipeline refactoring

9acdb65

kind of working version

b1a69f2

flash sa

3091c30

flash attention for starcoders

f199232

relaxing requirements

2dcfb9f

cherry-picking fixes

ff31fc2

more logging

d17b1db

trainable_embeddings

31cb823

goodbye codify model

968a33c

saving in safe_tensors format

being able to load safetensors

4771dc9

fix flashattn dependency

dc92779

fix flashattn dependency

0007881

fixing layer retrieval for some models (now we correctly find layers …

3d3901f

…for some models to set a lora to them)

removed flash-att installation from docker file

6abd0a1

correct saving shared weights

af5546a

TOKENIZERS_PARALLELISM=false while finetuning

dump status each iteration

5e94cfe

log fix

a4835ac

forward packages explicitly

96be51f

rm encodings, legacy scratchpads

e6294bf

rm legacy code contrast diff format

c5caed6

codellama/7b fixes

687a524

add flash attention to codellama

122833a

add inference fixes for codellama

Deepseek models (#209)

c9d388a

* add deepseek inference and finetuning * no extra kwargs * add deepseek-ai/deepseek-coder-5.7bmqa-base

do not install flash-attention by default

9a4ba6a

mitya52 and others added 7 commits November 6, 2023 15:48

rename deepseek models

594b4b0

move caps to the root

1184ea3

Revert "move caps to the root" This reverts commit b440517.

Nightly fixes (#213)

c51dc14

* fixes * add extra ENVS to use fork() method inside the container

support HTTP_PROXY=http://x.x.x.x

bd8cbac

HTTP_PROXY fix async too

c2e2fd3

shorter loop local_files_only

20931af

add base

b52eeb3

reymondzzzz closed this Nov 8, 2023

klink linked an issue Nov 9, 2023 that may be closed by this pull request

add check for the minumum number of files for fine-tuning job #224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V1.2.0 cicl #214

V1.2.0 cicl #214

reymondzzzz commented Nov 8, 2023

V1.2.0 cicl #214

V1.2.0 cicl #214

Conversation

reymondzzzz commented Nov 8, 2023