You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
composer_collect_env
Collecting system information...
---------------------------------
System Environment Report
Created: 2025-02-02 12:32:57 CST
---------------------------------
PyTorch information
-------------------
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 15.2 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.6)
CMake version: version 3.31.5
Libc version: N/A
Python version: 3.12.5 (main, Aug 14 2024, 04:32:18) [Clang 18.1.8 ] (64-bit runtime)
Python platform: macOS-15.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M1
Versions of relevant libraries:
[pip3] numpy==2.1.3
[pip3] onnx==1.17.0
[pip3] onnxruntime==1.20.1
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.5.1
[pip3] torch-optimizer==0.3.0
[pip3] torchmetrics==1.6.0
[pip3] torchvision==0.20.1
[conda] Could not collect
Composer information
--------------------
Composer Version: 0.28.0
Composer Commit Hash: None
CPU Model: Apple M1
CPU Count: 8
Number of Nodes: 1
GPU Model: N/A
GPUs per Node: 0
GPU Count: 1
CUDA Device Count: 0
composer train/train.py \
train/yamls/pretrain/mpt-125m.yaml \
variables.data_local=my-copy-c4 \
train_loader.dataset.split=train_small \
eval_loader.dataset.split=val_small \
max_duration=10ba \
eval_interval=0 \
save_folder=mpt-125m \
model.attn_config.attn_impl=torch model.loss_fn=torch_crossentropy precision=fp32
2025-02-02 11:16:09,562: rank0[2908][MainThread]: DEBUG: llmfoundry.command_utils.train: Initializing dist with device...
2025-02-02 11:16:09,566: rank0[2908][MainThread]: DEBUG: llmfoundry.command_utils.train: Testing barrier with device...
2025-02-02 11:16:09,566: rank0[2908][MainThread]: DEBUG: llmfoundry.command_utils.train: Barrier test passed with device.
/Users/devworks/github.com/uv-llm/llmfoundry/command_utils/train.py:351: UserWarning: FSDP is not applicable for single-GPU training. Reverting to DDP.
warnings.warn(
/Users/devworks/github.com/uv-llm/llmfoundry/utils/config_utils.py:525: UserWarning: Using `cfg.model.init_device='meta'` is only valid when using FSDP! Reverting to `cfg.model.init_device='cpu'`.
warnings.warn(
2025-02-02 11:16:09,567: rank0[2908][MainThread]: INFO: llmfoundry.command_utils.train: Building tokenizer...
2025-02-02 11:16:09,922: rank0[2908][MainThread]: INFO: llmfoundry.command_utils.train: Building train loader...
2025-02-02 11:16:09,922: rank0[2908][MainThread]: INFO: streaming.base.dataset: Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64.
2025-02-02 11:16:09,933: rank0[2908][MainThread]: INFO: llmfoundry.command_utils.train: Building eval loader...
2025-02-02 11:16:09,933: rank0[2908][MainThread]: INFO: streaming.base.dataset: Because `predownload` was not specified, it will default to 8*batch_size if batch_size is not None, otherwise 64.
2025-02-02 11:16:09,935: rank0[2908][MainThread]: INFO: llmfoundry.command_utils.train: Initializing model...
MPTForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
- If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
2025-02-02 11:16:09,936: rank0[2908][MainThread]: INFO: llmfoundry.models.mpt.modeling_mpt: Instantiating an MPTForCausalLM model from /Users/devworks/github.com/uv-llm/llmfoundry/models/mpt/modeling_mpt.py
2025-02-02 11:16:10,611: rank0[2908][MainThread]: INFO: llmfoundry.models.mpt.modeling_mpt: We recommend using config.init_device="meta" with Composer + FSDP for faster initialization.
2025-02-02 11:16:12,196: rank0[2908][MainThread]: DEBUG: llmfoundry.models.mpt.modeling_mpt: MPTModel(
(wte): SharedEmbedding(50368, 768)
(wpe): Embedding(2048, 768)
(emb_drop): Dropout(p=0.0, inplace=False)
(blocks): ModuleList(
(0-11): 12 x MPTBlock(
(norm_1): LPLayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): MultiheadAttention(
(Wqkv): Linear(in_features=768, out_features=2304, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(norm_2): LPLayerNorm((768,), eps=1e-05, elementwise_affine=True)
(ffn): MPTMLP(
(up_proj): Linear(in_features=768, out_features=3072, bias=True)
(down_proj): Linear(in_features=3072, out_features=768, bias=True)
)
(resid_attn_dropout): Dropout(p=0.0, inplace=False)
(resid_ffn_dropout): Dropout(p=0.0, inplace=False)
)
)
(norm_f): LPLayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
2025-02-02 11:16:12,197: rank0[2908][MainThread]: DEBUG: llmfoundry.models.mpt.modeling_mpt: Using kaiming_normal_ initialization.
2025-02-02 11:16:12,298: rank0[2908][MainThread]: INFO: llmfoundry.command_utils.train: Building trainer...
2025-02-02 11:16:12,299: rank0[2908][MainThread]: INFO: composer.utils.reproducibility: Setting seed to 17
2025-02-02 11:16:12,301: rank0[2908][MainThread]: INFO: composer.trainer.trainer: Run name: 1738516572-ginger-mushroom
/Users/devworks/github.com/uv-llm/.venv/lib/python3.12/site-packages/composer/callbacks/memory_monitor.py:137: UserWarning: The memory monitor only works on CUDA devices, but the model is on cpu.
warnings.warn(f'The memory monitor only works on CUDA devices, but the model is on {model_device.type}.')
2025-02-02 11:16:12,380: rank0[2908][MainThread]: INFO: composer.trainer.trainer: Stepping schedulers every batch. To step schedulers every epoch, set `step_schedulers_every_batch=False`.
2025-02-02 11:16:12,381: rank0[2908][MainThread]: INFO: composer.trainer.trainer: Setting seed to 17
2025-02-02 11:16:12,381: rank0[2908][MainThread]: INFO: composer.utils.reproducibility: Setting seed to 17
2025-02-02 11:16:12,381: rank0[2908][MainThread]: INFO: llmfoundry.command_utils.train: Logging config
variables:
data_local: my-copy-c4
data_remote: null
max_seq_len: 2048
global_seed: 17
run_name: null
max_seq_len: 2048
run_name: null
model:
name: mpt_causal_lm
init_device: meta
d_model: 768
n_heads: 12
n_layers: 12
expansion_ratio: 4
max_seq_len: 2048
vocab_size: 50368
attn_config:
attn_impl: torch
loss_fn: torch_crossentropy
tokenizer:
name: EleutherAI/gpt-neox-20b
kwargs:
model_max_length: 2048
train_loader:
name: text
dataset:
local: my-copy-c4
remote: null
split: train_small
shuffle: true
max_seq_len: 2048
shuffle_seed: 17
drop_last: true
num_workers: 8
eval_loader:
name: text
dataset:
local: my-copy-c4
remote: null
split: val_small
shuffle: false
max_seq_len: 2048
shuffle_seed: 17
drop_last: false
num_workers: 8
scheduler:
name: cosine_with_warmup
t_warmup: 100ba
alpha_f: 0.1
optimizer:
name: decoupled_adamw
lr: 0.0006
betas:
- 0.9
- 0.95
eps: 1.0e-08
weight_decay: 0.0
algorithms:
gradient_clipping:
clipping_type: norm
clipping_threshold: 1.0
max_duration: 10ba
eval_interval: 0
eval_first: false
eval_subset_num_batches: -1
global_train_batch_size: 256
seed: 17
device_eval_batch_size: 16
device_train_microbatch_size: 16
precision: fp32
fsdp_config: null
progress_bar: false
log_to_console: true
console_log_interval: 1ba
callbacks:
speed_monitor:
window_size: 10
lr_monitor: {}
memory_monitor: {}
runtime_estimator: {}
save_folder: mpt-125m
n_gpus: 1
device_train_batch_size: 256
device_train_grad_accum: 16
merge: true
tp_config: null
n_params: 125311488
n_active_params: 125311488
n_trainable_params: 125311488
2025-02-02 11:16:12,495: rank0[2908][MainThread]: INFO: llmfoundry.command_utils.train: Starting training...
2025-02-02 11:16:12,495: rank0[2908][MainThread]: INFO: composer.trainer.trainer: Using precision Precision.FP32
******************************
Config:
algorithms:
gradient_clipping:
clipping_threshold: 1.0
clipping_type: norm
callbacks:
lr_monitor: {}
memory_monitor: {}
runtime_estimator: {}
speed_monitor:
window_size: 10
composer_commit_hash: None
composer_version: 0.28.0
console_log_interval: 1ba
device_eval_batch_size: 16
device_train_batch_size: 256
device_train_grad_accum: 16
device_train_microbatch_size: 16
enabled_algorithms/GradientClipping: true
eval_first: false
eval_interval: 0
eval_loader:
dataset:
local: my-copy-c4
max_seq_len: 2048
remote: null
shuffle: false
shuffle_seed: 17
split: val_small
drop_last: false
name: text
num_workers: 8
eval_subset_num_batches: -1
fsdp_config: null
global_train_batch_size: 256
log_to_console: true
max_duration: 10ba
max_seq_len: 2048
merge: true
model:
attn_config:
attn_impl: torch
d_model: 768
expansion_ratio: 4
init_device: meta
loss_fn: torch_crossentropy
max_seq_len: 2048
n_heads: 12
n_layers: 12
name: mpt_causal_lm
vocab_size: 50368
n_active_params: 125311488
n_gpus: 1
n_params: 125311488
n_trainable_params: 125311488
node_name: unknown because NODENAME environment variable not set
num_cpus_per_node: 1
num_nodes: 1
optimizer:
betas:
- 0.9
- 0.95
eps: 1.0e-08
lr: 0.0006
name: decoupled_adamw
weight_decay: 0.0
precision: fp32
progress_bar: false
rank_zero_seed: 17
run_name: null
save_folder: mpt-125m
scheduler:
alpha_f: 0.1
name: cosine_with_warmup
t_warmup: 100ba
seed: 17
time/remaining_estimate_unit: hours
tokenizer:
kwargs:
model_max_length: 2048
name: EleutherAI/gpt-neox-20b
tp_config: null
train_loader:
dataset:
local: my-copy-c4
max_seq_len: 2048
remote: null
shuffle: true
shuffle_seed: 17
split: train_small
drop_last: true
name: text
num_workers: 8
variables:
data_local: my-copy-c4
data_remote: null
global_seed: 17
max_seq_len: 2048
run_name: null
******************************
2025-02-02 11:16:12,497: rank0[2908][MainThread]: DEBUG: composer.trainer.trainer: Spinning the dataloaders
[rank0]: Traceback (most recent call last):
[rank0]: File "/Users/devworks/github.com/uv-llm/scripts/train/train.py", line 9, in <module>
[rank0]: train_from_yaml(yaml_path, args_list)
[rank0]: File "/Users/devworks/github.com/uv-llm/llmfoundry/command_utils/train.py", line 662, in train_from_yaml
[rank0]: return train(yaml_cfg)
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: File "/Users/devworks/github.com/uv-llm/llmfoundry/command_utils/train.py", line 643, in train
[rank0]: trainer.fit()
[rank0]: File "/Users/devworks/github.com/uv-llm/.venv/lib/python3.12/site-packages/composer/trainer/trainer.py", line 2297, in fit
[rank0]: self._train_loop()
[rank0]: File "/Users/devworks/github.com/uv-llm/.venv/lib/python3.12/site-packages/composer/trainer/trainer.py", line 2447, in _train_loop
[rank0]: self._spin_dataloaders_to_cur_epoch()
[rank0]: File "/Users/devworks/github.com/uv-llm/.venv/lib/python3.12/site-packages/composer/trainer/trainer.py", line 2381, in _spin_dataloaders_to_cur_epoch
[rank0]: for _ in dataloader:
[rank0]: ^^^^^^^^^^
[rank0]: File "/Users/devworks/github.com/uv-llm/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 479, in __iter__
[rank0]: self._iterator = self._get_iterator()
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/Users/devworks/github.com/uv-llm/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 415, in _get_iterator
[rank0]: return _MultiProcessingDataLoaderIter(self)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/Users/devworks/github.com/uv-llm/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1138, in __init__
[rank0]: w.start()
[rank0]: File "/Users/devworks/.local/share/mise/installs/python/3.12.5/lib/python3.12/multiprocessing/process.py", line 121, in start
[rank0]: self._popen = self._Popen(self)
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "/Users/devworks/.local/share/mise/installs/python/3.12.5/lib/python3.12/multiprocessing/context.py", line 224, in _Popen
[rank0]: return _default_context.get_context().Process._Popen(process_obj)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/Users/devworks/.local/share/mise/installs/python/3.12.5/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
[rank0]: return Popen(process_obj)
[rank0]: ^^^^^^^^^^^^^^^^^^
[rank0]: File "/Users/devworks/.local/share/mise/installs/python/3.12.5/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
[rank0]: super().__init__(process_obj)
[rank0]: File "/Users/devworks/.local/share/mise/installs/python/3.12.5/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
[rank0]: self._launch(process_obj)
[rank0]: File "/Users/devworks/.local/share/mise/installs/python/3.12.5/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 47, in _launch
[rank0]: reduction.dump(process_obj, fp)
[rank0]: File "/Users/devworks/.local/share/mise/installs/python/3.12.5/lib/python3.12/multiprocessing/reduction.py", line 60, in dump
[rank0]: ForkingPickler(file, protocol).dump(obj)
[rank0]: AttributeError: Can't get local object 'get_tokens_per_batch_func.<locals>.get_num_tokens_in_batch'
2025-02-02 11:16:12,512: rank0[2908][MainThread]: DEBUG: composer.core.engine: Closing the engine.
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Closing callback ConsoleLogger
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Closing callback SpeedMonitor
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Closing callback LRMonitor
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Closing callback MemoryMonitor
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Closing callback RuntimeEstimator
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Closing callback CheckpointSaver
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Post-closing callback ConsoleLogger
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Post-closing callback SpeedMonitor
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Post-closing callback LRMonitor
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Post-closing callback MemoryMonitor
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Post-closing callback RuntimeEstimator
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Post-closing callback CheckpointSaver
2025-02-02 11:16:12,513: rank0[2908][MainThread]: DEBUG: composer.core.engine: Engine closed.
ERROR:composer.cli.launcher:Rank 0 crashed with exit code 1.
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.
Global rank 0 (PID 2908) exited with code 1
ERROR:composer.cli.launcher:Global rank 0 (PID 2908) exited with code 1
Expected behavior
Step of the quickstart to be successful
Additional context
the line added for composer trainning model.attn_config.attn_impl=torch model.loss_fn=torch_crossentropy precision=fp32 si to allow it to run on m1 cpu
Not sure how to go about this error as the part above seems OK?
[rank0]: ForkingPickler(file, protocol).dump(obj)
[rank0]: AttributeError: Can't get local object 'get_tokens_per_batch_func.<locals>.get_num_tokens_in_batch'
ERROR:composer.cli.launcher:Rank 0 crashed with exit code 1.
The text was updated successfully, but these errors were encountered:
Environment
To reproduce
Steps to reproduce the behavior:
Expected behavior
Step of the quickstart to be successful
Additional context
the line added for composer trainning
model.attn_config.attn_impl=torch model.loss_fn=torch_crossentropy precision=fp32
si to allow it to run on m1 cpuNot sure how to go about this error as the part above seems OK?
The text was updated successfully, but these errors were encountered: