Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add important details about external memory to docs #11113

Merged
merged 3 commits into from
Dec 19, 2024

Conversation

david-cortes
Copy link
Contributor

Not 100% sure that I'm getting all the details right, so @trivialfis please take a careful look.

Comment on lines 28 to 29
(unless specified by the ``extmem_single_page``) . Instead, it caches all batches in the
external memory and fetch them on-demand. Go to the end of the document to see a
comparison between :py:class:`~xgboost.QuantileDMatrix` and the external memory version of
:py:class:`~xgboost.ExtMemQuantileDMatrix`.
external memory to disk (in a compressed format) and fetches them on-demand. Go to the
Copy link
Member

@trivialfis trivialfis Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use external memory to refer to disk. For the CPU, the "main memory" is the memory we usually talk about, and external memory can refer to anything else from local disk to network storage. This is not an XGBoost convention; it's used elsewhere as well. Computation using "main memory" data is sometimes called "in-core" computing.

Only a disk is supported as an external memory for XGBoost. As a result, from the perspective of a CPU-based algorithm, "external memory" and "disk" are the same thing.

For GPU, the main memory is the device memory, whereas CPU memory and disk are "external" to GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm not very familiar with libraries offering larger-than-memory compute, but this is the first time I'm hearing of from-disk-caches as "external memory". I do see there's a whole wikipedia article about it though. "out-of-core" I guess is quite common and typically understood to mean from-disk though. Perhaps it could have that explanation in the .rst file too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a brief intro with a few sentences. But for this PR, could you please help revise or revert the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undid the changes to the .rst file.

python-package/xgboost/core.py Outdated Show resolved Hide resolved
@trivialfis trivialfis merged commit 7b818e1 into dmlc:master Dec 19, 2024
57 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants