-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version support and automatic CI build #2214
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Putnam, Kevin <[email protected]>
Signed-off-by: Putnam, Kevin <[email protected]>
Signed-off-by: Putnam, Kevin <[email protected]>
Signed-off-by: Putnam, Kevin <[email protected]>
@intelkevinputnam I'm seeing several misrendered things in the docs from your branch. |
make html | ||
|
||
source ./set_version.sh | ||
export SPHINXPROJ=scikit-learn-intelex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have daal4py
with docs under doc/daal4py
.
import subprocess | ||
import os | ||
|
||
def get_version(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@icfaust Could you please look into whether this captures the version number correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would prefer if we got rid of this file and set_version.sh entirely and replaced it with
DOC_VERSION=$(python -m pip freeze | grep -oP "(?<=scikit-learn-intelex==)\d*\.\d*")
doc/sources/conf.py
Outdated
# -- Project information ----------------------------------------------------- | ||
|
||
project = "Intel(R) Extension for Scikit-learn*" | ||
copyright = "Intel" | ||
author = "Intel" | ||
|
||
# The short X.Y version | ||
version = "2025.0.0" | ||
version = os.environ.get('DOC_VERSION','') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing the version to "latest" here when it doesn't find the env. variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there isn't a version, the build should fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean: would be ideal to have a version "latest" building the docs directly from the master branch, for unreleased versions. So in that case, the release versions would have the env. variable whereas the master branch wouldn't.
@@ -0,0 +1,19 @@ | |||
[ | |||
{ | |||
"name": "2025.0 (latest)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this file auto-generated from the releases that we have as github tags instead? It's BTW missing some version numbers, such as 2024.7.0:
https://github.com/uxlfoundation/scikit-learn-intelex/releases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file will need to be manually curated, since it is the list of versions that populates the switcher for all published versions. You'll need to remove old versions and stipulate which one is latest. The build script shouldn't know which is latest, since you could possible publish an older version (that isn't latest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of releases can be obtained programmatically. This repository keeps branches with "rls" in the name for all released versions, which can be obtained with a command like this from the root of the repository:
git branch -r | grep rls
And since it follows date-based versioning, the most up-to-date one will always be the last one in alphabetical order, as returned by that command. So it should be possible to auto-generate this file with all the releases that exist, and keep it auto-updated to point to the last when a new branch comes in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the issue would be that some versions are missing docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to agree with @david-cortes-intel on this one, we could technically have the (latest)
tag be updated on a github create
trigger when it has rls
in it.
git config --global user.email "${GITHUB_ACTOR}@github.com" | ||
git add . | ||
git commit -m "latest HTML output" | ||
git push -f https://${GITHUB_ACTOR}:${{secrets.GITHUB_TOKEN}}@github.com/${GITHUB_REPOSITORY}.git HEAD:gh-pages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: what happens if something goes wrong during the build? For example, we might run into errors such as failing to import the modules from which autodoc generates documentation, which would manifest as a "warning" being issued but some docs still being built.
Could there be a check for build warnings that would make this script fail (avoiding pushing anything to GH) if something goes wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any errors in the github workflow will stop the docs from publishing. We could turn on "error on warning" in the sphinx build, if we wanted to gate against any issue in the Sphinx build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that'd be ideal, so that we don't accidentally end up with empty docs. But would be better if it could ignore some warnings selectively - currently there are some warnings when building daal4py
which I'm not sure how to solve and which do not impact the resulting docs that get built.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example benign warnings when building daal4py
:
/home/dcortes/repos/scikit-learn-intelex/doc/daal4py/docstring of daal4py._daal4py.classifier_prediction_result:1: WARNING: duplicate object description of daal4py.classifier_prediction_result, other instance in algorithms, use :noindex: for one of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#example-1-passing-the-github_token-as-an-input let me know if i am missing something, sentence just before this link: "Commits pushed by a GitHub Actions workflow that uses the GITHUB_TOKEN do not trigger a GitHub Pages build."
@@ -0,0 +1,19 @@ | |||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "stable" version tag in addition to latest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this file needs to be manually curated, we can add any tags we want to it.
Perhaps this PR could be a good opportunity to address one topic around the version numbers: in open source projects, version "latest" or "dev" is typically meant for docs from "latest commit of the master branch", while the "last released version" is typically "stable". This is the case for the docs of the scikit-learn package which this repository builds upon, and for most other machine-learning-related libraries such as gradient boosting frameworks and similar. Could we make the switch here so that the freshest release would be "stable" and have "latest" as something built periodically from the main branch? |
Since the build script is dependent on the latest released version of the scikit-learn-intelex package, the changes must not have propagated to it as of the first build. The changes you mention appear to have propagated to the released package now, so the you'll see what you expect if you check again. |
@@ -25,6 +25,8 @@ | |||
|
|||
# -- Path setup -------------------------------------------------------------- | |||
|
|||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this import is not used.
Verified that we can indeed push changes to this PR (CC @yuejiaointel ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my initial review, we probably need someone to review the js, since I have no idea on that.
- name: install dependencies | ||
run: | | ||
sudo apt update | ||
sudo apt install -y pandoc git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sudo apt install -y pandoc git | |
sudo apt install -y pandoc | |
How come git is necessary?
cd doc | ||
source set_version.sh | ||
echo Generating version $DOC_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd doc | |
source set_version.sh | |
echo Generating version $DOC_VERSION | |
DOC_VERSION=$(python -m pip freeze | grep -oP "(?<=scikit-learn-intelex==)\d*\.\d*") | |
echo Generating version $DOC_VERSION | |
echo "DOC_VERSION=${DOC_VERSION}" >> "$GITHUB_ENV" |
Add it to the github environment to save some work/lines going forward.
cd doc | ||
source set_version.sh | ||
cd .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd doc | |
source set_version.sh | |
cd .. |
With the previous suggestion these 3 lines are unnecessary as DOC_VERSION
will be propagated in the environment between steps.
cd doc | ||
chmod +x build-doc.sh | ||
./build-doc.sh | ||
mv _build/scikit-learn-intelex ../../output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd doc | |
chmod +x build-doc.sh | |
./build-doc.sh | |
mv _build/scikit-learn-intelex ../../output | |
doc/build-doc.sh | |
mv doc/_build/scikit-learn-intelex ../output |
Sorry since I am not so knowledgeable on the docs side. If build-doc.sh doesn't have the right permissions, we can set those in the repo to what is necessary (does it hurt to store build-doc.sh as 755 instead)?
mv -f ../output/$DOC_VERSION . | ||
mv ../output/versions.json . | ||
mv ../output/index.html . | ||
rm -rf doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this rm
wipe the git checkout
of the build-doc.sh?
sudo apt update | ||
sudo apt install -y pandoc git | ||
pip3 install -r requirements-doc.txt | ||
pip3 install scikit-learn-intelex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had forgotten to comment here but: we'd probably want to change this logic towards either building from source, or downloading artifacts from github - otherwise this wouldn't work out when pushing to release branches (would build docs for the wrong version), nor when trying to build docs for unreleased versions.
Description
replaces #1785
Version of docs built using CI: https://intelkevinputnam.github.io/scikit-learn-intelex/
There are two major changes in this PR:
Version Switching Support
All published versions must be tracked in doc/versions.json as this will allow all subsequent versions of the docs to automatically be able to switch to the latest without needing to rebuild the docs.
The version is supplied by the installed version of scikit-learn-intelex (latest by default in the CI script). This way building the docs is not dependent on the oneDAL environment with the goal of making it easier to contribute to and edit the docs. It should also make builds faster.
Since the move to the UXL Foundation, version switching is broken in the latest version of the docs because of there is a hard coded path to the versions.json that no longer exists (it was intel.github.com/scikit-learn-intelex). It also is now settable in the conf.py (html_context)
CI Deployment to GitHub Pages via GitHub Action
GitHub pages will be updated automatically. A new directory will be added for each version of scikit-learn-intelex. When the same version is built twice, it is overwritten. See
.github/workflows/publish.yml
for the script. The branch will need to be changed tomain
before it will work on the main branch. I'm not sure it is necessary to have the docs built on every merge intomain
. Addingworkflow_dispatch:
to the list of conditions for running the action can provide a handy "deploy whenever you feel like it" option.To build the docs locally, run
./build-docs.sh
. Docs can be previewed locally by starting a web server (python3 -m http.server
) in the_build
directory and usinglocalhost:8000/scikit-learn-intelex
as the URL.Checklist to comply with before moving PR from draft:
PR completeness and readability