Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to UV #2282

Merged
merged 29 commits into from
Nov 5, 2024
Merged

Migrate to UV #2282

merged 29 commits into from
Nov 5, 2024

Conversation

chris48s
Copy link
Member

@chris48s chris48s commented Oct 24, 2024

Refs

The core point of this PR is to migrate us from pip to uv.

However, I have scope-creeped (scope-crept?) this a bit to cover several other related tasks which made sense to address at the same time. So in reality, the things happening in this PR are:

1. UV

Migrate from pip to UV, including related CI and deployment changes.

2. Audit packages

Instead of just blindly re-creating our existing pip setup, I've taken the opportunity to do a bit of an audit of our packages. This includes a bit of fiddling with the optional groups we have but more importantly I have removed a bunch of packages that we no longer use, no longer need or were there to ensure we install a compatible version of a second-order package (which is now better achieved by uv's resolver/lockfile).

As we go through this process on other repos I would propose we do the same. Writing a new manifest seems like a good opportunity to review what we're packages we are pulling in and why.

Most of the action on this is in 6d5d0d1 and 7ad181a

3. Dependabot

I've done a bit of a review of our dependabot config

Because dependabot doesn't support uv.lock, I've written a GitHub action which will run uv lock and push the lockfile on any PRs submitted by dependabot.

I don't really know how this is going to work out yet.

I do know this will come with the following tradeoffs:

  1. @dependabot rebase won't work because every PR will contain additional commits not by dependabot. I'm hoping we can rely on @dependabot recreate
  2. Dependabot will only know about our first order dependencies, but not our transitive dependencies. This means that Dependabot can bump things that appear in our manifest but not things that only appear in our lockfile, so it won't be able to offer us bumps for security issues affecting transitive dependencies.

I don't really know how this is going to work out in practice - are there also other problems I haven't thought of?

I think my proposal for this is: I've intentionally left some packages out of date in this PR. I reckon we merge this, and then I'll try updating them with dependabot. We'll see if what I've done works, what the workflow looks like, where the pain points are, and find out what I haven't considered. Ideally I would quite like it if we could avoid having to migrate to renovate immediately on every repo where we adopt this. Renovate might solve several other problems for us, but being able to keep the scope of each PR to just "migrate from one package manager to another" to start with would be nice.


Final remarks:
Although one of the things we're doing here is doing this work for EE, this also hopefully lays down a template for how we're going to do it elsewhere. So it seems like a good place to bikeshed the details. Lets nail the blueprint here before we do it everywhere.

I've tried to break this down into sensible commits to review so you can see what I'm doing at each stage. There's a little bit of going round the houses on how to install UV where I did one thing and then another. In general, going commit by commit is probably helpful though.

some of these are packages we don't use any more

some of them were pins we put in requirements.txt to force
a package we directly depend on to use a specific version of
a transitive dependency (workaround for no lockfile)

none of them are needed any more
mock has been part of the standard library for many years
we no longer need this compatibility shim
By default the curl installer installs into $HOME/.cargo/bin
this makes it a bit awakward to install as one user and then
call it as another.

You can work around this, but given uv has zero python
dependencies, installing it into the global env with pip won't
conflict with anything and saves some faffing about with paths
and permissions.

Installing it as root into the system env makes it available if
we log in with either the ubuntu or every_election user account
@coveralls
Copy link

coveralls commented Oct 24, 2024

Coverage Status

coverage: 70.678%. remained the same
when pulling 165b0fd on uv2
into 0decd2d on master.

. .venv/bin/activate
pip install -r requirements/cdk.txt
uv sync --extra cdk --no-dev
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, uv sync --extra cdk --no-dev here will install all the main packages as well as the cdk extras.
Nothing bad will happen. We're just installing more stuff than we really need to.

Upstream feature request: astral-sh/uv#8292

Copy link
Member Author

@chris48s chris48s Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: We should now be able to upgrade to UV 0.4.27 which adds the --only-group flag and fix this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so there's a lot going on in c5b78be .

As of UV 0.4.27
https://github.com/astral-sh/uv/releases/tag/0.4.27

UV now has dependency-groups: https://docs.astral.sh/uv/concepts/dependencies/#dependency-groups
This is a distinct feature from optional-dependencies: https://docs.astral.sh/uv/concepts/dependencies/#optional-dependencies

..although there is obviously a lot of overlap between these two concepts.
I think if you're using uv as an application package manager, they basically just reduce to 2 overlapping ways to accomplish almost the same thing.
If you're using uv to build packages, optional-dependencies allow you to define optional groups that are "public" to your user so you can pip install package[optional], whereas dependency-groups are "private".

Switching from optional-dependencies to dependency-groups means we can now use the --only-group flag to just install CDK dependencies, which is good. There's no equivilent of the --all-extras flag though, so if we want everything we have to either do uv sync --group cdk --group production or add everything to our default groups and exclude the ones we don't want when we don't want them. I think what I've done here is probably reasonable.

The dev-dependencies feature that was current when I wrote this PR ~a week ago is now "legacy"
https://docs.astral.sh/uv/concepts/dependencies/#legacy-dev-dependencies

I think this is a pretty good microcosm of what we're getting involved in here. Both in terms of the rough and the smooth (or the ruff and the smooth, if you prefer :badpundog: ).

On the one hand, uv is moving quickly 🚀 A feature we wanted ended up in a release within a week or so and it was already something they were looking at before they triaged the issue.
On the other hand, uv is moving quickly 😱 A feature that was current a week ago is now considered legacy as of a patch release and there are several overlapping ways to do quite similar things.

I suspect this will all shake out by the time they're offering any sort of stability but I reckon there will be some more churn between now and then. I will also note that this whole thing is basically something poetry went through over a longer timeframe before they really got it right.

..anyhooo, in c5b78be I have

  • said we need uv>=0.4.27
  • converted our optional-dependencies and dev-dependencies
  • updated the relevant uv sync commands

@@ -85,6 +79,7 @@ jobs:
name: Submit coverage
command: |
. .venv/bin/activate
uv pip install coveralls
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered moving this to a package in our manifest, but I can't really be bothered. I'm not sure how to meaningfully review a bump, for example.
Lets just stick with installing the latest version in CI and deal with it if it breaks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be collapsed into one line with uv run --with coveralls coverage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well uv run --with coveralls coveralls worked, but first I copied and pasted what you said and then spent ages being confused about why the coverage report wasn't submitting :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INSTALL.md Show resolved Hide resolved
Comment on lines 1 to 10
#!/usr/bin/env bash
set -xeE

UV_CONSTRAINT=">=0.4.0,<0.5.0"

if [ "$CI" = "true" ]; then
pip install uv"$UV_CONSTRAINT"
else
sudo PIP_BREAK_SYSTEM_PACKAGES=1 pip install uv"$UV_CONSTRAINT"
fi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of things to talk about in this file:

  1. Based on my experience with ruff, I think we probably don't want to just always install the latest uv, at least at this stage. Pinning a constraint and accepting we're going to need to bump it is probably a better call while this is still zero-versioned and offers no stability guarantees IMO. >=0.4.0,<0.5.0 seems pragmatic at this stage. Simultaneously, we'll probably want to bump it every so often. That's one of the costs we absorb of being early adopters. I abstracted this into this slightly clumsy script which I am calling in lots of places so that we've only got one place we have to bump the constraint. This is the reason why I'm not using the astral-sh/setup-uv action in .github/workflows/uv-lock.yml.

  2. One of the problems I was trying to solve here is that on the EC2 instances I wanted to install uv in such a way that regardless of whether we aws ec2-instance-connect ssh --os-user every_election or aws ec2-instance-connect ssh --os-user ubuntu, we've got uv installed and on the path. The simplest way to do that was smoosh it into the global python env with pip, rather than curl-bash it. There's more detail on this in the commit message for ea89bad

@chris48s chris48s changed the title WIP Migrate to UV Migrate to UV Oct 24, 2024
@chris48s chris48s requested a review from symroe October 24, 2024 14:30
Copy link
Member

@symroe symroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on commits as I went so missed some refactors that went on later. I've left these comments in case they're useful, but I think you can ignore some due to you having had the same thought as me :)

@@ -23,7 +23,7 @@ jobs:
node-version: '18'

- restore_cache:
key: v3-dependencies-{{ checksum "requirements/base.txt" }}-{{ checksum "requirements/testing.txt" }}-{{ checksum "requirements/cdk.txt" }}
key: v3-dependencies-{{ checksum "pyproject.toml" }}-{{ checksum "uv.lock" }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I know we're pinning exact versions in pyproject.toml now, but wouldn't the lock file be updated in this case too? What's your thinking in using both files for the cache key here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good question. I went back and forth on this in my own head.

My thinking on this was that because dependabot can't bump uv.lock it will try to push commits that change pyproject.toml but don't change the lockfile this might be sensible. Maybe it is irrelevant though. uv lock --locked should ensure we just fail the build early doors on those commits anyway.

I've removed it in 46d6de1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only other consideration here is the python version. At the moment we tend to pump the v[n] prefix when we bump python versions, but including pyproject.toml would mean we bust the cash a lot more often. Not a big deal, I think what you currently have is reasonable 👍

@@ -35,22 +35,17 @@ jobs:
name: Install app dependencies
command: |
sudo apt update && sudo apt install -y gdal-bin python3-gdal python3-dev
pyenv local $(ls ~/.pyenv/versions/)
python -m venv .venv
curl -LsSf https://astral.sh/uv/0.4.23/install.sh | sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can set UV_INSTALL_DIR=$HOME/.local/bin/ before sh (after the pipe) here, and that saves modifying the profile below.

Two more things here:

  1. It would be nice to cache this to save a download each time. This is partly for speed of build, but also to reduce the amount of things that can break and stop our builds (although I know it redirects to GitHub at the moment, so it's not a major problem).

    I think something like this might do it?

    if [ ! -f "$HOME/.local/bin/uv" ]; then
          curl -LsSf https://astral.sh/uv/0.4.23/install.sh | UV_INSTALL_DIR=$HOME/.local/bin/uv sh
    fi

    This would also need save_cache / restore_cache for the binary.

  2. Can we parameterize the version at all? Thinking for projects with more than one step where we might want to have this block a few times. This might not be needed if we can cache the binary and restore it for each job.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is irrelevant given the changes I made in subsequent commits 1e57ca1 and ea89bad.
Do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so too, but I kept it in on the off-chance you'd not need some of the env vars / flags I pointed out 👍

curl -LsSf https://astral.sh/uv/0.4.23/install.sh | sh
echo 'export PATH=$HOME/.cargo/bin:$PATH' >> $BASH_ENV
source $BASH_ENV
uv venv --python 3.12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. This will download and install the python version if it's not found on the system. Neat locally, but not what we want in CI (easy to miss that UV is doing this for us!). Can you add --no-python-downloads or export UV_PYTHON_DOWNLOADS=never?
  2. Do we actually need to specify the python version here, if we have it in pyproject.toml / .python-version? And we're already using the cimg for the same version?

Copy link
Member Author

@chris48s chris48s Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually have opposite objectives here.

My thinking here was: Previously we were doing all sorts of pyenv faffing to make sure we were using the right global python version:

I did at one point work out what we were trying to do there, but I've forgotten again.

My thinking here is: Lets bin all that and invoke uv venv --python 3.12. If uv can use a version of python that is already installed, cool. If it needs to go and download one, also fine. Either way, we're running the right version in our venv. That is one of the core value propositions of UV imo. Tbh we could even consider switching the base image for the build to one that doesn't have python on it at all.

I don't mind trying what you suggested, but also I'd like to understand what advantage that gives us, or what problem what I've suggested causes.

Happy to just talk about this one on a call or whatever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically want to ensure we don't drift between [Circle]CI image version and the Python version in the venv.

At the moment the build isn't going to fail if we use cimg/python3.8 and uv venv --python 3.12, but it will download a python (maybe for every CI run, depending on caching). If someone doesn't know this they might see the image version and assume that's what we're running. When we move to Python3.13 we might just bump the image, etc, etc.

I would like to be in a position where we either only specify the python version in one place (uv) or where we prevent uv from installing a version of Python that's not installed.

The pyenv faffing was due to Pyenv being too picky. If 3.12 is in python-version but you have 3.12.4 installed, it complains. You can't (or couldn't at the time) specify 3.12.* or anything like that. I don't think uv has this "feature" so it's moot here.

.circleci/config.yml Show resolved Hide resolved
@@ -85,6 +79,7 @@ jobs:
name: Submit coverage
command: |
. .venv/bin/activate
uv pip install coveralls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be collapsed into one line with uv run --with coveralls coverage

@@ -60,7 +55,7 @@ jobs:
paths:
- ./.venv
- ./node_modules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding $HOME/.cache/uv here will mean that uv doesn't need to re-download packages between runs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 9d49161

@@ -1,7 +1,7 @@
name: AddUser
description: Adds a system user
schemaVersion: 1.0
component_version: 0.0.4
component_version: 0.0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing.

When I pushed stuff to the dev AWS account to test it, the image builder threw errors saying there was already a different 0.0.4 of cdk_stacks/components/add_user.yml and already a different 0.0.2 of cdk_stacks/components/instance_connect.yml, so that's why I've bumped those two despite there being no other changes in those files.

I don't really know what a good pattern is for managing this stuff, but given I don't think there's really any downside to bumping a version even if nothing changed, keeping them aligned across AWS accounts seems sensible to me.

@chris48s
Copy link
Member Author

Right. I've pushed some commits and replied to some comments.. back to you

@symroe
Copy link
Member

symroe commented Nov 3, 2024

Great, I've replied where I think it might be useful, but basically this is good to go.

I can see we're going to need to stay on top of uv changes quite a bit, especially when this pattern is in more than one project.

Would you mind writing a "how to even UV at DC" document, where you pull out a load of these decisions? I'd like a document we can use when working across DC that explains the state of play and how we're using it. Does that sound useful to you?

@chris48s chris48s force-pushed the uv2 branch 3 times, most recently from 710694c to 165b0fd Compare November 4, 2024 15:54
@chris48s chris48s merged commit b0db01b into master Nov 5, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants