Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP, but please review]: per-doc-access-control #4139

Closed
wants to merge 32 commits into from
Closed

Conversation

janl
Copy link
Member

@janl janl commented Aug 6, 2022

This is a 2022 rebase of #3038 with many comments on that PR addressed.

This is still a WIP

Overview

This PR introduces the long-awaited first iteration of the per-doc-access-control feature (_access) for short.

The goal of this feature is to make the db-per-user pattern obsolete and allow mutually untrusting users access to the same database.

To recap, the downsides of the db-per-user pattern are:

  • additional management is needed to create, secure and remove database(s) corresponding to a user’s lifecycle. the peruser add-on helps with this, but it is not always sufficient.
  • in order to support the very common requirement of running a query across all per-user databases, all those databases need to be replicating into a central database (or other target). This adds significant CPU and IOPS resource usage to a CouchDB.
  • in large installations, most per-user databases will be very small (<100kb) leading to an unfavourable file-descriptor to data-accessed ratio.
  • additional tuning is needed to make compaction behave safely in this scenario.
  • there’s probably more…

An _access-enabled database has these properties.

  • each user can only read and write their own documents
  • each user gets _changes and _all_docs responses scoped to only their own documents
  • admins can still access all documents, and create views across all documents for analysis

In future iterations, the _access feature could allow documents to be owned by multiple users, and groups, plus differentiation of readonly and rewrite permissions for each, and per-user-views, but that’s TBD for the scope of this initial PR, however provisions have been made to make this possible.

Implementation Notes

The fundamental addition to the CouchDB API is threefold.

First, a database can be access-enabled at creation time. It is not possible to make an existing database access-capable. You can create a an access-enabled database with PUT /db?access=true (final option name to be bikeshet).

Second, the introduction of a new top-level document property _access: ["username"]. The current implementation requires non-admins to write docs that have this property. Docs without it are rejected (unless an admin writes them). The username in the first array element MUST match the user_ctx of any document CRUD request.

Third, for access-enabled databases, both _changes and _all_docs now include a switch:

  1. admin users go straight to the existing by-seq and by-id indexes, just as before.
  2. non-admin users are directed to a new internal view that includes two sections, one corresponding to by-seq and one to by-id, but with the username from the _access property as a prefix. These indexes then are queried with the user property from the request’s user_ctx as hardcoded startkey/endkey.

This new internal view is implemented by way of a new query server that is somewhat modelled after the mango query server (reusing couch_index/couch_mrview as much as possible).

The consequence here is that a non-access-enabled databases should behave no different than before in all aspects API and operational.

Each user also now gets a new internal role _user automatically appended to their list of roles to simplify access control setup. If a databases has this role in its _security object, it means: each authenticated user can access the database.

The replicator has been expanded to create replication checkpoints with an _access property as well. External replication clients like PouchDB will have to be updated accordingly.

I tried to make the PR easy to follow with logical commits to each section of CouchDB + some cleanup at the end. I suggest once satisfactory, this should be squashed into a single commit with an updated version of this PR text as the commit message.

The access feature can be globally disabled by server config.

This feature should work well with partitioned databases (in fact the combination should be a great benefit), but this has not yet been verified.

Implementation State

This PR comes with extensive tests covering all desired behaviour and it works and passes all tests.

There are a few cosmetic things to be discussed (all are already comments to this PR).

There is a performance regression in this PR that still has to be investigated, but it might be addressed by changing the current PR-behaviour in one detail. See this comment for more details about handling conflicted docs.

Future Work

Depending on how hard it is to do this correctly, we may or may not include expanding the _access property. It is designed to hold a list of users and groups that have access to a document. We will have to think through the consequences a bit more, but it might be that the only two things that are needed are:

  1. expand the access query server to index the whole _access property and add a row for the doc for each entry
  2. expand the doc CRUD validation to look at all entries in the whole _access property.

But if this proves to complicated, I’m okay with merging this PR with out multi-user and group support.

We might also want to add readonly and readwrite tags to _access entries for even more fine-grained access control.

This PR does not yet include “users can create views on their share of the database”, but I’d like to add this feature eventually.


There is a corresponding RFC that still needs updating and has comments that have not been addressed yet. My goal is to produce and up-to-date RFC by the time this PR is ready to merge.

Next Steps

  1. I’d like a wieder review of this to see if there are any obvious places that need addressing.
  2. I’d like to nail down the remaining design question about conflicted docs.
  3. I’d like to find the performance regression.

@janl janl self-assigned this Aug 6, 2022
@janl
Copy link
Member Author

janl commented Aug 6, 2022

Original comment

This is great, gratz!

One question: it’s not clear to me how /_security response will look for buckets having access restrictions, is there a special field for those restrictions? I mean I want to explicitly mark buckets of the kind in Photon, so how can I detect _access restricted buckets reading /_security endpoint?

RFC states admins can grant individual users and groups access to a database using the database’s _security object, no details.

#3038 (comment)

case chttpd:qs_value(Req, "rev") of
% fetch the old doc revision, so we can compare access control
% in send_update_doc() later.
Doc0 = couch_doc_open(Db, DocId, nil, [{user_ctx, Req#httpd.user_ctx}]),
Copy link
Member Author

@janl janl Aug 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original Comment:

If this fails (due to access restrictions) how does the 403 bubble back up to the user? I followed chttpd_db:couch_doc_open/4 to fabric:open_doc/3 to fabric_doc_open:go/3 through to couch_db:get_doc_info/2 but I couldn't work out where the access restriction is enforced.

I presume we end up at some point in couch_db:validate_access or check_access which throws a {forbidden, "something"} but I couldn't see how this translates into a 403.

#3038 (comment)

src/couch/src/couch_db.erl Outdated Show resolved Hide resolved
% to proceed normally, only if they disagree should this become admin-only
case is_admin(Db) of
true -> ok;
_Else2 -> throw({forbidden, <<"document is in conflict">>})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original comment #3038 (comment)

why can't users see conflicted docs in "access" enabled dbs?

Original answer:

I might have an implementation gap here. The intention is: if leaves disagree on _access, an admin must resolve before a user can proceed. But if leaves agree on _access, a user should just normally be able to use this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on further inspection: we should make it so that a user can’t produce a conflicted doc with different _access in the first place. Only an admin could do it, but then they are on the hook for fixing it themselves and we shouldn’t worry. Am I missing something?

Copy link
Member Author

@janl janl Aug 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, I think this was meant as a safeguard for a cluster in split-brain, where two users create the same doc _id with an _access field set to them. On cluster reconciliation, the doc will be in conflict and have one of the users as the winning rev. This could mean that someone write a doc into the db and it becomes unaccessible later if retrieved only with the _id. We could make it that a retrieval with _id/_rev could be allowed for the other user, but it wouldn’t help with _changes and _all_docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to define what a conflicted _access document means. There are a few options;

  1. the winning revision is the only one we considered. (side-effect is you lose access if a winner appears via replication / split-brain, as you noted)
  2. we block all non-admin access until conflicts are resolved.
  3. access is the combination of all the conflicts.

There's good and bad in each of those. Of the three listed, 1 seems least surprising / most reasonable.

Roles = couch_util:get_value(<<"roles">>, UserProps, []),
case lists:member(<<"_admin">>, Roles) of
true -> Roles;
_ -> [<<"_users">> | Roles]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnewson I switched to this from Roles ++ <<"_users">> by your request, but this prepends the new role to the list. I’d rather append it as not to break any existing scripts (I had to update a few tests to match for this change already). Would you be okay going back to ++ or is there a better option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi. going back to ++ is fine if we're confident the list stays quite short, which seems highly likely.

@janl janl mentioned this pull request Aug 6, 2022
@janl janl force-pushed the feat/access-2022 branch from 6c256d5 to c4d4373 Compare August 20, 2022 09:23
@janl janl changed the title WIP: feat/access 2022 rebase WIP: per-access-control Aug 20, 2022
@janl janl changed the title WIP: per-access-control [WIP, but please review]: per-doc-access-control Aug 20, 2022
@janl
Copy link
Member Author

janl commented Nov 11, 2022

From the PR description:

Each user also now gets a new internal role _user automatically appended to their list of roles to simplify access control setup. If a databases has this role in its _security object, it means: each authenticated user can access the database.

This needs rethinking/clarifying: We still would want to allow two databases to have a non-overlapping set of users (as per _users) to access one but not the other database

@janl
Copy link
Member Author

janl commented Jul 12, 2023

superseded by #4673 — will port remaining relevant comments over

@janl janl closed this Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants