Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Public collections #2271

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

feat: Public collections #2271

wants to merge 12 commits into from

Conversation

SuaYoo
Copy link
Member

@SuaYoo SuaYoo commented Jan 6, 2025

Resolves #1051

Notes on the PR itself:

@SuaYoo SuaYoo force-pushed the public-collections-feature branch 2 times, most recently from 84a69c7 to ddf6fd4 Compare January 7, 2025 20:02
@SuaYoo SuaYoo marked this pull request as ready for review January 7, 2025 21:51
Copy link

socket-security bot commented Jan 8, 2025

New and removed dependencies detected. Learn more about Socket for GitHub ↗︎

Package New capabilities Transitives Size Publisher
npm/@shoelace-style/[email protected] None 0 8.39 MB claviska

🚮 Removed packages: npm/@shoelace-style/[email protected]

View full report↗︎

page_size: int = DEFAULT_PAGE_SIZE,
page: int = 1,
) -> Tuple[List[PageUrlCount], int]:
"""List all URLs in collection sorted desc by snapshot count"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we mean list all pages in collection, right?

Copy link
Member

@tw4l tw4l Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually URLs, not pages. The "pages" are stored in our database are actually snapshots (URL + timestamp combo specific to a crawl), so we may have multiple pages/snapshots for the same URL in a collection

raise HTTPException(status_code=400, detail="upload_failed")
raise HTTPException(
status_code=400,
detail="Upload failed: maxiumum thumbnail size (2 MB) exceeded",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be better to have this be an id, so frontend can localize

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fair. I think we're already validating this in the frontend before submission but can write it so it's useful for API users as well.

tw4l and others added 12 commits January 8, 2025 16:05
…2164)

Fixes #2158 

- Adds `Organization.listPublicCollections` field and API endpoint to
update it
- Replaces `Collection.isPublic` boolean with `Collection.access`
(values: `private`, `unlisted`, `public`) and add database migration
- Update frontend to use `Collection.access` instead of `isPublic`,
otherwise not changing current behavior

---------

Co-authored-by: sua yoo <[email protected]>
Fixes #1051 

If org with provided slug doesn't exist or no public collections exist
for that org, return same 404 response with a detail of
"public_profile_not_found" to prevent people from using public endpoint
to determine whether an org exists.

Endpoint is `GET /api/public-collections/<org-slug>` (no auth needed) to
avoid collisions with existing org and collection endpoints.
- Enables creating a public org profile page with description and
website at `/profile/<org slug>`
- Updates current "Overview" page to be "Dashboard", found under
`/dashboard`
- Organizes org "General" settings tab by "General", "Profile", and
"Developer Tools"
- Adds sign up banner to log in page for consistent CTA banners
- Updates copy and docs to support changes
- Allows user to set collection to private, public, or unlisted
- Adds route for public collection page with basic page layout
- Refactors copy button to abstract clipboard functionality
---------

Co-authored-by: Henry Wilkinson <[email protected]>
Co-authored-by: emma <[email protected]>
Fixes #2182 

This rather large PR adds the rest of what should be needed for public
collections work in the frontend.

New API endpoints include:

- Public collections endpoints: GET, streaming download
- Paginated list of URLs in collection with snapshot (page) info for
each
- Collection endpoint to set home URL
- Collection endpoint to upload thumbnail as stream
- DELETE endpoint to remove collection thumbnail

Changes to existing API endpoints include:

- Paginating public collection list results
- Several `pages` endpoints that previously only supported `/crawls/` in
their path, e.g. `/orgs/{oid}/crawls/all/pages/reAdd`, now support
`/uploads/` and `/all-crawls/` namespaces as well. This is necessitated
by adding pages for uploads to the database (see below). For
`/orgs/{oid}/namespace/all/pages/reAdd`, `crawls` or `uploads` will
serve as a filter to only affect crawls of that given type. Other
endpoints are more liberal at this point, and will perform the same
action regardless of the namespace used in the route (we'll likely want
to change this in a follow-up to be more consistent).
- `/orgs/{oid}/namespace/all/pages/reAdd` now kicks off a background job
rather than doing all of the computation in an asyncio task in the
backend container. The background job additionally updates collection
date ranges, page/size counts, and tags for each collection in the org
after pages have been (re)added.

Other big changes:

- New uploads will now have their pages read into the database!
Collection page counts now also include uploads
- A migration was added to start a background job for each org that will
add the pages for previously-uploaded WACZ files to the database and
update collections accordingly
- Adds a new `ImageFile` subclass of `BaseFile` for thumbnails that we
can use for other user-uploaded image files moving forward, with
separate output models for authenticated and public endpoints
- Allows user to choose collection replay home page and collection
thumbnail (resolves
#2182)
- Displays collection thumbnails on org dashboard and public page
- Enables downloading public collection (resolves
#2233)
- Adds caption as "Summary" to metadata dialog
- Moves description editor to "About" tab

---------

Co-authored-by: Emma Segal-Grossman <[email protected]>
- Renames `inject_analytics` to `inject_extra` and updates docs
- Manually tracks page views to enable passing custom props
- Tracks copying collection share link and downloading a public
collection

---------

Co-authored-by: emma <[email protected]>
- Refactors all page headers to use new `pageHeader`
- Removes border under org name/title in the org dashboard
- Removes share link from the dialogue footer
- Removes stickied collection navigation, replaces with improved
viewport-based scaling!
- Adds a max-width for the collection description in the logged in view.
- Moves the markdown editor buttons to below the editor
- Controls are now In-line with how we handle dialogue options
elsewhere, fixes a minor responsive design issue.
- Minor copy changes

---------

Co-authored-by: emma <[email protected]>
Co-authored-by: sua yoo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Public org collections page
4 participants