Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - Exclude Documents from Search Results #5

Open
JGtHb opened this issue Jun 9, 2024 · 1 comment
Open

Feature Request - Exclude Documents from Search Results #5

JGtHb opened this issue Jun 9, 2024 · 1 comment

Comments

@JGtHb
Copy link

JGtHb commented Jun 9, 2024

As a user, I sometimes need to crawl pages that are required to map an entire site, but the indexed documents are not relevant for searching and should be excluded from search results.

Below are two initial approach ideas:

Search Parameter Defaults

  1. In the Administration screen, a new button titled 'Default Search Params' could be added. This could link to a new page that allowed users to specify parameters enabled by default for all searches.
  2. Optionally, users could set per-user default search parameters. I work in a single-user environment and not familiar with how multi-user setups work. Admins would need to set the default params for unauthenticated searches, if allowed.

Advantage: Simple implementation, allows users to quickly adjust parameters when searching if they want to temporarily add a page to search results.
Disadvantage: Time consuming to exclude new pages from search results or do ad-hoc page exclusions.

Document-Level Setting

  1. In the Crawl Policies > Main page, a box for 'Exclude from Search Regex' could be added. In this box admins could specify regex that when matched would mark the document as 'Excluded from Search' when crawled.
  2. In the Documents > [Selected Document] > Main page, a field titled 'Excluded from Search' would be displayed with a true/false value and a button to toggle the saved value.
  3. In the Documents page, users could select existing documents and initiate an 'Exclude from Search' action. This would mark all selected documents as excluded from search, and the documents would not be returned when searching. A filter button on the right-hand side of the screen would allow users to quickly see documents included or excluded from search results for all users.
  4. Optionally, authenticated admins could see a button inline with search results (next to 'Cached') called 'Exclude from Search' to quickly remove a document from future searches

Advantage: Allows easy removal of existing documents that have already been crawled, and configuration of future exclusions when setting up a crawl job.
Disadvantage: More time consuming to re-add a page to search results if it was incorrectly excluded. Likely more complex to implement.

@biolds
Copy link
Owner

biolds commented Jun 10, 2024

Thanks for the high quality ticket and suggestions. I like the Document-level approach, it's feature rich and not too hard to implement, I look into it when I find the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants