Feature Request - Exclude Documents from Search Results #5

JGtHb · 2024-06-09T20:33:50Z

As a user, I sometimes need to crawl pages that are required to map an entire site, but the indexed documents are not relevant for searching and should be excluded from search results.

Below are two initial approach ideas:

Search Parameter Defaults

In the Administration screen, a new button titled 'Default Search Params' could be added. This could link to a new page that allowed users to specify parameters enabled by default for all searches.
Optionally, users could set per-user default search parameters. I work in a single-user environment and not familiar with how multi-user setups work. Admins would need to set the default params for unauthenticated searches, if allowed.

Advantage: Simple implementation, allows users to quickly adjust parameters when searching if they want to temporarily add a page to search results.
Disadvantage: Time consuming to exclude new pages from search results or do ad-hoc page exclusions.

Document-Level Setting

In the Crawl Policies > Main page, a box for 'Exclude from Search Regex' could be added. In this box admins could specify regex that when matched would mark the document as 'Excluded from Search' when crawled.
In the Documents > [Selected Document] > Main page, a field titled 'Excluded from Search' would be displayed with a true/false value and a button to toggle the saved value.
In the Documents page, users could select existing documents and initiate an 'Exclude from Search' action. This would mark all selected documents as excluded from search, and the documents would not be returned when searching. A filter button on the right-hand side of the screen would allow users to quickly see documents included or excluded from search results for all users.
Optionally, authenticated admins could see a button inline with search results (next to 'Cached') called 'Exclude from Search' to quickly remove a document from future searches

Advantage: Allows easy removal of existing documents that have already been crawled, and configuration of future exclusions when setting up a crawl job.
Disadvantage: More time consuming to re-add a page to search results if it was incorrectly excluded. Likely more complex to implement.

biolds · 2024-06-10T18:39:48Z

Thanks for the high quality ticket and suggestions. I like the Document-level approach, it's feature rich and not too hard to implement, I look into it when I find the time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request - Exclude Documents from Search Results #5

Feature Request - Exclude Documents from Search Results #5

JGtHb commented Jun 9, 2024 •

edited

Loading

biolds commented Jun 10, 2024

Feature Request - Exclude Documents from Search Results #5

Feature Request - Exclude Documents from Search Results #5

Comments

JGtHb commented Jun 9, 2024 • edited Loading

biolds commented Jun 10, 2024

JGtHb commented Jun 9, 2024 •

edited

Loading