Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search test #4471

Open
wants to merge 123 commits into
base: development
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
0192e32
Add Elasticsearch config/gem, init rake task
worleydl Sep 10, 2024
a1dc8df
Cleanup logging, add es_reset task
worleydl Sep 10, 2024
2d109ce
Cleanup env config, move ES client to lib module
worleydl Sep 13, 2024
29dfc2c
Export method
worleydl Sep 13, 2024
7ffd45b
es_reindex task
worleydl Sep 13, 2024
3b2feeb
Cleanup id field management for bulk
worleydl Sep 13, 2024
fd7c919
collection/user/work indexing.
worleydl Sep 17, 2024
4fc54b5
Add collection_id/is_public to page schema
worleydl Sep 23, 2024
5a36647
Handle null collections
worleydl Sep 23, 2024
2bd568f
More null collection
worleydl Sep 23, 2024
85608b6
Multithreaded indexing
worleydl Sep 23, 2024
edde6ce
ES search impl for collection level search
worleydl Sep 24, 2024
2ef33ef
Cleanup query
worleydl Sep 24, 2024
53eae57
ES pagination
worleydl Sep 24, 2024
94aeaa8
Cleanup manual offset code
worleydl Sep 24, 2024
c675f38
Map text operators to symbol versions for SQS
worleydl Sep 24, 2024
6ebaf88
Fix phrase queries
worleydl Sep 24, 2024
b6346d3
Cleanup SQS replacements
worleydl Sep 24, 2024
2c10d61
Work level query
worleydl Nov 12, 2024
133b034
Populate page::work_id for elastic
worleydl Nov 12, 2024
7aa7693
Add docset to schema/indexer
worleydl Nov 15, 2024
85352a6
ES backing for findaproject
worleydl Nov 15, 2024
7a44c5f
Trying out partial setup
worleydl Dec 9, 2024
529254c
Tap into tabs style
worleydl Dec 9, 2024
c310c25
Building out logic for new serp
worleydl Dec 9, 2024
aaafddb
Building out object inflation
worleydl Dec 9, 2024
8be5b5a
Fix index filters for cross collection search
worleydl Dec 9, 2024
b5bda81
Early search result rendering
worleydl Dec 9, 2024
25e754c
Fix issue with inflation
worleydl Dec 9, 2024
be3404c
Dynamic tabs/page count
worleydl Dec 9, 2024
d68631a
General weighting for federated searches
worleydl Dec 9, 2024
1b7173b
html safe render
worleydl Dec 9, 2024
89b8159
Pagination test
worleydl Dec 9, 2024
218ed15
Hookup pagination
worleydl Dec 9, 2024
4c823d7
Link dev
worleydl Dec 10, 2024
4c965bd
Building out filters
worleydl Dec 10, 2024
e8251b0
Show active tab for filter
worleydl Dec 10, 2024
f860a14
Fix all filter count
worleydl Dec 10, 2024
9bb57b9
Fix issues with initial state
worleydl Dec 10, 2024
283fc14
Cap pagination at 10k
worleydl Dec 16, 2024
0a9cc66
Sanitize instead of using raw HTML for snippets
worleydl Dec 16, 2024
8f69e0b
Modify user indexing: only owners and no staff users
worleydl Dec 16, 2024
9f58a09
Link up user results
worleydl Dec 16, 2024
94f58b1
Link up collection results
worleydl Dec 16, 2024
a1150bc
Link up work results
worleydl Dec 16, 2024
175b916
Link up page results
worleydl Dec 16, 2024
4c231cd
Setup page partial for search results
worleydl Dec 16, 2024
1128280
Simplify page partial
worleydl Dec 16, 2024
5210fb4
Tap into existing highlighting code
worleydl Dec 16, 2024
7db9b9f
Setup work partial
worleydl Dec 16, 2024
698076d
WIP elastic translations
worleydl Dec 16, 2024
b93b71c
Cleanup escaped collection footer
worleydl Dec 17, 2024
f8617d1
Fix table issue with page partial
worleydl Dec 17, 2024
a29e4c3
Exclude works missing a collection from indexing
worleydl Dec 17, 2024
de2d604
Use to_snippet for user/collection snippets
worleydl Dec 17, 2024
4b21534
Fold docsets into collections for index/search
worleydl Jan 5, 2025
8c9daf1
Hookup docset partial render
worleydl Jan 5, 2025
1be890d
Early collection access
worleydl Jan 5, 2025
1298c89
Expand collection access
worleydl Jan 5, 2025
8dd1e29
Add docsets to collection filter
worleydl Jan 5, 2025
7d73355
Expand works indexing in prep for access queries
worleydl Jan 5, 2025
acae346
Setup filtering for work type
worleydl Jan 5, 2025
5563fa7
Prep page schema for filters
worleydl Jan 5, 2025
e083237
Update page filters
worleydl Jan 6, 2025
9459fcb
Add Elastic delta concern and attach to required models
worleydl Jan 6, 2025
b4edc0b
Cleanup lingering issues with work model
worleydl Jan 6, 2025
879eeb5
Fix delta delete
worleydl Jan 6, 2025
9958f0d
Don't setup hooks if elastic isn't enabled
worleydl Jan 6, 2025
dcf0c18
all of Dan's changes for ElasticSearch
saracarl Jan 7, 2025
f0b4e41
gemfile update
saracarl Jan 7, 2025
ed90c47
Improve layout of tabbed search results
worleydl Jan 7, 2025
88222f7
No results template
worleydl Jan 13, 2025
08be6b5
Exception handling for findaproject query
worleydl Jan 13, 2025
9df1786
Translations cleanup
worleydl Jan 13, 2025
2e8ebaa
Reuse query generation
worleydl Jan 13, 2025
c6944d1
Add blocked collections to filters
worleydl Jan 13, 2025
e421437
Cleanup dead code
worleydl Jan 14, 2025
3d434dc
Consistent partials in main search results
worleydl Jan 14, 2025
755d63d
Merge remote-tracking branch 'dans-work/development' into search_test
saracarl Jan 16, 2025
7e32795
Merge branch 'development' into search_test
saracarl Jan 16, 2025
5dbe79d
Prep for delta permission updates
worleydl Jan 16, 2025
da38a18
Early version of permissions indexer
worleydl Jan 16, 2025
0b4758f
Setup delete by query optimization for bulk page deletes
worleydl Jan 16, 2025
734324c
Finish up permission delta indexer
worleydl Jan 16, 2025
baf1db3
Must reindex works on coll/docset permission changes
worleydl Jan 16, 2025
05583d1
added elasticsearch query logging
saracarl Jan 16, 2025
08e9b2e
Add ES query logging
benwbrum Jan 16, 2025
54ef4ce
ES: Setup whitespace analyzer for searchable metadata on works
worleydl Jan 22, 2025
27038e9
ES: Basic query intent for ID searches in work metadata
worleydl Jan 22, 2025
968c5ca
ES: Add more weight to phrase matches on pages
worleydl Jan 22, 2025
be6c9b5
ES: Apply ES relevance to search_attempt searches
worleydl Jan 22, 2025
7c0f169
ES: Fold Page updates into delta indexer
worleydl Jan 22, 2025
0dbc037
Cleanup work partial title in findaproject results
worleydl Jan 22, 2025
84fa958
Make page results consistent with other types
worleydl Jan 22, 2025
71a0180
Add work breadcrumb on page results
worleydl Jan 22, 2025
9368a01
Prep for org level search
worleydl Jan 22, 2025
5d05e88
Org search
worleydl Jan 23, 2025
149b733
Fix org search type filters
worleydl Jan 23, 2025
45f2ca6
Route nested searches to tabbed results
worleydl Jan 23, 2025
4aa5932
Update search heading for different types
worleydl Jan 23, 2025
2674804
Improved breadcrumbs for Pages/Works
worleydl Jan 27, 2025
97f9388
Schema tweaks for improved identifier search
worleydl Jan 27, 2025
beb8539
Use work style on page partial for consistent margins
worleydl Jan 27, 2025
97a50e6
Handle exceptions for client facing ES requests
worleydl Jan 27, 2025
42075d5
Basic exception handling for sync task
worleydl Jan 27, 2025
35dc0de
Smaller breadcrumbs, moved to footer.
worleydl Feb 3, 2025
353b028
Drop recent activity, sticky search modes
worleydl Feb 3, 2025
03e3a80
Improvements to identifier search
worleydl Feb 3, 2025
01fd96c
Lock search results to viewport width, allow side scroll
worleydl Feb 3, 2025
e58c1c1
Don't add params to searchbox if not set
worleydl Feb 3, 2025
753f42e
Remove double padding from page results, consistent breadcrumbs
worleydl Feb 3, 2025
efa047f
Cleanup phrase query generator
worleydl Feb 4, 2025
f4288ca
WIP alias management
worleydl Feb 3, 2025
a89ff7d
Build out environment support
worleydl Feb 3, 2025
156f43a
Fix index boosts
worleydl Feb 3, 2025
98bc106
Merge dev
worleydl Feb 4, 2025
05ecdf0
Rollover fixes
worleydl Feb 4, 2025
a229e0b
merge dev in
saracarl Feb 5, 2025
2dbd3a8
merging dev and dans code
saracarl Feb 5, 2025
194633a
more dans code
saracarl Feb 6, 2025
d9c11d0
Checkpoint
benwbrum Feb 6, 2025
6903932
Checkpoint on search integration -- work search
benwbrum Feb 6, 2025
40e8624
Document sets and collection search work
benwbrum Feb 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -150,3 +150,6 @@ gem 'open3'
gem 'clipboard-rails'

gem 'ajax-datatables-rails', '~> 1.0.0'

# Elasticsearch client
gem 'elasticsearch', '8.15.0'
9 changes: 9 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,14 @@ GEM
activesupport (>= 4)
edtf (>= 2.3, < 4)
roman (~> 0.2.0)
elastic-transport (8.3.5)
faraday (< 3)
multi_json
elasticsearch (8.15.0)
elastic-transport (~> 8.3)
elasticsearch-api (= 8.15.0)
elasticsearch-api (8.15.0)
multi_json
errbase (0.2.2)
erubi (1.12.0)
event_stream_parser (1.0.0)
Expand Down Expand Up @@ -698,6 +706,7 @@ DEPENDENCIES
easy_translate
edtf
edtf-humanize
elasticsearch (= 8.15.0)
factory_bot_rails
flamegraph
forty_facets
Expand Down
6 changes: 5 additions & 1 deletion app/assets/stylesheets/components/shared.scss
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,10 @@
margin: 0 0 $gapSize / 2 0;
font-size: $fontSizeStrong;
font-family: $fontFamilyHead;
&.smaller {
font-size: $fontSizeSmall;
margin: 10px 0 0 0;
}
li {
margin: 0;
display: inline-block;
Expand Down Expand Up @@ -1440,4 +1444,4 @@ button, .button, .dropdown dd a {
color: $fgHover;
}
}
}
}
3 changes: 3 additions & 0 deletions app/assets/stylesheets/sections/collection.scss
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,9 @@
}
.progress { max-width: 300px; }
}
&.search_result {
max-width: 90vw;
}
}

.hidden {
Expand Down
6 changes: 6 additions & 0 deletions app/assets/stylesheets/sections/work.scss
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@
padding: $gapSize 0;
display: inline-block;
border-bottom: 1px dotted $borderColor;
&.search_result {
padding: 0;
}
&_thumbnail {
float: left;
margin-right: $gapSize;
Expand All @@ -71,6 +74,9 @@
padding-right: 0.5em;
padding-bottom: 0.10em;
padding-left: 0.5em;
&.search_result{
overflow-x: auto;
}
}
&_text p {
margin: 0.3em 0;
Expand Down
62 changes: 61 additions & 1 deletion app/controllers/collection_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ class CollectionController < ApplicationController
include ContributorHelper
include AddWorkHelper
include CollectionHelper
include ElasticSearchable

DEFAULT_WORKS_PER_PAGE = 15

Expand Down Expand Up @@ -139,12 +140,65 @@ def update_buttons

end

def search # ElasticSearch version
search_page = (search_params[:page] || 1).to_i
@term = search_params[:term]

page_size = 10

if @collection.is_a?(Collection)
query_config = {
type: 'collection',
coll_id: @collection.id
}
@collection_filter = @collection
else
query_config = {
type: 'docset',
docset_id: @collection.id
}
@docset_filter = @collection
end

search_data = elastic_search_results(
@term,
search_page,
page_size,
search_params[:filter],
query_config
)

if search_data
inflated_results = search_data[:inflated]
@full_count = search_data[:full_count] # Used by All tab
@type_counts = search_data[:type_counts]

# Used for pagination, currently capped at 10k
#
# TODO: ES requires a scroll/search_after query for result sets larger
# than 10k.
#
# To setup support we just need to add a composite tiebreaker field
# to the schemas
@filtered_count = [ 10000, search_data[:filtered_count] ].min

# Inspired by display controller search
@search_string = "\"#{@term || ""}\""
@search_results = WillPaginate::Collection.create(
search_page,
page_size,
@filtered_count) do |pager|
pager.replace(inflated_results)
end
end
end

def facets
collection = Collection.find(params[:collection_id])
@metadata_coverages = collection.metadata_coverages
end

def search
def facet_search
mc = @collection.metadata_coverages.where(key: params['facet_search']['label']).first
first_year = params['facet_search']['date'].split.first.to_i
last_year = params['facet_search']['date'].split.last.to_i
Expand Down Expand Up @@ -819,4 +873,10 @@ def filtered_works
works_scope = works_scope.distinct.paginate(page: params[:page], per_page: DEFAULT_WORKS_PER_PAGE)
@works = works_scope
end

def search_params
params.permit(:term, :page, :filter, :collection_id, :user_id)
end


end
83 changes: 83 additions & 0 deletions app/controllers/concerns/elastic_searchable.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
module ElasticSearchable
require 'elastic_util'
extend ActiveSupport::Concern

def elastic_search_results(query, page, page_size, filter, query_config)
return nil if query.nil?

search_types = ['collection', 'page', 'user', 'work']
# Narrow down types based on query_config
if query_config.present?
case query_config[:type]
when "org"
search_types = ['collection', 'page', 'work']
when "collection", "docset"
search_types = ['page', 'work']
when "work"
search_types = ['page']
end
end

if filter
count_query = ElasticUtil.gen_query(
current_user,
query,
search_types,
query_config,
page, page_size, true
)

# Need to run a count query for all types
# TODO: Could use msearch for one call to ES
resp = ElasticUtil.safe_search(
index: count_query[:indexes],
body: count_query[:query_body]
)

# No real inflation happens here but we get counts back
inflated_resp = ElasticUtil.inflate_response(resp)

full_count = inflated_resp[:full_count]
type_counts = inflated_resp[:type_counts]

filtered_query = ElasticUtil.gen_query(
current_user,
query,
[filter],
query_config,
page, page_size
)

filtered_resp = ElasticUtil.safe_search(
index: filtered_query[:indexes],
body: filtered_query[:query_body]
)

# Actual object inflation for the filtered set
inflated_resp = ElasticUtil.inflate_response(filtered_resp)

# Blend all/filtered for display
return {
inflated: inflated_resp[:inflated],
full_count: full_count,
filtered_count: inflated_resp[:filtered_count],
type_counts: type_counts
}
else
generated_query = ElasticUtil.gen_query(
current_user,
query,
['collection', 'page', 'user', 'work'],
query_config,
page, page_size
)

resp = ElasticUtil.safe_search(
index: generated_query[:indexes],
body: generated_query[:query_body]
)

return ElasticUtil.inflate_response(resp)
end
end
end
Loading
Loading