You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This is a new feature request. We'd like to have Data Prepper have web site crawling capabilities that can crawl web sites and facilitate the ingestion of web pages into OpenSearch.
Describe the solution you'd like
Introduce a "webcrawler source" that would provide ability to crawl a public website on a periodic basis (on-demand or a schedule), respecting the configuration of the website and rate-limiting of the requests, filtering (including/excluding pages), etc... On pages that were acquired, the ability to store content in OpenSearch for search and discovery.
Describe alternatives you've considered (Optional)
Use of a Selenium web crawler and a Chromium driver, then filtering, enriching the content, and storing it in OpenSearch.
Additional context
N/A
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
This is a new feature request. We'd like to have Data Prepper have web site crawling capabilities that can crawl web sites and facilitate the ingestion of web pages into OpenSearch.
Describe the solution you'd like
Introduce a "webcrawler source" that would provide ability to crawl a public website on a periodic basis (on-demand or a schedule), respecting the configuration of the website and rate-limiting of the requests, filtering (including/excluding pages), etc... On pages that were acquired, the ability to store content in OpenSearch for search and discovery.
Describe alternatives you've considered (Optional)
Use of a Selenium web crawler and a Chromium driver, then filtering, enriching the content, and storing it in OpenSearch.
Additional context
N/A
The text was updated successfully, but these errors were encountered: