Skip to content

Commit

Permalink
docs: restructure documentation (#243)
Browse files Browse the repository at this point in the history
Co-authored-by: Pierre Slamich <[email protected]>
  • Loading branch information
alexgarel and teolemon authored Sep 27, 2024
1 parent 51a2450 commit 952a839
Show file tree
Hide file tree
Showing 28 changed files with 494 additions and 265 deletions.
8 changes: 8 additions & 0 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,16 @@ COMPOSE_PROJECT_NAME=search
# unify separator with windows style
COMPOSE_PATH_SEPARATOR=;
# dev is default target
# in prod, you should use docker-compose.yml;docker/prod.yml;docker/monitor.yml
COMPOSE_FILE=docker-compose.yml;docker/dev.yml

# Version of Elastic products
STACK_VERSION=8.3.3

# Set TAG to sha-<commit sha> of the version you want to use
# if you want to use a docker image from our repository
# TAG=sha-

# Set the cluster name
CLUSTER_NAME=docker-cluster

Expand All @@ -32,12 +37,14 @@ ES_EXPOSE=127.0.0.1:9200
NGINX_BASIC_AUTH_USER_PASSWD=

# by default on dev desktop, no restart
# set to always for production
RESTART_POLICY=no

# Increase or decrease based on the available host memory (in bytes)
# 1GB works well, 2GB and above leads to lower latency
MEM_LIMIT=4294967296

# This is the name of a network possibly shared with other containers
# on dev connect to the same network as off-server
COMMON_NET_NAME=po_default

Expand All @@ -51,4 +58,5 @@ LOG_LEVEL=DEBUG
# This envvar is **required**
CONFIG_PATH=


ALLOWED_ORIGINS='http://localhost,http://127.0.0.1,https://*.openfoodfacts.org,https://*.openfoodfacts.net'
22 changes: 7 additions & 15 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,19 @@
# Pull requests that update GitHub Actions code. If you navigate to the folder, you will have a README of what it does
📚 Documentation:
- changed-files:
- any-glob-to-any-file: 'docs/how-to-install.md'
- any-glob-to-any-file: 'docs/sphinx/_static/.empty'
- any-glob-to-any-file: 'docs/sphinx/_templates/.empty'
- any-glob-to-any-file: 'docs/sphinx/api.rst'
- any-glob-to-any-file: 'docs/sphinx/cli.rst'
- any-glob-to-any-file: 'docs/sphinx/conf.py'
- any-glob-to-any-file: 'docs/sphinx/config.rst'
- any-glob-to-any-file: 'docs/sphinx/index.rst'
- any-glob-to-any-file: 'docs/sphinx/misc.rst'
- any-glob-to-any-file: 'docs/sphinx/searching.rst'
- any-glob-to-any-file: 'docs/sphinx/types.rst'
- any-glob-to-any-file: 'docs/users/ref-python.md'

- any-glob-to-any-file:
- 'docs/**/*'
- 'docs/*'
API:
- changed-files:
- any-glob-to-any-file: 'docs/sphinx/api.rst'
- any-glob-to-any-file:
- 'docs/sphinx/api.rst'
- 'app/api.py'

Build Scripts:
- changed-files:
- any-glob-to-any-file: 'scripts/Dockerfile.sphinx'
- any-glob-to-any-file: 'scripts/build_sphinx.sh'
- any-glob-to-any-file: 'scripts/build_*.sh'
- any-glob-to-any-file: 'scripts/generate_doc.sh'
- any-glob-to-any-file: 'scripts/sphinx/Makefile'

Expand Down
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,6 @@ venv.bak/
# mkdocs documentation
/gh_pages

# github pages
gh_pages/

# mypy
.mypy_cache/
.dmypy.json
Expand Down
152 changes: 8 additions & 144 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,159 +1,23 @@
# ![Search-a-licious](./assets/RVB_HORIZONTAL_WHITE_BG_SEARCH-A-LICIOUS-50.png "Search-a-licious logo")

Search-a-licious unlocks the full potential of large data collections by transforming them into easily searchable content. Users can quickly and efficiently find exactly what they need.

**NOTE:** This is a prototype which is being heavily evolved to be more generic, more robust and have much more functionalities.
With powerful text queries, facet exploration, and intuitive visualizations, Search-a-licious empowers your users to dive deep into data effortlessly.

This API is currently in development. Read [Search-a-licious roadmap architecture notes](https://docs.google.com/document/d/1mibE8nACcmen6paSrqT9JQk5VbuvlFUXI1S93yHCK2I/edit) to understand where we are headed.
Developers can rapidly build and deploy new applications based on existing data collections in just hours. The platform offers reusable components that adapt to various contexts, all built on best-in-class open-source tools.

### Organization
Search-a-licious was originally developed to power the [Open Food Facts](https://world.openfoodfacts.org/) project, helping consumers make informed choices for their health and the planet.

There is a [Lit/JS Frontend](frontend/README.md) and a Python (FastAPI) Backend (current README) located on this repository.
Its versatile architecture makes it ideal for a wide range of applications: from exposing large data collections to the public, to building decision support systems and exploratory tools. Search-a-licious is the key to unlocking the value in your data.

### Backend
Ready to use it ? Jump to the [documentation](https://openfoodfacts.github.io/search-a-licious/)

The main file is `api.py`, and the schema is in `models/product.py`.
This is an Open Source project and [contributions are very welcome](https://openfoodfacts.github.io/search-a-licious/#contributing) !

A CLI is available to perform common tasks.

### Running the project on your machine

Note: the Makefile will align the user id with your own uid for a smooth editing experience.

Before running the services, you need to make sure that your [system mmap count is high enough for Elasticsearch to run](https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html). You can do this by running:

```console
sudo sysctl -w vm.max_map_count=262144
```

Then build the services with:

```
make build
```

Start docker:

```console
docker compose up -d
```

> [!NOTE]
> You may encounter a permission error if your user is not part of the `docker` group, in which case you should either [add it](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user) or modify the Makefile to prefix `sudo` to all docker and docker compose commands.
> Update container crash because we are not connected to any Redis
Docker spins up:
- Two elasticsearch nodes
- [Elasticvue](https://elasticvue.com/)
- The search service on port 8000
- Redis on port 6379

You will then need to import from a JSONL dump (see instructions below).

### Development

#### Pre-requisites
##### Installing Docker
- First of all, you need to have Docker installed on your machine. You can download it [here](https://www.docker.com/products/docker-desktop).
- Be sure you can [run docker without sudo](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user)

##### Installing Direnv
For Linux and macOS users, You can follow our tutorial to install [direnv](https://openfoodfacts.github.io/openfoodfacts-server/dev/how-to-use-direnv/).[^winEnvrc]

Get your user id and group id by running `id -u` and `id -g` in your terminal.
Add a `.envrc` file at the root of the project with the following content:
```shell
export USER_GID=<your_user_gid>
export USER_UID=<your_user_uid>

export CONFIG_PATH=data/config/openfoodfacts.yml
export OFF_API_URL=https://world.openfoodfacts.org
export ALLOWED_ORIGINS='http://localhost,http://127.0.0.1,https://*.openfoodfacts.org,https://*.openfoodfacts.net'
```

[^winEnvrc]: For Windows users, the .envrc is only taken into account by the `make` commands.

##### Installing Pre-commit
You can follow the following [tutorial](https://pre-commit.com/#install) to install pre-commit on your machine.

##### Installing mmap
Be sure that your [system mmap count is high enough for Elasticsearch to run](https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html). You can do this by running:
```shell
sudo sysctl -w vm.max_map_count=262144
```
To make the change permanent, you need to add a line `vm.max_map_count=262144` to the `/etc/sysctl.conf` file and run the command `sudo sysctl -p` to apply the changes.
This will ensure that the modified value of `vm.max_map_count` is retained even after a system reboot. Without this step, the value will be reset to its default value after a reboot.

#### Running your local instance using Docker
Now you can run the project with Docker ```docker compose up ```.
After that run the following command on another shell to compile the project: ```make tsc_watch```.
Do this for next installation steps and to run the project.

#### Exploring Elasticsearch data

- Go to http://127.0.0.1:8080/welcome
- Click on "Add Elasticsearch cluster"
- change the cluster name to "docker-cluster"
- Click on "Connect"

#### Importing data into your development environment
- Import Taxonomies: `make import-taxonomies`
- Import products :
```shell
# get some sample data
curl https://world.openfoodfacts.org/data/exports/products.random-modulo-10000.jsonl.gz --output data/products.random-modulo-10000.jsonl.gz
gzip -d data/products.random-modulo-10000.jsonl.gz
# we skip updates because we are not connected to any redis
make import-dataset filepath='products.random-modulo-10000.jsonl' args='--skip-updates'

#### Pages
Now you can go to :
- http://localhost:8000 to have a simple search page without use lit components
or
- http://localhost:8000/static/off.html to access to lit components search page

To look into the data, you may use elasticvue, going to http://127.0.0.1:8080/ and reaching http://127.0.0.1:9200 cluster: `docker-cluster` (unless you changed env variables).

#### Pre-Commit

This repo uses [pre-commit](https://pre-commit.com/) to enforce code styling, etc. To use it:
```console
pre-commit install
```
To run tests without committing:

```console
pre-commit run
```

#### Debugging the backend app
To debug the backend app:
* stop API instance: `docker compose stop api`
* add a pdb.set_trace() at the point you want,
* then launch `docker compose run --rm --use-aliases api uvicorn app.api:app --proxy-headers --host 0.0.0.0 --port 8000 --reload`[^use_aliases]

### Running the full import (45-60 min)
To import data from the [JSONL export](https://world.openfoodfacts.org/data), download the dataset in the `data` folder, then run:

`make import-dataset filepath='products.jsonl.gz'`

If you get errors, try adding more RAM (12GB works well if you have that spare), or slow down the indexing process by setting `num_processes` to 1 in the command above.

Typical import time is 45-60 minutes.

If you want to skip updates (eg. because you don't have a Redis installed),
use `make import-dataset filepath='products.jsonl.gz' args="--skip-updates"`
You should also import taxonomies:
`make import-taxonomies`
### Using sort script
See [How to use scripts](./docs/users/how-to-use-scripts.md)

## Thank you to our sponsors !

This project has received financial support from the NGI Search (New Generation Internet) program, funded by the 🇪🇺 European Commission. Thank you for supporting Open-Souce, Open Data, and the Commons.

<img src="./assets/NGISearch_logo_tag_icon.svg" alt="NGI-search logo" title="NGI-search logo" height="100" />
<img src="./assets/NGISearch_logo_tag_icon.svg" alt="NGI-search logo" title="NGI-search logo" height="100" />
<img src="./assets/europa-flag.jpg" alt="European flag" title="European flag" height="100" />
17 changes: 17 additions & 0 deletions app/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,12 @@ def get_document(

@app.post("/search")
def search(search_parameters: Annotated[SearchParameters, Body()]):
"""This is the main search endpoint.
It uses POST request to ensure privacy.
Under the hood, it calls the :py:func:`app.search.search` function
"""
return app_search.search(search_parameters)


Expand Down Expand Up @@ -138,6 +144,10 @@ def search_get(
charts: GetSearchParamsTypes.charts = None,
index_id: GetSearchParamsTypes.index_id = None,
) -> SearchResponse:
"""This is the main search endpoint when using GET request
Under the hood, it calls the :py:func:`app.search.search` function
"""
# str to lists
langs_list = langs.split(",") if langs else ["en"]
fields_list = fields.split(",") if fields else None
Expand Down Expand Up @@ -183,6 +193,7 @@ def taxonomy_autocomplete(
] = None,
index_id: Annotated[str | None, INDEX_ID_QUERY_PARAM] = None,
):
"""API endpoint for autocompletion using taxonomies"""
check_config_is_defined()
global_config = cast(config.Config, config.CONFIG)
check_index_id_is_defined_or_400(index_id, global_config)
Expand Down Expand Up @@ -216,6 +227,7 @@ def taxonomy_autocomplete(

@app.get("/", response_class=HTMLResponse)
def off_demo():
"""Redirects to the off.html page"""
return RedirectResponse(url="/static/off.html", status_code=status.HTTP_302_FOUND)


Expand All @@ -231,6 +243,7 @@ def html_search(
# Display debug information in the HTML response
display_debug: bool = False,
):
"""A demo page to test the search endpoint directly"""
if not q:
return templates.TemplateResponse("search.html", {"request": request})

Expand Down Expand Up @@ -282,6 +295,10 @@ def robots_txt():

@app.get("/health")
def healthcheck():
"""API endpoint to check the health of the application
It uses :py:mod:`app.health`.
"""
from app.health import health

message, status, _ = health.run()
Expand Down
9 changes: 9 additions & 0 deletions app/health.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
"""This module contains the health check functions for the application.
It is based upon the `py-healthcheck`_ library.
.. _py-healthcheck: https://github.com/klen/py-healthcheck
"""

from healthcheck import HealthCheck

from app.utils import connection, get_logger
Expand All @@ -8,6 +15,7 @@


def test_connect_redis():
"""Test connection to REDIS."""
logger.debug("health: testing redis connection")
client = connection.get_redis_client(socket_connect_timeout=5)
if client.ping():
Expand All @@ -16,6 +24,7 @@ def test_connect_redis():


def test_connect_es():
"""Test connection to ElasticSearch."""
logger.debug("health: testing es connection")
es = connection.get_es_client(timeout=5)
if es.ping():
Expand Down
1 change: 1 addition & 0 deletions docs/.pages
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
nav:
- README.md
- users
- devs
36 changes: 25 additions & 11 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,29 @@ It provides a ready to use component to:
* build powerful in app features thanks to a powerful API

On a technical level, you can use:
* web components to quickly build your UI using any javascript framework, or plain HTML
* [web components](./users/tutorial.md#building-a-search-interface) to quickly build your UI using any JavaScript framework, or plain HTML
* sensible defaults to provide a good search experience
* an easy to setup, one stop, file configuration to describe your content
* a ready to deploy docker compose file including all needed services
* a one command initial data import from a jsonl data export
* continuous update through a stream of events
* an easy to setup, [one stop, file configuration](./users/tutorial.md#create-a-configuration-file) to describe your content
* a [ready to deploy Docker Compose file](./users/how-to-install.md) including all needed services
* a [one command initial data import](./users/tutorial.md#import-the-data) from a JSONL data export
* [continuous update](./users/how-to-update-index.md) through a stream of events

It leverage existing components:
* [Elasticsearch](https://www.elastic.co/elasticsearch) for the search engine[^OpenSearchWanted]
It leverages existing components:
* [Elasticsearch](https://www.elastic.co/elasticsearch) for the search engine
* [Web Components](https://developer.mozilla.org/en-US/docs/Web/API/Web_Components) (built thanks to [Lit framework](https://lit.dev/))
* [Vega](https://vega.github.io/) for the charts
* [Redis] for event stream[^AltRedisWanted]
* [Redis] for event stream

[^OpenSearchWanted]: [Open Search](https://opensearch.org/) is also a desirable target, contribution to verify compatibility and provide it as default would be appreciated.
[Read our tutorial](./users/tutorial.md) to get started !

[^AltRedisWanted]: an alternative to Redis for event stream would also be a desirable target.
## Contributing

This is an Open Source project and contributions are really welcome !

See our [developer introduction to get started](./devs/introduction.md)

Every contribution as bug report, documentation, UX design is also really welcome !
See our [wiki page about Open Food Facts](https://wiki.openfoodfacts.org/Search-a-licious)

## documentation organization

Expand All @@ -47,4 +53,12 @@ Pages title should start with:
* *tutorial on* - tutorials aimed at learning
* *how to…* - how to guides to reach a specific goal
* *explain…* - explanation to understand a topic
* *reference…* - providing detailed information
* *reference…* - providing detailed information


## Thank you to our sponsors !

This project has received financial support from the NGI Search (New Generation Internet) program, funded by the 🇪🇺 European Commission. Thank you for supporting Open-Source, Open Data, and the Commons.

<img src="./assets/NGISearch_logo_tag_icon.svg" alt="NGI-search logo" title="NGI-search logo" height="100" />
<img src="./assets/europa-flag.jpg" alt="European flag" title="The European Union flag" height="100" />
1 change: 1 addition & 0 deletions docs/assets/architecture-diagram.drawio
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<mxfile host="Electron" modified="2023-01-31T11:13:22.819Z" agent="5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/20.8.10 Chrome/106.0.5249.199 Electron/21.3.5 Safari/537.36" etag="L3cBmnxst6NFtYVHYlf1" version="20.8.10" type="device"><diagram name="Page-1" id="r1JxjbiHpN2shwPAHvJh">7Vxbc5s4FP41nuk+bAYQYPyYa7e7aZpOOtP2aQeDMLQYUSHH9v76lUCYi2Qb21zstslMgg5Cls4539G5CI/A7Xz1Ftux/x65MBxpirsagbuRppkWoH8ZYZ0RdEXNCDMcuBmpRHgJ/oOcqHDqInBhUulIEApJEFeJDooi6JAKzcYYLavdPBRWPzW2Z1AgvDh2KFI/By7xM6qljQv6XzCY+fknq+YkuzO38858JYlvu2hZIoH7EbjFCJHsar66hSHjXc6X7LmHLXc3E8MwIk0eWH9cfrm+s9SnZOFPnkwLOfef/tT5bF/tcMFXzGdL1jkLMFpELmSjKCNws/QDAl9i22F3l1TmlOaTeUhbKr30gjC8RSHCtB2hiHa6SQhG32GNKM6eL+gVYgJXJRJfzVuI5pDgNe3C7wLOWK5ZY95cFmIyLE7zSyIycoHYXDVmm5EL7tELzsADmKkaAu+gS7WJNxEmPpqhyA7vC+pNlbtFn0eEYs7Tb5CQNYeGvSCoynHKQLz+wp9PG19Z48rIm3er8s27NW8lxMbkmkGkLCpKewjYstM+W6WUoAV24C5ecKzaeAbJrn5c3xijdgodw9AmwWsVljIRpo/SddnrUocYBRFJSiM/M0KhS5pZVSagG1Uw7elvKEpNfbIZFMq0WcoJ+iVg9c4mNnsqmsGEBCg6Dbt2GMwieh1Cj4rshiExoJbwmpOniBA0r2F8pAHXgJarC0CndyxtCkyzHbTrVYbL4L7ZNMpwV42u4A4Ecdy/wlRJXgiG9rw9Q0pZ6Xme5jgyJrvm1DRaYrJRY7IlMtmU8NjsjMXibvTLWlS9oUUF6llYVJVb0C0WtRsLqW+zkB4kjg9xq5Dsx+6p2n5MGr1iEvzGZM4L4zcm9+qLIWDyg+eFAZVEa47LWQATGAMD0xoWlwUUv1aQKMclXAUke0wzePNrPiS9Lp5ijfyh44E6bgrUtnF6kkTHAnSuI/bpbrapxXQOHsInOpp7vH7CFEFAmA0tT+qNmo4Fp14nLr9k49t0qXj8oCuENUiWwMjNdxsntJMkcKrcroomR4FaggAHUgcgAA1BoDcEwR5Dl9MO29OETQtMqmqg1wO6bN38qR0xu17TJ12pDZQxRhiotfB9PISBrjhAhUqWfaCaHdcqhlzdY8iPV8c89bM/R3ReVjmfd8kMJNDGjs+mGM1Sz2aANEw/GQJQy7rqEqs8kRgDvSubrE6GdntS1jbGS9p6hjigy4d4JIlStoP0BKw1tf1qU+PfE9bEJNsGa9fP7wYBWj/hBVAnV9VoTZrytJS8Vz9JT1VMsWQ+qRMGkO+bP6ErKkpD5o2acmF05o/mA19GyNdxwclsaOHaz8WcBihTABSxVyhC8wAmlP4GxSxBYod/tIMth3KW7TsCuuaB66YKMgjA1JqXrxqGgC9ZNbc7v8LavvN8DlymYwPJo59ym1WThy7au7Es6DM6ksdmCeXg2wzZPuGhNMxzNpwwfyzYqQrKB2Bo7LdMMmfs/9Pjc5p/pLyPceqJ8cHo5LLxsn6nCflw0bl24qej7/T3DpBjvbAtyVL2WtLLcyYlOT5Q9mVp6wtLAYOJeTWuOWnS8gyLqXtkseg0PyzSx+i6GFJyQ3Z5DK9Fn4MXqDXRHU7PwVHSDEYQ24Sy4QL5PDk3qyFWkf5++fDUZhR4po6Rud8v2lSi+nGMBgk6Os+QND1ekWvimYQPmhg+iOI5qiZR1CH2F+aOrc7vV5zjBdo4IGycXu6n3iGcdFNqLnXTegeoDWSYtYG21DsOPjtZ35R1Zfe8lNP693JqQRNrrzO+sf/0OX69QY6/11hcG+Row+GVs4MKZ3I7q+y0s8dbQ63pCQit9QLAlmqsWg2c9ElNexqXdc3J1XhS+qmNW8++dlzl1cRDApu80cvdP+2YjTNNGtXDhsGTRkBMGr0rjpxlCZ8pznM9IZpRhF9etFbPnQJVjBH6PSJqDmGvjzXE3dYlQNPA4rziCrCtzsfPnrmLeUz/vWFR+GNLpYnzK/tZtb1EYs+kwXd375yIeZAY4oThhMLCHRW5PYV9ENUWO/oeRLMLtGq6oVaYL831Tfq0amJIcHoFIgsmko28wmCKbZyW/jqoR3QXdrRdtzC0M4tBwO8YJGtUJC0oBNf2k3bMxqFKN29VCBGGMW4zVDkuPDk0KWPU8sb5qdmtU9tyyrbb17U0iUW9vI2qlmYDoqUCEkvV2Wvoulj+++QHbIuJMfrGviSB7VTzmHq6UZtl112v+9c2nha4Pt7PdVX2dkB3bBc9Zsp27GYLJWzq7bywMRzP2QG3mv3a2Jl9LnF3fBdd4naNSC+RhpbnjXbwtV8rIhaW7ldBQtJgQkExjIq9/GewJ7qq7BWApshPcnQnBFnI0aJu9/TOgNXIbHSl3v+Op9G7p3vzh+n52mdt9vHHzVLyXRqnxHLvH1lKBFMXU5JW5K+gwwgHjj9PD0nXu5zv4TNByhJd2B7EgWoI3+cxEqnUZf7m8VJn4dqcBhaOKNCYRdaRVNYeP+2mJDCkzpg0E/2L6EOfx7ek+iB6ypfOYqHmP7CdFZ3iUxDnLcgCM39jifD3zkFysANykuTqAY3MsW4poKHN4nvnsmC++PI+cP8/</diagram></mxfile>
Loading

0 comments on commit 952a839

Please sign in to comment.