Skip to content

First Release of SEAL (2023-05-16)

Compare
Choose a tag to compare
@alexander-schranz alexander-schranz released this 15 May 23:51
be611c1
Schranz Search Logo with a Seal on it with a magnifying glass

Schranz Search - First Release of SEAL

Monorepository for SEAL a Search Engine Abstraction Layer with support to different search engines
Documentation | Packages

Elasticsearch | Opensearch | Meilisearch | Algolia | Solr | Redisearch | Typesense
PHP | Symfony | Laravel | Spiral | Mezzio | Yii



Hello and welcome 👋,

About six month ago at the beginning of December 2022 I started the "Schranz-Search" project, which later out of that SEAL was born. At first more the project starteed as a research around different search engines which are around. At that time with a very limited knowledge about alternatives to Elasticsearch I was very curious what exists "beyond the tellerrand". With the support of different communities around Twitter, Reddit, Meetups, .. I could create a list of different search engines, and the list was bigger then expected and still grows.

My personally experience being a Core Developer at Sulu CMS Sulu a Symfony based CMS was limited to Elasticsearch. After having a look at the different search engines which did exist, I had to sortout which ones make sense to add to such an abstraction and are mostly used by the PHP community. Beside Opensearch, which should as a fork of Elasticsearch be a easy way to support, I did have a look at Algolia and Meilisearch and had so the first punch of search engines together I wanted to support. And so the start was created for SEAL the Search Engine Abstraction Layer.

Avoiding bringing complexity and search jargons to the end user

Search engines can be complex and they all have their own terms for different things. The target for the project was to hide the complexity of different search engines behind a easy understandable interface and so be very beginner friendly. The important part here was how the definition of the data which wanted to be added to the search engine need to be structured. Different search engines have different terms to define their mappings, fields, options, ... In the search engine abstraction layer wanted to avoid this kind of terms like doc_values: true, index: true, TAGS, keyword or other special terms of the different search engines. In the research I did stumble over Meilisearch definitions and really liked how they are targetting this issue. Instead of using some special search jargons terms in Meilisearch you are just telling what you want todo with the data fields you are indexing / saving. So a simple configuration inspired by Meilisearch was shipped to SEAL by using simple understandable words like searchable, filterable and sortable. So the following Schema definitions was born:

<?php

use Schranz\Search\SEAL\Schema\Field;
use Schranz\Search\SEAL\Schema\Index;
use Schranz\Search\SEAL\Schema\Schema;

$schema = new Schema([
    'blog' => new Index('blog', [
        'id' => new Field\IdentifierField('id'),
        'title' => new Field\TextField('title', sortable: true),
        'description' => new Field\TextField('description'),
        'tags' => new Field\TextField('tags', multiple: true, filterable: true),
        'published' => new Field\DateTimeField('published', sortable: true),
        'comments' => new Field\ObjectField('comments', [
            'text' => new Field\TextField('text', searchable: false),
            'author' => new Field\IntegerField('author'),
        ], multiple: true),
    ]),
]);

To be near as possible to PHP with the definitions the following types where supported Text, Integer, Float, Boolean and DateTime. This way all kind of different PHP Types are represented, with the multiple flag every type could also be an array of data. And with a special type called Object even assocative arrays could be added.

Strict vs. Dynamic Schema

There was nearly no discussion for me about going with a dynamic schema, I always wanted to go with a strict Schema like it is defined for databases. The first case was not all search engines supporting dynamic schemas. If you are new the search engines this means that you can push any data to it and by some kind of magic the search engines put that field into a specific type and configuration e.g. a string will by a text type in elasticsearch and so on, but if the first inputted string looks like a date it is a date field type and additional text will fail. My experience with this kind of mechanism was really bad and I only recommend it for quick prototyping. To go with a fixed and strict schema I wanted to prevent unwanted magic and add support for a wider range of search engines which do not support that kind of magic.

Creating a single interface to communicate with the search engine

After defining the definitions of the fields. The next and most important part was how the create the interface for the user of the library to communicate with the search engines. I'm really a big fan of @frankdejonge work with Flysystem, an abstraction for local and remote filesystems. It uses a single class and the Adapter Design Pattern to communicate with the different systems. That was the pattern we definitely can reuse for our abstraction. Another library which did also have an impact of the architecture is @doctrine, in the first implementation of SEAL I did go with a SchemaManager and a Connection, which is very similar how the Doctrine/DBAL works. After some implementation of different Adapters I decided to split the Connection class into two seperate classes the Indexer and the Searcher. Thx here to @wachterjohannes and @Toflar who helped me find a good way for splitting the read and write and so make things like a ReadWriteAdapter a lot easier. But back to the more important class the Engine, which is responsible for providing a single interface for the end user of the library to comunicate with there different search engines. For this we added the following methods to it:

interface EngineInterface
{
    public function saveDocument(string $index, array $document): void;

    public function deleteDocument(string $index, string $identifier): void;

    /**
     * @throws DocumentNotFoundException
     *
     * @return array<string, mixed>
     */
    public function getDocument(string $index, string $identifier): array;

    public function createSearchBuilder(): SearchBuilder;

    public function createIndex(string $index): void;

    public function dropIndex(string $index): ?TaskInterface;

    public function existIndex(string $index): bool;

    public function createSchema(): void;

    public function dropSchema(): void;
}

The usage of string representation of the index make it easier for the end user, without any imports or loading they are able with an instance of the Engine to add, delete, search and manage there search engines indexes. Internal the Engine forwards the Index instance and so the configured fields to the Adapter so that the adapter can work with it.

grafik

Fighting the search engines

The main difficulty was to fight the different search engines mappings, schemas, field definitions to match into the defined Field with options with searchable, filterable and sortable. For example to make a field only filterable and not searchable I first thought its enough to index it in Elasticsearch as a Keyword. But still if you did search for the whole word it did still show up the document in the result. After some deep diving into Elasticsearch and Lucene I found out that I could achieve it by configure the field index: false but doc_values: true. This was the only solution I found for this kind of options on my side that Elasticsearch behave the expected way. The most easiest thing as our own mapping implemented the same way was the support for Meilisearch as it uses nearly the same type of configurations. For Algolia I first thought it is the same, but sorting in Algolia requires additional replica indexes. This is also why a strict schema is required for the Search Engine abstraction that we now at creating time of the Indexes which Indexes we need to create. So at the creating time of the Indexes for Algolia we create in the AlgoliaSchemaManager additional replicas which have the specific sorting defined. At search time we are using that replica and it returns us then the result in the expected order.

Beside Elasticsearch, Opensearch, Algolia and Meilisearch I also later added the support for Solr (because used widely in the @typo3 community), RediSearch (personally a big fan of @redis) and Typesense (which did come up sometimes in my research on Reddit). With some kind of community help from the different Search Engines I could implement Solr via its Cloud mode and Typesense via some changes in the core mapping. The thing I could not solve a long time was the support for RediSearch. The problem there was that different DIALECTS did exist and using the latest DIALECT I was not able to make a field searchable and filterable at the same time. Another big issue was that filters did not work on fields containing - which for example every uuid does contain. As there where 2 open issues open ad RediSearch repository since 2-3 years I thought I will probably need to cancel the support for RediSearch. After some Twitter conversersation about the Developer Experience of the Search with Redis CEO Rowan Trollope:

Bildschirmfoto 2023-05-16 um 01 29 55

Some nice people from the RediSearch team did push me into the right direction how we could still achieve the things needed at current state. Instead of using the same field for searchable and filterable we are duplicating the field and so we have for example a searchable category field and a filterable category__raw field. This is even similar how Elasticsearch his handling Text and Keyword combination of a field. This was create and we could close the RediSearch Issue.

Being dependency transparent

With the splitting of SEAL into different packages and while the SEAL core even has at current state only a dependency to PHP 8.1 and greater. The different adapter packages could define there own supported dependencies like here. This way the abstraction is not hiding any dependency for specific adapters, this means if you want to use a specific adapter like elasticsearch the elasticsearch adapter will install you already all required dependency of it and are not hidden behind optional dependencies. Beside that it was for me very important to get things started quickly and so the Getting Started documentation shows also how you can fast get your favorite search engine software running with docker compose.

Implementing Filters

A todays search in my opinion can not exist without a support of some kind of filters. Why you maybe want to filter blogs by categories, a search today can get very complex specially in E-Commerce system. So it was a minimum requirement for the search engines supported by the abstraction that they support atleast some basic filters also. This make it possible to not only create a nice search on a website but also great overview pages with nice filters. So different Conditions where added to make filtering possible:

new EqualCondition('some_field', 'some_value');
new NotEqualCondition('some_field', 'some_value');
new GreaterThanCondition('some_field', 2.5);
new GreaterThanEqualCondition('some_field', 2.5);
new LessThanCondition('some_field', 2.5);
new LessThanEqualCondition('some_field', 2.5);

This filters can then be used via the SearchBuilder which can be created easily over the Engine instance:

<?php

use Schranz\Search\SEAL\Search\Condition;

$result = $engine->createSearchBuilder()
    ->addIndex('blog')
    ->addFilter(new Condition\SearchCondition('Create a webspace with Sulu')
    ->addFilter(new Condition\EqualCondition('tags', 'development')
    ->getResult();

foreach ($result as $document) {
    // do something with the document
}

$total = $result->total();

This way we got all together to support and communicate with the different search engines all kind of pages with searches.

Frameworks support

A standalone library is nice, but today I think is very important that we also provide an easy way to use such kind of library also inside different Frameworks. As a core developer at Sulu CMS my first choice of Framework Integration was providing a Bundle for @symfony ecosystem. This was also one of the easist for me to implement, even I think Symfony Bundles are the most complex things releated to other frameworks. But with Symfony new AbstractBundle class introduced in Symfony 6.1 it did make a lot easier as a Bundle is not longer splitted into 3 different classes (Bundle, Extension, Configuration). The next Framework of choice to provide an integration for was @laravel ecosystem. As I already had experience with such kind of implemention via my Schranz Templating it was possible for me to provide also the specific services for Laravel via an own ServiceProvider. My next framework of choice was @spiral, maybe not that widely used but a very modern and for me a nice mix of Symfony and Laravel orientated framework with a very helpful small community. Via a Spiral Bootloader we were able to provide the configuration and services for our library to the Spiral ecosystem.

For an easy configuration I did go for a DSN like configurations for the Adapters, which for example in Symfony are already used for Doctrine Databases or Symfony Messenger Buses. So via a change of a single Environment Variable another a Adapter could be used:

ENGINE_DSN=meilisearch://127.0.0.1:7700
ENGINE_DSN=algolia://%env(ALGOLIA_APPLICATION_ID)%:%env(ALGOLIA_ADMIN_API_KEY)%
ENGINE_DSN=elasticsearch://127.0.0.1:9200
ENGINE_DSN=opensearch://127.0.0.1:9200
ENGINE_DSN=redis://127.0.0.1:6379
ENGINE_DSN=solr://127.0.0.1:8983
ENGINE_DSN=typesense://[email protected]:8108

After writing the first parts of the documentation I was seeking for some people testing it out. One of the first ones giving a lot of feedback was @froschdesign which is comming from @laminas and Mezzio Framework. At that point I did not yet have integration for Laminas Mezzio Framework. But with the help of @froschdesign I was able to even provide for this kind of Framework an integration for. I personally a fan of its very simple and understandable Architecture of the Mezzio Framework. But the integration was a little bit more difficulty then I did expected, as there was not an easy way to provide services based on configurations. With the help of @froschdesign we did find a solution to go over a custom PSR-11 Container which did solve all our problems and we could so provide the integration. With now better knowledge about this kind of integrations and some help of @samdark I also was able to provide another Integration into the @yiisoft Framework Ecosystem.

This way the library could be used now easily in different kind of frameworks. For all kind of frameworks there were also CLI/Console commands created so they can via the Frameworks CLI tools also manage there indexes. For Laravel integration even Facades were created so the Services provided by the integration layer are also available over this Facades.

Writing documentation

One of the most important part I think for every library which is published is writing the documentation. As I had some special usecases for the documentation I did stick with the tools there with rst and python sphinx tool. The most important part of the documentation is the Getting Started documentation, the target while writing it was that everybody should be able to get into the library as fast as possible. So that documentation already should target the different Framework integrations, via the special sphinx-tabs extension, I think was able provide a good user experience for the [Getting Started documentation](https://schranz-search.github.io/schranz-search/getting-started/index.html. And a developer even not reading the rest of the documentation should be able to get all kind of other functionality together.

Still a Introduction documentation was added to show the basic structure of the project and explain some basic terms used inside the documentation to avoid confusion. The parts were already rewritten to avoid some kind of confusion, still feedback is specially welcome about the documentation.

The last part of writing the documentation was then to document all kind of feature which the core library provides. This goes over the Schema Definitions, Indexing Operations and different kind of Search & Filter Conditions. Also the result of links of the whole research was added to an own Research documentation. Which should in future still be extended with any kind of interesting links about Searches, even UI & UX should be listed there.

Target of the project?

The target of the project is that it should be way to go to implement any searchable content in PHP over this library. As Core Developer of @sulu CMS, I'm looking forward to make this library the way to integrate searches not only in Sulu but also other kind of content management systems. As we already have a deep connection to some guys at @contao and @typo3, I'm looking forward to get there some cooperation together to make this library an abstraction which every kind of CMS or every kind of System and PHP Application can easily use. This means that even more integration layers should exist in the future, if you already have some knowledge about a System or Framework you like to have an integration for, let us know via a Github Issue. The same exists if you have knowledge about a not yet supported Search engine you would like to see in the abstraction be supported. But the library at all is not the only target of the project, the Research Documentation should even be a source for everybody working with Search Engines and should be a living collection about search engines and interesting links around it, if you have anything to share about search engines I'm very happy to add the link to this documentation.

ODM and Datamapper Support?

One of the most frequently asked question is if there will be support for directly store and read objects build on own classes. SEAL itself is designed to be the lower level library working with array data like a SQL insert. The ODM or datamapper package will be on its own and build on top of SEAL package. Building ODM based on SEAL is currently to early and at current state we recommend instead using a Serializer or Normalizer like Symfony Serializer which make the normalization and denormalization from array to object and back easy. Still for all want to join the discussion about the ODM/Datamapper package have a look at this issue.

Packages

With this release the following packages are now provided by the Schranz Search project:

Similar projects

In the past there where some similar projects around try to targetting the same issues. nresni/Ariadne and abstraction around Solr, Elasticsearch and Zendsearch but is outdated 12 years ago. The massiveart/MassiveSearchBundle and abstraction around ZendSearch and Elasticsearch currently maintained by @sulu but not actively developed. The laravel/scout official support of Algolia and Meilisearch working with dynamic optional schema, deeply connected to Laravel and usage inside Eloquent, has also some different community adapters available for other search engines. It could be that SEAL in the future could provide an own laravel scout adapters to make the transition for them easier see following issue here. If you know about other similar libraries and projects let us know happy to add them to our README similar projects part.

What is coming next?

The current next task is more about testing and getting feedback. With this first tagged version I want to get more people to test out the library and so get more feedback about the current implemention. Find out if there are still things which confuses, and checkout where the things maybe can be improved and what kind of errors maybe can be avoided with clear documentation and namings. So the focus is here provide the best possible developer experience which we can and make this library beginner friendly for even very unexperienced developers which where not yet in contact with any kind of search engines. If anything in the documentation or library confuses you please feel free to a issue Github Issue or Github Discussion for it.

I'm really looking forward for all kind of feedback. At this state I want really thank you for all which did take the time testing the even unreleased version yet.

Already want to thank here @froschdesign, @butschster, @samdark, @vjik, @Toflar, @wachterjohannes, the Reddit PHP Community and my Twitter followers and all others who did give already some feedback and or provided helps with different Framework integrations.

Sincerly looking forward to your feedbacks,
Alex


Join the discussion in one of the following channels about the release:

Sharing very welcome :)