Skip to content

Instantly share code, notes, and snippets.

@irevoire
Last active July 27, 2021 11:02
Show Gist options
  • Save irevoire/b666a640954d6cfd5c2d791ad6f9fb33 to your computer and use it in GitHub Desktop.
Save irevoire/b666a640954d6cfd5c2d791ad6f9fb33 to your computer and use it in GitHub Desktop.
meilisearch lib.rs

MeiliSearch

Hello there, future contributors. If you are here and see this code, it's probably because you want to add a super new fancy feature in MeiliSearch or fix a bug and first of all, thank you for that!

To help you in this task, we'll try to do a little overview of the project.

Milli is the core library of MeiliSearch, it's where we actually index documents and perform searches. It's aim is to do theses two tasks as fast as possible. You can give an update to milli and it'll uses as many cores as provided to perform it as fast as possible. Nothing more. You can perform searches at the same time (search only use one core). As you can see we're missing quite a lot of features here, milli does not handle multiples indexes, it can't queue updates, it doesn't provide any web / API frontend, it doesn't implements dumps or snapshots, etc...

The index module is what encapsulate one milli index. It abstracts over its transaction and isolate a task that can be run into a thread. This is the unit of interaction with milli. If you add a feature to milli, you'll probably need to add it in this module too before being able to expose it to the rest of meilisearch.

To handle multiple indexes we created an index_controller. It's in charge of creating new indexes, keeping references to all its indexes forward asynchronous updates to its indexes and provide an API to synchronously search in its indexes. To achieves this goal, we use an actor model.

The actor model

Every actor is composed of at least three files:

  • mod.rs declare and import all the files used by the actor. We also describe the interface (= all the methods) used to interact with the actor. If you are not modifying anything inside of an actor, this is usually all you need to see.
  • handle_impl.rs implements the interface described in the mod.rs; in reality, there is no code logic in this file. Every method is only wrapping its parameters in a structure that is sent to the actor. This is useful for test and futureproofing.
  • message.rs contains an enum that describes all the interactions you can have with the actor.
  • actor.rs is used to create and execute the actor. It's where we'll write the loop looking for new messages and actually perform the tasks.

MeiliSearch currently uses four actors:

  • uuid_resolver hold the association between the user-provided indexes name and the internal uuid representation we use.
  • index_actor is our representation of multiples indexes. Any request made to MeiliSearch that needs to talk to milli will pass through this actor.
  • update_actor is in charge of indexes updates. Since updates can take a long time to receive and process, we need to:
    1. Store them as fast as possible so we can continue to receive other updates even if nothing has been processed
    2. Feed the index_actor with a new update every time it finished its current job.
  • dump_actor this actor handle the dumps. It needs to contact all the others actors and create a dump of everything that was currently happening.

Data module

The data module provide an unified interface to communicate with the index controller and other services (snapshot, dumps, ...), initialize the MeiliSearch instance

Http server

To handle the web and API part, we are using actix-web; you can find all routes in the [routes] module. Currently the configuration of actix-web is made in the lib.rs. Most of the routes uses [extractors] to handle the authentication.

@MarinPostma
Copy link

To help you in this task, we'll try to do a little overview of the project. MeiliSearch is frontend of milli. To handle the web and API part, we are using actix-web; you can find everything in the [routes] module. Since MeiliSearch encapsulates milli, which is too low-level for us, we also wrote a little wrapper around in the [index] module. If you added a feature to milli, you might want to check this module.

This part need to be elaborated a bit more, like what is Milli? If this is gonna be the entry point to the project, let descibe its components a bit more in depth.

Here are the sections I am thinking about:

Milli

exlplains what milli is, what it does (index documents, perform searches...) and what it doesn't (handles multiples indexes, queue perform update asynchronously...) + redirects to the milli repo for more info.

Index module

moving upwards, this is what wraps a milli index. It abstracts over the transaction, and isolate a task that can be run into a thread. This is the unit of interation with milli

IndexController module

handles multiple instantes of a milli index.Based on the actor model, allows asynchrounnous writes to the indexes (UpdateStore) and synchrous reads. It handles the resolution of indexes names to actual indexes.

Architecture note

all actors are connected thanks to an interface *Handle to allow tehn to be replaced with something else: usefull for test, and futureproofing.
4 actors... bla bla
modules organization blabla

Uuid Resolver Actor

resolves indirection between index uid and index uuid etc

other modules of uindex controller

Data module

Unified interface to communicate with the index controller and other services (snapshot, dumps...) initialize the meilisearch instance blabla
As agnostic as possible to the overlying tech used to communicate with it.

Http server

-> actix web
-> all routes are defined in routes
-> configuration in lib.rs (will move)
-> client to the Data moddule

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment