naviji/gsoc-2020-work-product.md

## gsoc-2020-work-product.md

      
    Raw
  

              gsoc-2020-work-product.md
            
          
    Naveen M V - GSoC 2020

What got done

The project consisted of three parts:

Make search better by introducing additional search filters. (e.g., tags, notebook, type)
Make the ranking of search results better by implementing the Okapi BM25 relevance function.
Make fuzzy search possible.

Of these, the first two have been merged. The third is nearing completion.
The documentation for the new search filters can be found here.
Discussions and weekly project reports can be found here.
What's left to do

Deploying the fuzzy search on all supported platforms is the main problem to be solved.
Because we are using the Spellfix extension, which needs to be compiled from source, there are some difficulties in packaging and deploying the feature.
Code contributions


All: Add search filters

Joplin's search was previously using the FTS offered by SQLite almost directly, which lacks flexibility. For example, we can't restrict the search scope to a particular notebook or search based on tags.
The current search implementation fixes most of these problems by providing a better abstraction over FTS, supporting many new filters while remaining flexible enough to add new ones in the future easily.
Known limitations:

We can't exclude a notebook from the search directly. E.g., -notebook: archive doesn't work.
The search bar is not interactive. Providing suggestions or listing the recent searches could be useful.


All: Weigh notes using Okapi BM25 score

Okapi BM25 is a popular ranking function used by search engines to estimate a document's relevance to a given search query. It ranks a set of documents based on the query terms appearing in each document, regardless of their proximity.


Desktop: Fuzzy search

Fuzzy search gives a better user experience.
Known limitations:

We need to build the spellfix extension from source.
No support for mobile.


Challenges and learnings


Managing complexity and creating abstractions was the main challenge. Mentor feedback was helpful here.


Another is the use of test-driven development. It saved me hours of frustration when refactoring the code while implementing the search filters. (It was also quite satisfying seeing all those tests pass.)


I'm also going to spend more time studying software architecture and design patterns. These become impossible to ignore as the code base grows larger. I have a lot to learn here.