Skip to content

Instantly share code, notes, and snippets.

@maneeshpm
Last active November 30, 2021 04:10
Show Gist options
  • Save maneeshpm/f2a63bde56198957877268a6253d2cb2 to your computer and use it in GitHub Desktop.
Save maneeshpm/f2a63bde56198957877268a6253d2cb2 to your computer and use it in GitHub Desktop.
In Search Of A Better Search Solution: A GSoC Journey

image

In search of a better search solution: A GSoC journey

"And when you search something, all the kiwix libraries conspires in helping you to retrieve it"
- Definitely Not Paulo Coelho, The Alchemist

About Me and the project

My name is Maneesh P M. I am a senior UG in the Dept. of Chemical Engineering at IIT Kanpur. I enjoy building backend technologies, both online and offline.

My project focused on improving the search functionalities of openzim and kiwix covering both full text search and suggestion search. The major objectives were:

  • Drop wrapper structures from kiwix for performance and usability enhancements
  • Improve relevant suggestion results and their snippets
  • Introduce a versatile suggestion API that is able to work even in the absence of a Xapian index
  • Make suggestion handling more stable in kiwix-serve and other sub-projects

The projects that involved me were

The journey!

I started with the project early. The first stage of the project included usability improvements to zim-tools which is a utility that helps in local testing and command line usage.

zim-tools

The PRs fixing them: openzim/zim-tools PRs


libzim

The work on zim-tools set me up for actually working on the library. I started off with some maintenance work like

Related PRs: openzim/libzim#479, openzim/libzim#503, openzim/libzim#515,

Xapian is essentially at the core of our search infrastructure. Several modifications were made to its implementation.

These improvements and fixes made the suggestion results much more relevant for the user.

Related PRs: openzim/libzim#492, openzim/libzim#501, openzim/libzim#520, openzim/libzim#526, openzim/libzim#534, openzim/libzim#528

With these fixes that improved "relevancy", we could now move on to user exp, that is ease of use in library, snippets, general maintenance stuffs

Related PRs: openzim/libzim#545, openzim/libzim#559, openzim/libzim#547,


libkiwix

For a while, we shifted our attention to libkiwix to make some fixes based on the work in libzim.

Related PRs: kiwix/libkiwix#508, kiwix/libkiwix#510, kiwix/libkiwix#505, kiwix/libkiwix#528


kiwix-tools & kiwix-desktop

Since the projects depend on kiwix, any change in libkiwix has to be reflected here as well. Addition of a SuggestionItems for iteration class was one such change.

Related PRs: kiwix/kiwix-desktop#628, kiwix/kiwix-desktop#648, kiwix/kiwix-tools#461

Most of the issues in kiwix-tools were either moved to libkiwix/libzim or fixed via upstream patches.


All this work in itself improved the usage aspect of the library considerably 🎉

BUT the real ball game was yet to begin, work on the architecture and design! I was completely new to this area and spent a considerable amount of time to pick them up and build the huge upcoming PRs.

Back to libkiwix!

Dropping Wrappers from libkiwix kiwix/libkiwix#430
Essentially we were redeclaring all the libzim structures inside libkiwix, which was unnecessary and complicating. So they had to be dropped in three stages.

Some smaller bugs fixes encountered during this change:

Back to libzim! Again!

A major back compatibility problem in libzim was, suggestions did not work in the absence of a title/ft index. One had to use Archive methods to get suggestions in this case manually. To fix this,

Add new Suggestion API to libzim openzim/libzim#564
This was undeniably the longest running PR which took 100+ discussion comments, 1500+ lines of code, about 19 commits and a lot of reviews from my mentors Matthieu and Emmanuel. The changes were

  • Adding SuggestionSearcher & SuggestionSearch
  • Enhancing Archive::iterator methods
  • Introducing SuggestionIterator and SuggestionResultSet
  • Introducing SuggestionDataBase
  • Fixing the compilation with and without Xapian dependency

There features were so interrelated that they had to be done in one go within a single PR for coherence, and therefore the PR grew in size. Post these changes, we can now use the new Suggestion API with new and old zim files alike with libzim handling the intricacies of the interactions.

This was finally merged in openzim/libzim#574.

With the new suggestions API in place, we added the additional enhancements to libkiwix as well.


Epilogue

Now we can say 🎉.

Some numbers encapsulating the work in GSoC period:

  • 52 Issues fixed
  • 41 PRs merged
  • 6 Projects
  • 7K+ lines of code
  • 3 Months of fun and learning!

What's next for the project?


Thanks to Matthieu and Emmanuel for their help and encouragement, and thanks to GSoC for providing this awesome opportunity! Kudos 🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment