Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save quantranhong1999/8147f52a1b17f0461a9f1f27fc8b5303 to your computer and use it in GitHub Desktop.
Save quantranhong1999/8147f52a1b17f0461a9f1f27fc8b5303 to your computer and use it in GitHub Desktop.

GSoC 2021 Summary Report - Implement Thread support for JMAP for Apache James

Date of report: 19/08/2021

About me

My name’s Tran Hong Quan. I’m final year student at Hanoi University of Science and Technology. My major is Computer Engineering.

If you are interested in this project or Google Summer of Code, you can contact me by Email: quan.tranhong1999@gmail.com

Brief introduction

JMAP is an email application protocol to modernise IMAP, on top of HTTP using a JSON format. JMAP is designed to make efficient use of limited network resources and to be horizontally scalable to a very large number of users. Apache James is one of the first implementations of this new standard.

Mail user agents generally allow displaying emails grouped by conversations (replies, forward, etc...). As a part of JMAP RFC-8621 implementation, there is a dedicated concept: thread. JMAP Threads is already implemented In Apache James in a rather naive way: each email is a thread of its own. This naive implementation is specification compliant but defeats the overall purposes of threads: emails which related to a topic should belong to a thread.

James’s data models, storage APIs, and some JMAP methods at HTTP level need to be changed to make sure the purpose of the thread is reached.

How I did it?

Firstly we need to know that some message belong to a thread if they have the identical thread identifier (threadId). My work is around this threadId.

We need a dedicated module to guess new messages's threadId. Here I call it ThreadIdGuessingAlgorithm.

My idea is firstly adding a threadId property to James's Message model so I can query the threadId of a message.

When there is a new coming message, I will query all old messages of that user to see if there is any related message to this new message. If a new coming message relates to a old message, I decide that the new message should have same threadId with the old message. Otherwise the new message should have a new generated threadId. I did implement two ways to query old messages:

  • First way: use search engine (ElasticSearch, Lucene).

    This mostly is for experiment purpose. Every time there is a new message, we need to query to search engine and that is expensive. That is why I need the Second way to do this for production environment.

  • Second way: implement dedicated Cassandra table to save old message's thread data and base on that data to see if new message is related.

    Cassandra known queries have really fast query time. That's why it is good enough for production.

So now I need to qualify if two message related to each other. Firstly I have to read and follow JMAP specification carefully, in that the specification defines the conditions to qualify this case:

The exact algorithm for determining whether two Emails belong to the same Thread is not mandated in this spec to allow for compatibility with different existing systems. For new implementations, it is suggested that two messages belong in the same Thread if both of the following conditions apply:

1. An identical message id [@!RFC5322] appears in both messages in any of the Message-Id, In-Reply-To, and References header fields.
2. After stripping automatically added prefixes such as “Fwd:”, “Re:”, “[List-Tag]”, etc., and ignoring white space, the subjects are the same. This avoids the situation where a person replies to an old message as a convenient way of finding the right recipient to send to but changes the subject and starts a new conversation.

My idea is get these above header fields of old message and new message, stripping subject and then see if them sastified the conditions. In James there already have a piece of code to do this stripping subject job so I can leverage it.

After handling this guessing threadId stuff successfully, we can base on that work to implement and develop further thread tasks.

What work was done?

At the ending of Google Summer of Code, I finished "Implement Thread/get method" task - the last task in my proposal's schedule. All of my code has been reviewed carefully by my mentors and got merged into Apache James master branch.

Here is summary of my work with related Pull Requests:

Change James data models

Implement ThreadId guessing logic

Implement non-naive Thread/get method

What is left to do?

The original objective of the GSOC project is fully achieved (use of Thread/get).

While GsoC enabled a basic User experience with threads on top of JMAP, several enhancements would likely not be covered due to out of GSoC's schedule. There are some work related to Threads topic that I will continue to contribute to Apache James after Google Summer of Code:

  • Implement Thread/changes based on previous table
  • Push state changes for threads
  • Investigate ElasticSearch aggregation for collapse thread
  • Implement collapse thread option on top of the message search index
  • Implement collapse thread for memory
  • Email/query should expose thread options

Conclusion

Based on above work progress, it can be consider that I finished my Google Summer of Code 2021 project successfully.

Firstly I want to say thank to my mentors (Rene Cordier and Benoit Tellier), Google and The Apache Software Foundation for offering this oppoturnity to me. Then I want to say thank you again to my mentors. They are very enthusiasm to guide me throughout the project. I learn a lot from them.

Of course throughout the project, I met a lot challenges. Here is some main challenges:

  • Solving unknown and complicated problems
  • Work with a large codebase, complicated system (sometime I have to do bottom-up approach: reading code and tests to understand how the thing works first)
  • Write clean and working code
  • Learning new system design, new technology stacks in a short time

But to me, facing these above challenges is a great oppoturnity to learn from that and grow up quickly. Here is some main things I did learn from doing this project:

  • Problem solving
  • Improve my system design view
  • Develop a good mindset about writting clean code and performance-oriented
  • Learn new technology stacks: Cassandra, ElasticSearch, functional reactive programming
  • Learn how to interact with open source community

In conclusion, I did learn a lot doing my GSoC project. I am looking forward to learning and working on more challenging stuff in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment