- Name: Jakub Mačina
- Nickname: @dmacjam
- Email: jakub.macina@gmail.com
- Country & timezone: Slovakia, GMT+2
- School Name & Study: Slovak University of Technology, Information Systems
- Personal Website: https://dmacjam.github.io
- Phone number: +421 910 926 484
The project proposes new filters to advanced search and extraction of the rake task for running unit tests into a separate gem. Advanced search enhancements contains 1) searching posts containing all of the entered tags and 2) searching posts with images (with extension for searching by any filetype). The benefit for the Discourse community is better filtering of posts with emphasis on the search performance which will be useful for searching in bigger forums. By extracting custom way of running unit tests into a gem, we separate concerns as the tests running library is not dependent on the Discourse, Discourse code base gets smaller and it will be beneficial to whole Rails community.
I am member of Discourse meta community with username @dmacjam and I have earned member and contributor badge. I have already contributed to Discourse core by adding new feature of appending tags to topics as a bulk action (meta discussion). I am passionate about online communities and it is my research topic during my Master studies. I implemented my Master thesis by extending open-sourced university CQA system (Ruby on Rails, PostgreSQL, Python) which is closely related to Discourse in terms of idea and used technologies. Moreover, I have experience in software development from part-time job, internship, side and school projects. I have built my own Ruby on Rails web application (Domased), worked with frontend framework Angular.js in two projects (Find the perfect school, work project at ArcGeo). What’s more, I am trying to have an impact on the community by contributing to StackOverflow in Ruby on Rails, PostgreSQL and Python categories. I am regularly attending local Ruby meetups and I am sharing my knowledge by reviewing books within this community.
Part 1: Core: Tag logical AND search
Discourse currently supports advanced search by tags which contains any of the tag specified. If user wants to search posts containing intersection of all tags, he/she must specify tag keyword before every tag, such as tag:bug tag:pr-welcome
. Moreover, this logical AND search is not well optimized because it is doing sequential search on all topics for number of times equal to number of searched tags.
User story: As a user, I want to search posts which contains all of the tags I entered in order to find posts in more detailed way.
Use case:
- User inputs search keywords to Advanced search input field.
- User specifies tags in With tags input field.
- User checks checkbox Contains all provided tags and clicks on search icon.
- Search results are posts which contains all tags provided in step 2.
Task breakdown:
- Add checkbox Contains all provided tags (
search-advanced-options.hbs
) - Modify relevant controller logic (
search-advanced-options.js.es6
). If checkbox is checked, add between tags separator '&' (represents logical AND) instead of ',' (represents logical OR) - Modify
Search
class:- when separator is ',', use logical OR tag search which is already implemented
- when separator is '&', use logical AND tag search
Performance notes: My idea is to use PostgreSQL built-in tools for full-text search and boolean operations: to_tsvector
and to_tsquery
. By this design, it is possible to do only one sequential database scan on tags
and topic_tags
tables, which clearly speeds up logical AND search by multiple tags.
Part 2: Core: Search posts with images
User story: As a user, I want to filter posts in Advanced search results which contains image(s) in order to filter the results.
Use case:
- User inputs keywords in Advanced search, marks checkbox Includes images and click on search button.
- All posts in search results are relevant to keywords and contains images.
Task breakdown:
- Images can be either uploaded from the device or pasted as a direct link.
- Adding new column
filetype
of type string(5) totopic_links
table. - Extend Sidekiq job
jobs/regular/process_post.rb
and it's calling methodextract_from(post)
inTopicLink
class with method for parsing the link for an allowed filetypes by admin or any common filetypes.- If links contain any allowed filetype, persist the filetype name to
filetype
column oftopic_links
table.
- If links contain any allowed filetype, persist the filetype name to
- Add new checkbox in section Only return topics/posts that… with name Includes images.
- Add logic to call appropriate function from
search.rb
. - Extend
search.rb
code to filter posts by joining totopic_links
and filtering only values wherefiletype
is equal to any common image filetype.
Extension 1: Search by specifying filetype, e.g. keyword filetype:pdf
- Extend
search.rb
to parse new filter keyword and search bytopic_links
table fieldfiletype
.
Extension 2: Search posts with any uploaded attachment.
- Add checkbox Includes uploaded attachment to
search-advanced-options.hbs
. - Extend
search.rb
to filter posts bytopic_links
table fielddomain
. If it is Discourse site domain, than it is uploaded attachment.
Create new gem repository, define dependencies, write tests for the autospec and write a documentation how to use the autospec gem.
Extension: Listening for Javascript files changes which automatically fires tests for the changed file.
I cannot identify any risks. To mitigate any challenge, I am planning use TDD and write proper tests according to specs, communicating clearly with my mentor and with community on the meta forum and submitting smaller pull requests at every milestone.
I have already identified classes or modules where to add/change code. I have an experience with PostgreSQL full text search from my other project (Find the perfect school). Moreover, in my Master thesis research work I did an analysis of StackOverflow CQA system database model and I have experience with development of university CQA system.
Start: May 15
End: August 29
Working hours per week: 38 hours
Other commitments:
- Master thesis defense - 1 day in June
- Graduation ceremony - 1 day in July
- May 15 Communicating details of my tasks to Discourse meta community. Learning Ember.js and familiarize myself with relevant Discourse codebase. Setting the scope and the final architecture of the project with my mentor.
- May 29 (official coding start) Create checkbox and controller logic for tag logical AND advanced search.
- June 5 Write tests and implement tag search which contains all provided tags.
- Milestone 1: Logical AND tag seach feature is delivered.
- June 19 Create migration for
TopicLink
class with new column. Write tests and extend the methodextract_from(post)
inTopicLink
class to parse filetypes in links. - June 26 (first official evaluation) Adjust advanced search UI for searching posts with images. Write tests for searching by images.
- July 10 Implement posts search which includes images.
- Milestone 2: Searching posts with images is delivered.
- July 17 Write tests and implement Extension 1 - searching by custom filename in advanced search.
- Milestone 3: Searching post by custom filename in advanced search is delivered.
- July 24 (second official evaluation) Extract rake autospec into a gem.
- July 31 Write tests and documentation for the autospec gem.
- August 7 Start working on rake autospec Extension: Listening for Javascript files changes.
- Milestone 4: Autospec is extracted into a standalone gem.
- August 21 Writing documentation for new advanced search features.
I am a CS Master student interested in web development with experience in software development from more than 8 projects. I am full-stack developer, interested more in backend with skills in Ruby on Rails, Python, Java and Angular.js. Besides web development, I am very interested in a machine learning and data science. I like startups and I co-founded project which monitors quality and length of user’s sitting in front of a computer. I am fan of open source and KISS approach. I am passionate about movement (soccer and biking in particular) and travelling. More information is available at my Linkedin and my blog.
I have already an experience working remotely for my side project which I co-founded and for my Master thesis cooperation with a researcher from Harvard University (5 hours time difference). My internship in Switzerland and study exchange program in Belgium made me confident communicating in technical English. I believe that my flexibility, eagerness to learn and experience in Angular.js fronted framework help me to easily learn Ember.js.
Software Engineer internship @ PSI Switzerland (2016) Software proposal and development of an desktop application for managing on-call responsibilities for employees using Java and PostgreSQL. Communication language was English.
Software developer @ ArcGeo (2015-2016) Building web applications and backend APIs as a geographic information systems (GIS) solutions for customers using Angular.js and Javascript.
Co-founder @ Spine Hero (2015-) Because of my passion to healthy lifestyle, I co-founded a project which monitors quality and length of user’s sitting in front of the computer. It’s a computer vision desktop application. We won a local startup competition and now we are running it as a side project (in the past year mostly remotely). The most important are the skills that I have gained in project management, teamwork, communication, conflict resolution, TDD and git flow development model.
Here are my most interesting relevant projects from my Github portfolio
Domased My first web project built in Ruby on Rails for finding interesting events by location. Domased is automatically parsing new events from popular local sites selling tickets, Facebook sites and homepages of cities. Users can add new events and set their interests after logging in.
Find the perfect school I used open data released by UK government to help parents find suitable school for their children according to several options, e.g. closeness to the desired location, type of school, level of criminality nearby or presence of undesired spots. It is implemented in Node.js, Angular.js and uses PostgreSQL database. Since I used a lot of data, such as map of UK with places from OpenStreetMap and public information about schools and criminality in UK, I optimized my SQL queries by proper database design with column indices and full text search to search in reasonable time.
Degree: Masters Degree in Information Systems with expected graduation in June 2017.
Online communities research Discourse as a discussion platform is part of my research topic. I am focusing on online learning communities within CQA (Community Question Answering) systems and discussion boards. In my Master Thesis, I proposed recommendation of new questions in online student communities to support collaboration among students. I implemented question recommendation as a machine learning task in university CQA system Askalot. My implementation added new Ruby on Rails modules for displaying recommendations and A/B experimental structure to Askalot CQA system and Python modules for text processing and machine learning. The source code is in branch question-routing-mooc-feature. By cooperation with Harvard University I deployed this branch and evaluated my question recommendation online by A/B experiment on online learning course Quantum Cryptography with more than 4500 students at the well-known MOOC platform EdX.
Degree: Artificial Intelligence exchange program
Study language: English
Degree: Bachelors degree in Informatics.
Ranked among top 5% of students according to the GPA. Received Dean’s Award for excellent bachelor thesis in topic Detection of wrong sitting posture using neural networks.