Skip to content

Instantly share code, notes, and snippets.

@dmacjam
Last active April 3, 2017 14:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dmacjam/887e50a4d1b349f6b7e44f6302314d62 to your computer and use it in GitHub Desktop.
Save dmacjam/887e50a4d1b349f6b7e44f6302314d62 to your computer and use it in GitHub Desktop.
Jakub Macina @dmacjam Discourse GSoC17

Advanced search enhancements & Autospec gem

  • Name: Jakub Mačina
  • Nickname: @dmacjam
  • Email: jakub.macina@gmail.com
  • Country & timezone: Slovakia, GMT+2
  • School Name & Study: Slovak University of Technology, Information Systems
  • Personal Website: https://dmacjam.github.io
  • Phone number: +421 910 926 484

Synopsis

The project proposes new filters to advanced search and extraction of the rake task for running unit tests into a separate gem. Advanced search enhancements contains 1) searching posts containing all of the entered tags and 2) searching posts with images (with extension for searching by any filetype). The benefit for the Discourse community is better filtering of posts with emphasis on the search performance which will be useful for searching in bigger forums. By extracting custom way of running unit tests into a gem, we separate concerns as the tests running library is not dependent on the Discourse, Discourse code base gets smaller and it will be beneficial to whole Rails community.

I am member of Discourse meta community with username @dmacjam and I have earned member and contributor badge. I have already contributed to Discourse core by adding new feature of appending tags to topics as a bulk action (meta discussion). I am passionate about online communities and it is my research topic during my Master studies. I implemented my Master thesis by extending open-sourced university CQA system (Ruby on Rails, PostgreSQL, Python) which is closely related to Discourse in terms of idea and used technologies. Moreover, I have experience in software development from part-time job, internship, side and school projects. I have built my own Ruby on Rails web application (Domased), worked with frontend framework Angular.js in two projects (Find the perfect school, work project at ArcGeo). What’s more, I am trying to have an impact on the community by contributing to StackOverflow in Ruby on Rails, PostgreSQL and Python categories. I am regularly attending local Ruby meetups and I am sharing my knowledge by reviewing books within this community.

Project Details

Specs & Scope

Task 1: Advanced search enhancements

Discourse currently supports advanced search by tags which contains any of the tag specified. If user wants to search posts containing intersection of all tags, he/she must specify tag keyword before every tag, such as tag:bug tag:pr-welcome. Moreover, this logical AND search is not well optimized because it is doing sequential search on all topics for number of times equal to number of searched tags.

User story: As a user, I want to search posts which contains all of the tags I entered in order to find posts in more detailed way.

Use case:

  1. User inputs search keywords to Advanced search input field.
  2. User specifies tags in With tags input field.
  3. User checks checkbox Contains all provided tags and clicks on search icon.
  4. Search results are posts which contains all tags provided in step 2.

Task breakdown:

  • Add checkbox Contains all provided tags (search-advanced-options.hbs)
  • Modify relevant controller logic (search-advanced-options.js.es6). If checkbox is checked, add between tags separator '&' (represents logical AND) instead of ',' (represents logical OR)
  • Modify Search class:
    • when separator is ',', use logical OR tag search which is already implemented
    • when separator is '&', use logical AND tag search

Performance notes: My idea is to use PostgreSQL built-in tools for full-text search and boolean operations: to_tsvector and to_tsquery. By this design, it is possible to do only one sequential database scan on tags and topic_tags tables, which clearly speeds up logical AND search by multiple tags.

User story: As a user, I want to filter posts in Advanced search results which contains image(s) in order to filter the results.

Use case:

  1. User inputs keywords in Advanced search, marks checkbox Includes images and click on search button.
  2. All posts in search results are relevant to keywords and contains images.

Task breakdown:

  • Images can be either uploaded from the device or pasted as a direct link.
  • Adding new column filetype of type string(5) to topic_links table.
  • Extend Sidekiq job jobs/regular/process_post.rb and it's calling method extract_from(post) in TopicLink class with method for parsing the link for an allowed filetypes by admin or any common filetypes.
    • If links contain any allowed filetype, persist the filetype name to filetype column of topic_links table.
  • Add new checkbox in section Only return topics/posts that… with name Includes images.
  • Add logic to call appropriate function from search.rb.
  • Extend search.rb code to filter posts by joining to topic_links and filtering only values where filetype is equal to any common image filetype.

Extension 1: Search by specifying filetype, e.g. keyword filetype:pdf

  • Extend search.rb to parse new filter keyword and search by topic_links table field filetype.

Extension 2: Search posts with any uploaded attachment.

  • Add checkbox Includes uploaded attachment to search-advanced-options.hbs.
  • Extend search.rb to filter posts by topic_links table field domain. If it is Discourse site domain, than it is uploaded attachment.

Task 2: Extract rake autospec into a gem

Create new gem repository, define dependencies, write tests for the autospec and write a documentation how to use the autospec gem.

Extension: Listening for Javascript files changes which automatically fires tests for the changed file.

Anticipated challenges

I cannot identify any risks. To mitigate any challenge, I am planning use TDD and write proper tests according to specs, communicating clearly with my mentor and with community on the meta forum and submitting smaller pull requests at every milestone.

Groundwork

I have already identified classes or modules where to add/change code. I have an experience with PostgreSQL full text search from my other project (Find the perfect school). Moreover, in my Master thesis research work I did an analysis of StackOverflow CQA system database model and I have experience with development of university CQA system.

Project Schedule

Start: May 15

End: August 29

Working hours per week: 38 hours

Other commitments:

  • Master thesis defense - 1 day in June
  • Graduation ceremony - 1 day in July

Schedule:

  • May 15 Communicating details of my tasks to Discourse meta community. Learning Ember.js and familiarize myself with relevant Discourse codebase. Setting the scope and the final architecture of the project with my mentor.
  • May 29 (official coding start) Create checkbox and controller logic for tag logical AND advanced search.
  • June 5 Write tests and implement tag search which contains all provided tags.
  • Milestone 1: Logical AND tag seach feature is delivered.
  • June 19 Create migration for TopicLink class with new column. Write tests and extend the method extract_from(post) in TopicLink class to parse filetypes in links.
  • June 26 (first official evaluation) Adjust advanced search UI for searching posts with images. Write tests for searching by images.
  • July 10 Implement posts search which includes images.
  • Milestone 2: Searching posts with images is delivered.
  • July 17 Write tests and implement Extension 1 - searching by custom filename in advanced search.
  • Milestone 3: Searching post by custom filename in advanced search is delivered.
  • July 24 (second official evaluation) Extract rake autospec into a gem.
  • July 31 Write tests and documentation for the autospec gem.
  • August 7 Start working on rake autospec Extension: Listening for Javascript files changes.
  • Milestone 4: Autospec is extracted into a standalone gem.
  • August 21 Writing documentation for new advanced search features.

Experience

I am a CS Master student interested in web development with experience in software development from more than 8 projects. I am full-stack developer, interested more in backend with skills in Ruby on Rails, Python, Java and Angular.js. Besides web development, I am very interested in a machine learning and data science. I like startups and I co-founded project which monitors quality and length of user’s sitting in front of a computer. I am fan of open source and KISS approach. I am passionate about movement (soccer and biking in particular) and travelling. More information is available at my Linkedin and my blog.

I have already an experience working remotely for my side project which I co-founded and for my Master thesis cooperation with a researcher from Harvard University (5 hours time difference). My internship in Switzerland and study exchange program in Belgium made me confident communicating in technical English. I believe that my flexibility, eagerness to learn and experience in Angular.js fronted framework help me to easily learn Ember.js.

Work Experience

Software Engineer internship @ PSI Switzerland (2016) Software proposal and development of an desktop application for managing on-call responsibilities for employees using Java and PostgreSQL. Communication language was English.

Software developer @ ArcGeo (2015-2016) Building web applications and backend APIs as a geographic information systems (GIS) solutions for customers using Angular.js and Javascript.

Co-founder @ Spine Hero (2015-) Because of my passion to healthy lifestyle, I co-founded a project which monitors quality and length of user’s sitting in front of the computer. It’s a computer vision desktop application. We won a local startup competition and now we are running it as a side project (in the past year mostly remotely). The most important are the skills that I have gained in project management, teamwork, communication, conflict resolution, TDD and git flow development model.

Github projects

Here are my most interesting relevant projects from my Github portfolio

Domased My first web project built in Ruby on Rails for finding interesting events by location. Domased is automatically parsing new events from popular local sites selling tickets, Facebook sites and homepages of cities. Users can add new events and set their interests after logging in.

Find the perfect school I used open data released by UK government to help parents find suitable school for their children according to several options, e.g. closeness to the desired location, type of school, level of criminality nearby or presence of undesired spots. It is implemented in Node.js, Angular.js and uses PostgreSQL database. Since I used a lot of data, such as map of UK with places from OpenStreetMap and public information about schools and criminality in UK, I optimized my SQL queries by proper database design with column indices and full text search to search in reasonable time.

Academic experience

2015-2017 Slovak University of Technology in Bratislava

Degree: Masters Degree in Information Systems with expected graduation in June 2017.

Online communities research Discourse as a discussion platform is part of my research topic. I am focusing on online learning communities within CQA (Community Question Answering) systems and discussion boards. In my Master Thesis, I proposed recommendation of new questions in online student communities to support collaboration among students. I implemented question recommendation as a machine learning task in university CQA system Askalot. My implementation added new Ruby on Rails modules for displaying recommendations and A/B experimental structure to Askalot CQA system and Python modules for text processing and machine learning. The source code is in branch question-routing-mooc-feature. By cooperation with Harvard University I deployed this branch and evaluated my question recommendation online by A/B experiment on online learning course Quantum Cryptography with more than 4500 students at the well-known MOOC platform EdX.

2016-2016 KU Leuven, Belgium

Degree: Artificial Intelligence exchange program

Study language: English

2012-2015 Slovak University of Technology in Bratislava

Degree: Bachelors degree in Informatics.

Ranked among top 5% of students according to the GPA. Received Dean’s Award for excellent bachelor thesis in topic Detection of wrong sitting posture using neural networks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment