Skip to content

Instantly share code, notes, and snippets.

@nikhilrayaprolu
Last active March 12, 2017 06:15
Show Gist options
  • Save nikhilrayaprolu/de5b43c93f57f84d220173dd6f9f4f62 to your computer and use it in GitHub Desktop.
Save nikhilrayaprolu/de5b43c93f57f84d220173dd6f9f4f62 to your computer and use it in GitHub Desktop.
Requirements_document for Susper Search Appliance

FUNCTIONAL AND TECHNICAL REQUIREMENTS DOCUMENT

Project:-Develop the front end for a Susper Search Appliance

Introduction of the Project:

To develop a web front end as a Search Appliance ,for the next version of the grid search engine.

The Susper Search Appliance uses Yacy-Grid. Currently only the backend exists, that can be used via terminal. The goal of this project is to offer users a simple UI for a Susper Search Appliance, that allows users to run a local distributed search engine.

Background Info (given in the project description)

A good way to start is to read up on Open Source and proprietary search appliances such as the discontinued Google Search Appliance. A good starting point is the Wikipedia: https://en.wikipedia.org/wiki/Google_Search_Appliance

For background info on the first version of the Yacy Search Appliance please have a look at the following link: http://yacy.net/en/Applications.html

Search Appliance: Topic-Oriented Search and Search Engine for Projects

You can operate as a single search appliance without networking with other peers. Such a search instance can be used for:

  • Search for projects - a combination of wikis, forums and websites
  • Topic-oriented search engine - combine a search for several web pages from different domains into a single search portal.
  • Companies can host their own search engine in order to help guard company secrets (a company may give away secrets with the search words employees submit to a search portal).
  • The search engine helps to preserve your anonymity when searching

Your own search portal! If you run a YaCy peer, you have your own search engine. You can use it either to provide search functionality for your own search portal, or you can join a community of search engine peers to share your web index with the web index of other YaCy peer owners. If you search with YaCy your search requests are anonymous because they go to your computer and not to a central search portal server. No central server can log your searches.

Integration of a search window in your own web pages YaCy provides simple code snippets to support the integration of YaCy search into your own web pages. There is also a search widget (pops up on your web pages if you like that) and demonstration snippets are provided by the YaCy administration interface. You only need to copy-paste the code into your pages.

Privacy and Security Your private search requests are never stored, monitored or evaluated for commercial purposes. If you are searching for terms related to product development and innovation, you can potentially give away information about your company activities. To maintain your business secrets, you need your own search engine (which can easily be created with YaCy).

Basic Analysis regarding Search Appliances

What is a Search Appliance?

Search Appliance, which is attached to acorporate network for the purpose of indexing the content shared across that network in a way that is similar to a web search engine.

What are the prominent search appliances present now?

Some of the prominent search appliances I came across include

i)Google Search Appliance

ii)MindBreeze

iii)LucidWorks

What are the possible features we could incorporate now in the project?

On analysis of above search appliances I have figured out the below features which could be incorporated in the Susper-Yacy Search Appliance.

1)Search Features

i)Able to search specific websites user mentioned(Mostly like if a company 'A' wants to search only on their site 'A.com' .we could have an option to enter those sites which they would like to search upon ,and index the data using the links specified.

ii)Above feature of searching a specific website is useful even for embedded site search options which could be given with the Appliance.

iii)Search Databases (input for tokens to access or passwords could be taken from the user and index the data on a separate machine. (or) directly use the native search options or commands of the databases whenever user wants specific information.

iv)Search specific Public Mailing Lists of the user/company or Private mails(using tokens to access or passwords)

v) Search files on storages on the network(support maximum possible formats)

vi)Search other data like CRMs

vii)Full Text search functionality

viii)Support Regex based search, SQL like queries

ix)Different types of sorting features

x)AutoComplete Search

xi)Spell Check similar to google like "Do you Mean that?"

xii)Manual Boosting and Blocking

xiii)Implementing Possible Machine Learning and Data Mining techniques to get most relevant data possible

xiv)Entity Recognition:-Entity recognition structures content by pulling attributes like date,author or product type form documents.

xv)Translation of documents into other languages.

xvi)User-added results:-Users can help improve the search experience by adding search results for a specific keyword.

xvii)Document preview:


2)Able to access the search appliance from desktop , smart phones(Progressive apps) , remote networks

  1. Making all connections encrypted

  2. With Authentication and Authorisation,access control, support major security protocols

5)Dynamic Queries:-"applying synonyms and taxonomies"

6)Have API support for Susper Seach Appliance, with a REST API initially and separate API support for all major languages.

7)Cross Platform support for the whole system.

8)journaling , Access Logs,Error Logs

9)Protect User Privacy in all possible ways

10)Admin Console

Interfaces of some of the search Appliances Along with their feature Explanation. interfaces and some images are added in this gist

What is the present state the project?

This is regarding the susper search engine, presently there is a lot to be done (as also been mentioned as another GSOC project, where

i)Automated Tests should be written

ii)Extra Features like synonym support ,error correction in search could be incorporated.

iii)Design changes should be done , to make it user-friendly

iv)Responsive design changes are going on

v)adding of extra search features

vi)implementation of internationalisation features to be done.

What is my Time Line for the project?

Implementation:-

Stages of implementation and requirements from the backend side along with timeline-

Since yacy-grid is not complete, there should be parallel work to be done on the backend to support the frontend functionalities , (or) the frontend could be designed without the backend support with some mock data , and port the backend once it is completed.

And I would write tests for each enhancement individually,once a new component or feature is added. (since I had the experience of breaking my own code, effect of which could be reduced by writing tests parallelly with the code).

Stage -1 :-

Implement all left out features for the susper.com search engine:-

Since we will be building upon by extending the present susper.com , in the initial weeks we have to check that susper.com has complete functionality ,and complete test suite .Extra Features like synonym support ,error correction in search could be incorporated, and auto suggesting key words should be done.

Stage -2:-

Separate Susper Search Appliance from susper.com :

From Here actual progress in the project will be started towards making the search appliance.

Feature number Feature Description Frontend Work BackEnd Requirements
1 Authentication and authorisation modules Implement basic authentication to a normal user and for admin.** Where a normal user can search for results but cannot access admin UI**
2 Custom indexing **Able to search specific websites user mentioned(Mostly like if a company 'A' wants to search only on their site 'A.com' .we could have an option to enter those sites which they would like to search upon ,and index the data using the links specified.)****ii)Above feature of searching a specific website is useful even for embedded site search options which could be given with the Appliance.and respective widgets to embed in their sites will be generated. **Using this widget and embedding it in their site they could have custom site search. In frontend , mostly in admin UI, there will be an option where admin is allowed to enter a sequence of URLs of any form ,including Regex .** And then this sequence of URLs will be sent to backend. **Also generating the widget is required Once the Sequence of URLs from frontend are received from frontend ,the respective site data should be indexed in the backend.
3 Search Databases (input for tokens to access or passwords could be taken from the user and index the data on a separate machine. (or) directly use the native search options or commands of the databases whenever user wants specific information. In admin UI, admin will be asked to choose the database , address of it and credentials to access it, all these data will be sent to backend ** Once the credentials are received in the backend information will be indexed upon from the database.**
4 Search Mailing Lists, or work as an archiver for Mailing Lists Respective details of the Mailing Lists will be taken and the server works as an archiver for the Mailing lists Take the required Inputs from the user. Work as an archiver and index the mailing lists
5 Search CRM's Searching CRMs should be incorporated Take the required Inputs from the user. Index the CRMs
6 Support Regex based search, SQL like queries Supporting these features will allow user to search more efficiently Since the backend is in solr ,and solr supports regex and SQL type queries this could be incorporated
7 Search Features Search Features like sorting,Autocomplete, spell check similar to google like "Do you Mean that?" These all features doesn't need support with backend and can be incorporated in the frontend
8 Manula Boosting and Blocking,** User added search results ** ** This will allow our search appliance to learn on its own to rank search results. **User-added results:-Users can help improve the search experience by adding search results for a specific keyword. Implementing Possible Machine Learning and Data Mining techniques to get most relevant data possible
9 Entity Recognition Entity Recognition:-Entity recognition structures content by pulling attributes like date,author or product type form documents.
10 Translation Translation of documents into other languages.
11 Document Preview This is a complete work of Fronend where document can be previewed before opening it.

Some other extra features include:-

** ** 11)Able to access the search appliance from desktop , smart phones(Progressive apps) , remote networks

** **12) Making all connections encrypted

** ** 13)Dynamic Queries:-"applying synonyms and taxonomies"

** ** 14)Have API support for Susper Seach Appliance, with a REST API initially and separate API support for all major languages.

** ** 15)Cross Platform support for the whole system.(Files on a windows server or linux server should be indexed .(backend)

** ** 16journaling , Access Logs,Error Logs

** ** 17)Protect User Privacy in all possible ways

** ** 18)Admin Console

Time Line:-

*** every feature pull includes coding, testing,responsive design,secure (where ever possible make them encrypted , add authentication and authorisations, access controls)**

Period Work to be done(along with coding and testing, and responsive design)
May 4- May 30th( community bonding period) Implement all left out features for the susper.com search engine. So that work on susper search appliance could be started
May 4- 11 Incorporate test suite for susper.com.
May 11-18 Extra Features like synonym support ,error correction in search could be incorporated,
May 18-25 auto suggestinng key words should be done.
May 25th-June 1st (build up basic framework for authentication and authorisation,along with a design for admin console)
June 1st- June 8th Build up basic framework for journaling , Access Logs,Error Logs
June 8th-June 15th Implement frontend for feature no 2
June 15th-June 22nd Implement frontend for feature no 3
June 22nd - June 29th Implement frontend for feature no 4
June 26 16:00 UTC Mentors and students can begin submitting Phase 1 evaluations
** June 30 16:00 UTC** ** Phase 1 Evaluation deadline; Google begins issuing student payments**
June 29th -July 6th Implement frontend for feature no 5
July 6th -July 13th Implement frontend for feature no 6
July 13th - July 20th Implement frontend for feature no 7
July 20th- July 27th Implement frontend for feature no 8
July 24 16:00 UTC ** Mentors and students can begin submitting Phase 2 evaluations**
July 28 16:00 UTC ** Phase 2 Evaluation deadline**
July 27th - August 3rd Implement frontend for feature no 9
Aug 3rd - Aug 10th Implement frontend for feature no 10
Aug 10th-Aug 21st Implement frontend for feature no 11,Add https and make all connections encrypted.
August 21 - 29 16:00 UTC Final week: Students submit their final work product and their final mentor evaluation
August 29 - September 5 16:00 UTC Mentors submit final student evaluations

Left out feature in the above list of features :-

14)Have API support for Susper Seach Appliance, with a REST API initially and separate API support for all major programming languages.:-

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment