nikhilrayaprolu/README.md

## README.md

      
    Raw
  

              README.md
            
          
    FUNCTIONAL AND TECHNICAL REQUIREMENTS DOCUMENT
Project:-Develop the front end for a Susper Search Appliance
Introduction of the Project:
To develop a web front end as a Search Appliance ,for the next version of the grid search engine.
The Susper Search Appliance uses Yacy-Grid. Currently only the backend exists, that can be used via terminal. The goal of this project is to offer users a simple UI for a Susper Search Appliance, that allows users to run a local distributed search engine.
Background Info (given in the project description)

A good way to start is to read up on Open Source and proprietary search appliances such as the discontinued Google Search Appliance. A good starting point is the Wikipedia: https://en.wikipedia.org/wiki/Google_Search_Appliance
For background info on the first version of the Yacy Search Appliance please have a look at the following link: http://yacy.net/en/Applications.html
Search Appliance:
Topic-Oriented Search and Search Engine for Projects
You can operate as a single search appliance without networking with other peers. Such a search instance can be used for:

Search for projects - a combination of wikis, forums and websites
Topic-oriented search engine - combine a search for several web pages from different domains into a single search portal.
Companies can host their own search engine in order to help guard company secrets (a company may give away secrets with the search words employees submit to a search portal).
The search engine helps to preserve your anonymity when searching

Your own search portal!
If you run a YaCy peer, you have your own search engine. You can use it either to provide search functionality for your own search portal, or you can join a community of search engine peers to share your web index with the web index of other YaCy peer owners. If you search with YaCy your search requests are anonymous because they go to your computer and not to a central search portal server. No central server can log your searches.
Integration of a search window in your own web pages
YaCy provides simple code snippets to support the integration of YaCy search into your own web pages. There is also a search widget (pops up on your web pages if you like that) and demonstration snippets are provided by the YaCy administration interface. You only need to copy-paste the code into your pages.
Privacy and Security
Your private search requests are never stored, monitored or evaluated for commercial purposes.
If you are searching for terms related to product development and innovation, you can potentially give away information about your company activities. To maintain your business secrets, you need your own search engine (which can easily be created with YaCy).
Basic Analysis regarding Search Appliances
What is a Search Appliance?
Search Appliance, which is attached to acorporate network for the purpose of indexing the content shared across that network in a way that is similar to a web search engine.
What are the prominent search appliances present now?
Some of the prominent search appliances I came across include
i)Google Search Appliance
ii)MindBreeze
iii)LucidWorks
What are the possible features we could incorporate now in the project?
On analysis of above search appliances I have figured out the below features which could be incorporated in the Susper-Yacy Search Appliance.
1)Search Features
i)Able to search specific websites user mentioned(Mostly like if a company 'A' wants to search only on their site 'A.com' .we could have an option to  enter those sites which they would like to search upon ,and index the data using the links specified.
ii)Above feature of searching a specific website is useful even for embedded site search options which could be given with the Appliance.
iii)Search Databases (input for tokens to access or passwords could be taken from the user and index the data on a separate machine. (or) directly use the native search options or commands of the databases whenever user wants specific information.
iv)Search specific Public Mailing Lists of the user/company or Private mails(using tokens to access or passwords)
v) Search files on storages on the network(support maximum possible formats)
vi)Search other data like CRMs
vii)Full Text search functionality
viii)Support Regex based search, SQL like queries
ix)Different types of sorting features
x)AutoComplete Search
xi)Spell Check similar to google like "Do you Mean that?"
xii)Manual Boosting and Blocking
xiii)Implementing Possible Machine Learning and Data Mining techniques to get most relevant data possible
xiv)Entity Recognition:-Entity recognition structures content by pulling attributes like date,author or product type form documents.
xv)Translation of documents into other languages.
xvi)User-added results:-Users can help improve the search experience by adding search results for a specific keyword.
xvii)Document preview:

2)Able to access the search appliance from desktop , smart phones(Progressive apps) , remote networks


Making all connections encrypted


With Authentication and Authorisation,access control, support major security protocols


5)Dynamic Queries:-"applying synonyms and taxonomies"
6)Have API support for Susper Seach Appliance, with a REST API initially and separate API support for all major languages.
7)Cross Platform support for the whole system.
8)journaling , Access Logs,Error Logs
9)Protect User Privacy in all possible ways
10)Admin Console
Interfaces of some of the search Appliances Along with their feature Explanation.
interfaces and some images are added in this gist
What is the present state the project?
This is regarding the susper search engine, presently there is a lot to be done (as also been mentioned as another GSOC project, where
i)Automated Tests should be written
ii)Extra Features like synonym support ,error correction in search could be incorporated.
iii)Design changes should be done , to make it user-friendly
iv)Responsive design changes are going on
v)adding of extra search features
vi)implementation of internationalisation features to be done.
What is my Time Line for the project?
Implementation:-
Stages of implementation and requirements from the backend side along with timeline-
Since yacy-grid is not complete, there should be parallel work to be done on the backend  to support the frontend functionalities , (or) the frontend could be designed without the backend support with some mock data , and port the backend once it is completed.
And I would write tests for each enhancement individually,once a new component or feature is added. (since I had the experience of breaking my own code, effect of which could be reduced by writing tests parallelly with the code).
Stage -1 :-
Implement all left out features for the susper.com search engine:-
Since we will be building upon by extending the present susper.com , in the initial weeks we have to check that susper.com has complete functionality ,and complete test suite .Extra Features like synonym support ,error correction in search could be incorporated, and auto suggesting key words should be done.
Stage -2:-
Separate Susper Search Appliance from susper.com :
From Here actual progress in the project will be started towards making the search appliance.


Feature number
Feature
Description
Frontend Work
BackEnd Requirements


1
Authentication and authorisation modules
Implement basic authentication to a normal user and for admin.** Where a normal user can search for results but cannot access admin UI**


2
Custom indexing
**Able to search specific websites user mentioned(Mostly like if a company 'A' wants to search only on their site 'A.com' .we could have an option to  enter those sites which they would like to search upon ,and index the data using the links specified.)****ii)Above feature of searching a specific website is useful even for embedded site search options which could be given with the Appliance.and respective widgets to embed in their sites will be generated. **Using this widget and embedding it in their site they could have custom site search.
In frontend , mostly in admin UI, there will be an option where admin is allowed to enter a sequence of URLs of any form ,including Regex .** And then this sequence of URLs will be sent to backend. **Also generating the widget is required
Once the Sequence of URLs from frontend are received from frontend ,the respective site data should be indexed in the backend.


3
Search Databases
(input for tokens to access or passwords could be taken from the user and index the data on a separate machine. (or) directly use the native search options or commands of the databases whenever user wants specific information.
In admin UI, admin will be asked to choose the database , address of it and credentials to access it, all these data will be sent to backend
** Once the credentials are received in the backend information will be indexed upon from the database.**


4
Search Mailing Lists, or work as an archiver for Mailing Lists
Respective details of the Mailing Lists will be taken and the server works as an archiver for the Mailing lists
Take the required Inputs from the user.
Work as an archiver and index the mailing lists


5
Search CRM's
Searching CRMs should be incorporated
Take the required Inputs from the user.
Index the CRMs


6
Support Regex based search, SQL like queries
Supporting these features will allow user to search more efficiently
Since the backend is in solr ,and solr supports regex and SQL type queries this could be incorporated


7
Search Features
Search Features like sorting,Autocomplete, spell check similar to google  like "Do you Mean that?"
These all features doesn't need support with backend and can be incorporated in the frontend


8
Manula Boosting and Blocking,** User added search results **
** This will allow our search appliance to learn on its own to rank search results. **User-added results:-Users can help improve the search experience by adding search results for a specific keyword.

Implementing Possible Machine Learning and Data Mining techniques to get most relevant data possible


9
Entity Recognition
Entity Recognition:-Entity recognition structures content by pulling attributes like date,author or product type form documents.


10
Translation
Translation of documents into other languages.


11
Document Preview

This is a complete work of Fronend where document can be previewed before opening it.


Some other extra features include:-
**        ** 11)Able to access the search appliance from desktop , smart phones(Progressive apps) , remote networks
**         **12) Making all connections encrypted
**         **            13)Dynamic Queries:-"applying synonyms and taxonomies"
**         **            14)Have API support for Susper Seach Appliance, with a REST API initially and separate API support for all major languages.
**         **            15)Cross Platform support for the whole system.(Files on a windows server or linux server should be indexed .(backend)
**         **            16journaling , Access Logs,Error Logs
**         **            17)Protect User Privacy in all possible ways
**         **            18)Admin Console
Time Line:-
*** every feature pull includes coding, testing,responsive design,secure (where ever possible make them encrypted , add authentication and authorisations, access controls)**


Period
Work to be done(along with coding and testing, and responsive design)


May 4- May 30th( community bonding period)
Implement all left out features for the susper.com search engine. So that work on susper search appliance could be started


May 4- 11
Incorporate test suite for susper.com.


May 11-18
Extra Features like synonym support ,error correction in search could be incorporated,


May 18-25
auto suggestinng key words should be done.


May 25th-June 1st
(build up basic framework for authentication and authorisation,along with a design for admin console)


June 1st- June 8th
Build up basic framework for journaling , Access Logs,Error Logs


June 8th-June 15th
Implement frontend for feature no 2


June 15th-June 22nd
Implement frontend for feature no 3


June 22nd - June 29th
Implement frontend for feature no 4


June 26 16:00 UTC
Mentors and students can begin submitting Phase 1 evaluations


** June 30 16:00 UTC**
** Phase 1 Evaluation deadline; Google begins issuing student payments**


June 29th -July 6th
Implement frontend for feature no 5


July 6th -July 13th
Implement frontend for feature no 6


July 13th - July 20th
Implement frontend for feature no 7


July 20th- July 27th
Implement frontend for feature no 8


July 24 16:00 UTC
** Mentors and students can begin submitting Phase 2 evaluations**


July 28 16:00 UTC
** Phase 2 Evaluation deadline**


July 27th - August 3rd
Implement frontend for feature no 9


Aug 3rd - Aug 10th
Implement frontend for feature no 10


Aug 10th-Aug 21st
Implement frontend for feature no 11,Add https and make all connections encrypted.


August 21 - 29 16:00 UTC Final week:
Students submit their final work product and their final mentor evaluation


August 29 - September 5 16:00 UTC
Mentors submit final student evaluations


Left out feature in the above list of features :-
14)Have API support for Susper Seach Appliance, with a REST API initially and separate API support for all major programming languages.:-

  
## Screenshot from 2017-03-05 17-44-17.png

      
    Raw
  

              Screenshot from 2017-03-05 17-44-17.png
            
          
## Screenshot from 2017-03-05 17-45-17.png

      
    Raw
  

              Screenshot from 2017-03-05 17-45-17.png
            
          
## Screenshot from 2017-03-05 17-46-25.png

      
    Raw
  

              Screenshot from 2017-03-05 17-46-25.png
            
          
## Screenshot from 2017-03-05 17-47-04.png

      
    Raw
  

              Screenshot from 2017-03-05 17-47-04.png
            
          
## Screenshot from 2017-03-05 17-48-08.png

      
    Raw
  

              Screenshot from 2017-03-05 17-48-08.png
            
          
## Screenshot from 2017-03-05 17-51-27.png

      
    Raw
  

              Screenshot from 2017-03-05 17-51-27.png
            
          
## Screenshot from 2017-03-05 17-52-09.png

      
    Raw
  

              Screenshot from 2017-03-05 17-52-09.png
            
          
## Screenshot from 2017-03-05 17-52-32.png

      
    Raw
  

              Screenshot from 2017-03-05 17-52-32.png
            
          
## Screenshot from 2017-03-05 17-52-56.png

      
    Raw
  

              Screenshot from 2017-03-05 17-52-56.png
            
          
## Screenshot from 2017-03-05 17-53-26.png

      
    Raw
  

              Screenshot from 2017-03-05 17-53-26.png
            
          
## Screenshot from 2017-03-05 17-53-50.png

      
    Raw
  

              Screenshot from 2017-03-05 17-53-50.png
            
          
## Screenshot from 2017-03-05 17-54-54.png

      
    Raw
  

              Screenshot from 2017-03-05 17-54-54.png
Feature number	Feature	Description	Frontend Work	BackEnd Requirements
1	Authentication and authorisation modules	Implement basic authentication to a normal user and for admin. Where a normal user can search for results but cannot access admin UI
2	Custom indexing	Able to search specific websites user mentioned(Mostly like if a company 'A' wants to search only on their site 'A.com' .we could have an option to enter those sites which they would like to search upon ,and index the data using the links specified.)ii)Above feature of searching a specific website is useful even for embedded site search options which could be given with the Appliance.and respective widgets to embed in their sites will be generated. Using this widget and embedding it in their site they could have custom site search.	In frontend , mostly in admin UI, there will be an option where admin is allowed to enter a sequence of URLs of any form ,including Regex . And then this sequence of URLs will be sent to backend. Also generating the widget is required	Once the Sequence of URLs from frontend are received from frontend ,the respective site data should be indexed in the backend.
3	Search Databases	(input for tokens to access or passwords could be taken from the user and index the data on a separate machine. (or) directly use the native search options or commands of the databases whenever user wants specific information.	In admin UI, admin will be asked to choose the database , address of it and credentials to access it, all these data will be sent to backend	Once the credentials are received in the backend information will be indexed upon from the database.
4	Search Mailing Lists, or work as an archiver for Mailing Lists	Respective details of the Mailing Lists will be taken and the server works as an archiver for the Mailing lists	Take the required Inputs from the user.	Work as an archiver and index the mailing lists
5	Search CRM's	Searching CRMs should be incorporated	Take the required Inputs from the user.	Index the CRMs
6	Support Regex based search, SQL like queries	Supporting these features will allow user to search more efficiently	Since the backend is in solr ,and solr supports regex and SQL type queries this could be incorporated
7	Search Features	Search Features like sorting,Autocomplete, spell check similar to google like "Do you Mean that?"	These all features doesn't need support with backend and can be incorporated in the frontend
8	Manula Boosting and Blocking, User added search results	This will allow our search appliance to learn on its own to rank search results. User-added results:-Users can help improve the search experience by adding search results for a specific keyword.		Implementing Possible Machine Learning and Data Mining techniques to get most relevant data possible
9	Entity Recognition	Entity Recognition:-Entity recognition structures content by pulling attributes like date,author or product type form documents.
10	Translation	Translation of documents into other languages.
11	Document Preview		This is a complete work of Fronend where document can be previewed before opening it.
Period	Work to be done(along with coding and testing, and responsive design)
May 4- May 30th( community bonding period)	Implement all left out features for the susper.com search engine. So that work on susper search appliance could be started
May 4- 11	Incorporate test suite for susper.com.
May 11-18	Extra Features like synonym support ,error correction in search could be incorporated,
May 18-25	auto suggestinng key words should be done.
May 25th-June 1st	(build up basic framework for authentication and authorisation,along with a design for admin console)
June 1st- June 8th	Build up basic framework for journaling , Access Logs,Error Logs
June 8th-June 15th	Implement frontend for feature no 2
June 15th-June 22nd	Implement frontend for feature no 3
June 22nd - June 29th	Implement frontend for feature no 4
June 26 16:00 UTC	Mentors and students can begin submitting Phase 1 evaluations
June 30 16:00 UTC	Phase 1 Evaluation deadline; Google begins issuing student payments
June 29th -July 6th	Implement frontend for feature no 5
July 6th -July 13th	Implement frontend for feature no 6
July 13th - July 20th	Implement frontend for feature no 7
July 20th- July 27th	Implement frontend for feature no 8
July 24 16:00 UTC	Mentors and students can begin submitting Phase 2 evaluations
July 28 16:00 UTC	Phase 2 Evaluation deadline
July 27th - August 3rd	Implement frontend for feature no 9
Aug 3rd - Aug 10th	Implement frontend for feature no 10
Aug 10th-Aug 21st	Implement frontend for feature no 11,Add https and make all connections encrypted.
August 21 - 29 16:00 UTC Final week:	Students submit their final work product and their final mentor evaluation
August 29 - September 5 16:00 UTC	Mentors submit final student evaluations