A search engine is a software system that is designed to carry out web searches. They search the web in a systematic way for particular information specified in a textual web search query.
In this project we are going to build a search engine which can show us the search results it fetched from a few selected web sites.
A search engine performs four basic processes:
- Crawling
- Indexing
- Searching
- Ranking
Web search engines get their information by crawling from site to site. The crawler is provided with an entrypoint from which it starts collecting the links and text data and storing them in the database.
Indexing means associating the data found on the web pages with the domain it was found on and HTML fields. The way data is stored in the database is a major contributor to the efficiency of the search engine.
As the name implies searching means to search the database for relevant results to the search query.
Ranking means to rank the search results found from the above operation in order of their relevance to the user. The better ranking system results in a better search experience.
- PYTHON PROGRAMMING LANGUAGE
- MONGODB (Text based searching is very easy in mongodb)
- FLASK FRAMEWORK
The basic requirements for this project are:
- Git
- A Text Editor
- Python
- MongoDB
-
Go to the git downloads page
-
Download the installer for windows and run it.
-
Keep everything at default and finish the installation.
A text editor is needed to write the actual code. There are many good text editors and IDEs like Notepad++, Visual Studio Code, PyCharm, Atom, Sublime Text Editor. For this project, we will use Visual Studio Code.
-
Go to the Visual Studio Code download page.
-
Select the correct version according to your Operating System and download the installer.
-
Run the installer and install visual studio code with the default options.
-
In Visual Studio Code, install the python extension from the extension marketplace.
- Go to the python downloads page.
- Download the latest version of python available in the site and run it.
- Check the 'ADD PYTHON 3.10 TO PATH' option.
- Click on 'Install Now'
-
Go to the mongodb website and under the products tab go to the community server section.
-
Select the latest version, and your platform i.e. Windows and the file type(.msi) and download it.
-
Run the installer. Keep the default settings and complete the installation.
-
Now go to System properties and select 'Environment variables'. Select path and Click on 'Edit'
-
Click on 'New' and add the path to the 'bin' folder of your mongodb installation. If default settings are kept, the path should be 'C:\Program Files\MongoDB\Server\5.0\bin'.
-
Save it and mongodb installation is complete.
We are going to install and run mongodb in our local systems. For doing that follow the below instructions.
-
Head over to https://docs.mongodb.com/manual/installation/
-
Click on Install MongoDB Community Edition on Ubuntu.
- Follow the instructions given in the page to install mongodb on your computer.
- Run the installed mongodb by following the instructions on clicking here.
Check if python is already installed:
python3 --version
If you get an output like:
Python 3.x.x
Skip the next part. If you don't, follow the given commands.
sudo apt-get update && sudo apt upgrade -y
sudo apt-get install software-properties-common
sudo apt install python3
sudo dnf install python3 python3-devel