hack3r-0m/GSOC_PROPOSAL_SNARE.md

## GSOC_PROPOSAL_SNARE.md

      
    Raw
  

              GSOC_PROPOSAL_SNARE.md
            
          
    Candidate Introduction


OM PARIKH
InfoSec practitioner (specializing in fuzzing, cryptograph and binary exploitation)
Sophomore, Part-time Blockchain developer & InfoSec intern @ Matic
Member @ Appwrite, contributed to 10+ OSS projects
Github  Stackoverflow Ethereum-StackExchange  LinkedIn
Creator & Maintainer of NFTminter (15+ stars on github and 500+ daily visits on website)

Task Statement (SNARE 3.1)


Re-write Cloner
Upgrade the SNARE codebase to be compatible with the latest aiohttp version
Fix the CI pipeline
Improve Test Coverage
Work on critical issues ( #236, #7 and #284 )
Package Publishing & Documentation

Task Analysis


For solving the above mentioned tasks, An intermidate knowledge of Python, Familiartiy with InfoSec fundamentals, Git, Client-Server HTTP communication, Unit & E2E Testing, Packaging & Publishing is required.
Tasks should be prioritized in same order in Task Statement section for smooth workflow
#5 in Task Statement can require extensive debugging (Codebase is light-weight so this should not be trouble )

What I have done so far


Setup TANNER + SNARE local development environment & running test suite
Tested core components (Tanner handler, dorks & cloner )
studied the breaking changes from aiohttp 3.4 to aiohttp 3.7 (changelog)
examined the snare & tanner logs after exposing it to artifical traffic

1. Re-write Cloner


As per current scenario, cloner relies on BeautifulSoup for all the heavy lifting of parsing the response and then using asyncio's Queue to parse the other hyperlinks from the fetched response.
This is very inefficient (as per the discussion carried out in slack) due to multiple reasons and has vast range of corner cases to be convered
pywebcopy is the perfect solution for this, this make the user experice very smooth and has less overhead compared to something like selenium (which loads headless browser in memory)
pywebcopy is stable, reliable and has good test coverage
It is bakced by lxml, requests, beautifulsoup4, pyquery, requests_html (most of which are using natively to solve same purpose) and has support for authentication, bypass_robots & cookies.
Integration of this package can be very smoothly by adding helper functions to wrap selective core pywebcopy methods
ETA : 22-25 Hours

2. Upgrade the SNARE codebase to be compatible with the latest aiohttp version


older version aiohttp (almost 3 year old package) and aiohttp_jinja2 are used in tanner_handler, html_handler, server, middleware and their respective test cases. ( cloner will fixed by time )
There were very few breaking changes, however response error handling functions (such as handle 400 & handle 500) are causing troubles with TANNER which needs to be fixed.
Observing falied CI builds from automatic dependancy upgrade from dependabot gives lot of hints.
ETA: 17-20 Hours

3. Fix the CI pipeline


Travis config still uses python 3.5 as primary and should be changed ASAP!
config uses direclty pip install, while it should be done with flags such as --no-cache etc. for more deterministic builds.
few other minor improvements
ETA: 5-7 Hours

4. Improve Test Coverage


current test coverage is around 64% ( however snare/ has 90% coverage), it is comparatively low and needs to be inccreaed for more robust architecure.
Replace test cases for new cloner version, upgrade aiohttp.
Add more tests for middlewares and server (as they are the ones decreasing overall test coverage)
ETA: 20-25 Hours

5. Work on critical issues


Above mentioned open issues on SNARE would help end users a lot if solved ( #236, #7 and #284 )
I will start with #284 for which SNARE is detecting attacker's IP as proxy's IP, this can be solved by b debugging on headers dict and check for X-Forwarded-FOr header
For adding SSL support for server, this can be done with python's SSL lib and upgrading aiohttp. For example:

import ssl
from aiohttp import web

app = web.Application()
app.add_routes([..., ...])

ssl_context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
ssl_context.load_cert_chain('path/to/domain.crt', 'path/to/domain.key')

web.run_app(app, ssl_context=ssl_context)


Work on other issues discussing priority with communtiy
ETA: After completing all above tasks and leaving 15 hours for below task, i will spend remaining time on this.

6. Package Publishing & Documentation


This includes the bulding setup.py after completing above major tasks, ensure correct metadata.
After publishing, check the installation of package via different methods (pip, egg, wheel) and ensure stability.
Improve the current documentation to include minimal basic details, serving as a walkthrough for using (not including co-existence with TANNER as that be done in GSOD )
ETA: 15-17 Hours

Community Bonding


Read the codebase once more thorughly and quickly (1-2 days)
Discuss the implementation specific details and tweak the work flow according to changes suggested
Start contributing as quickly as possible

Others


I will be able to devote 45+ hours a week at minimum
I was not able contribute actively pre GSOC due to a bit health issues and a exam (glad both are solved completly now! )
#1, #2 and #3 will be finished before first evaluation
I am open to changes to proposal and new ideas if i have missed anything