Task Statement (SNARE 3.1)
- Re-write Cloner
- Upgrade the SNARE codebase to be compatible with the latest aiohttp version
- Fix the CI pipeline
- Improve Test Coverage
- Work on critical issues ( #236, #7 and #284 )
- Package Publishing & Documentation
What I have done so far
- For solving the above mentioned tasks, An intermidate knowledge of Python, Familiartiy with InfoSec fundamentals, Git, Client-Server HTTP communication, Unit & E2E Testing, Packaging & Publishing is required.
- Tasks should be prioritized in same order in Task Statement section for smooth workflow
- #5 in Task Statement can require extensive debugging (Codebase is light-weight so this should not be trouble )
1. Re-write Cloner
- Setup TANNER + SNARE local development environment & running test suite
- Tested core components (Tanner handler, dorks & cloner )
- studied the breaking changes from aiohttp 3.4 to aiohttp 3.7 (changelog)
- examined the snare & tanner logs after exposing it to artifical traffic
2. Upgrade the SNARE codebase to be compatible with the latest aiohttp version
- As per current scenario, cloner relies on BeautifulSoup for all the heavy lifting of parsing the response and then using asyncio's Queue to parse the other hyperlinks from the fetched response.
- This is very inefficient (as per the discussion carried out in slack) due to multiple reasons and has vast range of corner cases to be convered
- pywebcopy is the perfect solution for this, this make the user experice very smooth and has less overhead compared to something like selenium (which loads headless browser in memory)
- pywebcopy is stable, reliable and has good test coverage
- It is bakced by lxml, requests, beautifulsoup4, pyquery, requests_html (most of which are using natively to solve same purpose) and has support for authentication, bypass_robots & cookies.
- Integration of this package can be very smoothly by adding helper functions to wrap selective core pywebcopy methods
- ETA : 22-25 Hours
3. Fix the CI pipeline
- older version aiohttp (almost 3 year old package) and aiohttp_jinja2 are used in tanner_handler, html_handler, server, middleware and their respective test cases. ( cloner will fixed by time )
- There were very few breaking changes, however response error handling functions (such as handle 400 & handle 500) are causing troubles with TANNER which needs to be fixed.
- Observing falied CI builds from automatic dependancy upgrade from dependabot gives lot of hints.
- ETA: 17-20 Hours
4. Improve Test Coverage
- Travis config still uses python 3.5 as primary and should be changed ASAP!
- config uses direclty pip install, while it should be done with flags such as
--no-cache etc. for more deterministic builds.
- few other minor improvements
- ETA: 5-7 Hours
5. Work on critical issues
- current test coverage is around 64% ( however snare/ has 90% coverage), it is comparatively low and needs to be inccreaed for more robust architecure.
- Replace test cases for new cloner version, upgrade aiohttp.
- Add more tests for middlewares and server (as they are the ones decreasing overall test coverage)
- ETA: 20-25 Hours
- Above mentioned open issues on SNARE would help end users a lot if solved ( #236, #7 and #284 )
- I will start with #284 for which SNARE is detecting attacker's IP as proxy's IP, this can be solved by b debugging on headers dict and check for
- For adding SSL support for server, this can be done with python's SSL lib and upgrading aiohttp. For example:
from aiohttp import web
app = web.Application()
ssl_context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
6. Package Publishing & Documentation
- Work on other issues discussing priority with communtiy
- ETA: After completing all above tasks and leaving 15 hours for below task, i will spend remaining time on this.
- This includes the bulding setup.py after completing above major tasks, ensure correct metadata.
- After publishing, check the installation of package via different methods (pip, egg, wheel) and ensure stability.
- Improve the current documentation to include minimal basic details, serving as a walkthrough for using (not including co-existence with TANNER as that be done in GSOD )
- ETA: 15-17 Hours
- Read the codebase once more thorughly and quickly (1-2 days)
- Discuss the implementation specific details and tweak the work flow according to changes suggested
- Start contributing as quickly as possible
- I will be able to devote 45+ hours a week at minimum
- I was not able contribute actively pre GSOC due to a bit health issues and a exam (glad both are solved completly now! )
- #1, #2 and #3 will be finished before first evaluation
- I am open to changes to proposal and new ideas if i have missed anything