sgtpepperpt/gsoc2019-report.md

## gsoc2019-report.md

      
    Raw
  

              gsoc2019-report.md
            
          
    Project Report

Organisation: The Honeynet Project
Project: Cowrie
Title: SSH Proxy for Cowrie
Mentor: Michel Oostherhof
Goals

The proposal focuses on the development of an SSH proxy for Cowrie, a mid-interaction
SSH and Telnet proxy. This proxy is then complemented with a set of backend virtual
machines, managed by our prototype, which provide a secure environment for attackers
to explore, while limiting the damage that could be done. Later we added a Telnet proxy
to complement the SSH version.
Contributions

We can divide the contributions of this GSoC project in three main modules that
have been added to the Cowrie project:

SSH proxy (PR)
Telnet proxy (PR)
Backend pool (PR)

There were some minor PRs, but these three each add a new module for Cowrie, and contain
the main body of our work.
SSH Proxy

We started our work by doing the SSH proxy, which was the most important part of the project
and the original proposal. The concept was simple: an SSH connection from attacker to Cowrie,
and another one from Cowrie to a backend (a simple Docker
at this time), and some logging in the middle. It wouldn't be as simple as a TCP proxy, as SSH traffic
is encrypted and, event if it wasn't, you'd need to be aware of the protocol to produce some meaningful
logging. Fortunately, I had a great example to this part: HonSSH,
a honeypot forked from Kippo, as was Cowrie.
I started working on this early, and got a working prototype within a week. After some fine-tuning,
it was merged into Cowrie's main repository in late June.
Telnet Proxy

Soon after the SSH proxy took form, I also started working on a Telnet one. Same concept,
but not as much standardisation as with SSH - we only have lines going back and forth, and
it's up to us to ascertain which parts are dealing with authentication and commands, for example
(in SSH you have messages and message codes that tell you that).
I started by adapting the classes from SSH to deal with TCP transports to and from attackers,
and to and from backends. Then, it was time to interpret communication, and I used the same idea
from the SSH proxy (originally from HonSSH): create a "handler" class, to where messages from attackers
are sent, processed, and then effectively sent to the backend (with responses following the same way back).
In this handler we decided to use regex patterns to detect prompts that need spoofing (authentication, for example),
which can be configured by users. Different Telnet servers have different kinds of prompts,
which makes it impossible to provide a universal solution.
Backend Pool

After both proxies were done and merged, we started the work on the backend pool. The concept is
to continuously run a set of virtual machines in the background and, as attackers connect, create a
connection to a chosen VM, which is then presented to the attacker via Cowrie's proxy. There are some
rules to attribute VMs, such as serving the same VM to the same IP - an attacker that reconnects sees
whichever side-effects they made in the system, and also timing out VMs eventually - we can't keep dozens
of VMs running after having been used (and possibly ruined!), so we throw them out after some time, and
if no-one reconnected.
Early on it was decided we were using Qemu, which enables a lot of different VMs (anything you could imagine).
Handling Qemu from Python is not easy, but we settled on libvirt to ease this,
as it provides a clean API and is well maintained.
The backend pool is composed of a server module (with a Twisted TCP server at the front, and a producer-consumer
to handle VM creation and deletion), and a client module to allow the proxy to talk with the pool. We made the
pool to be as decoupled as possible from the backend, so that users can choose to deploy the pool on a different
machine from that of "main" Cowrie (as is my case, with Cowrie living in a Raspberry Pi, where no VM would really launch).
As VM images we created a Ubuntu 18.04 one (let's say that's the traditional approach), and an OpenWRT one, to simulate
an ARM device, like a router (this was the IoT approach).
As of now the pool is working, but sometimes VMs stop working reliably and need to be destroyed. We
haven't found a cause for it, or a way to detect that behaviour automatically, so we implemented
a refresh mechanism that cleans all VMs periodically for now, even if they haven't been used.
Testing and Documentation

Lots of stuff were added to Cowrie, and we needed to test and document these features. I wrote a test
that allows us to compare output from a server, from output of the proxy (which in turn is connected to
that server). As a server we used Cowrie's shell backend (the part of Cowrie that already existed).
Next steps for testing involve testing the backend pool per se, which involves a lot of challenges
in and of itself that would extend beyond our current scope.
As documentation, and beyond the story of this report, we wrote three help/tutorial pages documenting the
features and configuration options of our components:

Guide on using the proxy
Backend pool guide
Analysing VM contents after an attack

Some Side Contributions

As side contributions I'd like to mention two clients to execute commands in remote machines, via
SSH and Telnet, that can be used as standalone components in other projects: ssh_exec.py and telnet_exec.py.
Final Remarks

I'd like to thank my mentor Michel for his input, ideas, and readiness to help
in any doubt (however minor) I had! Working for Cowrie during GSoC was a fun experience
where I gained a lot of knowledge, from network protocols (SSH wire protocol) and their
intricacies (Telnet is tricky) to hypervisors and virtual machines (Qemu was totally new
for me)! I also learned a new framework (Python Twisted) which, although twisted at times,
does make you think in a new way and is really a challenge.
Undoubtedly this prototype will allow us to better understand and prevent hacker attacks
against unprotected machines, by allowing attackers to do their thing with a "real" machine,
instead of being constrained by a subset of bash commands. All in all, this was a great
experience and I could not have possibly learned as much as I did without it.