Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Python script to connect to Tor via Stem and Privoxy, requesting a new connection (hence a new IP as well) as desired.

Crawling Anonymously with Tor in Python

adapted from the article "Crawling anonymously with Tor in Python" by S. Acharya, Nov 2, 2013.

The most common use-case is to be able to hide one's identity using TOR or being able to change identities programmatically, for example when you are crawling a website like Google and you don’t want to be rate-limited or blocked via IP address.

Tor

Install Tor.

sudo apt-get update
sudo apt-get install tor
sudo /etc/init.d/tor restart

Notice that the socks listener is on port 9050.

Next, do the following:

  • Enable the ControlPort listener for Tor to listen on port 9051, as this is the port to which Tor will listen for any communication from applications talking to the Tor controller.
  • Hash a new password that prevents random access to the port by outside agents.
  • Implement cookie authentication as well.

You can create a hashed password out of your password using:

tor --hash-password my_password

Then, update the /etc/tor/torrc with the port, hashed password, and cookie authentication.

sudo gedit /etc/tor/torrc
ControlPort 9051
# hashed password below is obtained via `tor --hash-password my_password`
HashedControlPassword 16:E600ADC1B52C80BB6022A0E999A7734571A451EB6AE50FED489B72E3DF
CookieAuthentication 1

Restart Tor again to the configuration changes are applied.

sudo /etc/init.d/tor restart

python-stem

Next, install python-stem which is a Python-based module used to interact with the Tor Controller, letting us send and receive commands to and from the Tor Control port programmatically.

sudo apt-get install python-stem

privoxy

Tor itself is not a http proxy. So in order to get access to the Tor Network, use privoxy as an http-proxy though socks5.

Install privoxy via the following command:

sudo apt-get install privoxy

Now, tell privoxy to use TOR by routing all traffic through the SOCKS servers at localhost port 9050.

sudo gedit /etc/privoxy/config

and enable forward-socks5 as follows:

forward-socks5 / localhost:9050

Restart privoxy after making the change to the configuration file.

sudo /etc/init.d/privoxy restart

##Python Script##

In the script below, urllib2 is using the proxy. privoxy listens on port 8118 by default, and forwards the traffic to port 9050 upon which the Tor socks is listening.

Additionally, in the renew_connection() function, a signal is being sent to the Tor controller to change the identity, so you get new identities without restarting Tor. Doing such comes in handy when crawling a web site and one doesn’t wanted to be blocked based on IP address.

PyTorStemPrivoxy.py

'''
Python script to connect to Tor via Stem and Privoxy, requesting a new connection (hence a new IP as well) as desired.
'''

import stem
import stem.connection

import time
import urllib2

from stem import Signal
from stem.control import Controller

# initialize some HTTP headers
# for later usage in URL requests
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
headers={'User-Agent':user_agent}

# initialize some
# holding variables
oldIP = "0.0.0.0"
newIP = "0.0.0.0"

# how many IP addresses
# through which to iterate?
nbrOfIpAddresses = 3

# seconds between
# IP address checks
secondsBetweenChecks = 2

# request a URL 
def request(url):
    # communicate with TOR via a local proxy (privoxy)
    def _set_urlproxy():
        proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
        opener = urllib2.build_opener(proxy_support)
        urllib2.install_opener(opener)

    # request a URL
    # via the proxy
    _set_urlproxy()
    request=urllib2.Request(url, None, headers)
    return urllib2.urlopen(request).read()

# signal TOR for a new connection 
def renew_connection():
    with Controller.from_port(port = 9051) as controller:
        controller.authenticate(password = 'my_password')
        controller.signal(Signal.NEWNYM)
        controller.close()

# cycle through
# the specified number
# of IP addresses via TOR 
for i in range(0, nbrOfIpAddresses):

    # if it's the first pass
    if newIP == "0.0.0.0":
        # renew the TOR connection
        renew_connection()
        # obtain the "new" IP address
        newIP = request("http://icanhazip.com/")
    # otherwise
    else:
        # remember the
        # "new" IP address
        # as the "old" IP address
        oldIP = newIP
        # refresh the TOR connection
        renew_connection()
        # obtain the "new" IP address
        newIP = request("http://icanhazip.com/")

    # zero the 
    # elapsed seconds    
    seconds = 0

    # loop until the "new" IP address
    # is different than the "old" IP address,
    # as it may take the TOR network some
    # time to effect a different IP address
    while oldIP == newIP:
        # sleep this thread
        # for the specified duration
        time.sleep(secondsBetweenChecks)
        # track the elapsed seconds
        seconds += secondsBetweenChecks
        # obtain the current IP address
        newIP = request("http://icanhazip.com/")
        # signal that the program is still awaiting a different IP address
        print ("%d seconds elapsed awaiting a different IP address." % seconds)
    # output the
    # new IP address
    print ("")
    print ("newIP: %s" % newIP)

Execute the Python 2.7 script above via the following command:

python PyTorStemPrivoxy.py

When the above script is executed, one should see that the IP address is changing every few seconds.

Adaptations to the original article

  • tweaks of grammar.
  • the use of python-stem instead of pytorctl.
  • a slight difference of settings within the /etc/tor/torrc file.
  • the use of a different hashed password for the Tor controller, in this case my_password.
  • some modifications in the sample program to accommodate the use of python-stem, cleaner logic, and more comprehensive commentary.
'''
Python script to connect to Tor via Stem and Privoxy, requesting a new connection (hence a new IP as well) as desired.
'''
import stem
import stem.connection
import time
import urllib2
from stem import Signal
from stem.control import Controller
# initialize some HTTP headers
# for later usage in URL requests
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
headers={'User-Agent':user_agent}
# initialize some
# holding variables
oldIP = "0.0.0.0"
newIP = "0.0.0.0"
# how many IP addresses
# through which to iterate?
nbrOfIpAddresses = 3
# seconds between
# IP address checks
secondsBetweenChecks = 2
# request a URL
def request(url):
# communicate with TOR via a local proxy (privoxy)
def _set_urlproxy():
proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
# request a URL
# via the proxy
_set_urlproxy()
request=urllib2.Request(url, None, headers)
return urllib2.urlopen(request).read()
# signal TOR for a new connection
def renew_connection():
with Controller.from_port(port = 9051) as controller:
controller.authenticate(password = 'my_password')
controller.signal(Signal.NEWNYM)
controller.close()
# cycle through
# the specified number
# of IP addresses via TOR
for i in range(0, nbrOfIpAddresses):
# if it's the first pass
if newIP == "0.0.0.0":
# renew the TOR connection
renew_connection()
# obtain the "new" IP address
newIP = request("http://icanhazip.com/")
# otherwise
else:
# remember the
# "new" IP address
# as the "old" IP address
oldIP = newIP
# refresh the TOR connection
renew_connection()
# obtain the "new" IP address
newIP = request("http://icanhazip.com/")
# zero the
# elapsed seconds
seconds = 0
# loop until the "new" IP address
# is different than the "old" IP address,
# as it may take the TOR network some
# time to effect a different IP address
while oldIP == newIP:
# sleep this thread
# for the specified duration
time.sleep(secondsBetweenChecks)
# track the elapsed seconds
seconds += secondsBetweenChecks
# obtain the current IP address
newIP = request("http://icanhazip.com/")
# signal that the program is still awaiting a different IP address
print ("%d seconds elapsed awaiting a different IP address." % seconds)
# output the
# new IP address
print ("")
print ("newIP: %s" % newIP)
@niharsawant

This comment has been minimized.

Copy link

commented Feb 23, 2016

Is this snippet still working because I'm facing some issues regarding Connection Refused.

@sonpython

This comment has been minimized.

@testingflexsin

This comment has been minimized.

Copy link

commented Jun 19, 2017

Hello Sir,

I am getting error "stem.connection.UnreadableCookieFile: Authentication failed: '/var/run/tor/control.authcookie' doesn't exist".

Regards,
Pavan Sharma

@rohitjnv2

This comment has been minimized.

Copy link

commented Jun 21, 2017

Remove hashtag '#' from CookieAuthentication 1

open torrc
sudo gedit /etc/tor/torrc

search ControlPort

Make it look like this:
ControlPort 9051
HashedControlPassword .Your hashed password.
CookieAuthentication 1

@mark-alfonso

This comment has been minimized.

Copy link

commented Jul 13, 2017

following the guide, everything checks out but still can't get new IP. help?

image

EDIT:
so I was able to make it work by:
changing
proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
to
proxy_support = urllib2.ProxyHandler({"https" : "https://127.0.0.1:8118"})
reference: https://stackoverflow.com/questions/23220494/tor-doesnt-work-with-urllib2

then in renew_connection, I added:
controller.signal(Signal.HUP)
-- this is to signal tell privoxy that the IP was forced to change

@zeffon

This comment has been minimized.

Copy link

commented Aug 21, 2017

Hi Sir,
I use these code in my scrapy project, but there is a error and I don't know how to solve it.(mac os)
scrapy.core.downloader.handlers.http11.TunnelError: Could not open CONNECT tunnel with proxy 127.0.0.1:8118 [{'status': 503, 'reason': b'Forwarding failure'}]

@dythe

This comment has been minimized.

Copy link

commented Sep 10, 2017

Encountering the same issue as mark-alfonso but i was unable to solve it with his steps even after changing those that e mentioned.

@ttpro1995

This comment has been minimized.

Copy link

commented Oct 18, 2017

Add a dot after
forward-socks5 / localhost:9050
=>
forward-socks5 / localhost:9050 . #dot is important at the end

Refers to https://stackoverflow.com/questions/9887505/how-to-change-tor-identity-in-python

@shiyang1983

This comment has been minimized.

Copy link

commented Nov 30, 2017

I tried all the mentioned method. But I still get the following "Refuse Connection" error:

File "PyTorStemPrivoxy.py", line 61, in
renew_connection()
File "PyTorStemPrivoxy.py", line 48, in renew_connection
with Controller.from_port(port = 9050) as controller:
File "/usr/lib/python2.7/dist-packages/stem/control.py", line 915, in from_port
control_port = stem.socket.ControlPort(address, port)
File "/usr/lib/python2.7/dist-packages/stem/socket.py", line 368, in init
self.connect()
File "/usr/lib/python2.7/dist-packages/stem/socket.py", line 239, in connect
self._socket = self._make_socket()
File "/usr/lib/python2.7/dist-packages/stem/socket.py", line 397, in _make_socket
raise stem.SocketError(exc)
stem.SocketError: [Errno 111] Connection refused

@shiyang1983

This comment has been minimized.

Copy link

commented Nov 30, 2017

I can't find tor running. Using sudo netstat -anp | grep LISTEN

@alejandrohdo

This comment has been minimized.

Copy link

commented May 3, 2018

I tried it in python 3.6, and it does not work, some scope about it? Thanks

@tigefa4u

This comment has been minimized.

Copy link

commented May 9, 2018

use tor as proxy in simple setup 😍

https://www.marcus-povey.co.uk/2016/03/24/using-tor-as-a-http-proxy/

from

export http_proxy="http://127.0.0.1:9050"
export https_proxy="http://127.0.0.1:9050"

live

export http_proxy="http://127.0.0.1:8123"
export https_proxy="http://127.0.0.1:8123"
@JoeUnsung

This comment has been minimized.

Copy link

commented Apr 3, 2019

I tried it in python 3.6, and it does not work, some scope about it? Thanks

you should change "import urllib2" into, "import urllib.request as urllib2".
because python3 no longer provide urllib2 and they split its part into request and error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.