Skip to content

Instantly share code, notes, and snippets.

@tariqmislam
Created March 22, 2012 15:58
Show Gist options
  • Star 22 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save tariqmislam/2159173 to your computer and use it in GitHub Desktop.
Save tariqmislam/2159173 to your computer and use it in GitHub Desktop.
Setting Up Hadoop 0.20.2 on Windows 7 With Cygwin
=================================================================
SETTING UP SSHD AS A SERVICE FOR RUNNING HADOOP DAEMONS ON WINDOWS 7
=================================================================
Steps:
1. Download 'setup.exe' from Cygwin website
2. Right-click on 'setup.exe'
3. Leave settings as they are, click through until you come to the plugin selection window
3.1 - Make sure that the installation directory is 'C:\cygwin'
4. In the plugin selection window, under 'Net', click on 'openssh' to install it
5. Click 'Next', then go do something productive while installation runs its course.
6. Once installed, go to Start -> All Programs -> Cygwin, right-click on the subsequent shortcut and select the option to 'Run as Administrator'
7. In your cygwin window, type in the following commands:
$ chmod +r /etc/passwd
$ chmod u+w /etc/passwd
$ chmod +r /etc/group
$ chmod u+w /etc/group
$ chmod 755 /var
$ touch /var/log/sshd.log
$ chmod 664 /var/log/sshd.log
This is needed in order to allow the sshd account to operate without a passphrase, which is required for Hadoop to work.
8. Once the prompt window opens up, type 'ssh-host-config' and hit Enter
9. Should privilege separation be used? NO
10. Name of service to install: sshd
11. Do you want to install sshd as a service? YES
12. Enter the value of CYGWIN for the daemon: <LEAVE BLANK, JUST HIT ENTER>
13. Do you want to use a different name? (default is 'cyg_server'): NO
14. Please enter the password for user 'cyg_server': <LEAVE BLANK, JUST HIT ENTER>
15. Reenter: <LEAVE BLANK, JUST HIT ENTER>
At this point the ssh service should be installed, to run under the 'cyg_server' account. Don't worry, this will all be handled under the hood.
To start the ssh service, type in 'net start sshd' in your cygwin window. When you log in next time, this will automatically run.
To test, type in 'ssh localhost' in your cygwin window. You should not be prompted for anything.
=================================================================
INSTALLING AND CONFIGURING HADOOP
=================================================================
This is assuming the installation of version 0.20.2 of Hadoop. Newer versions do not get along with Windows 7 (mainly, the tasktracker daemon which requires permissions to be set that are inherently not allowed by Windows 7, but are required by more recent versions of Hadoop e.g. 0.20.20x.x)
1. Download the stable version 0.20.2 of Hadoop
2. Using 7-Zip (you should download this if you have not already, and it should be your default archive browser), open up the archive file. Copy the top level directory from the archive file and paste it into your home directory in C:/cygwin. This is usually something like C:/cygwin/home/{username}
3. Once copied into your cygwin home directory, navigate to {hadoop-home}/conf. Open the following files for editing in your favorite editor (I strongly suggest Notepad++ ... why would you use anything else):
* core-site.xml
* hdfs-site.xml
* mapred-site.xml
* hadoop-env.sh
4. Make the following additions to the corresponding files:
* core-site.xml (inside the configuration tags)
<property>
<name>fs.default.name</name>
<value>localhost:9100</value>
</property>
* mapred-site.xml (inside the configuration tags)
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property>
* hdfs-site.xml (inside the configuration tags)
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
* hadoop-env.sh
* uncomment the JAVA_HOME export command, and set the path to your Java home (typically C:/Program Files/Java/{java-home}
5. In a cygwin window, inside your top-level hadoop directory, it's time to format your Hadoop file system. Type in 'bin/hadoop namenode -format' and hit enter. This will create and format the HDFS.
6. Now it is time to start all of the hadoop daemons that will simulate a distributed system, type in: 'bin/start-all.sh' and hit enter.
You should not receive any errors (there may be some messages about not being able to change to the home directory, but this is ok).
Double check that your HDFS and JobTracker is up and running properly by visiting http://localhost:50070 and http://localhost:50030, respectively.
To make sure everything is up and running properly, let's try a regex example.
7. From the top level hadoop directory, type in the following set of commands:
$ bin/hadoop dfs -mkdir input
$ bin/hadoop dfs -put conf input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ bin/hadoop dfs -cat output/*
This should display the output of the job (finding any word that matched the regex pattern above).
8. Assuming no errors, you are all set to set up your Eclipse environment.
FYI, you can stop all your daemons by typing in 'bin/stop-all.sh', but keep it running for now as we move on to the next step.
=================================================================
CONFIGURING AND USING THE HADOOP PLUGIN FOR ECLIPSE
=================================================================
1. Download Eclipse Indigo
2. Download the hadoop plugin jar located at: https://issues.apache.org/jira/browse/MAPREDUCE-1280
The file name is 'hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar'
Normally you could use the plugin provided in the 0.20.2 contrib folder that comes with Hadoop, however that plugin is out of date.
3. Copy the downloaded jar and paste it into your Eclipse plugins directory (e.g. C:/eclipse/plugins)
4. In a regular command prompt, navigate to your eclipse folder (e.g. 'cd C:/eclipse')
5. Type in 'eclipse -clean' and hit enter
6. Once Eclipse is open, open a new perspective (top right corner) and select 'Other'. From the list, select 'MapReduce'.
7. Go to Window -> Show View, and select Map/Reduce. This will open a view window for Map/Reduce Locations
8. Now you are ready to tie in Eclipse with your existing HDFS that you formatted and configured earlier. Right click in the Map/Reduce Locations view and select 'New Hadoop Location'
9. In the window that appears, type in 'localhost' for the Location name. Under Map/Reduce Master, type in localhost for the Host, and 9101 for the Port. For DFS Master, make sure the 'Use M/R Master host' checkbox is selected, and type in 9100 for the Port. For the User name, type in 'User'. Click 'Finish'
10. In the Project Explorer window on the left, you should now be able to expand the DFS Locations tree and see your new location. Continue to expand it and you should see something like the following file structure:
DFS Locations
-> (1)
-> tmp (1)
-> hadoop-{username} (1)
-> mapred (1)
-> system(1)
-> jobtracker.info
At this point you can create directories and files and upload them to the HDFS from Eclipse, or you can create them through the cygwin window as you did in step 7 in the previous section.
=================================================================
CREATING YOUR FIRST HADOOP PROJECT IN ECLIPSE
=================================================================
1. Open up the Java perspective
2. In the Project Explorer window, select New -> Project...
3. From the list that appears, select Map/Reduce Project
4. Provide a project name, and then click on the link that says 'Configure Hadoop install directory'
4.1 Browse to the top-level hadoop directory that is located in cygwin (e.g. C:\cygwin\home\{username}\{hadoop directory})
4.2 Click 'OK'
5. Click 'Finish'
6. You will notice that Hadoop projects in Eclipse are simple Java projects in terms of file structure. Now let's add a class.
7. Right click on the project, and selet New -> Other
8. From the Map/Reduce folder in the list, select 'MapReduce Driver'. This will generate a class for you.
At this point, you are all set to go, now it's time to learn all about MapReduce, which is outside the context of this documentation. Enjoy and have fun.
@loveneet29
Copy link

hi buddy..
its good nice tutorial
but can u tell me which version of eclipse ur using as iam trying same on my indigo its not able to create new hdfs ur point number 8 i.e.
8. Now you are ready to tie in Eclipse with your existing HDFS that you formatted and configured earlier. Right click in the Map/Reduce Locations view and select 'New Hadoop Location
it doen't openeing,
i have used helios 3.5.5 it work in tht but it show ing following errors
"Cannot connect to the Map/Reduce location: localhost
Call to localhost/127.0.0.1:9100 failed on local exception: java.io.EOFException"

can u help me out with this issue.
thanks in advance

@tariqmislam
Copy link
Author

Did you follow the directions in step #9 properly? Make sure that the port numbers are correct and are in the correct fields.

I am not sure why you are getting the EOFException

@uzairfarooq
Copy link

Thanks, your tutorial helped me a lot

@anbu2392
Copy link

i tried to setup ssh-config without using the pass phrase but it is not possible to setup without pass phrase..
please help

i pasted the code from my window

Anbu@Hydra /
$ chmod +r /etc/passwd

Anbu@Hydra /
$ chmod u+w /etc/passwd

Anbu@Hydra /
$ chmod +r /etc/group

Anbu@Hydra /
$ chmod u+w /etc/group

Anbu@Hydra /
$ chmod 755 /var

Anbu@Hydra /
$ touch /var/log/sshd.log

Anbu@Hydra /
$ chmod 664 /var/log/sshd.log

Anbu@Hydra /
$ ssh-host-config

*** Warning: Running this script typically requires administrator privileges!
*** Warning: However, it seems your account does not have these privileges.
*** Warning: Here's the list of groups in your user token:

None
Users
HomeUsers

*** Warning: This usually means you're running this script from a non-admin
*** Warning: desktop session, or in a non-elevated shell under UAC control.

*** Warning: Make sure you have the appropriate privileges right now,
*** Warning: otherwise parts of this script will probably fail!

*** Query: Are you sure you want to continue? (Say "no" if you're not sure
*** Query: you have the required privileges) (yes/no) yes

*** Query: Overwrite existing /etc/ssh_config file? (yes/no) yes
*** Info: Creating default /etc/ssh_config file
*** Query: Overwrite existing /etc/sshd_config file? (yes/no) yes
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/READ ME.privsep.
*** Query: Should privilege separation be used? (yes/no) no
*** Info: Updating /etc/sshd_config file

*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: []
*** Info: On Windows Server 2003, Windows Vista, and above, the
*** Info: SYSTEM account cannot setuid to other users -- a capability
*** Info: sshd requires. You need to have or to create a privileged
*** Info: account. This script will help you do so.

*** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,
*** Info: or later. On these systems, it's not possible to use the LocalSystem
*** Info: account for services that can change the user id without an
*** Info: explicit password (such as passwordless logins [e.g. public key
*** Info: authentication] via sshd).

*** Info: If you want to enable that functionality, it's required to create
*** Info: a new account with special privileges (unless a similar account
*** Info: already exists). This account is then used to run these special
*** Info: servers.

*** Info: Note that creating a new user requires that the current account
*** Info: have Administrator privileges itself.

*** Info: No privileged account could be found.

*** Info: This script plans to use 'cyg_server'.
*** Info: 'cyg_server' will only be used by registered services.
*** Query: Do you want to use a different name? (yes/no) no
*** Query: Create new privileged user account 'cyg_server'? (yes/no) yes
*** Info: Please enter a password for new user cyg_server. Please be sure
*** Info: that this password matches the password rules given on your system.
*** Info: Entering no password will exit the configuration.
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Reenter:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
[1]+ Stopped ssh-host-config

system details: cygwin 1.21
hadoop 1.04

@pranavkaushik9
Copy link

Hello. Before I explain the issue I am having, need to let you know that I am totally new to CYGWIN and stuff like this.
My objective of installing SSH using CYGWIN is to setup Hadoop on windows 7 x64 machine. I am trying to execute the steps given above. however I am not able to provide a blank password. Below is the log for same. Any help will be greatly appreciated.

$ chmod +r /etc/passwd

$ chmod u+w /etc/passwd

$ chmod +r /etc/group

$ chmod u+w /etc/group

$ chmod 755 /var

$ touch /var/log/sshd.log

$ chmod 664 /var/log/sshd.log

$ ssh-host-config

*** Query: Overwrite existing /etc/ssh_config file? (yes/no) yes
*** Info: Creating default /etc/ssh_config file
*** Query: Overwrite existing /etc/sshd_config file? (yes/no) yes
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/README.privsep.
*** Query: Should privilege separation be used? (yes/no) no
*** Info: Updating /etc/sshd_config file

*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: []
*** Info: On Windows Server 2003, Windows Vista, and above, the
*** Info: SYSTEM account cannot setuid to other users -- a capability
*** Info: sshd requires. You need to have or to create a privileged
*** Info: account. This script will help you do so.

*** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,
*** Info: or later. On these systems, it's not possible to use the LocalSystem
*** Info: account for services that can change the user id without an
*** Info: explicit password (such as passwordless logins [e.g. public key
*** Info: authentication] via sshd).

*** Info: If you want to enable that functionality, it's required to create
*** Info: a new account with special privileges (unless a similar account
*** Info: already exists). This account is then used to run these special
*** Info: servers.

*** Info: Note that creating a new user requires that the current account
*** Info: have Administrator privileges itself.

*** Info: No privileged account could be found.

*** Info: This script plans to use 'cyg_server'.
*** Info: 'cyg_server' will only be used by registered services.
*** Query: Do you want to use a different name? (yes/no) no
*** Query: Create new privileged user account 'cyg_server'? (yes/no) yes
*** Info: Please enter a password for new user cyg_server. Please be sure
*** Info: that this password matches the password rules given on your system.
*** Info: Entering no password will exit the configuration.
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:
*** Query: Please enter the password:

@mazda11
Copy link

mazda11 commented Feb 13, 2013

Where can I find Haddoop version 0.20.2? not available on any of the mirror sites listed in apache.org.

Thanks,
-Mazda

@mazda11
Copy link

mazda11 commented Feb 13, 2013

I did find this version in archive: http://archive.apache.org/dist/hadoop/core/hadoop-0.20.2/

Thanks,
-Mazda

@dkyr
Copy link

dkyr commented May 1, 2013

In step "Please enter the password for user 'cyg_server': <LEAVE BLANK, JUST HIT ENTER>", I press Enter but it keeps asking for a password.

@Ho3ein-Boka
Copy link

can u tell me which version of eclipse support Hadop ?!!

@mumfyness
Copy link

For those of you having a problem with getting past the 'cyg_server' entry with no password, you can just enter a lame password like 'cyg' and move on. After you have completed that step, you can Setup authorization keys using
ssh-keygen. Take a look at this other tutorial and it should resolve your issue. It did for me. Go here and fold this into your tasks from the above tutorial. Look down the page for "Setup authorization keys".
http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html
I am using Windows 7 and Hadoop-1.1.2 and hadoop-eclipse-plugin-1.1.2.jar. I am trying to use Eclipse:Indigo SR2 Build id:20120216-1857. It is not smooth yet and am having a ConnectionException in eclipse when I create the Map/Reduce location. Connection Error in call to localhost/127.0.0.1:9100. But it is actually moving well. I got my Cygwin setup nicely and both localhost:50070 and localhost:50030 are offering web pages.
Have fun, don't get stressed.

@shadab-shah
Copy link

I was following you correctly till this step : hadoop-env.sh

  • uncomment the JAVA_HOME export command, and set the path to your Java home (typically C:/Program Files/Java/{java-home}

Over here can you please elaborate what is Java_Home.

I search for it and it says that Java_Home is the path of your jre folder , I have installed java itself on C drive hence my jre path is something like this , C:/Java/jre1.7.

Still it is not working. Please help me with this.

@kkashyap1707
Copy link

Once the prompt window opens up, type 'ssh-host-config' and hit Enter it says -bash: ssh-host-config: command not found
Reply me the solution on kkashyap1707@gmail.com

@kkashyap1707
Copy link

Reply me ASAP........ with steps that how to install ssh on window XP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment