Skip to content

Instantly share code, notes, and snippets.

@tejasri-v
Last active January 11, 2022 16:53
Show Gist options
  • Save tejasri-v/bd150250c5c31d37f6392740040f52be to your computer and use it in GitHub Desktop.
Save tejasri-v/bd150250c5c31d37f6392740040f52be to your computer and use it in GitHub Desktop.
Lab6-Part2
Part 2: Web page downloader
---------------------------
There is a very useful program called "wget". It's a command line
tool that you can use to download a web page like this:
wget http://www.gnu.org/software/make/manual/make.html
which will download the make manual page, make.html, and save it in
the current directory. wget can do much more (downloading a whole web
site, for example); see man wget for more info.
Your job is to write a limited version of wget, which we will call
http-client.c, that can download a single file. You use it like this:
./http-client www.gnu.org 80 /software/make/manual/make.html
So you give the components of the URL separately in the command line:
the host, the port number, and the file path. The program will
download the given file and save it in the current directory. So
in the case above, it should produce make.html in the current
directory. It should overwrite an existing file.
Hints:
- The program should open a socket connection to the host and port
number specified in the command line, and then request the given
file using HTTP 1.0 protocol. (See
http://www.jmarshall.com/easy/http/ for HTTP 1.0 protocol.) An HTTP
GET request looks like this:
GET /path/file.html HTTP/1.0
[zero or more headers ...]
[a blank line]
- Include the following header in your request:
Host: the.host.name.you.are.connecting.to:<port_number>
Some web sites require it.
- Use "\r\n" rather than "\n" as newline when you send your request.
It's required by the HTTP protocol.
- Then the program reads the response from the web server which looks
like this:
HTTP/1.0 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354
<html>
<body>
<h1>Happy New Millennium!</h1>
(more file contents)
.
.
.
</body>
</html>
Just like in part 1, you can use fdopen() to wrap the socket with a
FILE*, which will make reading the lines much easier.
- The "200" in the 1st line indicates that the request was successful.
If it's not 200, the program should print the 1st line and exit.
- After the 1st line, a bunch of headers will come, then comes a blank
line, and then the actual file content starts. Your program should
skip over all headers and just receive the file content.
- Note that the program should be able to download any type of file, not
just HTML files.
- The server will terminate the socket connection when it's done
sending the file.
- You will need to pick out the file name part of a file path
(make.html from /software/make/manual/make.html for example). Check
out strrchr().
- You will need to convert a host name into an IP address. Here is one way
to convert a host name into an IP address in dotted-quad notation:
struct hostent *he;
char *serverName = argv[1];
// get server ip from server name
if ((he = gethostbyname(serverName)) == NULL) {
die("gethostbyname failed");
}
char *serverIP = inet_ntoa(*(struct in_addr *)he->h_addr);
The man pages of the functions will tell you which header files need to be
included.
--
Good luck!
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <arpa/inet.h>
static void die(const char *s) { perror(s); exit(1); }
int main(int argc, char **argv)
{
if (argc != 4){
fprintf(stderr, "usage: %s <host> <port> <file-path>\n", argv[0]);
exit(1);
}
struct hostent *he;
char *serverName = argv[1];
if ((he = gethostbyname(serverName)) == NULL)
die("gethostbyname failed");
char *serverIP = inet_ntoa(*(struct in_addr *)he->h_addr);
unsigned short port = atoi(argv[2]);
int sock;
if ((sock = socket(AF_INET, SOCK_STREAM, 0)) < 0)
die("socket failed");
struct sockaddr_in servaddr;
memset(&servaddr, 0, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = inet_addr(serverIP);
servaddr.sin_port = htons(port);
if (connect(sock, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0)
die("connect failed");
char buf[1000];
snprintf(buf, 1000, "GET %s HTTP/1.0\r\n", argv[3]);
size_t len = strlen(buf);
if(send(sock, buf, len, 0) != len)
die("send failed");
snprintf(buf, 1000, "Host: %s:%s\r\n", argv[1], argv[2]);
len = strlen(buf);
if(send(sock, buf, len, 0) != len)
die("send failed");
snprintf(buf, 1000, "\r\n");
len = strlen(buf);
if(send(sock, buf, len, 0) != len)
die("send failed"); //not sure that this is the right way to print everything ->is it cheating to send three times
FILE *response = fdopen(sock, "r");
char a[1000];
fgets(a, 1000, response);
// printf("%s", a);
if(a[9] == '2' && a[10] == '0' && a[11] == '0'){
while(a[strlen(a)-1] == '\n' && a[strlen(a)-2] == '\r' && strlen(a) > 2){
fgets(a, 1000, response);
}
// fgets(a, 1000, response);
// if(strlen(a) != 2)
// printf("not past the header");
// fgets(a, 1000, response);
// printf("%s", a);
FILE *fp;
char *name;
name = strrchr(argv[3], '/');
fp = fopen(name+1, "w");
if(fp == NULL)
die("fopen failed");
char b[1000];
while(fread(b, 1, 1, response) == 1){
fwrite(b, 1, 1, fp);
}
}
else{
printf("%s", a);
exit(1);
}
}
CC = gcc
CXX = g++
CFLAGS = -g -Wall $(INCLUDES)
CXXFLAGS = -g -Wall $(INCLUDES)
.PHONY: default
default: http-client
.PHONY: clean
clean:
rm -f *.o *~ a.out core http-client
.PHONY: all
all: clean default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment