cURL, which stands for "Client URL", is a command line tool that can make requests to servers, just like browsers can. You may have been using cURL in order to test your web server implementation.
If you've never played around with cURL, open up a terminal window and type in curl -D - www.google.com
. When that command gets executed, you'll see that you get back an HTTP response with a whole bunch of HTML in the body. You just requested Google's home page, but since cURL is just a command line tool, it isn't capable of taking the HTML in the response and rendering it.
For this sprint challenge, you'll be implementing a barebones client that will run from the command line. In other words, a stripped down version of cURL that can only make GET requests. Your MVP implementation will need to be able to accept a URL as input, make a GET request, receive the response and print it all to stdout
.
As part of the web server sprint, you had to construct the server's response to a given request, and these took the form:
HTTP/1.1 200 OK
Date: Wed Dec 20 13:05:11 PST 2017
Connection: close
Content-Length: 41749
Content-Type: text/html
<!DOCTYPE html><html><head><title>Lambda School ...
Now that we're implementing a client, our client will instead need to construct requests.
For this sprint challenge, all your code should be implemented in the client.c
file. Whenever you update your code, rerun make
in order to compile a new executable. The steps that your client will need to execute are the following:
- Parse the input URL.
- Your client should be able to handle URLs such as
localhost:3490/d20
andwww.google.com:80/
. Input URLs need to be broken down intohostname
,port
, andpath
. Thehostname
is everything before the colon (but doesn't includehttp://
orhttps://
if either are present), theport
is the number after the colon ending at the backslash, and thepath
is everything after the backslash. - Implement the
parse_url()
function, which receives the input URL and tokenizes it intohostname
,port
, andpath
strings. Assign each of these to the appropriate field in theurlinfo_t
struct and return it from theparse_url()
function. - You can use the
strchr
function to look for specific characters in a string. You can also use thestrstr
function to look for specific substrings in a string.
- Your client should be able to handle URLs such as
- Construct the HTTP request.
- Just like in the web server, use
sprintf
in order to construct the request from thehostname
,port
, andpath
. Requests should look like the following:
The connection should be closed, otherwise some servers will simply hang and not return a response, since they're expecting more data from our client.GET /path HTTP/1.1 Host: hostname:port Connection: close
- Just like in the web server, use
- Connect to the server.
- All of the networking logic that you'll need to connect to an arbitrary server is provided in the
lib.h
andlib.c
files. All you have to do call theget_socket()
function in order to get a socket that you can then send and receive data from using thesend
andrecv
system calls. - Make sure that your web server implementation (built during project days 1 & 2 from Web Server I) is running in another ternimal window when testing local requests.
- All of the networking logic that you'll need to connect to an arbitrary server is provided in the
- Send the request string down the socket.
- Hopefully that's pretty self-explanatory.
- Receive the response from the server and print it to
stdout
.- The main hurdle that needs to be overcome when receiving data from a server is that we have no idea how large of a response we're going to get back. So to overcome this, we'll just keep calling
recv
, which will return back data from the server up to a maximum specified byte length on each iteration. We'll just continue doing this in a loop untilrecv
returns back no more data from the server:
while ((numbytes = recv(sockfd, buf, BUFSIZE - 1, 0)) > 0) { // print the data we got back to stdout }
- The main hurdle that needs to be overcome when receiving data from a server is that we have no idea how large of a response we're going to get back. So to overcome this, we'll just keep calling
- Clean up.
- Don't forget to
free
any allocated memory andclose
any open file descriptors.
- Don't forget to
Your cURL client will receive a 2 when it satisfies the following:
- Your client can successfully request any resource that your web server implementation (built during project days 1 & 2 from Web Server I) is capable of serving, i.e., it can successfully execute
./client localhost:3490/d20
,./client localhost:3490/index.html
, and any other URL that your web server implementation is capable of serving up. Don't forget to start up your web server implementation in another terminal window. Your client should print out the correct response tostdout
, something like:
HTTP/1.1 200 OK
Date: Tue Oct 2 11:41:43 2018
Connection: close
Content-Length: 3
Content-Type: text/plain
17
- Your client can successfully make a request to a non-local host, such as Google, Facebook, Reddit, etc. It doesn't necessarily need to successfully get back the HTML contents of the page, but your client should receive back a header response with some sort of HTTP status code and other metadata. For example, executing
./client www.google.com:80/
should return back a 200 status code with all the HTML that makes up Google's homepage and print it all tostdout
. The response header will look something like this:
HTTP/1.1 200 OK
Date: Tue, 02 Oct 2018 18:44:13 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2018-10-02-18; expires=Thu, 01-Nov-2018 18:44:13 GMT; path=/; domain=.google.com
Set-Cookie: NID=140=xQnQZhdVuKxdbMlSwuwPo-3Ii375x3h2c936Kcyk_JA8HAZTunEFW2L5F93UcSqDI-JtnHgl3r_qwZVxyJMFvMKYDKYZf4ab25QjziB5iFRNuNpjDEPKa8bn7ICeWNsH; expires=Wed, 03-Apr-2019 18:44:13 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Connection: close
In order to earn a score of 3, complete at least one of the following stretch goals:
- Make the URL parsing logic more robust.
- The specified URL parsing logic is really brittle. The most glaring hole is the fact that oftentimes, URLs don't actually include the port number. In such cases, clients just assume a default port number of 80. Improve the URL parsing logic such that it can handle being passed a URL without a port number, such as
www.google.com/
. - Also improve the parsing logic so that it can receive URLs prepended with
http://
orhttps://
. Such URLs should not be treated any differently by the client, you'll just need to strip them off the input URL so that they don't become part of the hostname.
- The specified URL parsing logic is really brittle. The most glaring hole is the fact that oftentimes, URLs don't actually include the port number. In such cases, clients just assume a default port number of 80. Improve the URL parsing logic such that it can handle being passed a URL without a port number, such as
- Implement the ability for the client to follow redirects.
- If you execute
./client google.com:80/
, you'll get back a response with a301 Moved Permanently
status. There's aLocation
field in the header as well as ahref
tag in the body specifying where the client needs to be redirected. Augment your client such that when it encounters a 301 status, it will automatically follow the redirect link and issue another request for the correct location.
- If you execute
- Don't have the client print out the header.
- Let's make the printing of the header of the response optional. Implement functionality such that the client can accept a
-h
flag, and only when this flag is present do we print the response header as well. Otherwise, when printing a response, your client should just print the body of the response to stdout.
- Let's make the printing of the header of the response optional. Implement functionality such that the client can accept a