Learn about HTTP protocol sequences, in particular those HTTP sequences used to access information. Learn how HTTPS sequences and HTTPS offloading are used for content inspection.
- [Voiceover] The effectiveness of a web application tester is determined by his or her knowledge of the protocols used by the application being tested. The key protocol to learn about is HTTP. HTTP is a stateless client server protocol. In other words, each HTTP message sent stands on its own and has no implicit knowledge of any previous messages. We're pretty familiar with using a software application called a browser to access web application but HTTP messages can also be sent programmatically and from appliance firmware.
In fact, HTTP can even be hand-constructed and sent across a telnet connection. Assuming we're using a browser, we'll put in an address known as a URL, a Universal Resource Locator, which can be looked up in a directory, typically a domain name server, to find the internet address of the web server. The URL with either start with HTTP colon slash slash or HTTPS colon slash slash. By default an HTTP colon slash slash URL will establish a TCP connection on port 80. And HTTPS will establish one on port 443.
We can override this by placing a colon and then a port number after the main URL. It's not unusual to see a web server set up on a different port. The web browser will then create an HTTP message and send it to the web server. The web server will then reply as an HTTP response message. Let's take a look at a typical HTTP exchange using Wireshark. I recorded a browsing session in the HacMe Casino using WireShark. It's open and we can see at line 91 the start of the three-way handshake for the connection to port 3000 on ten dot nor dot two dot ten.
With a randomly assigned browser source port of five four four seven two I'll right click and select follow TCP stream to get a trace of this session. The first packet we see is an HTTP get request. It starts with the line GET slash HTTP slash one dot one. This instructs the server to send back the default startup page in the route directory of the web server. GET is one of eight standard request commands, or methods, defined in HTTP.
The GET command allows a user to download a web resource from the server. This will often be a webpage specified by a file name. Admitting the file name will cause the default first page to be returned. The HEAD request is used in the same way as a GET, but will download just the headeral messidator. This is commonly done to check the last modified date of the page to check against a local cashed copy. The POST command allows a user to upload items to the server. This is commonly used to send form data to the server and is the key message that's of interest to pen testers.
The TRACE command requests a diagnostic trace of the actions taken by the server. The OPTIONS command asks the server to return the list of request methods it supports. The CONNECT commands causes the proxy to connect to another host. This is often used to make an SSL connection through the proxy. The protocol also allows for the PUT command to send data to the server to be stored and the DELETE command to delete the data. These are often not used in production systems for security reasons. Back to WireShark and the next line indicates the host name of the web server and is a repeat of what we entered into the browser, host ten dot nor dot two dot ten 3000.
Following this we see the user agent which has been used to send the GET request. In our case, this is a Mozilla Gecko Firefox browser running on Linux. The browser then indicates what it connects as a response, the format of the message, the language, and the encoding options. The final line says connection, keep alive. This enables the connection to be kept open for multiple response request pairs. But at the content level it's still stateless unless we have a way of maintaining state, usually a cookie.
Keep alive is the default in HTTP one point one, but in this case it's being explicitly stated. The web server returns an HTTP response, which starts with the line HTTP slash one dot one 200 O K. There are six common status codes. 200 indicates that the request was processed without any problems. 400 is a bad request. 401 is unauthorized. 403 means access is forbidden and 404 an unknown page.
500 is an internal server error. The next line indicates that no cashing is being used. This is an optional feature for a web server, which can be used to improve performance. As for the request, the response includes a connection line which indicates the connection should be kept alive allowing another request to be sent by the user. The content type is shown as text HTML, which indicates that readable HTML data follows. The server line provides information on the server's configuration. In this case telling us that the HTTP server is built using Webbrix and Ruby and showing their versions.
The next line in the header is content length and indicates the length of the following HTML code. We then have a line starting with set cookie, which provides the session ID. This is being used to provide session information for the web application. By using the cookie I'll maintain authenticated access into the web application throughout the session. Following this is a series of requests and responses, which access the various resources from the server, the style sheets, images, scripts, and so on.
In this we can see a request header line for the referrer. Referrer logging is used to allow websites and web servers to identify where people are visiting them from for marketing or statistical purposes. Using the referrer field can be a privacy issue and most web browsers will admit this, particularly those using HTTPS. We can see the user login request and we can see that the credentials are being sent in plain text. This might be a problem as it will enable anyone intercepting the traffic to log into our account.
We can see below this a GET request to obtain our John Doe picture. Given we don't have one loaded we see the server returns an error code 404 to indicate the resource does not exist. While websites often just use a few there are a lot of header fields that are being defined in the HTTP standard. The webpage shown here is an excellent quick reference for all the HTTP header fields that you might come across during testing.
Note: The topics in this course will prepare you for key objectives on the Certified Ethical Hacker exam. Find an overview of the certification and the exam handbook at https://www.eccouncil.org/programs/certified-ethical-hacker-ceh/.
- Dissecting HTTP/HTTPS protocol
- Working with WebSockets
- Understanding cookies
- Installing testing tools such as Hacme Casino and the Vega Scanner
- Running web application tests
- Practicing your skills