113498266Sopenharmony_ci# The Art Of Scripting HTTP Requests Using Curl
213498266Sopenharmony_ci
313498266Sopenharmony_ci## Background
413498266Sopenharmony_ci
513498266Sopenharmony_ci This document assumes that you are familiar with HTML and general networking.
613498266Sopenharmony_ci
713498266Sopenharmony_ci The increasing amount of applications moving to the web has made "HTTP
813498266Sopenharmony_ci Scripting" more frequently requested and wanted. To be able to automatically
913498266Sopenharmony_ci extract information from the web, to fake users, to post or upload data to
1013498266Sopenharmony_ci web servers are all important tasks today.
1113498266Sopenharmony_ci
1213498266Sopenharmony_ci Curl is a command line tool for doing all sorts of URL manipulations and
1313498266Sopenharmony_ci transfers, but this particular document will focus on how to use it when
1413498266Sopenharmony_ci doing HTTP requests for fun and profit. This documents assumes that you know
1513498266Sopenharmony_ci how to invoke `curl --help` or `curl --manual` to get basic information about
1613498266Sopenharmony_ci it.
1713498266Sopenharmony_ci
1813498266Sopenharmony_ci Curl is not written to do everything for you. It makes the requests, it gets
1913498266Sopenharmony_ci the data, it sends data and it retrieves the information. You probably need
2013498266Sopenharmony_ci to glue everything together using some kind of script language or repeated
2113498266Sopenharmony_ci manual invokes.
2213498266Sopenharmony_ci
2313498266Sopenharmony_ci## The HTTP Protocol
2413498266Sopenharmony_ci
2513498266Sopenharmony_ci HTTP is the protocol used to fetch data from web servers. It is a simple
2613498266Sopenharmony_ci protocol that is built upon TCP/IP. The protocol also allows information to
2713498266Sopenharmony_ci get sent to the server from the client using a few different methods, as will
2813498266Sopenharmony_ci be shown here.
2913498266Sopenharmony_ci
3013498266Sopenharmony_ci HTTP is plain ASCII text lines being sent by the client to a server to
3113498266Sopenharmony_ci request a particular action, and then the server replies a few text lines
3213498266Sopenharmony_ci before the actual requested content is sent to the client.
3313498266Sopenharmony_ci
3413498266Sopenharmony_ci The client, curl, sends an HTTP request. The request contains a method (like
3513498266Sopenharmony_ci GET, POST, HEAD etc), a number of request headers and sometimes a request
3613498266Sopenharmony_ci body. The HTTP server responds with a status line (indicating if things went
3713498266Sopenharmony_ci well), response headers and most often also a response body. The "body" part
3813498266Sopenharmony_ci is the plain data you requested, like the actual HTML or the image etc.
3913498266Sopenharmony_ci
4013498266Sopenharmony_ci## See the Protocol
4113498266Sopenharmony_ci
4213498266Sopenharmony_ci Using curl's option [`--verbose`](https://curl.se/docs/manpage.html#-v)
4313498266Sopenharmony_ci (`-v` as a short option) will display what kind of commands curl sends to the
4413498266Sopenharmony_ci server, as well as a few other informational texts.
4513498266Sopenharmony_ci
4613498266Sopenharmony_ci `--verbose` is the single most useful option when it comes to debug or even
4713498266Sopenharmony_ci understand the curl<->server interaction.
4813498266Sopenharmony_ci
4913498266Sopenharmony_ci Sometimes even `--verbose` is not enough. Then
5013498266Sopenharmony_ci [`--trace`](https://curl.se/docs/manpage.html#-trace) and
5113498266Sopenharmony_ci [`--trace-ascii`](https://curl.se/docs/manpage.html#--trace-ascii)
5213498266Sopenharmony_ci offer even more details as they show **everything** curl sends and
5313498266Sopenharmony_ci receives. Use it like this:
5413498266Sopenharmony_ci
5513498266Sopenharmony_ci    curl --trace-ascii debugdump.txt http://www.example.com/
5613498266Sopenharmony_ci
5713498266Sopenharmony_ci## See the Timing
5813498266Sopenharmony_ci
5913498266Sopenharmony_ci Many times you may wonder what exactly is taking all the time, or you just
6013498266Sopenharmony_ci want to know the amount of milliseconds between two points in a transfer. For
6113498266Sopenharmony_ci those, and other similar situations, the
6213498266Sopenharmony_ci [`--trace-time`](https://curl.se/docs/manpage.html#--trace-time) option
6313498266Sopenharmony_ci is what you need. It will prepend the time to each trace output line:
6413498266Sopenharmony_ci
6513498266Sopenharmony_ci    curl --trace-ascii d.txt --trace-time http://example.com/
6613498266Sopenharmony_ci
6713498266Sopenharmony_ci## See which Transfer
6813498266Sopenharmony_ci
6913498266Sopenharmony_ci When doing parallel transfers, it is relevant to see which transfer is
7013498266Sopenharmony_ci doing what. When response headers are received (and logged) you need to
7113498266Sopenharmony_ci know which transfer these are for.
7213498266Sopenharmony_ci [`--trace-ids`](https://curl.se/docs/manpage.html#--trace-ids) option
7313498266Sopenharmony_ci is what you need. It will prepend the transfer and connection identifier
7413498266Sopenharmony_ci to each trace output line:
7513498266Sopenharmony_ci
7613498266Sopenharmony_ci    curl --trace-ascii d.txt --trace-ids http://example.com/
7713498266Sopenharmony_ci
7813498266Sopenharmony_ci## See the Response
7913498266Sopenharmony_ci
8013498266Sopenharmony_ci By default curl sends the response to stdout. You need to redirect it
8113498266Sopenharmony_ci somewhere to avoid that, most often that is done with `-o` or `-O`.
8213498266Sopenharmony_ci
8313498266Sopenharmony_ci# URL
8413498266Sopenharmony_ci
8513498266Sopenharmony_ci## Spec
8613498266Sopenharmony_ci
8713498266Sopenharmony_ci The Uniform Resource Locator format is how you specify the address of a
8813498266Sopenharmony_ci particular resource on the Internet. You know these, you have seen URLs like
8913498266Sopenharmony_ci https://curl.se or https://example.com a million times. RFC 3986 is the
9013498266Sopenharmony_ci canonical spec. The formal name is not URL, it is **URI**.
9113498266Sopenharmony_ci
9213498266Sopenharmony_ci## Host
9313498266Sopenharmony_ci
9413498266Sopenharmony_ci The hostname is usually resolved using DNS or your /etc/hosts file to an IP
9513498266Sopenharmony_ci address and that is what curl will communicate with. Alternatively you specify
9613498266Sopenharmony_ci the IP address directly in the URL instead of a name.
9713498266Sopenharmony_ci
9813498266Sopenharmony_ci For development and other trying out situations, you can point to a different
9913498266Sopenharmony_ci IP address for a hostname than what would otherwise be used, by using curl's
10013498266Sopenharmony_ci [`--resolve`](https://curl.se/docs/manpage.html#--resolve) option:
10113498266Sopenharmony_ci
10213498266Sopenharmony_ci    curl --resolve www.example.org:80:127.0.0.1 http://www.example.org/
10313498266Sopenharmony_ci
10413498266Sopenharmony_ci## Port number
10513498266Sopenharmony_ci
10613498266Sopenharmony_ci Each protocol curl supports operates on a default port number, be it over TCP
10713498266Sopenharmony_ci or in some cases UDP. Normally you do not have to take that into
10813498266Sopenharmony_ci consideration, but at times you run test servers on other ports or
10913498266Sopenharmony_ci similar. Then you can specify the port number in the URL with a colon and a
11013498266Sopenharmony_ci number immediately following the hostname. Like when doing HTTP to port
11113498266Sopenharmony_ci 1234:
11213498266Sopenharmony_ci
11313498266Sopenharmony_ci    curl http://www.example.org:1234/
11413498266Sopenharmony_ci
11513498266Sopenharmony_ci The port number you specify in the URL is the number that the server uses to
11613498266Sopenharmony_ci offer its services. Sometimes you may use a proxy, and then you may
11713498266Sopenharmony_ci need to specify that proxy's port number separately from what curl needs to
11813498266Sopenharmony_ci connect to the server. Like when using an HTTP proxy on port 4321:
11913498266Sopenharmony_ci
12013498266Sopenharmony_ci    curl --proxy http://proxy.example.org:4321 http://remote.example.org/
12113498266Sopenharmony_ci
12213498266Sopenharmony_ci## User name and password
12313498266Sopenharmony_ci
12413498266Sopenharmony_ci Some services are setup to require HTTP authentication and then you need to
12513498266Sopenharmony_ci provide name and password which is then transferred to the remote site in
12613498266Sopenharmony_ci various ways depending on the exact authentication protocol used.
12713498266Sopenharmony_ci
12813498266Sopenharmony_ci You can opt to either insert the user and password in the URL or you can
12913498266Sopenharmony_ci provide them separately:
13013498266Sopenharmony_ci
13113498266Sopenharmony_ci    curl http://user:password@example.org/
13213498266Sopenharmony_ci
13313498266Sopenharmony_ci or
13413498266Sopenharmony_ci
13513498266Sopenharmony_ci    curl -u user:password http://example.org/
13613498266Sopenharmony_ci
13713498266Sopenharmony_ci You need to pay attention that this kind of HTTP authentication is not what
13813498266Sopenharmony_ci is usually done and requested by user-oriented websites these days. They tend
13913498266Sopenharmony_ci to use forms and cookies instead.
14013498266Sopenharmony_ci
14113498266Sopenharmony_ci## Path part
14213498266Sopenharmony_ci
14313498266Sopenharmony_ci The path part is just sent off to the server to request that it sends back
14413498266Sopenharmony_ci the associated response. The path is what is to the right side of the slash
14513498266Sopenharmony_ci that follows the hostname and possibly port number.
14613498266Sopenharmony_ci
14713498266Sopenharmony_ci# Fetch a page
14813498266Sopenharmony_ci
14913498266Sopenharmony_ci## GET
15013498266Sopenharmony_ci
15113498266Sopenharmony_ci The simplest and most common request/operation made using HTTP is to GET a
15213498266Sopenharmony_ci URL. The URL could itself refer to a webpage, an image or a file. The client
15313498266Sopenharmony_ci issues a GET request to the server and receives the document it asked for.
15413498266Sopenharmony_ci If you issue the command line
15513498266Sopenharmony_ci
15613498266Sopenharmony_ci    curl https://curl.se
15713498266Sopenharmony_ci
15813498266Sopenharmony_ci you get a webpage returned in your terminal window. The entire HTML document
15913498266Sopenharmony_ci that that URL holds.
16013498266Sopenharmony_ci
16113498266Sopenharmony_ci All HTTP replies contain a set of response headers that are normally hidden,
16213498266Sopenharmony_ci use curl's [`--include`](https://curl.se/docs/manpage.html#-i) (`-i`)
16313498266Sopenharmony_ci option to display them as well as the rest of the document.
16413498266Sopenharmony_ci
16513498266Sopenharmony_ci## HEAD
16613498266Sopenharmony_ci
16713498266Sopenharmony_ci You can ask the remote server for ONLY the headers by using the
16813498266Sopenharmony_ci [`--head`](https://curl.se/docs/manpage.html#-I) (`-I`) option which
16913498266Sopenharmony_ci will make curl issue a HEAD request. In some special cases servers deny the
17013498266Sopenharmony_ci HEAD method while others still work, which is a particular kind of annoyance.
17113498266Sopenharmony_ci
17213498266Sopenharmony_ci The HEAD method is defined and made so that the server returns the headers
17313498266Sopenharmony_ci exactly the way it would do for a GET, but without a body. It means that you
17413498266Sopenharmony_ci may see a `Content-Length:` in the response headers, but there must not be an
17513498266Sopenharmony_ci actual body in the HEAD response.
17613498266Sopenharmony_ci
17713498266Sopenharmony_ci## Multiple URLs in a single command line
17813498266Sopenharmony_ci
17913498266Sopenharmony_ci A single curl command line may involve one or many URLs. The most common case
18013498266Sopenharmony_ci is probably to just use one, but you can specify any amount of URLs. Yes
18113498266Sopenharmony_ci any. No limits. You will then get requests repeated over and over for all the
18213498266Sopenharmony_ci given URLs.
18313498266Sopenharmony_ci
18413498266Sopenharmony_ci Example, send two GET requests:
18513498266Sopenharmony_ci
18613498266Sopenharmony_ci    curl http://url1.example.com http://url2.example.com
18713498266Sopenharmony_ci
18813498266Sopenharmony_ci If you use [`--data`](https://curl.se/docs/manpage.html#-d) to POST to
18913498266Sopenharmony_ci the URL, using multiple URLs means that you send that same POST to all the
19013498266Sopenharmony_ci given URLs.
19113498266Sopenharmony_ci
19213498266Sopenharmony_ci Example, send two POSTs:
19313498266Sopenharmony_ci
19413498266Sopenharmony_ci    curl --data name=curl http://url1.example.com http://url2.example.com
19513498266Sopenharmony_ci
19613498266Sopenharmony_ci
19713498266Sopenharmony_ci## Multiple HTTP methods in a single command line
19813498266Sopenharmony_ci
19913498266Sopenharmony_ci Sometimes you need to operate on several URLs in a single command line and do
20013498266Sopenharmony_ci different HTTP methods on each. For this, you will enjoy the
20113498266Sopenharmony_ci [`--next`](https://curl.se/docs/manpage.html#-:) option. It is basically
20213498266Sopenharmony_ci a separator that separates a bunch of options from the next. All the URLs
20313498266Sopenharmony_ci before `--next` will get the same method and will get all the POST data
20413498266Sopenharmony_ci merged into one.
20513498266Sopenharmony_ci
20613498266Sopenharmony_ci When curl reaches the `--next` on the command line, it will sort of reset the
20713498266Sopenharmony_ci method and the POST data and allow a new set.
20813498266Sopenharmony_ci
20913498266Sopenharmony_ci Perhaps this is best shown with a few examples. To send first a HEAD and then
21013498266Sopenharmony_ci a GET:
21113498266Sopenharmony_ci
21213498266Sopenharmony_ci    curl -I http://example.com --next http://example.com
21313498266Sopenharmony_ci
21413498266Sopenharmony_ci To first send a POST and then a GET:
21513498266Sopenharmony_ci
21613498266Sopenharmony_ci    curl -d score=10 http://example.com/post.cgi --next http://example.com/results.html
21713498266Sopenharmony_ci
21813498266Sopenharmony_ci# HTML forms
21913498266Sopenharmony_ci
22013498266Sopenharmony_ci## Forms explained
22113498266Sopenharmony_ci
22213498266Sopenharmony_ci Forms are the general way a website can present an HTML page with fields for
22313498266Sopenharmony_ci the user to enter data in, and then press some kind of 'OK' or 'Submit'
22413498266Sopenharmony_ci button to get that data sent to the server. The server then typically uses
22513498266Sopenharmony_ci the posted data to decide how to act. Like using the entered words to search
22613498266Sopenharmony_ci in a database, or to add the info in a bug tracking system, display the
22713498266Sopenharmony_ci entered address on a map or using the info as a login-prompt verifying that
22813498266Sopenharmony_ci the user is allowed to see what it is about to see.
22913498266Sopenharmony_ci
23013498266Sopenharmony_ci Of course there has to be some kind of program on the server end to receive
23113498266Sopenharmony_ci the data you send. You cannot just invent something out of the air.
23213498266Sopenharmony_ci
23313498266Sopenharmony_ci## GET
23413498266Sopenharmony_ci
23513498266Sopenharmony_ci A GET-form uses the method GET, as specified in HTML like:
23613498266Sopenharmony_ci
23713498266Sopenharmony_ci```html
23813498266Sopenharmony_ci<form method="GET" action="junk.cgi">
23913498266Sopenharmony_ci  <input type=text name="birthyear">
24013498266Sopenharmony_ci  <input type=submit name=press value="OK">
24113498266Sopenharmony_ci</form>
24213498266Sopenharmony_ci```
24313498266Sopenharmony_ci
24413498266Sopenharmony_ci In your favorite browser, this form will appear with a text box to fill in
24513498266Sopenharmony_ci and a press-button labeled "OK". If you fill in '1905' and press the OK
24613498266Sopenharmony_ci button, your browser will then create a new URL to get for you. The URL will
24713498266Sopenharmony_ci get `junk.cgi?birthyear=1905&press=OK` appended to the path part of the
24813498266Sopenharmony_ci previous URL.
24913498266Sopenharmony_ci
25013498266Sopenharmony_ci If the original form was seen on the page `www.example.com/when/birth.html`,
25113498266Sopenharmony_ci the second page you will get will become
25213498266Sopenharmony_ci `www.example.com/when/junk.cgi?birthyear=1905&press=OK`.
25313498266Sopenharmony_ci
25413498266Sopenharmony_ci Most search engines work this way.
25513498266Sopenharmony_ci
25613498266Sopenharmony_ci To make curl do the GET form post for you, just enter the expected created
25713498266Sopenharmony_ci URL:
25813498266Sopenharmony_ci
25913498266Sopenharmony_ci    curl "http://www.example.com/when/junk.cgi?birthyear=1905&press=OK"
26013498266Sopenharmony_ci
26113498266Sopenharmony_ci## POST
26213498266Sopenharmony_ci
26313498266Sopenharmony_ci The GET method makes all input field names get displayed in the URL field of
26413498266Sopenharmony_ci your browser. That is generally a good thing when you want to be able to
26513498266Sopenharmony_ci bookmark that page with your given data, but it is an obvious disadvantage if
26613498266Sopenharmony_ci you entered secret information in one of the fields or if there are a large
26713498266Sopenharmony_ci amount of fields creating a long and unreadable URL.
26813498266Sopenharmony_ci
26913498266Sopenharmony_ci The HTTP protocol then offers the POST method. This way the client sends the
27013498266Sopenharmony_ci data separated from the URL and thus you will not see any of it in the URL
27113498266Sopenharmony_ci address field.
27213498266Sopenharmony_ci
27313498266Sopenharmony_ci The form would look similar to the previous one:
27413498266Sopenharmony_ci
27513498266Sopenharmony_ci```html
27613498266Sopenharmony_ci<form method="POST" action="junk.cgi">
27713498266Sopenharmony_ci  <input type=text name="birthyear">
27813498266Sopenharmony_ci  <input type=submit name=press value=" OK ">
27913498266Sopenharmony_ci</form>
28013498266Sopenharmony_ci```
28113498266Sopenharmony_ci
28213498266Sopenharmony_ci And to use curl to post this form with the same data filled in as before, we
28313498266Sopenharmony_ci could do it like:
28413498266Sopenharmony_ci
28513498266Sopenharmony_ci    curl --data "birthyear=1905&press=%20OK%20" http://www.example.com/when/junk.cgi
28613498266Sopenharmony_ci
28713498266Sopenharmony_ci This kind of POST will use the Content-Type
28813498266Sopenharmony_ci `application/x-www-form-urlencoded` and is the most widely used POST kind.
28913498266Sopenharmony_ci
29013498266Sopenharmony_ci The data you send to the server MUST already be properly encoded, curl will
29113498266Sopenharmony_ci not do that for you. For example, if you want the data to contain a space,
29213498266Sopenharmony_ci you need to replace that space with `%20`, etc. Failing to comply with this will
29313498266Sopenharmony_ci most likely cause your data to be received wrongly and messed up.
29413498266Sopenharmony_ci
29513498266Sopenharmony_ci Recent curl versions can in fact url-encode POST data for you, like this:
29613498266Sopenharmony_ci
29713498266Sopenharmony_ci    curl --data-urlencode "name=I am Daniel" http://www.example.com
29813498266Sopenharmony_ci
29913498266Sopenharmony_ci If you repeat `--data` several times on the command line, curl will
30013498266Sopenharmony_ci concatenate all the given data pieces - and put a `&` symbol between each
30113498266Sopenharmony_ci data segment.
30213498266Sopenharmony_ci
30313498266Sopenharmony_ci## File Upload POST
30413498266Sopenharmony_ci
30513498266Sopenharmony_ci Back in late 1995 they defined an additional way to post data over HTTP. It
30613498266Sopenharmony_ci is documented in the RFC 1867, why this method sometimes is referred to as
30713498266Sopenharmony_ci RFC 1867-posting.
30813498266Sopenharmony_ci
30913498266Sopenharmony_ci This method is mainly designed to better support file uploads. A form that
31013498266Sopenharmony_ci allows a user to upload a file could be written like this in HTML:
31113498266Sopenharmony_ci
31213498266Sopenharmony_ci    <form method="POST" enctype='multipart/form-data' action="upload.cgi">
31313498266Sopenharmony_ci      <input name=upload type=file>
31413498266Sopenharmony_ci      <input type=submit name=press value="OK">
31513498266Sopenharmony_ci    </form>
31613498266Sopenharmony_ci
31713498266Sopenharmony_ci This clearly shows that the Content-Type about to be sent is
31813498266Sopenharmony_ci `multipart/form-data`.
31913498266Sopenharmony_ci
32013498266Sopenharmony_ci To post to a form like this with curl, you enter a command line like:
32113498266Sopenharmony_ci
32213498266Sopenharmony_ci    curl --form upload=@localfilename --form press=OK [URL]
32313498266Sopenharmony_ci
32413498266Sopenharmony_ci## Hidden Fields
32513498266Sopenharmony_ci
32613498266Sopenharmony_ci A common way for HTML based applications to pass state information between
32713498266Sopenharmony_ci pages is to add hidden fields to the forms. Hidden fields are already filled
32813498266Sopenharmony_ci in, they are not displayed to the user and they get passed along just as all
32913498266Sopenharmony_ci the other fields.
33013498266Sopenharmony_ci
33113498266Sopenharmony_ci A similar example form with one visible field, one hidden field and one
33213498266Sopenharmony_ci submit button could look like:
33313498266Sopenharmony_ci
33413498266Sopenharmony_ci```html
33513498266Sopenharmony_ci<form method="POST" action="foobar.cgi">
33613498266Sopenharmony_ci  <input type=text name="birthyear">
33713498266Sopenharmony_ci  <input type=hidden name="person" value="daniel">
33813498266Sopenharmony_ci  <input type=submit name="press" value="OK">
33913498266Sopenharmony_ci</form>
34013498266Sopenharmony_ci```
34113498266Sopenharmony_ci
34213498266Sopenharmony_ci To POST this with curl, you will not have to think about if the fields are
34313498266Sopenharmony_ci hidden or not. To curl they are all the same:
34413498266Sopenharmony_ci
34513498266Sopenharmony_ci    curl --data "birthyear=1905&press=OK&person=daniel" [URL]
34613498266Sopenharmony_ci
34713498266Sopenharmony_ci## Figure Out What A POST Looks Like
34813498266Sopenharmony_ci
34913498266Sopenharmony_ci When you are about to fill in a form and send it to a server by using curl
35013498266Sopenharmony_ci instead of a browser, you are of course interested in sending a POST exactly
35113498266Sopenharmony_ci the way your browser does.
35213498266Sopenharmony_ci
35313498266Sopenharmony_ci An easy way to get to see this, is to save the HTML page with the form on
35413498266Sopenharmony_ci your local disk, modify the 'method' to a GET, and press the submit button
35513498266Sopenharmony_ci (you could also change the action URL if you want to).
35613498266Sopenharmony_ci
35713498266Sopenharmony_ci You will then clearly see the data get appended to the URL, separated with a
35813498266Sopenharmony_ci `?`-letter as GET forms are supposed to.
35913498266Sopenharmony_ci
36013498266Sopenharmony_ci# HTTP upload
36113498266Sopenharmony_ci
36213498266Sopenharmony_ci## PUT
36313498266Sopenharmony_ci
36413498266Sopenharmony_ci Perhaps the best way to upload data to an HTTP server is to use PUT. Then
36513498266Sopenharmony_ci again, this of course requires that someone put a program or script on the
36613498266Sopenharmony_ci server end that knows how to receive an HTTP PUT stream.
36713498266Sopenharmony_ci
36813498266Sopenharmony_ci Put a file to an HTTP server with curl:
36913498266Sopenharmony_ci
37013498266Sopenharmony_ci    curl --upload-file uploadfile http://www.example.com/receive.cgi
37113498266Sopenharmony_ci
37213498266Sopenharmony_ci# HTTP Authentication
37313498266Sopenharmony_ci
37413498266Sopenharmony_ci## Basic Authentication
37513498266Sopenharmony_ci
37613498266Sopenharmony_ci HTTP Authentication is the ability to tell the server your username and
37713498266Sopenharmony_ci password so that it can verify that you are allowed to do the request you are
37813498266Sopenharmony_ci doing. The Basic authentication used in HTTP (which is the type curl uses by
37913498266Sopenharmony_ci default) is **plain text** based, which means it sends username and password
38013498266Sopenharmony_ci only slightly obfuscated, but still fully readable by anyone that sniffs on
38113498266Sopenharmony_ci the network between you and the remote server.
38213498266Sopenharmony_ci
38313498266Sopenharmony_ci To tell curl to use a user and password for authentication:
38413498266Sopenharmony_ci
38513498266Sopenharmony_ci    curl --user name:password http://www.example.com
38613498266Sopenharmony_ci
38713498266Sopenharmony_ci## Other Authentication
38813498266Sopenharmony_ci
38913498266Sopenharmony_ci The site might require a different authentication method (check the headers
39013498266Sopenharmony_ci returned by the server), and then
39113498266Sopenharmony_ci [`--ntlm`](https://curl.se/docs/manpage.html#--ntlm),
39213498266Sopenharmony_ci [`--digest`](https://curl.se/docs/manpage.html#--digest),
39313498266Sopenharmony_ci [`--negotiate`](https://curl.se/docs/manpage.html#--negotiate) or even
39413498266Sopenharmony_ci [`--anyauth`](https://curl.se/docs/manpage.html#--anyauth) might be
39513498266Sopenharmony_ci options that suit you.
39613498266Sopenharmony_ci
39713498266Sopenharmony_ci## Proxy Authentication
39813498266Sopenharmony_ci
39913498266Sopenharmony_ci Sometimes your HTTP access is only available through the use of an HTTP
40013498266Sopenharmony_ci proxy. This seems to be especially common at various companies. An HTTP proxy
40113498266Sopenharmony_ci may require its own user and password to allow the client to get through to
40213498266Sopenharmony_ci the Internet. To specify those with curl, run something like:
40313498266Sopenharmony_ci
40413498266Sopenharmony_ci    curl --proxy-user proxyuser:proxypassword curl.se
40513498266Sopenharmony_ci
40613498266Sopenharmony_ci If your proxy requires the authentication to be done using the NTLM method,
40713498266Sopenharmony_ci use [`--proxy-ntlm`](https://curl.se/docs/manpage.html#--proxy-ntlm), if
40813498266Sopenharmony_ci it requires Digest use
40913498266Sopenharmony_ci [`--proxy-digest`](https://curl.se/docs/manpage.html#--proxy-digest).
41013498266Sopenharmony_ci
41113498266Sopenharmony_ci If you use any one of these user+password options but leave out the password
41213498266Sopenharmony_ci part, curl will prompt for the password interactively.
41313498266Sopenharmony_ci
41413498266Sopenharmony_ci## Hiding credentials
41513498266Sopenharmony_ci
41613498266Sopenharmony_ci Do note that when a program is run, its parameters might be possible to see
41713498266Sopenharmony_ci when listing the running processes of the system. Thus, other users may be
41813498266Sopenharmony_ci able to watch your passwords if you pass them as plain command line
41913498266Sopenharmony_ci options. There are ways to circumvent this.
42013498266Sopenharmony_ci
42113498266Sopenharmony_ci It is worth noting that while this is how HTTP Authentication works, many
42213498266Sopenharmony_ci websites will not use this concept when they provide logins etc. See the Web
42313498266Sopenharmony_ci Login chapter further below for more details on that.
42413498266Sopenharmony_ci
42513498266Sopenharmony_ci# More HTTP Headers
42613498266Sopenharmony_ci
42713498266Sopenharmony_ci## Referer
42813498266Sopenharmony_ci
42913498266Sopenharmony_ci An HTTP request may include a 'referer' field (yes it is misspelled), which
43013498266Sopenharmony_ci can be used to tell from which URL the client got to this particular
43113498266Sopenharmony_ci resource. Some programs/scripts check the referer field of requests to verify
43213498266Sopenharmony_ci that this was not arriving from an external site or an unknown page. While
43313498266Sopenharmony_ci this is a stupid way to check something so easily forged, many scripts still
43413498266Sopenharmony_ci do it. Using curl, you can put anything you want in the referer-field and
43513498266Sopenharmony_ci thus more easily be able to fool the server into serving your request.
43613498266Sopenharmony_ci
43713498266Sopenharmony_ci Use curl to set the referer field with:
43813498266Sopenharmony_ci
43913498266Sopenharmony_ci    curl --referer http://www.example.come http://www.example.com
44013498266Sopenharmony_ci
44113498266Sopenharmony_ci## User Agent
44213498266Sopenharmony_ci
44313498266Sopenharmony_ci Similar to the referer field, all HTTP requests may set the User-Agent
44413498266Sopenharmony_ci field. It names what user agent (client) that is being used. Many
44513498266Sopenharmony_ci applications use this information to decide how to display pages. Silly web
44613498266Sopenharmony_ci programmers try to make different pages for users of different browsers to
44713498266Sopenharmony_ci make them look the best possible for their particular browsers. They usually
44813498266Sopenharmony_ci also do different kinds of JavaScript etc.
44913498266Sopenharmony_ci
45013498266Sopenharmony_ci At times, you will see that getting a page with curl will not return the same
45113498266Sopenharmony_ci page that you see when getting the page with your browser. Then you know it
45213498266Sopenharmony_ci is time to set the User Agent field to fool the server into thinking you are
45313498266Sopenharmony_ci one of those browsers.
45413498266Sopenharmony_ci
45513498266Sopenharmony_ci To make curl look like Internet Explorer 5 on a Windows 2000 box:
45613498266Sopenharmony_ci
45713498266Sopenharmony_ci    curl --user-agent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" [URL]
45813498266Sopenharmony_ci
45913498266Sopenharmony_ci Or why not look like you are using Netscape 4.73 on an old Linux box:
46013498266Sopenharmony_ci
46113498266Sopenharmony_ci    curl --user-agent "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" [URL]
46213498266Sopenharmony_ci
46313498266Sopenharmony_ci## Redirects
46413498266Sopenharmony_ci
46513498266Sopenharmony_ci## Location header
46613498266Sopenharmony_ci
46713498266Sopenharmony_ci When a resource is requested from a server, the reply from the server may
46813498266Sopenharmony_ci include a hint about where the browser should go next to find this page, or a
46913498266Sopenharmony_ci new page keeping newly generated output. The header that tells the browser to
47013498266Sopenharmony_ci redirect is `Location:`.
47113498266Sopenharmony_ci
47213498266Sopenharmony_ci Curl does not follow `Location:` headers by default, but will simply display
47313498266Sopenharmony_ci such pages in the same manner it displays all HTTP replies. It does however
47413498266Sopenharmony_ci feature an option that will make it attempt to follow the `Location:`
47513498266Sopenharmony_ci pointers.
47613498266Sopenharmony_ci
47713498266Sopenharmony_ci To tell curl to follow a Location:
47813498266Sopenharmony_ci
47913498266Sopenharmony_ci    curl --location http://www.example.com
48013498266Sopenharmony_ci
48113498266Sopenharmony_ci If you use curl to POST to a site that immediately redirects you to another
48213498266Sopenharmony_ci page, you can safely use
48313498266Sopenharmony_ci [`--location`](https://curl.se/docs/manpage.html#-L) (`-L`) and
48413498266Sopenharmony_ci `--data`/`--form` together. Curl will only use POST in the first request, and
48513498266Sopenharmony_ci then revert to GET in the following operations.
48613498266Sopenharmony_ci
48713498266Sopenharmony_ci## Other redirects
48813498266Sopenharmony_ci
48913498266Sopenharmony_ci Browsers typically support at least two other ways of redirects that curl
49013498266Sopenharmony_ci does not: first the html may contain a meta refresh tag that asks the browser
49113498266Sopenharmony_ci to load a specific URL after a set number of seconds, or it may use
49213498266Sopenharmony_ci JavaScript to do it.
49313498266Sopenharmony_ci
49413498266Sopenharmony_ci# Cookies
49513498266Sopenharmony_ci
49613498266Sopenharmony_ci## Cookie Basics
49713498266Sopenharmony_ci
49813498266Sopenharmony_ci The way the web browsers do "client side state control" is by using
49913498266Sopenharmony_ci cookies. Cookies are just names with associated contents. The cookies are
50013498266Sopenharmony_ci sent to the client by the server. The server tells the client for what path
50113498266Sopenharmony_ci and hostname it wants the cookie sent back, and it also sends an expiration
50213498266Sopenharmony_ci date and a few more properties.
50313498266Sopenharmony_ci
50413498266Sopenharmony_ci When a client communicates with a server with a name and path as previously
50513498266Sopenharmony_ci specified in a received cookie, the client sends back the cookies and their
50613498266Sopenharmony_ci contents to the server, unless of course they are expired.
50713498266Sopenharmony_ci
50813498266Sopenharmony_ci Many applications and servers use this method to connect a series of requests
50913498266Sopenharmony_ci into a single logical session. To be able to use curl in such occasions, we
51013498266Sopenharmony_ci must be able to record and send back cookies the way the web application
51113498266Sopenharmony_ci expects them. The same way browsers deal with them.
51213498266Sopenharmony_ci
51313498266Sopenharmony_ci## Cookie options
51413498266Sopenharmony_ci
51513498266Sopenharmony_ci The simplest way to send a few cookies to the server when getting a page with
51613498266Sopenharmony_ci curl is to add them on the command line like:
51713498266Sopenharmony_ci
51813498266Sopenharmony_ci    curl --cookie "name=Daniel" http://www.example.com
51913498266Sopenharmony_ci
52013498266Sopenharmony_ci Cookies are sent as common HTTP headers. This is practical as it allows curl
52113498266Sopenharmony_ci to record cookies simply by recording headers. Record cookies with curl by
52213498266Sopenharmony_ci using the [`--dump-header`](https://curl.se/docs/manpage.html#-D) (`-D`)
52313498266Sopenharmony_ci option like:
52413498266Sopenharmony_ci
52513498266Sopenharmony_ci    curl --dump-header headers_and_cookies http://www.example.com
52613498266Sopenharmony_ci
52713498266Sopenharmony_ci (Take note that the
52813498266Sopenharmony_ci [`--cookie-jar`](https://curl.se/docs/manpage.html#-c) option described
52913498266Sopenharmony_ci below is a better way to store cookies.)
53013498266Sopenharmony_ci
53113498266Sopenharmony_ci Curl has a full blown cookie parsing engine built-in that comes in use if you
53213498266Sopenharmony_ci want to reconnect to a server and use cookies that were stored from a
53313498266Sopenharmony_ci previous connection (or hand-crafted manually to fool the server into
53413498266Sopenharmony_ci believing you had a previous connection). To use previously stored cookies,
53513498266Sopenharmony_ci you run curl like:
53613498266Sopenharmony_ci
53713498266Sopenharmony_ci    curl --cookie stored_cookies_in_file http://www.example.com
53813498266Sopenharmony_ci
53913498266Sopenharmony_ci Curl's "cookie engine" gets enabled when you use the
54013498266Sopenharmony_ci [`--cookie`](https://curl.se/docs/manpage.html#-b) option. If you only
54113498266Sopenharmony_ci want curl to understand received cookies, use `--cookie` with a file that
54213498266Sopenharmony_ci does not exist. Example, if you want to let curl understand cookies from a
54313498266Sopenharmony_ci page and follow a location (and thus possibly send back cookies it received),
54413498266Sopenharmony_ci you can invoke it like:
54513498266Sopenharmony_ci
54613498266Sopenharmony_ci    curl --cookie nada --location http://www.example.com
54713498266Sopenharmony_ci
54813498266Sopenharmony_ci Curl has the ability to read and write cookie files that use the same file
54913498266Sopenharmony_ci format that Netscape and Mozilla once used. It is a convenient way to share
55013498266Sopenharmony_ci cookies between scripts or invokes. The `--cookie` (`-b`) switch
55113498266Sopenharmony_ci automatically detects if a given file is such a cookie file and parses it,
55213498266Sopenharmony_ci and by using the `--cookie-jar` (`-c`) option you will make curl write a new
55313498266Sopenharmony_ci cookie file at the end of an operation:
55413498266Sopenharmony_ci
55513498266Sopenharmony_ci    curl --cookie cookies.txt --cookie-jar newcookies.txt \
55613498266Sopenharmony_ci      http://www.example.com
55713498266Sopenharmony_ci
55813498266Sopenharmony_ci# HTTPS
55913498266Sopenharmony_ci
56013498266Sopenharmony_ci## HTTPS is HTTP secure
56113498266Sopenharmony_ci
56213498266Sopenharmony_ci There are a few ways to do secure HTTP transfers. By far the most common
56313498266Sopenharmony_ci protocol for doing this is what is generally known as HTTPS, HTTP over
56413498266Sopenharmony_ci SSL. SSL encrypts all the data that is sent and received over the network and
56513498266Sopenharmony_ci thus makes it harder for attackers to spy on sensitive information.
56613498266Sopenharmony_ci
56713498266Sopenharmony_ci SSL (or TLS as the current version of the standard is called) offers a set of
56813498266Sopenharmony_ci advanced features to do secure transfers over HTTP.
56913498266Sopenharmony_ci
57013498266Sopenharmony_ci Curl supports encrypted fetches when built to use a TLS library and it can be
57113498266Sopenharmony_ci built to use one out of a fairly large set of libraries - `curl -V` will show
57213498266Sopenharmony_ci which one your curl was built to use (if any!). To get a page from an HTTPS
57313498266Sopenharmony_ci server, simply run curl like:
57413498266Sopenharmony_ci
57513498266Sopenharmony_ci    curl https://secure.example.com
57613498266Sopenharmony_ci
57713498266Sopenharmony_ci## Certificates
57813498266Sopenharmony_ci
57913498266Sopenharmony_ci In the HTTPS world, you use certificates to validate that you are the one
58013498266Sopenharmony_ci you claim to be, as an addition to normal passwords. Curl supports client-
58113498266Sopenharmony_ci side certificates. All certificates are locked with a pass phrase, which you
58213498266Sopenharmony_ci need to enter before the certificate can be used by curl. The pass phrase
58313498266Sopenharmony_ci can be specified on the command line or if not, entered interactively when
58413498266Sopenharmony_ci curl queries for it. Use a certificate with curl on an HTTPS server like:
58513498266Sopenharmony_ci
58613498266Sopenharmony_ci    curl --cert mycert.pem https://secure.example.com
58713498266Sopenharmony_ci
58813498266Sopenharmony_ci curl also tries to verify that the server is who it claims to be, by
58913498266Sopenharmony_ci verifying the server's certificate against a locally stored CA cert
59013498266Sopenharmony_ci bundle. Failing the verification will cause curl to deny the connection. You
59113498266Sopenharmony_ci must then use [`--insecure`](https://curl.se/docs/manpage.html#-k)
59213498266Sopenharmony_ci (`-k`) in case you want to tell curl to ignore that the server cannot be
59313498266Sopenharmony_ci verified.
59413498266Sopenharmony_ci
59513498266Sopenharmony_ci More about server certificate verification and ca cert bundles can be read in
59613498266Sopenharmony_ci the [`SSLCERTS` document](https://curl.se/docs/sslcerts.html).
59713498266Sopenharmony_ci
59813498266Sopenharmony_ci At times you may end up with your own CA cert store and then you can tell
59913498266Sopenharmony_ci curl to use that to verify the server's certificate:
60013498266Sopenharmony_ci
60113498266Sopenharmony_ci    curl --cacert ca-bundle.pem https://example.com/
60213498266Sopenharmony_ci
60313498266Sopenharmony_ci# Custom Request Elements
60413498266Sopenharmony_ci
60513498266Sopenharmony_ci## Modify method and headers
60613498266Sopenharmony_ci
60713498266Sopenharmony_ci Doing fancy stuff, you may need to add or change elements of a single curl
60813498266Sopenharmony_ci request.
60913498266Sopenharmony_ci
61013498266Sopenharmony_ci For example, you can change the POST method to `PROPFIND` and send the data
61113498266Sopenharmony_ci as `Content-Type: text/xml` (instead of the default `Content-Type`) like
61213498266Sopenharmony_ci this:
61313498266Sopenharmony_ci
61413498266Sopenharmony_ci    curl --data "<xml>" --header "Content-Type: text/xml" \
61513498266Sopenharmony_ci      --request PROPFIND example.com
61613498266Sopenharmony_ci
61713498266Sopenharmony_ci You can delete a default header by providing one without content. Like you
61813498266Sopenharmony_ci can ruin the request by chopping off the `Host:` header:
61913498266Sopenharmony_ci
62013498266Sopenharmony_ci    curl --header "Host:" http://www.example.com
62113498266Sopenharmony_ci
62213498266Sopenharmony_ci You can add headers the same way. Your server may want a `Destination:`
62313498266Sopenharmony_ci header, and you can add it:
62413498266Sopenharmony_ci
62513498266Sopenharmony_ci    curl --header "Destination: http://nowhere" http://example.com
62613498266Sopenharmony_ci
62713498266Sopenharmony_ci## More on changed methods
62813498266Sopenharmony_ci
62913498266Sopenharmony_ci It should be noted that curl selects which methods to use on its own
63013498266Sopenharmony_ci depending on what action to ask for. `-d` will do POST, `-I` will do HEAD and
63113498266Sopenharmony_ci so on. If you use the [`--request`](https://curl.se/docs/manpage.html#-X) /
63213498266Sopenharmony_ci `-X` option you can change the method keyword curl selects, but you will not
63313498266Sopenharmony_ci modify curl's behavior. This means that if you for example use -d "data" to
63413498266Sopenharmony_ci do a POST, you can modify the method to a `PROPFIND` with `-X` and curl will
63513498266Sopenharmony_ci still think it sends a POST. You can change the normal GET to a POST method
63613498266Sopenharmony_ci by simply adding `-X POST` in a command line like:
63713498266Sopenharmony_ci
63813498266Sopenharmony_ci    curl -X POST http://example.org/
63913498266Sopenharmony_ci
64013498266Sopenharmony_ci curl will however still act as if it sent a GET so it will not send any
64113498266Sopenharmony_ci request body etc.
64213498266Sopenharmony_ci
64313498266Sopenharmony_ci# Web Login
64413498266Sopenharmony_ci
64513498266Sopenharmony_ci## Some login tricks
64613498266Sopenharmony_ci
64713498266Sopenharmony_ci While not strictly just HTTP related, it still causes a lot of people
64813498266Sopenharmony_ci problems so here's the executive run-down of how the vast majority of all
64913498266Sopenharmony_ci login forms work and how to login to them using curl.
65013498266Sopenharmony_ci
65113498266Sopenharmony_ci It can also be noted that to do this properly in an automated fashion, you
65213498266Sopenharmony_ci will most certainly need to script things and do multiple curl invokes etc.
65313498266Sopenharmony_ci
65413498266Sopenharmony_ci First, servers mostly use cookies to track the logged-in status of the
65513498266Sopenharmony_ci client, so you will need to capture the cookies you receive in the
65613498266Sopenharmony_ci responses. Then, many sites also set a special cookie on the login page (to
65713498266Sopenharmony_ci make sure you got there through their login page) so you should make a habit
65813498266Sopenharmony_ci of first getting the login-form page to capture the cookies set there.
65913498266Sopenharmony_ci
66013498266Sopenharmony_ci Some web-based login systems feature various amounts of JavaScript, and
66113498266Sopenharmony_ci sometimes they use such code to set or modify cookie contents. Possibly they
66213498266Sopenharmony_ci do that to prevent programmed logins, like this manual describes how to...
66313498266Sopenharmony_ci Anyway, if reading the code is not enough to let you repeat the behavior
66413498266Sopenharmony_ci manually, capturing the HTTP requests done by your browsers and analyzing the
66513498266Sopenharmony_ci sent cookies is usually a working method to work out how to shortcut the
66613498266Sopenharmony_ci JavaScript need.
66713498266Sopenharmony_ci
66813498266Sopenharmony_ci In the actual `<form>` tag for the login, lots of sites fill-in
66913498266Sopenharmony_ci random/session or otherwise secretly generated hidden tags and you may need
67013498266Sopenharmony_ci to first capture the HTML code for the login form and extract all the hidden
67113498266Sopenharmony_ci fields to be able to do a proper login POST. Remember that the contents need
67213498266Sopenharmony_ci to be URL encoded when sent in a normal POST.
67313498266Sopenharmony_ci
67413498266Sopenharmony_ci# Debug
67513498266Sopenharmony_ci
67613498266Sopenharmony_ci## Some debug tricks
67713498266Sopenharmony_ci
67813498266Sopenharmony_ci Many times when you run curl on a site, you will notice that the site does not
67913498266Sopenharmony_ci seem to respond the same way to your curl requests as it does to your
68013498266Sopenharmony_ci browser's.
68113498266Sopenharmony_ci
68213498266Sopenharmony_ci Then you need to start making your curl requests more similar to your
68313498266Sopenharmony_ci browser's requests:
68413498266Sopenharmony_ci
68513498266Sopenharmony_ci - Use the `--trace-ascii` option to store fully detailed logs of the requests
68613498266Sopenharmony_ci   for easier analyzing and better understanding
68713498266Sopenharmony_ci
68813498266Sopenharmony_ci - Make sure you check for and use cookies when needed (both reading with
68913498266Sopenharmony_ci   `--cookie` and writing with `--cookie-jar`)
69013498266Sopenharmony_ci
69113498266Sopenharmony_ci - Set user-agent (with [`-A`](https://curl.se/docs/manpage.html#-A)) to
69213498266Sopenharmony_ci   one like a recent popular browser does
69313498266Sopenharmony_ci
69413498266Sopenharmony_ci - Set referer (with [`-E`](https://curl.se/docs/manpage.html#-E)) like
69513498266Sopenharmony_ci   it is set by the browser
69613498266Sopenharmony_ci
69713498266Sopenharmony_ci - If you use POST, make sure you send all the fields and in the same order as
69813498266Sopenharmony_ci   the browser does it.
69913498266Sopenharmony_ci
70013498266Sopenharmony_ci## Check what the browsers do
70113498266Sopenharmony_ci
70213498266Sopenharmony_ci A good helper to make sure you do this right, is the web browsers' developers
70313498266Sopenharmony_ci tools that let you view all headers you send and receive (even when using
70413498266Sopenharmony_ci HTTPS).
70513498266Sopenharmony_ci
70613498266Sopenharmony_ci A more raw approach is to capture the HTTP traffic on the network with tools
70713498266Sopenharmony_ci such as Wireshark or tcpdump and check what headers that were sent and
70813498266Sopenharmony_ci received by the browser. (HTTPS forces you to use `SSLKEYLOGFILE` to do
70913498266Sopenharmony_ci that.)
710