cURL for SEO
Long live the command line!
By Julien on November 14, 2019
I’ve been quite fond of the command line since my debuts on Linux, some 15 years ago.
It has a lot of sense for SEOs to learn how to use it, for a lot of different types of tasks.
And it’s often quicker than using any other tool.
Today, let’s talk about cURL.
This command line tool aims at transfering data using URLs. It’s more than 20 years old, fast and robust.
As an SEO, I use it daily for quick checks. Here’s some.
Fetching a file
Simply download a file. For instance, an HTML page:
$ curl https://www.example.com/my-page.html
It will print the source code for my-page.html
in your terminal.
To save the file on your disk, simply redirect it:
$ curl https://www.example.com/my-page.html > my-file.html
You could also use the pipe (|
) to chain it with another command. For example, let’s see if our page contains a <title>
tag:
$ curl https://www.example.com/my-page.html | grep "<title>"
This will print any line with a <title>
tag. Or nothing if there aren’t any.
Printing the headers only
I often just want to check the HTTP headers for a URL. It’s very simple with cURL, and the -I
argument:
$ curl -I https://www.example.com/
This will print the headers in the terminal, allowing to check for X-Robots-Tag
and other funky stuff.
To get both the headers and the body, use a lower -i
instead:
$ curl -i https://www.example.com/
Customising User-agent
You might want to use a specific User-agent to test some pages. And that’s quite easy with cURL too. Let’s use the standard Googlebot-Desktop UA:
$ curl -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" https://www.example.com/
And that’s it!
Following redirects
Let’s try some more complex arguments. Imagine you want to follow all the steps of a redirect chain.
Here’s how to do it with cURL:
$ curl -sLD - -o /dev/null -w "%{url_effective}" https://httpbin.org/redirect/3
Try this in your terminal and you will get 3 successive 302 redirects, then the final https://httpbin.org/get URL.
Creating aliases
As we just saw, some cURL arguments can be quite long and complex.
But we can create aliases to replace these complex commands with shortuts.
Here are my most used cURL aliases:
alias curlm='curl -A "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.103 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"'
alias curld='curl -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"'
alias curlr='curl -sLD - -o /dev/null -w "%{url_effective}"'
Simply paste these line in your .bash_profile
file (usually at the root of your user directory) and restart your terminal.
You will then be able to just use curlm https://www.example.com/
to fetch a URL using Googlebot’s mobile User-agent, and so on.
cURL has a lot of other options. Checkout the man page for more.
Cheers!