Crawling is one of the most common tasks in technical SEO. However, analyzing a crawl can take quite a long time, especially when working on high volume websites.
Using good tools can help you go faster, and can bring you to more advanced analysis. Talking about advanced stuff, I belive Python is one of the best tools I have at my disposal.
You’ll find in this Gitlab project a Jupyter Notebook that will show you some basic Data analysis using Pandas, a very useful Python library providing data structures and analysis tools.
Using Pandas, it’s quite easy to filter data, generate charts or include additional data, which are common tasks for SEOs.
For example in this notebook, with a few lines of code we’ll:
- count URLs per response code,
- generate and save charts,
- categorize URLs,
- export data to CSV,
Moreover, you’ll be able to automate these steps, which means a lot of time gained.
How to use this notebook
Download or clone the repository, then open a Terminal (Mac/Linux) or Command Prompt (Windows) in the
You’ll need to setup some requirements first:
pip install -r requirements.txt
Once everything is OK, simply run:
This should open a new tab in your browser. However you can also access your notebooks using http://localhost:8888.
You can run the code with your own data by replacing the Screaming Frog exports in the
About Jupyter notebooks
Notebooks contain both live code and text elements. They are very useful to test, explore data or explain code.
Learn how to use them with this notebooks basics guide.
What’s next ?
This example notebook mostly shows basic use of the Pandas library. The most advanced users might be disappointed, but I hope it suits beginners.
However I urge you to try and use these techniques, as this is in my opinion one of the best ways to get to more advanced Python usage.
There are loads of Python libraries you can use to go further, whether you’re into NLP, graph analysis, machine learning, …