I’ve been tinking code for a long time now, notably to automate tasks I do a lot as a SEO consultant.
In this post, I’ll quickly explain why and how I use Python and the other tools in my setup.
Why choose Python ?
Python is one of the most popular programming languages these days, and the most used according to IEEE.
I can’t tell you precisely why it is better or worse than any other language. But after a loooong time using PHP, I can tell you a few reasons why I’ve chosen to code in Python.
Python is easy to learn
Once you’ve understood that Python2 is a thing of the past, and that you only need to worry about 3.*, Python is quite easy to learn: its verbose syntax makes code readable and understandable, there are plenty of places on and offline to learn how to use it, and since a ton of people use it, nearly all of your problems have already been solved.
It has a ton of Datascience libraries
Pandas, Numpy, Scikit-Learn, … Like R, Python is very popular amongst Datascientists, which is good: you’ll have plenty of libraries, howtos and help to cover your needs and resolve your problems.
And a lot of other very helpful stuff
Package manager, libraries, virtual environnements, notebooks, APIs, … The Python ecosystem is vast and rich. We’ll see some examples of how some tools will make your experience much more enjoyable.
At least one version of Python is very likely already installed on your computer. You can tell by opening a command line and typing:
However, if any of these commands gives you an error, feel free to check this page and download the latest release ;)
As said earlier, I mostly use Python3: only a few scripts aren’t compatible yet, but most modern stuff isn’t retro-compatible. And with what’s coming next, you’ll never worry about versions anymore!
You really should use virtual environments: they enable you to have specific versions of Python core and libraries for each project you work with.
This means you do not need to worry anymore about having the right Python version, or having to uninstall and reinstall an obscure library because v3.4.117 isn’t compatible with what you’re doing…
Instead, you’ll be able to setup an environment with the right version of every requirement for your project, without worrying about your other projects because they’ll use other virtual environments. In other words, it’s like having a blank computer for each repository.
Afterwards, it’s just a matter of getting used to create a virtual env for each new project:
If everything works, the last line should get you this:
And your local environnement should now show up at the begining of your prompt, just like this :
You can now use
pip (Python’s package manager) to import any requirement you need. Most existing projects will ask you to run this command:
requirements.txt file lists all dependencies of a project.
pip will read each line and try to find the required library, then download and install it in your virtual environment.
This means you can have different versions of the same library in different environments if needed, as well as a global version.
Another helpful command is to install a specific library. For instance, let’s install
And if you need to update your
requirements.txt file with the exact version of each library, simply run:
Jupyter Notebooks are documents containing text, live code and visualizations.
They are very helpful to learn new methods and to share documented code with colleagues:
- rich text with Markdown and even mathematical equations
- step by step blocks of code and output display
- plots and advanced visualizations
And you can use them with other languages than Python if you want !
I mostly use them for two things: early-stage prototypes for new scripts, and exploring data.
Once I’m happy with a script, I export it to a
.py file, wrap it up a bit, and use it for real !
This has not much to do with using Python, but I urge you to use Git (or any other version control software).
Besides helping you keep track of what you do, and helping you work with a team, Github and Gitlab are full of open source projects you can clone and fork.
I recommend using Gitlab as its free plan allows unlimited private repositories (amongst other advantages).
My workflow is inspired by this article:
masterbranch with the main stable version of my project,
- a new branch for any bug or new feature I work onto, that will be merged to
masterthen deleted when complete.
I recommend using a
.gitignore file to indicate any file you don’t want to sync (passwords and credentials, data, system specific files, …). You’ll find the syntax here: https://git-scm.com/docs/gitignore.
Getting used to committing your code once in a while is just a matter of time, like saving your work in Excel or any other software.
Which IDE do I use ?
This is a question I get asked a lot. And the short answer is:
I’ve tried some in the past, for Python and other languages, but IDE don’t suit me. I feel restricted when I use them.
I really prefer using a good text editor (Sublime Text has been my choice for a few years now), and a command line.
But this is really a matter of finding what best suits you. Some IDE are very good, and can really help you learn any language and best practices.
If you’re looking for a Python specific IDE, I’ve been told PyCharm is a good choice.
Where to begin ?
There are many ways to start learning Python, or any programming language.
From books to online tutorials, it really is up to you.
A popular trend is to follow MOOCs, and I recommend trying DataCamp or Coursera as they both also propose Datascience lessons.
My own method is to work on a small actual project, and to get a little bit further every time. Stackoverflow is very helpful.
With all this you should be good to go !
In future posts, we’ll see some practical examples of Python scripts for SEO analysis. Stay tuned !