Categorize data using RegEx
A fast and easy way to categorize data in Python.
By Julien on November 15, 2017
Now that you know how to set Python up for SEO purposes, it is time to make real use of it!
Let’s start with something really useful: this little script to categorize data.
TL;DR: how to use it ?
Follow the inscructions on the gitlab repository to install and use the script.
Under the hood
The magic happens in the categorize.py
file, line 34.
First, we use the wonderful argpase library to get arguments from the command line. Required arguments are the names of the files containing the items to categorize and the set of rules to use.
Argparse will do a few checks on the arguments, like asserting if required arguments were passed, and if the correct types of data were used.
But we still have some tests to run: do those files really exist ? That’s the purpose of lines 50 and 56.
If everything is alright, we read the files, extract the contents then execute the write_csv
function.
This function will simply try to find the first rule matching each item we sent it, and write results in a CSV file.
About Regular Expressions
This script uses Regular Expressions to define patterns for our categories.
I won’t teach you how to use them, but if you want to learn more there are plenty of courses and tools available, like this one.
This is a very basic script, but it definitly works with any strings (URLs, keywords, …).
Feel free to try it, modify it and contribute with upgrades !