Member-only story
How To Clean Data with Python Pandas — Vehicles registered in Poland
5 min readMar 19, 2023
Thanks to the Open Data project, we have sources made available by Polish public entities. In this article, we will prepare and clean the Vehicles registered in Poland by province using Python and Pandas.
Data
We have data available in csv and xlsx (Excel) formats. Upon initial observation, the following problems can be seen:
- Inconsistent names of voivodeships — upper and lower case letters
- Inconsistent number of columns — TERYT column not present everywhere
- Inconsistent column names regarding months — mixed data forms + typos
Data Loading
Of the two available formats, I chose xlsx. The following code snippet will create a list of paths for all files of this type. Why did I choose xlsx and not csv? No specific reason.
xlsx_paths = []
for path, subdirs, files in os.walk('..\data'):
for name in files…