Member-only story

Analysis of Warsaw Public Transport Data in Kibana and Elasticsearch

Maciej Szymczyk
7 min readJun 14, 2020

In the previous posts I documented the creation of a data flow using technologies such as Kafka, Kafka Streams, Logstash and Elasticsearch. After a few days of work, I already have enough data to check the possibilities of urban transport data analysis in Elasticsearch and Kibana.

Data

The first record had 2020–06–02 20:24:58 timestamp (Tuesday). I made the screenshot below on Sunday evening, which is about 4.5 GB for 5 days. The API request is sent every 10 seconds. At the time I am writing this article there are 16 578 668 records at the “ztm” alias.

Cardinality

Using cardinality aggregation, we can check how many lines and vehicles a set is. There are about 309 bus lines and 1828 vehicles.

You probably wonder why I wrote “about”. The answer is in the documentation. This query (like many others in Elasticsearch) returns only estimated values. I will explain it on a blog sometime.

POST ztm/_search
{
"size": 0,
"aggs": {
"cardinality_lines": {
"cardinality": {
"field": "lines"
}…

--

--

Maciej Szymczyk
Maciej Szymczyk

Written by Maciej Szymczyk

Software Developer, Big Data Engineer, Blogger (https://wiadrodanych.pl), Amateur Cyclists & Triathlete, @maciej_szymczyk

Responses (1)