Member-only story

Analysis of Warsaw Public Transport Data in Kibana and Elasticsearch

7 min readJun 14, 2020

In the previous posts I documented the creation of a data flow using technologies such as Kafka, Kafka Streams, Logstash and Elasticsearch. After a few days of work, I already have enough data to check the possibilities of urban transport data analysis in Elasticsearch and Kibana.

Data

The first record had 2020–06–02 20:24:58 timestamp (Tuesday). I made the screenshot below on Sunday evening, which is about 4.5 GB for 5 days. The API request is sent every 10 seconds. At the time I am writing this article there are 16 578 668 records at the “ztm” alias.

Cardinality

Using cardinality aggregation, we can check how many lines and vehicles a set is. There are about 309 bus lines and 1828 vehicles.

You probably wonder why I wrote “about”. The answer is in the documentation. This query (like many others in Elasticsearch) returns only estimated values. I will explain it on a blog sometime.

POST ztm/_search
{
  "size": 0,
  "aggs": {
    "cardinality_lines": {
      "cardinality": {
        "field": "lines"
      }…

Analysis of Warsaw Public Transport Data in Kibana and Elasticsearch

Data

Cardinality

Written by Maciej Szymczyk

Responses (1)