I see that the problem was solved with symbolic link. I didn't do any updates to Dockerfile. Currently I'm busy with other projects (not Kafka Streams related)


Have you ever thought about creating a stream from database operations? In this story, you will learn what Change Data Capture is and how to use it while planning your system architecture. In the practical part, we will see the Debezium in action.

What is Change Data Capture?

Change Data Capture is a process of…


It is said that Apache Airflow is CRON on steroids. It is gaining popularity among tools for ETL orchestration (Scheduling, managing and monitoring tasks). The tasks are defined as Directed Acyclic Graph (DAG), in which they exchange information. …


Jupyter and Apache Zeppelin is a good place to experiment with data. Unfortunately, the specifics of notebooks do not encourage to organize the code, including its decomposition and readability. We can copy cells to Intellij IDEA and build JAR, but the effect will not be stunning. We can copy cells…


Twitter data can be obtained in many ways, but who wants to write the code 😉. Especially one that will work 24/7. In Elastic Stack, you can easily collect and analyze data from Twitter. Logstash has input to collect tweets. …


Kafka Connect is part of the Apache Kafka platform. It is used to connect Kafka with external services such as file systems and databases. In this story you will learn what problem it solves and how to run it.

Why Kafka Connect?

Apache Kafka is used in microservices architecture, log aggregation, Change data…


I recorded a video in which I talk about the advantages of NoSQL databases. The response was interesting, but I had the impression that not everyone sees the two sides of the coin. The facts are that they can cause us a lot of problems 😉.

Schema Management

Each NoSQL database approaches…


In Apache Spark/PySpark we use abstractions and the actual processing is done only when we want to materialize the result of the operation. To connect to different databases and file systems we use mostly ready-made libraries. …


This is a continuation of the previous story. This time we will look at the Detections tab in Elastic SIEM. Our goal is to automate IOC detection using proven rules. Let’s remind: We installed Elasticsearch + Kibana on one of the VMs. …


IT environments are becoming increasingly large, distributed and difficult to manage. All system components must be protected and monitored against cyber threats. You need a scalable platform that can store and analyze logs, metrics and events. SIEM solutions can cost a lot of money. …

Maciej Szymczyk

Software Developer, Big Data Engineer, Blogger (https://wiadrodanych.pl), Amateur Cyclists & Triathlete, @maciej_szymczyk

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store