
You're reading for free via Maciej Szymczyk's Friend Link. Become a member to access the best of Medium.
Member-only story
Efficient SIEM and Detection Engineering in 10 steps
SIEM systems and detection engineering are not just about data and detection rules. Planning and processes are becoming increasingly important over time. In 10 steps, you will learn how to approach detection in cybersecurity efficiently.
1. Just start
If you have ever been programming, you will certainly be familiar with software engineering. We can follow various methodologies in projects. It used to be that the waterfall model was the most popular. First there was the plan, then the analysis, then the implementation… and at the end it turned out that the customer had ordered something else. The solution to this problem was agile programming. Scrum is a popular example.
But why do I mention agile methodologies in an article about SIEM and detection engineering? Because they can be implemented in their own way in the cybersecurity world. Many people ask me: How do I get started? How many servers should I order? Which drives? How many nodes? etc. The answer is simple: it depends.
It all depends on the context. It is best to start as soon as possible. Set up a simple cluster and start collecting logs. What will we gain?
- First iteration/sprint 🙃
- We will recognise which sources we have available
- If there are too many sources then… we will find out which logs are most relevant to us….
- …and which logs clog up our disks and add little value.
- Log volume (events per second and size per day)
- We find out how little we know about our monitored environment
- Basic knowledge for planning the next iteration and licence/hardware purchases.
I happen to be a fan of Elastic Stack (psst! I even have an online course 😇). I think you can get very far without going beyond the free version. There are many things to do that will give more value than the premium features. Unless you need EDR right from the start, in that case you need the Platinium version.

Psst! You can have several different clusters 🧅😊.
2. Normalize data model
Data normalisation is a key thing in the context of data in SIEM. Why? Imagine a situation where 4 log sources are parsed by 4 people without any guidance. This is what you would get:
- Adam used the src_ip and dst_ip convention
- John preferred camelCase, i.e. srcIp and dstIp
- Sara used source.ip and destination.ip, and even used a special IP data type (allows searching by subnet masks)
- Tom worked a lot in MySQL, so the IP is int(4) 😅
As you can see we have a problem… It’s difficult to find your way around, not to mention write a detection rule.
The solution is to apply field naming conventions and their types. A universal example is OSSEM, while in the Elastic Stack world the natural choice is Elastic Common Schema (ECS). It is worth using special types such as IP, Geopoint, Geoshape, Histogram or time in nanoseconds (depends on the SIEM).
3. Plan data retention
Sooner or later you will encounter a situation where you run out of disk space. When planning a SIEM solution, it is worth setting a target data retention. The smaller it is, the more you will save on hardware or cloud services. On the other hand, we often find out about the fact of an intrusion long after it has occurred. Without logs, it will be hard to have an analysis. There are also formal requirements due to norms/standards that we have to meet. If you are just planning a SIEM, consider that few people say “Oh no! I bought too many servers”. It is usually the opposite.
It is worth mentioning data temperature at this point. Hot data are those most commonly used by analysts and detection rules (often 1 week — 1 month). Warm data is occasionally used, mainly by analysts (up to months). Cold data is rarely used and resides on slower (cheaper) servers. It’s called Hot Warm Cold architecture.
What if we need to keep the data for a year, but cannot afford such a large SIEM? We can dump them compressed onto NFS/AWS S3/Azure Blob Storage/Cloud Storage. You just need to plan how to extract and process them (e.g. AWS Athena).
4. Monitor your data streams
You have a SIEM, data sources, detection rules and… suddenly you discover that the most important data source stopped logging a month ago. We need to remember that every new element of our architecture is our responsibility. Have you set up a Kafka cluster? Great! Remember to monitor it properly.
Within the SPEED SIEM Use Case Framework, you will notice a special category of rules called Self-Monitoring. Why is it so important?
There are no detection rules without data. They simply don’t work because they have nothing to work on.

5. Plan the process of developing detection rules
There are a lot of ideas and needs for rules. This process needs to be sorted out. You can suggest the article written by Alex and implement it at your place. It is worth using even the simplest issue tracker and organising your work. Here are some categories that might be useful:
- Backlog — this is where all new ideas and demands go
- Doing/Development — 🏗️
- Documentation — this will be discussed below
- Waiting — it is not uncommon to rely on the work of other teams
- Testing — new rules usually generate a lot of FP
- Done — 😎
6. Identify assets
According to wikipedia, triage is a medical procedure that allows medical services to segregate the injured according to the severity of their injuries and prognosis. It is used not only in hospitals, but also in Incident Response.
Implementing a SIEM is a process. Creating detection rules is also a process. First we need to secure the most important elements of the system, e.g. the domain controller, the mail server, the main database server of our application. We need to know piece by piece the elements of our system, their owners, the security systems installed and the events we want to collect. If I don’t know about a server…. then I am probably not monitoring it properly
The information we collect will be useful in estimating the coverage by logs and detection rules. You can use the Detect Tactics, Techniques & Combat Threats (DeTT&CT) project for this. In this way we will estimate whether we are collecting enough logs and events from a given system.
7. Document
In Data Engineering, you may come across Data Silos. These are isolated, separate and often unrelated data in different systems. Such a silo could be your colleague’s Excel file. It is hard to manage and hard to access. This has a negative impact on data mining.
Such a ‘protein’ data silo could be… your colleague. He has been working in the company for years. You ask him about the IP address and he answers in detail what it is and who manages it. It’s nice to have such a colleague. Unfortunately, it’s impossible to scale him up, and there’s a risk that he’ll change job in the future.
How to solve this problem? I don’t know yet! I recommend using a wiki and keeping the documentation in markdown files. Unfortunately I already know that this way also scales poorly from a certain point. I am currently working on generating documentation from YAML files with a specific schema. We will see what comes out of this.
Well-produced documentation will certainly make the process of onboarding new people to the team easier and quicker. Hint: Put the identified assets from point 6 in the wiki.
8. Automate IoC detection
You read a report about a new threat, create a rule, upload the IoC from the report. You read another report, create another rule, upload another IoC… After a while, you get bored, and the old rules can be discarded because the adversary has long since changed IP addresses and domains (Piramyd of Pain).
IoC-based detection is the first type of rule to be automated. You can manage IoCs in a MISP. Elastic Stack has ways to pull data from MISP. A few generic rules should get the job done.

9. Define metrics
In programming, it is fairly easy to define a goal and measure progress. We have a set of functionalities and corresponding tasks. With each sprint, the application looks better and better and the customer is more and more satisfied (at least in theory 🙃).
In detection engineering, the issue is a bit more complicated. We need to define where we are and where we want to be. We can measure progress in terms such as maturity (team, processes) and coverage (data sources, rules).
Detection Engineering Maturity Matrix
Kyle Bailey has defined a detection engineering maturity matrix. You can find his article here and the project on GitHub here. Be sure to take a look at this project!
MITRE
MITRE is a not-for-profit organisation working on cyber security issues. It is known for its publicly available products such as MITRE ATT&CK ( Knowledge base of adversarial tactics and techniques), MITRE ATT&CK Navigator, MITRE D3FEND, MITRE Engage. Every organisation will have a different threat model and therefore different KPIs. Measuring coverage using the above mentioned tools allows you to identify your strengths, weaknesses and make decisions.
10. Don’t bother with ML
Machine Learning, Deep Learning, Blockchain, Big Data, NFT… these are buzzwords that vendors use to persuade us to buy their product. I am not saying that these are not cool topics. They are very cool and encouraging!
My point is that Machine Learning is the cherry on the cake. There is no point in reaching for it without reworking points 1–9. Many times companies have asked me about Machine Learning without really knowing what they want to achieve. Don’t go down that road. A simple IF is easier and faster to implement, explain and use 😁. It’s all about the time to gain ratio. Advanced players as much as possible can afford ML because they have a solid foundation.

Summary
This article is an attempt to organize gained experience and knowledge. If you have a comment or suggestion, I would be happy to hear it.