CYBER THREAT DETECTION BY SIEM TOOLS

Mrigakshi Goel
13 min readFeb 21, 2021
  • Summary/Abstract

Cyber threat detection is a practice of determining any possible attacks on an organization that could compromise the systems of an organization for any malicious benefits to the hackers.

These types of cyber attacks are becoming more and more sophisticated since 2007 and are resulting in reducing our safety in the cyber world.

As the cyber-attacks are growing, we are concerned to provide online businesses and corporations with a safer solution and thus arises the need for frequent analysis of data logs. This resulted in the creation of tools that could manage events and incidents in organizations. A SIEM (Security Incident Event Management) solution consists of a number of components that are able to provide a solution for Security Information Management (SIM) and Security Event Management (SEM) and can do the activities like Data Aggregation, Threat Intelligence, Security Event Correlation, Advanced Analytics, SOC Automation, Dashboards, Threat Hunting and Forensics. In this project, we have tried to integrate a SIEM solution in our home lab, with several devices and performed artificially intelligent analytics to detect cyber-attack.The final report will be focussed more on automation and orchestration

and will deliver outcomes showing that the ultimate motive of cyber threat detection is focussed on vulnerability assessment and mitigation to reduce the volume of attacks.

  • Introduction

SIEM tools are tools that do the work of both SEM and SIM, they can index the data forwarded by different devices through forwarders and collectors and then analyze them after indexing.

This analysis is becoming more precise and accurate every day which generates the need to create intelligent SIEM tools that can learn to not create an alert once it learns that a particular criterion is not a security incident. At the same time, this may raise the chances of false-negative reports.

Following is the flow of data into a SIEM tool:

Figure 1: flow of data into a SIEM tool

Deploying some of the SIEM tools in the industry and monitoring them could be costly as it would involve product licenses, some other software like universal forwarders and human resources as well.

SPLUNK AS A TOOL:

Splunk is a s/w tool mostly used to monitor, search, analyze and visualize machine-generated data in real time. It also performs capturing, indexing, and correlating real time data in a searchable container and produces graphs, alerts, dashboards, and visualization.

Important features of Splunk are:

  • Accelerate Development & Testing
  • Generate ROI faster
  • Allows you to build Real-time Data Applications
  • Agile analytics & reporting with the Real-time architecture
  • It also offers search, analysis, and visualization capabilities to empower the users of all types

Figure 2: Splunk Architecture

The main components of Splunk architecture is as follows: -

Universal Forward (UF):

Universal forward is a lightweight component where it pushes data to the heavy Splunk forwarder. You can install Universal Forward on the client side/application server. The job of this is only to forward the log data.

Load Balancer (LB):

Load balancer is a default Splunk load balancer. Which allows you to use your personalized load balancer.

Heavy forward (HF):

Heavy forward is a heavy component. This Splunk component allows you to filter the data. Example: collecting error logs.

Indexer (LB):

Indexer helps you store and index the data. It improves Splunk search performance. By default, Splunk automatically performs the indexing. For example, host, source, and date & time.

Search head (SH):

Search heads are used to obtain intelligence & generate reports.

Deployment Server DS:

Deployment server is used to deploy configuration. For e.g.: — updating the Universal Forwarder file configuration. We can also use a deployment server to share b/w the components we can use the deployment server.

License manager (LM):

The license is based on volume & usage, for example 50 GB per day. Splunk regular checks the licensing details.

Working of Splunk:

FORWARDER>>>INDEXER>>>SEARCH HEAD

Forwarder:

Forwarder is used to collect the data from remote machines then forwards data to the Index in real-time

Indexer:

Indexer is a process of incoming data in real time. It stores & Indexes data on the disk

Search Head:

End users will interact with the Splunk through the Search head where It will allow users to do search, analyze & Visualize.

  • Problem statement

It is really challenging for organizations to prevent threats and malicious behavior on networks. As these malicious attempts of attacks on a company’s network to either steal important information or to bring the company down started increasing in the last decade exponentially, it became crucial to find an automated solution for defense.

An anomaly-based method is used to detect network intrusion which will help in detecting unknown cyber threats but with its execution, another issue started arising. It could cause a high false alert rate. It is difficult for experts to go through a huge list of false alerts and process it in real-time. Another difficulty is labelling the data which would help in the evaluation of generated learning models, it is not an easy task to get labelled data for every switch or device on the network. The main problem as per many recent studies, suggests that the attackers are continuously changing their behaviors, increasing, and decreasing the attacks, thus making it hard anomalies in the existing systems.

This project aims to discover the best possible way to detect cyber threats on neural networks in real-time or on historical data by making use of artificially intelligent Security Information and Event Management (SIEM) solutions.

  • Review of Related Work

In general, an office environment topology looks like this:

We have been a part of big corporate networks for decades now, the bigger the corporate network, the more it earns. Chances are that the adversaries will attack any firm if they see an opportunity to earn money.

Figure 3: General Corporate Network

A detailed overview of work that has already been done on the topic. The section should critically review the pertinent works and provide your evaluation of them.

Cyber Threat detection is the process of finding threats on a network, the main idea behind the project is to detect threats before they come in the category of an attack.

The primary methods for detecting cyber and network threats are Intrusion prevention systems (IPS) and Intrusion detection systems (IDS) on enterprise networks that help to examine the network protocols by signature-based methods. These programs provide intrusion alerts, also called security events, they will in turn help us generate alerts.

SIEM tools, on the other hand, were developed to help us find the Security events and alerts and pinpoint the security breaches and related anomalies. The security information and event management (SIEM) has been focusing on collecting and managing the alerts of IPSs. It is difficult to recognize and detect strong network attacks because of the false alerts and the huge amount of the data. So, in our project, we are focusing on machine learning and artificial intelligence for detecting the attacks.

Discussions of work done by others should lead the reader of the proposal to a clear impression of how you will be building upon what has already been done and how your work differs from or is related to, previous work.

The Intrusion detection system and intrusion prevention system are not enough to prevent malware. The project will be focused on Cyber Threat Detection by collecting logs of data from Artificial Neural Networks or on historical data by using correlation rules or artificially intelligent SIEM tools which were not possible with IPS or IDS.

SIEM tools focus more on real-time detection and reducing the risk of false positives and false negatives. To improve the working of a security operations center, it is very important to monitor the logs 24 hours every day and prepare a report every month.

The originality of our approach comes from the fact that we are examining the NextGen SIEM tools, that support the integration of several integration devices and comply with industry standards like PCI DSS, HIPAA, and SOX. There is a wide range of research and development opportunities in this space as we are still developing the artificial intelligent solution in cybersecurity. The hackers are becoming more and more sophisticated every day and thus, the scope of improvement is endless.

In this project, we will focus on the implementation of AI efficient SIEM solutions available in the market today and try to bring out their artificial intelligence by reduction of false positive and false negative alerts.

  • Proposed Solution

Our proposed solution follows a list of stages, it will include:

  1. Collection of data
  2. Normalization of collected data
  3. Expansion where we collect other high-fidelity data
  4. Enrichment which includes augmented real-time data with intelligent solutions
  5. Automation and Orchestration
  6. Advanced Detection

We will follow a discrete process to bring out the best artificial intelligent solution from our SIEM tools to better detect cybercriminal and develop on earlier work.

Figure 4: Stages of SIEM tools

We will follow a discrete process to bring out the best artificial intelligent solution from our SIEM tools to better detect cybercriminal and develop on earlier work.

  • Methodology
  1. Your Experiment/Simulation detail

We will start by breaking the problem into smaller parts and then completing the parts of the problem.

Our home lab topology:

Figure 5: Our home lab

Our home lab consists of three operating system components that will send different event logs to our SIEM tool through the forwarder that collects the event log from them.

The second part of our home lab is the AWS or cloud data that is generated by the cloud on the go and is sending logs to our SIEM tool through the forwarder that collects the event log from it.

The third part of our home lab is the secondary data collected also called the gigs of data and these kinds of operations are analogous to forensics finding that analyzes past events rather than real-time events.

  1. Test Procedure Design (Flowchart)

The test process flow chart has been made in consideration of a large organization that has different teams involved.

For instance, there is a team called GSOC or Global Security operations center that analyses all SIEM events and incidents and prepares a report periodically on all the logs.

There is another team that is responsible to remediate or mitigate the vulnerabilities and act on any possible attack scenarios.

There is a Board team that needs to be informed in case of a possible attack and detection of attack or resolution of any such attacks.

There is another team called Network operations Centre or NOC, this team is not mentioned in the following flowchart but is very important as this team is responsible for supporting the network of the entire organization.

Figure 6: Test Flowchart

  1. Data/analysis of Test Procedure performed

While performing the data analysis in our SIEM tool we completed the following steps:

Collection:

This stage focuses on how we collect machine data generated by the foundational components of our security infrastructure. These include all security devices, operating systems, cloud etc.

After successfully onboarding data from these four categories, we are done with the forwarding of the data part. We move the critical activity logs to a separate system where they cannot be accessed or easily tampered by an attacker.

Now that we have the data, we can perform the basic investigations.

Normalization

This stage focuses on applying a standard security taxonomy and adding assets and identity data.

The implementation of a Security Operations Center (SOC) starts here, the aim of a SOC is to track systems and users on the network and to analyze a larger selection of detection mechanisms from the vendors and the community. Normalized data will streamline investigations and improve the effectiveness of an analyst.

Now, the data is required to be mapped to the Common Information Model (CIM). Search performance is improved dramatically using accelerated data models associated with CIM.

Asset and user details are correlated to events in your security log platform.

CHALLENGES AT THIS STAGE:

Now that we have the data and can search it in a standard way, you start to understand that some of the most effective security detections come from wire data and deeper endpoint visibility.

DATA SOURCES

At this stage, we need to ensure that the data is compliant with a standard security taxonomy. This means that the fields representing common values like source IP address, port, username, etc. can now have common names regardless of the devices that created those events. This critical investment allows us to start analyzing detection mechanisms from many sources to scale our capabilities as a security team.

Additionally, we can also gather reference information about our IT assets (systems, networks, devices, applications) and our user identities from Active Directory, LDAP, other IAM/SSO systems, etc. This data provides valuable enrichment to aid investigations and day-to-day analysis work.

Expansion

This stage is responsible for collecting additional high-fidelity data sources like endpoint activity and network metadata to drive advanced attack detection. The data sources in this stage will unlock a very rich set of detection capabilities. The smartest of the threat hunters rely on DNS and advanced endpoint data to uncover and track adversaries dwelling in the network.

CHALLENGES AT THIS STAGE:

The network and endpoint data we are collecting is rich in detail, but it lacks context and might contain indicators of compromise that are known to your peer organizations but lay undetected in your environment.

Enrichment

At this stage we can augment security data with intelligence sources to better understand the context and impact of an event. Machine data is important, but high performing security teams enrich their data with other internal and external sources. A wealth of contextual and investigative knowledge including threat intelligence feeds, open-source intelligence (OSINT) sources, and internally sourced information allows us to extract more value from the data we collected to detect security events and incidents sooner.

We can also understand the urgency of an alert based on the criticality of the asset at this stage.

As a team, we were able to quickly enrich alerts discovered in our environment by matching against threat intelligence feeds, pivoting to other systems, and initiating additional context gathering activities.

CHALLENGES AT THIS STAGE:

We now have reached significant detection capabilities, but as our team was operating in an ad-hoc fashion, requests were not tracked, performance is not measured, collaboration is ad-hoc, and lessons learned are not stored and leveraged for future use.

DATA SOURCES

Local IP/URL blocklists

Open source threat intel feeds

Commercial threat intel feeds

Automation and Orchestration

At this stage, we establish a consistent and repeatable security operation capability

We need to continuously monitor the environment for alerts, triage, and respond to threats in a consistent, repeatable, and measurable way. We now need to acquire the ability to track incidents, measure human-analyst effectiveness vs. artificial intelligence.

We must act according to prescribed playbooks which can be different for different organizations.

We learnt to automate simple response actions and combine them into more sophisticated orchestration.

CHALLENGES AT THIS STAGE:

We are functioning as a security organization and are ready to adopt the most advanced techniques to detect threats.

DATA SOURCES

This stage focuses more on what you do with the data you have or being on board from new sources.

  1. Discussion of Project and Results (Include future development and improvement.)

Remediation of event and incident anomalies deals with another whole study or domain that includes Vulnerability Management.

Vulnerability data feed is the early warning system that needs to be explored to get ahead of cyber-attacks and plays a crucial role in making an organization cybersafe.

Network Scanning tools like Qualys and OpenVAS (free vulnerability scanner) can help smaller organizations detect and expect cyber-attacks before they even happen.

Figure 7: Vulnerability Assessment Management

Security orchestration, automation and response- SOAR

SOAR technology helps in Executing, Automating and coordinating with the users/Analyst to respond quickly to security attacks and also improve the process of the Security Operations to find the incident response.

  1. SOAR tools use the playbooks to automate the workflow. Which will improve the process of the SOC Center.
  2. Intelligent automation, Combining security orchestration, interactive investigations and incident management into a single solution.
  3. It also helps in team collaboration and enabling security automatic actions.
  4. It also brings all the platforms to single and centralised console

SOAR follows the below steps to automate the SOC

  1. Ingest
  2. Extract
  3. Detonation
  4. Display Report
  5. Check Malice
  6. Update Database

Playbooks

In this project, we would like to use Playbooks for the cyber threat which is further classified into three steps:-

  1. Detection
  2. Analysis
  3. Remediation

Again each of the steps contains a number of sub-steps that require step by step actions by using various tools.

All steps are the continuous connection of playbooks to solve a bigger problem.

It is hard for anyone association to plan quite a thorough guide and thus the need to have Open Sources standard playbooks that could be handily utilized and consolidated by everyone. In the current circumstance because of the inaccessibility of standard playbooks.

Playbooks, as discussed, are measure steps, so these can be defined. designed/reported in a work process chart, a work process fashioner to construct measure steps graphically for documentation. This would be what was utilized for normalizing the playbooks, that could be utilized as a handbook and shared across groups.

Use case 1

Malware event

Detect

Analyse

Contain

Eradicate

Use case 2

Playbook for DDos

DDOs incident response playbook follows following steps

Detect

Analyze

Contain

Eradicate

Sample Output of OSInt after Automation

Conclusion [conclusion, finding and learning]

As of now we have done indexing, connected all the data logs from different sources to Amazon web services and generated reports and created dashboards going further work on automation of the process and alerts

In this project we found that how to connect different data logs from different sources/systems to Amazon web services and Splunk is very much useful as we are implementing this project majorly depending on this two and also we are learning the ins and outs of the security information and event management as we are using Splunk.

We are getting to know how cybersecurity firms work in real time and grow as a cyber security analyst and it also will help us prepare as a security expert by analyzing different use cases that occur in an attack scenario.

  • References/Bibliography

[1] http://www.riverbed.com/products-solutions/products/performance-management/wireshark-enhancement-products/Wireless-Traffic-Packet-Capture.html

[2]http://shibboleth.internet2.edu/

[3] http://www.aph.gov.au/house/committee/coms/cybercrime/report/full_report.pdf

[4]www.it2trust.com/pdf/Aladdin.SafeWord_PO_SafeWord.pdf

[5]https://www.exabeam.com/siem-guide/siem-architecture/

[6]https://docs.splunk.com/Documentation/Splunk/8.0.6/InheritedDeployment/Confdiscovery

--

--