Network intrusion is a growing threat with potentially severe impacts, which can be damaging in multiple ways to network infrastructures and digital/ intellectual assets in the cyberspace. The approach most commonly employed to combat network intrusion is the development of attack detection systems via machine learning and data mining techniques. These systems can identify and disconnect malicious network traffic, thereby helping to protect networks. This chapter systematically reviews two groups of common intrusion detection systems using fuzzy logic and artificial neural networks, and evaluates them by utilizing the widely used KDD 99 benchmark dataset.
Cybersecurity can be assisted by a set of techniques that protect cyberspace and ensure the integrity, confidentiality, and availability of networks, applications, and data. Cybersecurity techniques also have the potential to defend against and recover from any type of attack. More devices, namely, Internet of Things (IoT) devices, are connected to the cyberspace, and cybersecurity has become an elevated concern affecting governments, businesses, other organizations, and individuals. The scope of cybersecurity is broad, and can be grouped into five areas: critical infrastructure, network security, cloud security, application security, and IoT security.
Network intrusion detection systems (NIDS) attempt to identify unauthorized, illicit, and anomalous behavior
based solely on network traffic to support decision-making in network preventative actions by network administrators.
Machine Learning Based Approach
Traditional network intrusion detection systems are mainly developed using available knowledge bases, which are comprised of the specific patterns or strings that correspond to already known network behaviors, i.e., normal traffic and abnormal traffic. These patterns are used to check monitored network traffic to recognize possible threats. Typically, the knowledge bases of such systems are defined based on expert knowledge, and the patterns must be updated to ensure the coverage of new threats. Therefore, the detection performance of traditional network intrusion detection systems depends highly on the quality of the knowledge base. From the theoretical point of view, network intrusion detection systems mainly aim to classify the monitored traffic as either “legitimate” or “malicious.” Therefore, machine learning approaches are appropriate to solve such problems; and they have recently been widely applied to help better manage network intrusion detection issues.
Machine learning (ML) is a field of artificial intelligence, which refers to a set of techniques that give computer systems the ability to “learn.” Typically, machine learning algorithms, such as artificial neural networks, learn from data samples to categorize or find patterns in the data, and enable computer systems to make predictions on new or unseen data instances based on the discovered patterns. Depending on the way of learning, machine learning can be further grouped into two main categories: supervised learning and unsupervised learning. Supervised learning discovers the patterns to map an input to an output based on the labeled input-output pairs of data samples. The classification problem is a typical supervised learning problem, which has been commonly used for solving NIDS problems, such as those reported in. The goal of unsupervised learning is to find a mapping that is able to describe a hidden structure from unlabeled data samples. It is a powerful tool for identifying structures when unlabeled data samples are given. Thanks to the relaxation of the requirement for labels of training data in unsupervised learning, various unsupervised learning approaches have also been widely applied for NIDS problems, such as the clustering-based NIDS and self-organizing map-based NIDS.
Machine Learning Network Intrusion Detection System Architecture
Machine learning and data mining techniques work by establishing an explicit or implicit model that enables the analyzed patterns to be categorized. In general, machine learning techniques are able to deal with three common problems: classification, regression, and clustering. Network intrusion detection can be considered as a typical classification problem. Therefore, a labeled training dataset is usually required for system modeling. A number of machine learning approaches have been used to solve network intrusion detection problems, and all of them consist of three general phases.
- Preprocessing: the data instances that are collected from the network environment are structured, which can then be directly fed into the machine learning algorithm. The processes of feature extraction and feature selection are also applied in this phase
- Training: a machine learning algorithm is adopted to characterize the patterns of various types of data, and build a corresponding system model.
- Detection: once the system model is built, the monitored traffic data will be used as system input to be compared to the generated system model. If the pattern of the observation is matched with an existing threat, an alarm will be triggered.
Both supervised and unsupervised machine learning approaches have already been utilized to solve network intrusion detection problems. For instance, supervised learning-based classifiers have been successfully employed to detect unauthorized access, such as k-nearest neighbor (k-NN), support vector machine (SVM), decision tree, naïve Bayes network, random forests, and artificial neural networks (ANN). In addition, unsupervised learning algorithms, including k-means clustering and self-organized maps (SOM), have also been applied to deal with network intrusion detection problems, with good results.
For various reasons, such as the imbalance of training datasets and the high cost of computational requirement, it is currently very difficult to design a single machine learning approach that outperforms the existing ones. Therefore, hybrid machine learning approaches, such as clustering with classifier and hierarchical classifiers, have attracted a lot of attention in recent years. In addition, some data mining approaches have also been successfully utilized to solve intrusion detection problems.