Network Intrusion Detection
In this case study we need to predict anomalies and attacks in the network.
Business Problem:
The task is to build network intrusion detection system to detect anomalies and attacks in the network.
There are two problems.
- Binomial Classification: Activity is normal or attack.
- Multinomial classification: Activity is normal or DOS or PROBE or R2L or U2R .
Data Availability:
This data is KDDCUP’99 data set, which is widely used as one of the few publicly available data sets for network-based anomaly detection systems.
For more about data you can visit to http://www.unb.ca/cic/datasets/nsl.html
BASIC FEATURES OF EACH NETWORK CONNECTION VECTOR
- Duration: Length of time duration of the connection
- Protocol_type: Protocol used in the connection
- Service: Destination network service used
- Flag: Status of the connection – Normal or Error
- Src_bytes: Number of data bytes transferred from source to destination in single connection
- Dst_bytes: Number of data bytes transferred from destination to source in single connection
- Land: if source and destination IP addresses and port numbers are equal then, this variable takes value 1 else 0
- Wrong_fragment: Total number of wrong fragments in this connection
- Urgent: Number of urgent packets in this connection. Urgent packets are packets with the urgent bit activated.
- Hot: Number of „hot‟ indicators in the content such as: entering a system directory, creating programs and executing programs.
- Num_failed _logins: Count of failed login attempts.
- Logged_in Login Status: 1 if successfully logged in; 0 otherwise.
- Num_compromised: Number of “compromised’ ‘ conditions.
- Root_shell: 1 if root shell is obtained; 0 otherwise.
- Su_attempted: 1 if “su root” command attempted or used; 0 otherwise.
- Num_root: Number of “root” accesses or number of operations performed as a root in the connection.
- Num_file_creations: Number of file creation operations in the connection.
- Num_shells: Number of shell prompts.
- Num_access_files: Number of operations on access control files .
- Num_outbound_cmds: Number of outbound commands in an ftp session.
- Is_hot_login: 1 if the login belongs to the “hot” list i.e., root or admin; else 0.
- Is_guest_login: 1 if the login is a “guest” login; 0 otherwise .
- Count: Number of connections to the same destination host as the current connection in the past two seconds
- Srv_count: Number of connections to the same service (port number) as the current connection in the past two seconds.
- Serror_rate: The percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in count (23 )
- Srv_serror_rate: The percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in srv_count (24)
- Rerror_rate: The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in count (23)
- Srv_rerror_rate: The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in srv_count (24)
- Same_srv_rate: The percentage of connections that were to the same service, among the connections aggregated in count (23)
- Diff_srv_rate: The percentage of connections that were to different services, among the connections aggregated in count (23)
- Srv_diff_host_ rate: The percentage of connections that were to different destination machines among the connections aggregated in srv_count (24)
- Dst_host_count: Number of connections having the same destination host IP address.
- Dst_host_srv_ count: Number of connections having the same port number.
- Dst_host_same _srv_rate: The percentage of connections that were to the same service, among the connections aggregated in dst_host_count (32) .
- Dst_host_diff_ srv_rate: The percentage of connections that were to different services, among the connections aggregated in dst_host_count (32)
- Dst_host_same _src_port_rate: The percentage of connections that were to the same source port, among the connections aggregated in dst_host_srv_c ount (33) .
- Dst_host_srv_ diff_host_rate: The percentage of connections that were to different destination machines, among the connections aggregated in dst_host_srv_count (33).
- Dst_host_serro r_rate: The percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in dst_host_count (32).
- Dst_host_srv_s error_rate: The percent of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in dst_host_srv_c ount (33).
- Dst_host_rerro r_rate: The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in dst_host_count (32) .
- Dst_host_srv_r error_rate: The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in dst_host_srv_c ount (33).
Attack Class:
Let’s develop a machine learning model for further analysis.