The increasing sophistication and frequency of cyber threats have driven the evolution of Security Operations Centers (SOCs) into highly automated, intelligence-driven hubs. However, current SOCs face multiple challenges, including alert fatigue, false positives, and limited scalability. To address these, researchers have explored advanced machine learning (ML) and deep learning (DL) approaches for improving threat detection, response time, and decision-making. This literature survey summarizes key contributions related to the four core components of the project: web-based injection attack detection, spear-phishing email detection, Trojan detection, and DDoS attack detection.
1. Web-Based Injection Attack Detection (SQLi & XSS)
Traditional approaches like static signature-based detection or regex matching often fail to capture the contextual nature and variability of injection attacks. Studies have shown that transformer-based models—particularly BERT, RoBERTa, and XLNet—offer significant improvements by understanding the semantic and sequential structure of attack payloads.
Salam et al. (2023) proposed hybrid transformer models for industrial web threat detection, demonstrating the superiority of contextual embeddings.
Vaswani et al. (2017) introduced the Transformer architecture, laying the foundation for models like BERT and RoBERTa that can effectively understand complex textual patterns in payloads.
Ullah et al. (2022) highlighted the application of deep learning sparse autoencoders in anomaly detection within network traffic.
Despite their effectiveness, individual transformer models still exhibit computational overhead. Research into hybrid models aims to balance performance and scalability.
2. Spear-Phishing Email Detection
Email-based spear-phishing attacks are tailored and evade traditional filtering techniques. Existing ML-based filters often rely on superficial features (e.g., header info), leading to high false positive/negative rates.
Do et al. (2022) conducted a comprehensive survey on deep learning for phishing detection, identifying dataset imbalance and feature dependency as critical challenges.
Al-Hamar et al. (2021) proposed organization-specific phishing detection by analyzing email semantics rather than metadata alone.
Qi et al. (2023) demonstrated the effectiveness of ensemble learning and undersampling for phishing classification tasks.
Recent works emphasize the use of Bi-LSTM and BERT models to capture sequential patterns and contextual clues in email content. Integration of domain similarity analysis (Levenshtein Distance, homoglyph detection) further enhances precision.
3. Trojan Malware Detection
Most Trojan detection systems depend on signature or heuristic-based engines, which are ineffective against polymorphic or zero-day Trojans. Literature suggests shifting toward anomaly detection using unsupervised learning.
Faizal et al. (2022) proposed ML-based detection with 80+ traffic-based features and achieved high detection accuracy.
Xie et al. (2020) utilized hierarchical spatio-temporal models for HTTP-based Trojan analysis.
Huang et al. (2020) explored hardware-level Trojan detection and discussed how deep learning enhances malware resilience.
Advanced approaches include deep autoencoders for unsupervised feature extraction and random forest classifiers for binary classification. Emphasis is also placed on reducing false positives and improving scalability across large datasets.
4. DDoS Attack Detection
DDoS remains a significant threat due to its evolving nature and sheer volume. Traditional signature-based and threshold detection methods are reactive and inflexible.
Adedeji et al. (2023) provided a conceptual and research roadmap for DDoS detection, advocating for ML-based anomaly detection.
Kumari & Mrunalini (2022) explored supervised learning algorithms like Logistic Regression and Random Forest for DDoS classification.
Tiwari et al. (2024) demonstrated a real-time, signature-based defense mechanism, though limited to known threats.
Recent trends emphasize hybrid detection pipelines combining autoencoders for anomaly detection with classifiers for response. Predictive remediation and contextual reporting have also gained attention for real-time mitigation.