Methodology (Combined Approach)
This research adopts a modular and integrated machine learning (ML) and deep learning (DL) framework tailored to detect and classify multiple types of cyberattacks. Each attack vector (SQLi/XSS, Spear-Phishing, Trojans, and DDoS) has a specific model pipeline, while all components are unified under a centralized Security Operations Center (SOC) system for reporting and visualization.
Step-by-Step Methodology
1. Data Collection
Web Injection (SQLi, XSS): Collected 60,120 samples from public datasets.
Spear-Phishing Emails: Extracted from phishing email corpora including Enron and Nazario datasets.
Trojan Detection: Network traffic datasets from Kaggle and CIC-IDS2017.
DDoS Detection: CIC-DDoS2019 dataset for volumetric attack patterns.
2. Data Preprocessing
Duplicate and null value removal.
Tokenization using:
BERT/RoBERTa: WordPiece
XLNet: SentencePiece
Normalization using StandardScaler for numerical data (Trojan/DDoS).
Class balancing with RandomUnderSampler and custom weights.
3. Feature Extraction
For NLP-based models: Embedding generation using Transformers.
For network traffic: Flow-based, timing, and packet-level features (>80 features).
Domain analysis using:
Levenshtein Distance
Homoglyph detection
4. Model Architecture
Web Injection:
BERT, RoBERTa, XLNet and a hybrid RoBERTa+XLNet transformer model.
Spear-Phishing:
Bi-LSTM + BERT with domain similarity and pattern analysis.
Trojan Detection:
Autoencoder (256→16 bottleneck) + Random Forest Classifier.
DDoS Detection:
Logistic Regression + Autoencoder for anomaly-based classification.
5. Training and Optimization
Loss Function: CrossEntropyLoss for classification.
Optimizer: AdamW with learning rate scheduling.
Techniques used:
Gradient Clipping
Dropout Regularization
Frozen transformer layers to reduce computation.
6. Evaluation
Accuracy, Precision, Recall, F1-Score
Confusion Matrix
Feature importance analysis
Risk scoring for anomalous traffic
7. Remediation & Reporting Module
Automatically generates:
Attack Type
Payload/Source
Reason for flagging
Suggested remediation steps
Dashboards created using Seaborn/Matplotlib.
8. Integration into SOC Framework
All components feed into a central UI.
Real-time alert visualization, case management, and analyst feedback loop.
Modular deployment support (containerized services).