A Comparative Evaluation of a Multimodal Approach for Spam Email Classification Using DistilBERT and Structural Features

Asliyuksek, Halim; Tonkal, ÖZGÜR; Kocaoglu, Ramazan

doi:10.3390/electronics14193855

A Comparative Evaluation of a Multimodal Approach for Spam Email Classification Using DistilBERT and Structural Features

Asliyuksek H., Tonkal Ö., Kocaoglu R.

Electronics (Switzerland), cilt.14, sa.19, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14 Sayı: 19
Basım Tarihi: 2025
Doi Numarası: 10.3390/electronics14193855
Dergi Adı: Electronics (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
Anahtar Kelimeler: email filtering, machine learning, natural language processing, spam detection, text classification
Samsun Üniversitesi Adresli: Evet

Özet

This study aims to improve the automatic detection of unwanted emails using advanced machine learning and deep learning methods. By reviewing current research over the past five years, a comprehensive combined dataset structure was created containing a total of 81,586 email samples from seven different spam datasets. Class imbalance was addressed through the application of random oversampling and class-weighted loss, and the decision threshold was subsequently tuned for deployment. Among classical machine learning solutions, Random Forest (RF) emerged as the most successful method, while deep learning approaches, such as Transformer-based models like Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) and Robustly Optimized BERT Pretraining Approach (RoBERTa), demonstrated superior performance. The highest test score (99.62%) on a combined static dataset was achieved with a multimodal architecture that combines deep meaningful text representations from DistilBERT with structural text features. Beyond this static performance benchmark, the study investigates the critical challenge of concept drift by performing a temporal analysis on datasets from different eras. The results reveal a significant performance degradation in all models when tested on modern spam, highlighting a critical vulnerability of statically trained systems. Notably, the Transformer-based model demonstrated greater robustness against this temporal decay compared to traditional methods. This study offers not only an effective classification solution but also provides crucial empirical evidence on the necessity of adaptive, continually learning systems for robust spam detection.