Multimodal Learning Menggunakan Efficientnetv2 Dan Distilbert Untuk Deteksi Website Ilegal Dan Implementasinya Pada Browser Extension
Abstract
To address the ineffectiveness of domain blocking for illegal content such as online gambling, pornography, and piracy in Indonesia due to VPNs or new domains, this study proposes a Chromium extension based on multimodal learning. It fuses visual features using EfficientNetV2 M, text using multilingual DistilBERT, and HTML structure via early fusion into a vector of 2,187 dimensions, which is then classified by an MLP into four categories namely normal, online gambling, pornography, and piracy. Using a completely novel dataset consisting of 16,224 samples, the text and image combination achieved the best performance with an accuracy of 88.74%, a Macro F1 Score of 0.8275, and a Macro Recall of 0.812. For real world robustness, the extension utilizes all three modalities comprising text, image, and HTML, successfully classifying 36 of 40 unseen websites. Misclassifications occurred only in the digital piracy category due to limited training data and high visual similarity to legitimate websites..
Downloads
References
S. Mariyam, “Regulasi Konten Ilegal Pada Media Berbasis Teknologi Informasi,” cita huk. indonesa., vol. 1, no. 2, Nov 2022, doi: 10.57100/jchi.v1i2.19.
N. Afiqa, S. Sarah, N. K. Aghna, A. F. J. M. Aziz, dan S. Supriyono, “Degradasi Moralitas Generasi Muda di Era Globalisasi: Relevansi Pendidikan Kewarganegaraan sebagai Benteng Karakter,” J. Pendidik. Tambusai, vol. 9, no. 3, hlm. 37084–37088, 2025.
A. Iqbal, M. N. Aman, R. Rejendran, dan B. Sikdar, “Unveiling the Connection Between Malware and Pirated Software in Southeast Asian Countries: A Case Study,” IEEE Open J. Comput. Soc., vol. 5, hlm. 62–72, 2024, doi: 10.1109/OJCS.2024.3364576.
L. Rafiqah dan H. Rasyid, “The Dampak Judi Online terhadap Kehidupan Sosial Ekonomi Masyarakat,” Al-Mutharahah J. Penelit. Dan Kaji. Sos. Keagamaan, vol. 20, no. 2, hlm. 282–290, Des 2023, doi: 10.46781/al-mutharahah.v20i2.763.
C. Afriliani, N. A. Azzura, dan J. R. B. Sembiring, “Faktor Penyebab dan Dampak dari Kecanduan Pornografi di Kalangan Anak Remaja Terhadap Kehidupan Sosialnya,” Harmony J. Pembelajaran IPS Dan PKN, vol. 8, no. 1, hlm. 7–14, Agu 2023, doi: 10.15294/harmony.v8i1.61470.
F. P. Handoko dan I. M. H. Wijaya, “Efektivitas Permenkominfo No. 19 Tahun 2014 Tentang Penanganan Situs Internet Bermuatan Negatif Terhadap Penyalahgunaan Aplikasi Virtual Private Network,” J. Mhs. Huk. Saraswati, vol. 03, no. 1, hlm. 866–877, 2023.
A. R. Julian, S. Suwarno, dan P. Syah, “Upaya Dan Tantangan Diskomdigi Dalam Pemblokiran Situs Judi Online Di Provinsi Lampung,” vol. 4, no. 1, hlm. 111–123, 2025.
N. Simanjuntak dan A. H. Muhammad, “Analisis Perbandingan Algoritma SVM dan CNN dalam Mendeteksi Website Judi Online Berdasarkan Konten Teks,” Bull. Comput. Sci. Res., vol. 5, no. 4, hlm. 361–371, Jun 2025, doi: 10.47065/bulletincsr.v5i4.586.
F. Çolhak, M. İ. Ecevit, dan H. Dağ, “Transfer Learning for Phishing Detection: Screenshot-Based Website Classification,” dalam 2024 9th International Conference on Computer Science and Engineering (UBMK), Antalya, Turkiye: IEEE, Okt 2024, hlm. 1–6. doi: 10.1109/UBMK63289.2024.10773490.
F. Asdaghi, A. Soleimani, dan M. Zahedi, “A Novel Set of Contextual Features for Web Spam Detection,” Int. J. Nonlinear Anal. Appl., vol. 11, no. 1, Jan 2020, doi: 10.22075/ijnaa.2020.4297.
Y. Chen, R. Zheng, A. Zhou, S. Liao, dan L. Liu, “Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism,” Sensors, vol. 20, no. 14, hlm. 3989, Jul 2020, doi: 10.3390/s20143989.
C. Wang, M. Zhang, F. Shi, P. Xue, dan Y. Li, “A Hybrid Multimodal Data Fusion-Based Method for Identifying Gambling Websites,” Electronics, vol. 11, no. 16, hlm. 2489, Agu 2022, doi: 10.3390/electronics11162489.
R. F. Ramadhan dan A. Fauzan, “Pembatasan Internet Berbasis Ekstensi Web pada Chrome Browser,” Proc. Ser. Phys. Form. Sci., vol. 6, hlm. 192–199, Okt 2023, doi: 10.30595/pspfs.v6i.869.
M. Tan dan Q. Le, “EfficientNetV2: Smaller Models and Faster Training,” dalam Proceedings of the 38th International Conference on Machine Learning, PMLR, 2021, hlm. 10096–10106.
V. Sanh, L. Debut, J. Chaumond, dan T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” dalam 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS, 2019.
N. Aliyah Salsabila, Y. Ardhito Winatmoko, A. Akbar Septiandri, dan A. Jamal, “Colloquial Indonesian Lexicon,” dalam 2018 International Conference on Asian Language Processing (IALP), Bandung, Indonesia: IEEE, Nov 2018, hlm. 226–229. doi: 10.1109/IALP.2018.8629151.
A. Almomani dkk., “Phishing Website Detection With Semantic Features Based on Machine Learning Classifiers: A Comparative Study,” Int. J. Semantic Web Inf. Syst., vol. 18, no. 1, hlm. 1–24, Feb 2022, doi: 10.4018/IJSWIS.297032.
A. Zhang, Z. Lipton, M. Li, dan A. J. Smola, Dive into deep learning. Cambridge New York Port Melbourne New Delhi Singapore: Cambridge University Press, 2024.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Sahal Maghfud, Ulfa Khaira, Akhiyar Waladi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Universitas Harapan Medan






.png)


.png)


.png)

.png)


.png)



