Penanganan Ketidakseimbangan Data Pada Klasifikasi Penyakit Campak Menggunakan Kombinasi Smote Dan Xgboost
Keywords:
measles, classification, SMOTE, XGBoost, imbalanced dataAbstract
Data imbalance is one of the main challenges in developing disease classification models, as it can cause algorithms to recognize the majority class more dominantly and perform less optimally in detecting positive cases. This study aims to analyze the application of the combination of Synthetic Minority Over-sampling Technique (SMOTE) and XGBoost in measles disease classification. The data used consisted of 1,000 records with clinical features including age, immunization history, fever, cough, runny nose, conjunctivitis, skin rash, and measles status. The research data were divided into two subsets, namely 80% for the model training process and 20% for testing. The SMOTE technique was applied to the training data to address class distribution imbalance, while the XGBoost algorithm was used to build the classification model. Model performance was then evaluated using a confusion matrix and the metrics of accuracy, precision, recall, and F1-score. The results showed that XGBoost without SMOTE achieved an accuracy of 94.0%, precision of 83.3%, recall of 50.0%, and F1-score of 62.5%. After applying SMOTE, the performance improved, with an accuracy of 97.0%, precision of 79.2%, recall of 95.0%, and F1-score of 86.4%. These results indicate that the combination of SMOTE and XGBoost is more effective in improving the detection capability of positive measles cases in imbalanced data..
Downloads
References
I. W. Adhi, S. Gemilang, and I. W. Supriana, “Case-Based Reasoning untuk Diagnosis Penyakit Campak Menggunakan Metode Bayesian Model,” vol. 2, pp. 801–806, 2024.
L. Chaves and G. Marques, “applied sciences Data Mining Techniques for Early Diagnosis of Diabetes,” Appl. Sci., vol. 11, no. 2218, pp. 1–12, 2021.
K. H. Hanif, N. R. Muntiari, D. Harto, and D. S. Wiranata, “Perbandingan Analisis Sentimen Komentar Mahasiswa Prodi Teknik Komputer Menggunakan Algoritma Decision Tree , Support Vector Machine ( SVM ), dan Random Forest,” vol. 12, no. 01, 2026.
W. Apriliah, I. Kurniawan, M. Baydhowi, and T. Haryati, “Prediksi Kemungkinan Diabetes pada Tahap Awal Menggunakan Algoritma Klasifikasi Random Forest,” Sistemasi, vol. 10, no. 1, p. 163, 2021, doi: 10.32520/stmsi.v10i1.1129.
W. Chang, Y. Liu, Y. Xiao, X. Yuan, X. Xu, and S. Zhang, “Phương Pháp Xư Lý Dư Liệu Luận Án.Pdf,” 2019.
M. Rezapour, “Sentiment classification of skewed shoppers’ reviews using machine learning techniques, examining the textual features,” Eng. Reports, vol. 3, no. 1, pp. 1–13, 2021, doi: 10.1002/eng2.12280.
B. Algama et al., “Analisa Perbandingan Metode Arithmetic Mean Filtering Dan Metode Konvolusi Pada Citra,” vol. 5, no. 02, 2023.
A. W. Putera et al., “Klasifikasi Sms Spam Menggunakan Algoritma K-Nearest Neighbor,” vol. 5, no. 01, 2023.
T. M. Prasetyo, A. Amrullah, S. Syahrir, and B. N. Sari, “Implementasi Algoritma Svm ( Support Vector Machine ) Dalam Klasifikasi Penyakit Paru-Paru Berdasarkan,” vol. 6, no. 1, 2022.
D. Andriyani, A. Faqih, and S. E. Permana, “The Effect of SMOTE Application on Support Vector Machine Performance in Sentiment Classification on Imbalanced Datasets,” vol. 4, no. 2, 2025.
N. R. Muntiari, K. Nisa, A. S. Sandi A, I. A. Ashari, K. H. Hanif, and R. W. Dwinanto, “Comparison of random forest algorithm, support vector machine, and k-nearest neighbor for diabetes disease classification,” AIP Conf. Proc., vol. 2706, pp. 1–8, 2023, doi: 10.1063/5.0120218.
K. H. Hanif, A. Fadllullah, N. R. Muntiari, and I. A. Fahrezi, “A Comparative Sentiment Analysis of Computer Engineering Student Feedback Using Decision Trees and SVM,” vol. 10, no. 1, pp. 71–82, 2025, doi: 10.31572/inotera.Vol10.Iss1.2025.ID436.
N. R. Muntiari and K. H. Hanif, “Klasifikasi Penyakit Kanker Payudara Menggunakan Perbandingan Algoritma Machine Learning,” J. Ilmu Komput. dan Teknol., vol. 3, no. 1, pp. 1–6, 2022, doi: 10.35960/ikomti.v3i1.766.
N. R. Muntiari, K. Nisa, A. S. Sandi, I. A. Ashari, A. Kharis Hudaiby Hanif, and R. W. Dwinanto, “Comparison of random forest algorithm, support vector machine, and k-nearest neighbor for diabetes disease classification,” no. May, 2023.
R. Muntiar, Novita Ranti, Kharis Hudaiby Hanif, Syamsiah, “Klasifikasi Penyakit Preekslamsia Pada Ibu Hamil Menggunakan Perbandingan Algoritma Machine Learning,” vol. 13, no. 2, pp. 96–102, 2025.
A. Zaelani, M. Fikriansyah, M. Syahdan, and I. A. Hakim, “Peran Keamanan Basis Data Relasional dalam Menjamin Kualitas Data untuk Proses Data Mining : Studi Kasus Klasifikasi Aktivitas Akses Berisiko,” vol. 4, no. 2, pp. 5286–5292, 2025.
N. R. Muntiari, K. H. Hanif, and W. Rahmaniar, “Application of the Certainty Factor Method for Diagnosing Osteoarthritis Using the Python Programming Language,” J. Adv. Heal. Informatics Res., vol. 1, no. 1, pp. 21–27, 2023.
N. R. Muntiari, I. C. Nisa, A. Sriekaningih, A. Y. Adyatma, and M. Yusril, “Penerapan Algoritma YOLOv8 Dalam Indentifikasi Wajah secara Real- Time menggunakan CCTV untuk Presensi Siswa,” vol. 4, no. 3, pp. 1155–1165, 2024.
T. Azhima, Y. Siswa, T. Informatika, F. Sains, U. Muhammadiyah, and K. Timur, “Komparasi Optimasi Chi-Square , CFS , Information Gain Dan ANOVA Dalam Evaluasi Peningkatan Akurasi Algoritma Klasifikasi Data Performa Akademik Mahasiswa,” vol. 18, no. 1, 2023.
R. M. 2 Muntiari, Novita Ranti, “A Bibliometric Analysis of Knowledge Distillation in Medical Image Segmentation,” vol. 2, no. 3, pp. 115–126, 2024, doi: 10.59247/jahir.v2i3.297.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Novita Ranti Muntiari, Kharis Hudaiby Hanif, Muliyadi, Mufida

This work is licensed under a Creative Commons Attribution 4.0 International License.
Universitas Harapan Medan






.png)


.png)


.png)

.png)


.png)



