Komparasi kinerja algoritme Naive Bayes Classifier dan K-Nearest Neighbor dalam analisis sentimen pada media sosial X dengan Vader Lexicon

Thiang, Steven (2025) Komparasi kinerja algoritme Naive Bayes Classifier dan K-Nearest Neighbor dalam analisis sentimen pada media sosial X dengan Vader Lexicon. Bachelor thesis, Universitas Pelita Harapan.

[thumbnail of Title] Text (Title)
Title.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (234kB)
[thumbnail of Abstract] Text (Abstract)
Abstract.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (419kB)
[thumbnail of ToC] Text (ToC)
ToC.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (759kB)
[thumbnail of Chapter1] Text (Chapter1)
Chapter1.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (886kB)
[thumbnail of Chapter2] Text (Chapter2)
Chapter2.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (3MB)
[thumbnail of Chapter3] Text (Chapter3)
Chapter3.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (7MB)
[thumbnail of Chapter4] Text (Chapter4)
Chapter4.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (4MB)
[thumbnail of Chapter5] Text (Chapter5)
Chapter5.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (352kB)
[thumbnail of Bibliography] Text (Bibliography)
Bibliography.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (501kB)
[thumbnail of Appendices] Text (Appendices)
Appendices.pdf
Restricted to Repository staff only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (7MB)

Abstract

Meningkatnya penggunaan media sosial sebagai sarana penyampaian opini publik menjadikan platform X (sebelumnya Twitter) sebagai sumber data penting untuk analisis sentimen. Namun, besarnya volume data yang terus bertambah menimbulkan tantangan dalam proses analisis manual yang tidak efisien, sehingga diperlukan metode otomatis yang akurat dan efisien. Penelitian ini bertujuan untuk membandingkan performa algoritme Naïve Bayes Classifier dan K-Nearest Neighbor (KNN) dalam klasifikasi sentimen terhadap topik kenaikan Pajak Pertambahan Nilai (PPN) pada media sosial X. Untuk mendukung akurasi klasifikasi, pelabelan sentimen dilakukan secara otomatis menggunakan Vader Lexicon. Metodologi penelitian meliputi scraping data dari media sosial X, pelabelan sentimen secara otomatis, implementasi dan pelatihan model klasifikasi, serta evaluasi performa menggunakan Confusion Matrix dan kurva ROC. Hasil penelitian menunjukkan bahwa algoritme KNN dengan nilai k = 1 memiliki performa terbaik dengan akurasi 93,19%, presisi 94,07%, recall 92,96%, dan misclassification error 6,81%, serta AUC sebesar 0,95. Sedangkan, Naïve Bayes Classifier memperoleh akurasi 88,29%, presisi 87,43%, recall 86,67%, misclassification error 11,71%, dan AUC 0,93. Dengan demikian, KNN terbukti lebih unggul dalam mengklasifikasikan sentimen secara lebih akurat dan efisien dibandingkan Naïve Bayes Classifier. /The increasing use of social media as a platform for expressing public opinion has made platform X (formerly Twitter) an important data source for sentiment analysis. However, the ever-growing volume of data poses challenges for manual analysis, which is inefficient, thus necessitating accurate and efficient automated methods. This study aims to compare the performance of the Naïve Bayes Classifier and K-Nearest Neighbor (KNN) algorithms in sentiment classification on the topic of the Value Added Tax (VAT) increase on social media platform X. To support classification accuracy, sentiment labeling is carried out automatically using the Vader Lexicon. The research methodology includes data scraping from social media X, automatic sentiment labeling, implementation and training of classification models, and performance evaluation using a Confusion Matrix and ROC curve. The results show that the KNN algorithm with k = 1 achieved the best performance with an accuracy of 93.19%, precision of 94.07%, recall of 92.96%, a misclassification error of 6.81%, and an AUC of 0.95. In contrast, the Naïve Bayes Classifier achieved an accuracy of 88.29%, precision of 87.43%, recall of 86.67%, misclassification error of 11.71%, and an AUC of 0.93. Therefore, KNN is proven to be superior in classifying sentiment more accurately and efficiently than the Naïve Bayes Classifier.
Item Type: Thesis (Bachelor)
Creators:
Creators
NIM
Email
ORCID
Thiang, Steven
NIM03082210020
steventhiang3@gmail.com
UNSPECIFIED
Contributors:
Contribution
Contributors
NIDN/NIDK
Email
Thesis advisor
Chandra, Wenripin
NIDN0116088001
wenripin@lecturer.uph.edu
Uncontrolled Keywords: Analisis Sentimen; Media Sosial X; Naïve Bayes Classifier; K-Nearest Neighbor; Vader Lexicon; Klasifikasi Teks
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Depositing User: Steven Thiang
Date Deposited: 18 Jul 2025 07:38
Last Modified: 18 Jul 2025 07:38
URI: http://repository.uph.edu/id/eprint/69722

Actions (login required)

View Item
View Item