Pendeteksi hoax pada komentar YouTube menggunakan Cosine Similarity dan Naive Bayes

Hiroshi, Ryan (2020) Pendeteksi hoax pada komentar YouTube menggunakan Cosine Similarity dan Naive Bayes. Bachelor thesis, Universitas Pelita Harapan.

[img] Text (Title)
SKRIPSI RYAN MERGED-1-4_watermark.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (16MB)
[img]
Preview
Text (Abstract)
SKRIPSI RYAN MERGED-5-6.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (150kB) | Preview
[img]
Preview
Text (ToC)
SKRIPSI RYAN MERGED-7-13.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (201kB) | Preview
[img]
Preview
Text (Chapter1)
SKRIPSI RYAN MERGED-14-19.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (160kB) | Preview
[img] Text (Chapter2)
SKRIPSI RYAN MERGED-20-28.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (284kB)
[img] Text (Chapter3)
SKRIPSI RYAN MERGED-29-40.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (258kB)
[img] Text (Chapter4)
SKRIPSI RYAN MERGED-41-63.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (366kB)
[img] Text (Chapter5)
SKRIPSI RYAN MERGED-64-65.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (81kB)
[img]
Preview
Text (Bibliography)
SKRIPSI RYAN MERGED-66-67.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (143kB) | Preview

Abstract

YouTube merupakan sebuah situs web berbagi video, dimana terdapat banyak sekali orang yang menonton dan berbagi pendapatnya melalui kolom komentar yang tersedia di YouTube. Membedakan komentar yang benar ataupun bohong pada komentar YouTube merupakan hal yang sulit dilakukan mengingat banyaknya penyebaran berita bohong yang marak terjadi di sekitar kita. Berdasarkan hal tersebut, dibutuhkan sebuah penelitian yang dapat mendeteksi komentar yang mengandung unsur bohong pada komentar Youtube diperlukan. Metode-metode yang akan digunakan adalah Latent Dirichlet Allocation (LDA), Term Frequency-Inverse Document Frequency (TF-IDF), cosine similarity, dan Naïve Bayes. Langkah pertama yang harus dilakukan adalah mengumpulkan data dari komentar-komentar Youtube dan sebuah kumpulan data yang berisi berita-berita bohong. Langkah kedua adalah pembersihan data, yakni data yang kosong atau tidak lengkap akan dibuang. Langkah ketiga adalah menentukan topik dari komentar menggunakan LDA. Langkah keempat adalah menentukan bobot dari setiap kata yang relevan dan yang tidak dengan menggunakan TF-IDF. Kemudian langkah kelima adalah membandingkan data berita-berita bohong dengan komentar YouTube menggunakan cosine similarity. Langkah keenam adalah memasukkan data dari langkah sebelumnya ke dalam Naïve Bayes classifier untuk diprediksi. Penelitian ini telah berhasil mengidentifikasi topik dari komentar yang memiliki indikasi hoax dan perbandingan performa model prediksi dengan menggunakan 80% data training, 70% data training, 60% data training, topik yang sering dijadikan hoax, dan evaluasi model prediksi dengan menggunakan confusion matrix. Performa model terbaik dicapai dengan menggunakan 70% data training dan penggunaan 2-gram dengan nilai accuracy 99.989, precision 0.88, dan recall 0.518. / YouTube is a video sharing platform, where there are so many people watching and sharing their opinion inside the comment section provided on Youtube. Distinguishing real or fake opinion in the YouTube comment is a hard thing to do considering there is so many fake news that circulates around us. Because of that, a system that can detect comments that contain fake information in YouTube comments is needed. The methods used are Latent Dirichlet Allocation (LDA), Term Frequency-Inverse Document Frequency (TF-IDF), cosine similarity, and Naïve Bayes. The first step that needs to be done is collecting data YouTube comments and data which contains fake news. The second step is cleaning the data, as empty or incomplete data need to be deleted. The third step is determine topic for each comment using LDA. The fourth step is calculating the weight of each word that is relevant and not relevant using TF-IDF (Term Frequency-Inverse Document Frequency). Then the fifth step is comparing the fake news data with the YouTube Comments using cosine similarity. The sixth step is to insert the data from the previous step to Naïve Bayes classifier to be predicted. This research has successfully identified the topic of comments that indicated as hoax and comparison between prediction model using 80% training data, 70% training data, 60% training data, topics which often used as hoax, and prediction model evaluation using confusion matrix. The best model performance achieved by using 70% training data and using 2-gram with 99.989 accuracy, 0.88 precision, and 0.518 recall.

Item Type: Thesis (Bachelor)
Creators:
CreatorsNIMEmail
Hiroshi, RyanNIM00000022116ryanhiroshi@gmail.com
Contributors:
ContributionContributorsNIDN/NIDKEmail
Thesis advisorMurwantara, I MadeNIDN0302057304UNSPECIFIED
Thesis advisorPanduwinata, FransNIDN0306028201UNSPECIFIED
Additional Information: SK 82-16 HIR p
Uncontrolled Keywords: machine learning; fake news detection; natural language processing
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: University Subject > Current > Faculty/School - UPH Karawaci > School of Information Science and Technology > Informatics
Current > Faculty/School - UPH Karawaci > School of Information Science and Technology > Informatics
Depositing User: Ryan Hiroshi
Date Deposited: 03 Aug 2020 01:35
Last Modified: 08 Sep 2021 07:54
URI: http://repository.uph.edu/id/eprint/9586

Actions (login required)

View Item View Item