Christian, Dennis (2022) Penggunaan machine learning untuk prediksi hasil pemilu dan membandingkan merek menggunakan analisis sentimen data Twitter = Implementation of machine learning for election results prediction and comparing brands using sentiment analysis of Twitter data. Bachelor thesis, Universitas Pelita Harapan.
Preview
Title.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (157kB) | Preview
Preview
Abstract(1).pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (371kB) | Preview
Preview
ToC.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (227kB) | Preview
Preview
Chapter1.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (251kB) | Preview
![Chapter2 [thumbnail of Chapter2]](http://repository.uph.edu/style/images/fileicons/text.png)
Chapter2.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (454kB)
![Chapter3 [thumbnail of Chapter3]](http://repository.uph.edu/style/images/fileicons/text.png)
Chapter3.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (270kB)
![Chapter4 [thumbnail of Chapter4]](http://repository.uph.edu/style/images/fileicons/text.png)
Chapter4.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (478kB)
![Chapter5 [thumbnail of Chapter5]](http://repository.uph.edu/style/images/fileicons/text.png)
Chapter5.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (597kB)
![Chapter6 [thumbnail of Chapter6]](http://repository.uph.edu/style/images/fileicons/text.png)
Chapter6.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (233kB)
Preview
Bibliography.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (216kB) | Preview
![Appendices [thumbnail of Appendices]](http://repository.uph.edu/style/images/fileicons/text.png)
Appendices.pdf
Restricted to Repository staff only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (2MB)
Abstract
Kemudahan pengambilan data di era digitalisasi ini memungkinkan machine learning bisa dilatih dengan dataset yang sangat besar sehingga teknologi machine learning semakin maju. Sebagai contoh, data Twitter bisa digunakan untuk analisis sentimen, yaitu mengklasifikasi tulisan ke dalam kategori positif, negatif, dan bisa juga netral. Model machine learning yang diuji adalah Naive Bayes dan dua jenis deep learning, yaitu Convolutional Neural Network (CNN), dan Recurrent Neural Network (RNN). Deep learning merupakan bagian dari machine learning yang menggunakan algoritma neural network. Ketiga model tersebut bekerja dengan cara yang berbeda, Naive Bayes memprediksi kategori suatu kalimat dengan menghitung probabilitas menggunakan rumus teorema Bayes, CNN melakukan konvolusi data input terlebih dahulu sebelum dimasukkan ke neural network, sedangkan RNN memiliki memori yang memungkinkan untuk mengingat input sebelumnya untuk diproses bersama input setelahnya. Untuk membandingkan performa ketiga model, training dilakukan menggunakan dataset Bahasa Inggris dengan kategori positif dan negatif yang di-download dari thinknook. Dari ketiga model tersebut, penulis memilih RNN karena memiliki akurasi paling tinggi. Untuk meningkatkan lagi akurasinya, penulis menyortir kata-kata dalam dataset training dan testing menggunakan Stanford NER untuk menghapus kata-kata dengan topik yang tidak terlalu mengandung sentimen. Penulis juga melakukan penggeseran threshold output. Setelah prosedur-prosedur optimasi tersebut dilakukan, didapatkan akurasi 78,11% saat model diuji menggunakan data Twitter yang sudah dilabeli oleh penulis secara manual. Model yang akurasinya sudah meningkat ini kemudian digunakan untuk menganalisis sentimen Twitter terhadap merek Coca-Cola, Pepsi, Steam, Epic Games, dan Nokia namun hasil menunjukkan bahwa persentase sentimen positif terhadap suatu merek tidak terlalu berkorelasi dengan kinerja merek tersebut. Kemudian model digunakan untuk menganalisis sentimen Twitter terhadap calon presiden Amerika Serikat dan Indonesia. Didapatkan bahwa analisis sentimen ini dapat memprediksi calon mana yang dipilih paling banyak. / The ease of data retrieval in this digitalization era allows machine learning to be trained with very large datasets so that machine learning technology is advancing. For example, Twitter data can be used for sentiment analysis, which is to classify posts into positive, negative, and neutral categories. The machine learning model tested is Naive Bayes and two types of deep learning, namely Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). Deep learning is part of machine learning that uses neural network algorithms. The three models work in different ways, Naive Bayes predicts the category of a sentence by calculating the probability with Bayes theorem formula, CNN convolutes the input data first before it is entered into the neural network, while RNN has a memory that allows it to remember the previous input to be processed with the next input. To compare the performance of the three models, training was conducted using an English dataset with positive and negative categories that downloaded from thinknook. Of the three models, the author chose RNN because it has the highest accuracy. To further improve its accuracy, the author sort the words in the training and testing dataset using Stanford NER to remove words with less sentimental topics. The author also shifts the output threshold. After the optimization procedures were carried out, an accuracy of 78.11% was obtained when the model was tested using Twitter data that had been manually labeled by the author. The model with increased accuracy is then used to analyze Twitter sentiment towards the Coca-Cola, Pepsi, Steam, Epic Games, and Nokia brands, but the results show that the percentage of positive sentiment towards a brand is not highly correlated with the brand's performance. Then the model is used to analyze Twitter sentiment towards the presidential candidates of the United States and Indonesia. It was found that this sentiment analysis can predict which candidate is chosen the most.
Item Type: | Thesis (Bachelor) |
---|---|
Creators: | Creators NIM Email ORCID Christian, Dennis NIM01032180010 denyz3001@gmail.com UNSPECIFIED |
Contributors: | Contribution Contributors NIDN/NIDK Email Thesis advisor Martoyo, Ihan NIDN0318057301 ihan.martoyo@uph.edu |
Uncontrolled Keywords: | Analisis sentimen; Machine learning; RNN; Twitter |
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Divisions: | University Subject > Current > Faculty/School - UPH Karawaci > Faculty of Science and Technology > Electrical Engineering Current > Faculty/School - UPH Karawaci > Faculty of Science and Technology > Electrical Engineering |
Depositing User: | Users 9168 not found. |
Date Deposited: | 21 Feb 2022 07:19 |
Last Modified: | 21 Feb 2022 07:19 |
URI: | http://repository.uph.edu/id/eprint/46517 |