Implementasi machine learning dengan algoritma logistic regression dan random forest untuk prediksi performa calon mahasiswa baru = Implementation of machine learning with logistic regression and random forest algorithm to predict performance of prospective students

Chandra, Gerry (2020) Implementasi machine learning dengan algoritma logistic regression dan random forest untuk prediksi performa calon mahasiswa baru = Implementation of machine learning with logistic regression and random forest algorithm to predict performance of prospective students. Bachelor thesis, Universitas Pelita Harapan.

[img] Text (Title)
Title.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (1MB)
[img]
Preview
Text (Abstract)
Abstract.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (90kB) | Preview
[img]
Preview
Text (ToC)
ToC.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (568kB) | Preview
[img]
Preview
Text (Chapter1)
Chapter1.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (53kB) | Preview
[img] Text (Chapter2)
Chapter2.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (180kB)
[img] Text (Chapter3)
Chapter3.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (419kB)
[img] Text (Chapter4)
Chapter4.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (690kB)
[img] Text (Chapter5)
Chapter5.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (361kB)
[img] Text (Chapter6)
Chapter6.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (41kB)
[img]
Preview
Text (Bibliography)
Bibliography.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (113kB) | Preview
[img] Text (Appendices)
Appendices.pdf
Restricted to Repository staff only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (2MB)

Abstract

Berkembangnya internet dalam aplikasi industri memberikan dampak kepada institusi pendidikan. Salah satu dampak yang bisa dirasakan adalah terjadinya digitalisasi pada sebagian besar proses akademik universitas. Hal ini membuat proses pengolahan data dalam jumlah besar menjadi krusial untuk membantu pihak yang bersangkutan dalam menghasilkan keputusan yang tepat dan lebih baik. Salah satu masalah yang bisa dijawab dengan mengolah data pendidikan yang tersedia adalah rendahnya persentase mahasiswa lulus tepat waktu yang mempengaruhi akreditasi perguruan tinggi di Indonesia. Penelitian ini bertujuan untuk membentuk suatu model yang mampu melakukan prediksi terhadap performa calon mahasiswa baru. Model utama yang dikembangkan adalah klasifikasi machine learning dengan algoritma Logistic Regression dan Random Forest menggunakan informasi sekolah dan nilai-nilai SMA serta informasi orang tua sebagai variabel. Model machine learning dalam penelitian ini dibangun dengan menggunakan data mahasiswa Universitas Pelita Harapan tahun akademik 2018-2019. Performa model kemudian diukur dengan menggunakan metrik-metrik evaluasi model klasifikasi, yaitu accuracy, precision, recall, F1-score dan AUC. Kedua model utama yang dibangun dalam penelitian ini menghasilkan nilai accuracy dan recall sebesar 0,74, nilai precision sebesar 0,22 dan nilai F1-score sebesar 0,34. Nilai AUC untuk model Logistic Regression adalah 0,82 dan 0,79 untuk model Random Forest. Eksplorasi model dilakukan dalam penelitian ini untuk membuat model yang lebih seimbang antara label kelas dan juga model sederhana dengan hanya menggunakan dua variabel, yaitu nilai Bahasa Inggris dan nilai Matematika saat SMA sebagai predictor. Model tambahan yang dibuat tidak menghasilkan nilai-nilai metrik sebaik model utama. Analisis feature importances dari algoritma Random Forest menunjukkan variabel nilai Bahasa Inggris dan nilai Matematika sebagai dua variabel yang menjadi faktor terpenting untuk menentukan performa calon mahasiswa. / The development of internet in industrial application affects educational institutions. One common example of the revolution is the digitalization of university's academic processes. It makes the process of handling massive amount of data becomes crucial in order to assist the concerning parties to make better decisions. One problem that can be solved by exploring the available educational data is the low percentage of university students who graduated on time that can affect university's accreditation in Indonesia. This research focuses on the development of a model that can be utilized to predict the performance of prospective students. The main model uses the idea of machine learning classification with Logistic Regression and Random Forest algorithm using school information, high school scores and parents or guardian data as input variables. Model built in this research uses students' data of Universitas Pelita Harapan of academic year 2018-2019 as sample. Then, performance of the model will be measured using classification model evaluation metrics, which are accuracy, precision, recall, F1-score and AUC. The main models built in this research gives 0.74 accuracy and recall score, 0.22 precision score and 0.34 F1-score. Logistic Regression model gives 0.82 AUC score, while Random Forest model results in 0.79 AUC score. Model exploration is done in this research to build a model with more balanced class labels and also a simpler model that only uses two input variables, English and Mathematics scores at high school. The performance of the additional models is not as good as the main model. Feature importances analysis of Random Forest algorithm shows that English and Mathematics score at high school are the two most deciding variables to predict the performance of prospective students.

Item Type: Thesis (Bachelor)
Creators:
CreatorsNIMEmail
Chandra, GerryNIM00000024282gerry.chandra@ymail.com
Contributors:
ContributionContributorsNIDN/NIDKEmail
Thesis advisorMartoyo, IhanNIDN0318057301UNSPECIFIED
Uncontrolled Keywords: educational data mining; klasifikasi; logistic regression; machine learning; random forest
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions: University Subject > Current > Faculty/School - UPH Karawaci > Faculty of Science and Technology > Electrical Engineering
Current > Faculty/School - UPH Karawaci > Faculty of Science and Technology > Electrical Engineering
Depositing User: Users 2535 not found.
Date Deposited: 20 Feb 2020 03:07
Last Modified: 15 Jul 2020 06:58
URI: http://repository.uph.edu/id/eprint/7628

Actions (login required)

View Item View Item