Prediksi harga rumah dengan metode regresi linear berganda dan random forest = Predicting housing prices with multiple linear regression and random forest

Natasha, Zerlina (2024) Prediksi harga rumah dengan metode regresi linear berganda dan random forest = Predicting housing prices with multiple linear regression and random forest. Bachelor thesis, Universitas Pelita Harapan.

[img] Text (Title)
Title.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (382kB)
[img] Text (Abstract)
Abstract.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (902kB)
[img] Text (ToC)
ToC.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (776kB)
[img] Text (Chapter1)
Chapter1.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (671kB)
[img] Text (Chapter2)
Chapter2.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (858kB)
[img] Text (Chapter3)
Chapter3.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (821kB)
[img] Text (Chapter4)
Chapter4.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (1MB)
[img] Text (Chapter5)
Chapter5.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (699kB)
[img] Text (Bibliography)
Bibliography.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (669kB)
[img] Text (Appendices)
Appendices.pdf
Restricted to Repository staff only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (21MB)

Abstract

Data mining adalah sebuah proses untuk menemukan informasi serta pola yang bermanfaat dalam data dengan jumlah yang besar. Proses data mining mencakup pengumpulan data, ekstraksi data, analisis data, dan statistik data. Proses data mining dapat digunakan untuk memberikan solusi untuk bermacam-macam masalah yang ada di berbagai bidang. Salah satunya prediksi harga rumah. Kepemilikan atas properti rumah juga menjadi salah satu tujuan pencapaian hidup orang dewasa muda yang baru bekerja. Akan tetapi, rumah menjadi salah satu instrumen investasi bagi banyak investor yang berakibat pada melonjaknya harga rumah. Penentuan pembelian rumah dipengaruhi oleh lokasi, fasilitas dalam rumah, jarak dari pusat bisnis, serta tahun pembangunan rumah. Penelitian ini bertujuan untuk memprediksi harga rumah dengan metode Random Forest (RF) dan regresi linear berganda. Metode yang digunakan adalah dengan melakukan simulasi di aplikasi Rstudio dengan data dari kaggle.com. Performa akurasi model dievaluasi menggunakan pengukuran analisis galat yang terdiri dari Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), dan Mean Absolute Error (MAE). Hasil penelitian menyatakan bahwa metode RF memiliki kinerja yang lebih baik daripada regresi linear berganda dalam memprediksi harga rumah. Model RF dengan 300 pohon menunjukkan performa terbaik dengan MSE pada data tes sebesar 4,58×1010 dan RMSE pada data tes sebesar 2,14 × 105 , dibandingkan dengan regresi linear yang memiliki MSE pada data tes 1,32 × 1011 dan RMSE pada data tes sebesar 3,64 × 105 . Penambahan jumlah pohon dalam model random forest dari 300 ke 500 tidak secara signifikan mempengaruhi kinerja model, dengan perbedaan RMSE pada data tes kurang dari 0,5%. Penambahan jumlah pohon dalam model random forest tidak secara signifikan memengaruhi kinerja model, menunjukkan bahwa penambahan jumlah pohon tidak diperlukan setelah mencapai suatu titik tertentu. Hasil analisis lebih lanjut menyatakan bahwa pada metode regresi berganda, variabel yang paling berpengaruh dalam memprediksi harga rumah adalah CouncilArea, Rooms, Type, Distance, Bathroom, YearBuilt, BuildingArea, Car, Regionname, Landsize, dan Propertycount. Dalam metode RF, variabel yang paling berpengaruh adalah Distance, BuildingArea, dan YearBuilt. Kesimpulan dari penelitian ini adalah performa metode RF lebih baik dibandingkan metode regresi linear berganda dalam memprediksi harga rumah, meskipun perlu vi diperhatikan terjadinya overfitting. Penambahan jumlah pohon dalam model RF tidak selalu diperlukan setelah mencapai titik optimal 300 pohon. Penelitian ini tidak hanya memberikan pengetahuan tentang pemilihan dan optimasi model untuk prediksi harga rumah, tetapi juga mengidentifikasi faktor-faktor penting yang memengaruhi harga rumah dalam kedua metode tersebut. Temuan ini dapat bermanfaat bagi berbagai pihak dalam industri properti, dari pembeli rumah hingga pengembang dan investor. / Data mining is a process to find useful information and patterns in large amounts of data. The data mining process includes data collection, data extraction, data analysis, and statistics. One of real-life application of data mining is forecasting house price. Ownership of home property is also one of the goals of achievement of young adults who are new to work. However, houses have become one of the investment instruments for many investors. This has led to the rise in house prices. The determination of buying a house is influenced by the location, the facilities in the house, the distance from the business center, as well as the year of construction of the house. This study aims to predict the price of houses using Random Forest (RF) and multiple linear regression model. The research method used is to simulate in the Rstudio application with data from kaggle.com. The accuracy performance of the models is evaluated using error analysis measurements consisting of Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Error (MAE). The research results state that the RF method has better performance than multiple linear regression in predicting house prices. The RF model with 300 trees shows the best performance with an MSE on the test data of 4.58×1010 and an RMSE on the test data of 2.14×105 , compared to linear regression which has an MSE on the test data of 1.32×1011 and an RMSE on the test data of 3.64×105 . Increasing the number of trees in the random forest model from 300 to 500 does not significantly affect the model’s performance, with the difference in RMSE on the test data less than 0.5%. Increasing the number of trees in the random forest model does not significantly affect the model’s performance, indicating that increasing the number of trees is not necessary after reaching a certain point. Further analysis results state that in the multiple regression method, the variables that are most influential in predicting house prices are CouncilArea, Rooms, Type, Distance, Bathroom, YearBuilt, BuildingArea, Car, Regionname, Landsize, and Propertycount. In the RF method, the most influential variables are Distance, BuildingArea, and YearBuilt. The conclusion of this study is that the performance of the RF method is better than the multiple linear regression method in predicting house prices, although attention should be paid to the occurrence of overfitting. Increasing the number of trees in the RF model is not always necessary after reaching the optimal point of 300 trees. This research not only provides knowledge about the selection viii and optimization of models for house price prediction but also identifies important factors that influence house prices in both methods. These findings can be beneficial for various parties in the property industry, from home buyers to developers and investors.

Item Type: Thesis (Bachelor)
Creators:
CreatorsNIMEmail
Natasha, ZerlinaNIM01112190029zerlina.natasha@gmail.com
Contributors:
ContributionContributorsNIDN/NIDKEmail
Thesis advisorCahyadi, LinaNIDN0328077701lina.cahyadi@uph.edu
Thesis advisorSeleky, Jacob StevyNIDN0307117005jacob.seleky@lecturer.uph.edu
Uncontrolled Keywords: regresi linear berganda; random forest; harga rumah; multiple linear regression; random forest; regression; housing price.
Subjects: Q Science > QA Mathematics
Divisions: University Subject > Current > Faculty/School - UPH Karawaci > Faculty of Science and Technology > Mathematics
Current > Faculty/School - UPH Karawaci > Faculty of Science and Technology > Mathematics
Depositing User: Zerlina Natasha
Date Deposited: 16 Jul 2024 02:08
Last Modified: 16 Jul 2024 02:08
URI: http://repository.uph.edu/id/eprint/63960

Actions (login required)

View Item View Item