Imputation Using Statistical and Machine Learning Methods in Forecasting Life Expectancy

Hayashi, Sergius Tadao (2019) Imputation Using Statistical and Machine Learning Methods in Forecasting Life Expectancy. Bachelor thesis, Universitas Pelita Harapan.

[thumbnail of Title] Text (Title)
Title.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (1MB)
[thumbnail of Abstract]
Preview
Text (Abstract)
Abstract.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (274kB) | Preview
[thumbnail of ToC]
Preview
Text (ToC)
ToC.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (303kB) | Preview
[thumbnail of Chapter1]
Preview
Text (Chapter1)
Chapter1.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (270kB) | Preview
[thumbnail of Chapter2] Text (Chapter2)
Chapter2.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (381kB)
[thumbnail of Chapter3] Text (Chapter3)
Chapter3.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (581kB)
[thumbnail of Chapter4] Text (Chapter4)
Chapter4.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (339kB)
[thumbnail of Chapter5] Text (Chapter5)
Chapter5.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (1MB)
[thumbnail of Chapter6] Text (Chapter6)
Chapter6.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (267kB)
[thumbnail of Bibliography]
Preview
Text (Bibliography)
Bibliography.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (317kB) | Preview
[thumbnail of Appendices] Text (Appendices)
Appendices.pdf
Restricted to Repository staff only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (836kB)

Abstract

Processing, collecting and reporting data is essential in making decisions. Yet even in a well-designed and controlled study, the occurrence of missing data is not improbable. The occurrence of missing data decreases the statistical power of the dataset and training power for machine learning purposes. This thesis aims to compare six imputation method, three of which are statistical imputation methods and three are machine learning methods for life expectancy data to determine an optimal method for cases with its type of missingness pattern. The life expectancy data consist of 22 variables in relation to social, economic and health of 194 countries collected from World Health Organization’s and Wold Bank’s database. An artificial dataset was built for simulating the missingness of the original dataset to measure the performance of each method by error metrics. The artificial dataset mimics the original dataset’s missingness patterns and the nullity correlation between variables. Imputed artificial dataset were evaluated through its mean squared error, mean absolute error, and mean absolute percentage error while the original dataset were evaluated through its mean and variance changes. Surprisingly, given that the multi-layer perceptron had 10 iterations, the Hot-Deck and KNN method showed the best results for statistical and machine learning, respectively, with Hot-Deck slighly outperforming KNN.
Item Type: Thesis (Bachelor)
Creators:
Creators
NIM
Email
ORCID
Hayashi, Sergius Tadao
NIM00000013162
UNSPECIFIED
UNSPECIFIED
Contributors:
Contribution
Contributors
NIDN/NIDK
Email
Thesis advisor
Saputra, Kie Van Ivanky
NIDN0401038203
UNSPECIFIED
Thesis advisor
Ferdinand, Ferry Vincenttius
NIDN0323059001
UNSPECIFIED
Additional Information: SK 112-15 HAY i 2019; 31001000244419
Uncontrolled Keywords: Missing Data; Imputation; Statistical Imputation; Machine Learning Imputation; Mean Imputation; Hot-Deck Imputation; Multiple Imputation; Multi-Layer Perceptron; Self-Organizing Map; K-Nearest Neighbor
Subjects: Q Science > QA Mathematics
Divisions: University Subject > Current > Faculty/School - UPH Karawaci > Faculty of Science and Technology > Mathematics
Current > Faculty/School - UPH Karawaci > Faculty of Science and Technology > Mathematics
Depositing User: Nicholas Sio Pradiva
Date Deposited: 09 Nov 2021 08:10
Last Modified: 09 Nov 2021 08:10
URI: http://repository.uph.edu/id/eprint/42926

Actions (login required)

View Item
View Item