ROSELIAN, AUDREY (2024) Analisis komparatif metode peringkasan abstraktif dan ekstraktif dengan model flan-t5 pada teks berita = Comparative analysis of abstractive and extractive summarization methods with the flan-t5 model on news texts. Bachelor thesis, Universitas Pelita Harapan.
Preview
Title.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (106kB) | Preview
Preview
Abstract.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (310kB) | Preview
Preview
ToC.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (279kB) | Preview
Preview
Chapter1.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (364kB) | Preview
![Chapter2 [thumbnail of Chapter2]](http://repository.uph.edu/style/images/fileicons/text.png)
Chapter2.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (571kB)
![Chapter3 [thumbnail of Chapter3]](http://repository.uph.edu/style/images/fileicons/text.png)
Chapter3.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (1MB)
![Chapter4 [thumbnail of Chapter4]](http://repository.uph.edu/style/images/fileicons/text.png)
Chapter4.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (2MB)
![Chapter5 [thumbnail of Chapter5]](http://repository.uph.edu/style/images/fileicons/text.png)
Chapter5.pdf
Restricted to Registered users only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (174kB)
Preview
Bibliography.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (437kB) | Preview
![Appendices [thumbnail of Appendices]](http://repository.uph.edu/style/images/fileicons/text.png)
Appendices.pdf
Restricted to Repository staff only
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (1MB)
Abstract
Di era digital, berita telah menjadi bagian tak terpisahkan dalam kehidupan
sehari-hari, dengan internet sebagai sumber utama informasi. Berdasarkan data
Kementerian Komunikasi dan Informatika (Kominfo) pada Januari 2024,
sebanyak 79,5% atau sekitar 221 juta penduduk Indonesia merupakan pengguna
internet. Seiring dengan pesatnya arus informasi, jumlah berita yang beredar
semakin banyak, sehingga pembaca sering mengalami kesulitan dalam menyaring
dan memahami informasi yang relevan dalam waktu yang terbatas. Oleh karena
itu, diperlukan metode yang dapat membantu merangkum berita secara efisien
tanpa menghilangkan esensi informasi utama.
Untuk menentukan pendekatan yang lebih efektif dalam menghasilkan
ringkasan berita yang akurat dan mudah dipahami dengan analisis kuantitatif dan
kualitatif sehingga penelitian ini membandingkan dua metode peringkasan, yaitu
abstraktif dan ekstraktif. Metode abstraktif menghasilkan ringkasan dengan
mereformulasi teks asli menggunakan pemodelan berbasis bahasa alami,
sedangkan metode ekstraktif memilih kalimat penting langsung dari teks asli.
Implementasi metode ini melibatkan tahapan tokenisasi, pemodelan dengan grid
search untuk tuning hyperparameter pada FLAN-T5 abstraktif, serta embedding
kalimat, segmentasi menggunakan Stanza, dan klasterisasi dengan KMeans
berbasis cosine similarity untuk FLAN-T5 ekstraktif.
Penelitian ini berhasil melakukan fine-tuning model FLAN-T5 untuk
peringkasan abstraktif dan ekstraktif, serta membandingkan kinerjanya. Untuk
peringkasan abstraktif, model di-fine-tune dengan learning rate 2e-5 selama 9
epoch dan dropout 0,2. Peringkasan ekstraktif menggunakan embeddings kalimat
dengan FLAN-T5, segmentasi menggunakan Stanza, dan klasterisasi dengan
KMeans. Kualitas peringkasan dinilai menggunakan metrik ROUGE-1, ROUGE2, ROUGE-L, dan BERTScore. Hasil menunjukkan bahwa metode ekstraktif
unggul dengan nilai ROUGE-1, ROUGE-2, dan ROUGE-L masing-masing 78,35,
70,64, dan 77,98, serta BERTScore 86,71. Sebaliknya, peringkasan abstraktif
mencatat nilai yang lebih rendah, yaitu 56,69, 43,88, 52,79, dan BERTScore
81,13. Meskipun abstraktif memiliki potensi kreativitas, metode ekstraktif terbukti
lebih efektif dalam hal kualitas, efisiensi waktu, dan akurasi. / In the digital era, news has become an integral part of daily life, with the
internet serving as the primary source of information. According to data from the
Ministry of Communication and Informatics (Kominfo) in January 2024,
approximately 79.5% or around 221 million Indonesians are internet users. As the
flow of information accelerates, the number of circulating news articles continues
to increase, making it difficult for readers to filter and comprehend relevant
information within a limited time. Therefore, methods that can efficiently
summarize news while preserving essential information are necessary.
To determine the most effective approach for generating accurate and
easily understandable news summaries through both quantitative and qualitative
analysis, this study compares two summarization methods: abstractive and
extractive. The abstractive method generates summaries by reformulating the
original text using natural language processing models, whereas the extractive
method selects important sentences directly from the original text. The
implementation of these methods involves tokenization, modeling with grid
search for hyperparameter tuning in FLAN-T5 for abstractive summarization, as
well as sentence embedding, segmentation using Stanza, and clustering with
KMeans based on cosine similarity for FLAN-T5 extractive summarization.
This study successfully fine-tuned the FLAN-T5 model for both
abstractive and extractive summarization and compared their performance. For
abstractive summarization, the model was fine-tuned using a learning rate of 2e-5
for 9 epochs with a dropout of 0.2. Extractive summarization utilized sentence
embeddings with FLAN-T5, segmentation using Stanza, and clustering with
KMeans. The quality of the summarization was evaluated using ROUGE-1,
ROUGE-2, ROUGE-L, and BERTScore metrics. The results indicate that the
extractive method outperformed the abstractive method, achieving ROUGE-1,
ROUGE-2, and ROUGE-L scores of 78.35, 70.64, and 77.98, respectively, along
with a BERTScore of 86.71. In contrast, abstractive summarization recorded
lower scores of 56.69, 43.88, 52.79, and a BERTScore of 81.13. Although the
abstractive approach has the potential for creativity, the extractive method proves
to be more effective in terms of quality, time efficiency, and accuracy
Item Type: | Thesis (Bachelor) |
---|---|
Creators: | Creators NIM Email ORCID ROSELIAN, AUDREY NIM01082210018 roselianaudrey@gmail.com UNSPECIFIED |
Contributors: | Contribution Contributors NIDN/NIDK Email Contributor Samosir, Feliks Victor Parningotan NIDN0319049302 feliks.parningotan@uph.edu |
Subjects: | T Technology > T Technology (General) |
Divisions: | University Subject > Current > Faculty/School - UPH Karawaci > School of Information Science and Technology > Informatics Current > Faculty/School - UPH Karawaci > School of Information Science and Technology > Informatics |
Depositing User: | Magang Input |
Date Deposited: | 17 May 2025 02:14 |
Last Modified: | 17 May 2025 02:14 |
URI: | http://repository.uph.edu/id/eprint/68412 |