Summarizing trending topics in large collections of Facebook posts is particularly relevant to profile social user activities and interests. However, automatically generating these summaries poses significant challenges due to the high heterogeneity of the input data, the limited fluency of extractive summaries, and the absence of abstractive summarization methods capable of handling multiple posts simultaneously. Existing abstractive models are either not suited to handle large post collections or disregard topic-level text relations. In this work, we present TASP, a novel tool for trending topic detection and summarization from English-written Facebook posts. It trains abstractive summarization models on multi-post collections by leveraging a shortlist of authoritative posts published by renowned newspapers. At inference time, TASP first creates clusters of semantically similar social posts, each one representing a distinct topic, using pre-trained transformer-based language models. Then, it generates abstractive summaries of the clusters for which authoritative information is missing. To the best of our knowledge, TASP is the first available tool suited to abstractive multi-post summarization. We test our approach on a large-scale dataset of real Facebook posts. The results show (1) The higher effectiveness of transformer-based approaches in generating topic-specific post clusters compared to traditional methods. (2) The importance of attending long pieces of text in multi-post abstractive summary generation.
TASP: Topic-based abstractive summarization of Facebook text posts
Trevisan, Martino
2024-01-01
Abstract
Summarizing trending topics in large collections of Facebook posts is particularly relevant to profile social user activities and interests. However, automatically generating these summaries poses significant challenges due to the high heterogeneity of the input data, the limited fluency of extractive summaries, and the absence of abstractive summarization methods capable of handling multiple posts simultaneously. Existing abstractive models are either not suited to handle large post collections or disregard topic-level text relations. In this work, we present TASP, a novel tool for trending topic detection and summarization from English-written Facebook posts. It trains abstractive summarization models on multi-post collections by leveraging a shortlist of authoritative posts published by renowned newspapers. At inference time, TASP first creates clusters of semantically similar social posts, each one representing a distinct topic, using pre-trained transformer-based language models. Then, it generates abstractive summaries of the clusters for which authoritative information is missing. To the best of our knowledge, TASP is the first available tool suited to abstractive multi-post summarization. We test our approach on a large-scale dataset of real Facebook posts. The results show (1) The higher effectiveness of transformer-based approaches in generating topic-specific post clusters compared to traditional methods. (2) The importance of attending long pieces of text in multi-post abstractive summary generation.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S0957417424014349-main.pdf
Accesso chiuso
Tipologia:
Documento in Versione Editoriale
Licenza:
Copyright Editore
Dimensione
1.74 MB
Formato
Adobe PDF
|
1.74 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
1-s2.0-S0957417424014349-main-Post_print.pdf
embargo fino al 04/04/2025
Tipologia:
Bozza finale post-referaggio (post-print)
Licenza:
Creative commons
Dimensione
2.25 MB
Formato
Adobe PDF
|
2.25 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.