SuDocu

SuDocu is an example-based personalized document summarization system that allows the users to provide example summaries, learns the summarization intent from the examples, and produces summaries for new documents that reflect the user's summarization intent.

Project Summary

Text document summarization refers to the task of producing a brief representation of a document for easy human consumption. Existing text summarization techniques mostly focus on generic summarization, but users often require personalized summarization that targets their specific preferences and needs. However, precisely expressing preferences is challenging, and current methods are often ambiguous, outside the user's control, or require costly training data. We propose a novel and effective way to express summarization intent (preferences) via examples: the user provides a few example summaries for a small number of documents in a collection, and the system summarizes the rest. We demonstrate SuDocu, an example-based personalized Document Summarization system. Through a simple interface, SuDocu allows the users to provid eexample summaries, learns the summarization intent from the examples, and produces summaries for new documents that reflect the user's summarization intent. SuDocu further explains the captured summarization intent in the form of a package query, an extension of a traditional SQL query that handles complex constraintsand preferences over answer sets. SuDocu combines topic modeling, semantic similarity discovery, and in-database optimization in a novel way to achieve example-driven document summarization. We demonstrate how SuDocu can detect complex summarizationintents from a few example summaries and produce accurate summaries for new documents effectively and efficiently.

VLDB 2020 Talk

System Architecture

People

Publications

  • Nishant Yadav, Oscar Youngquist, Anna Fariha, Matteo Brucato, Julian Killingback, Peter J. Haas, and Alexandra Meliou. (submission under review). 2021.
  • Anna Fariha, Matteo Brucato, Peter J. Haas, and Alexandra Meliou. SuDocu: Summarizing Documents by Example. PVLDB, 13(12). 2020. Paper
    Won the best demonstration runner-up award at VLDB 2020.

Acknowledgement

This material is based upon work supported by the NSF under grants IIS-1453543, IIS-1943971, and CCF-1763423.