A new feature from the New York Times has a different tone than the average Reddit contribution. Indeed, the variety of writing styles and grammatical structures makes the task of automatically summarizing text a major challenge. For this reason, researchers from Pittsburgh and Microsoft Researcher's Future Social Experiences (FUSE) laboratory, which focuses on real-time and media-rich experiences, have developed an AI system that takes the start of the summarized documents into account. The team says this approach has improved experimental performance, especially in the case of web forum content and more general forms of text data.
This study follows the publication of a Microsoft Research study that describes a “flexible” AI system to think about relationships in “weakly structured” texts. The co-authors claim that it could outperform traditional natural language processing models in a number of text-summarizing tasks.
As researchers point out, threads in the forum typically begin with posts or comments that seek knowledge or help and subsequent comments that tend to respond to the original post by providing additional information or opinions. This opening text often contains important current information that could be useful for the summary.
The proposed AI benefits from this dependency between original posts and responses, but also tries to sort out irrelevant or superficial responses to ensure that they do not deteriorate.
The researchers prepared and rated their model for two summary corpora: one from a TripAdvisor forum with 700 threads (500 of which were used for training and 200 for validation and testing) and one with 532 Microsoft Word documents subjects ( of which 266, 1
Researchers plan to do this in the future. Integrate more general data sets into the training and testing phase to further review their approach. They also plan to distinguish the number of sentences included in the model from the first part of the generic documents.
"We take advantage of the tendency to introduce important information early into the text by looking at the first few sentences in generic text data," they wrote in an article about their work. "Evaluations showed that introductory sentences were treated with bidirectional Attention improved the performance of extractive summary models [even when] applied to a more general form [s] of text data. "