A Study of Human Summaries of Scientific Articles
1. Introduction
What the differences between human summaries and automatic summarizations?
Human summaries have deeper insights, which can be used to imporve and adapt existing automatic summarization systems to the domian of scientific papers.
Human summaries tend to be long, detailed and contain headlines and figures from the origin papers.
Automatic summarization focuses on:
- automatic generation of relatively short summaries(150 - 200 words);
- have an abstract-like structure, lacking other summarization constructs used by humans such as headlines and figures.;
- most existing summarization methods of scientific papers rely on citations in order to pinpoint the import parts, but the citations volume of newly published papers is not enough to perform a similar analysis.
Dataset for automatic scientific summarization: Scisumm, ScisummNet.
To solve the above problems, this paper studies a dataset for scientific summarizations, based on long human summaries authored by ShortScience.org users. The goal is to study the characteristics of human scientific summaries and propose to use the summaries published on blogs as a potential benchmark for automation.
2. Dataset
ShortScience is an open platform for publishing summaries of scientific papers in the domains of Computer Science, Physics and Biology. The website provides minimal instructions on how to write a summary and there is a large variation in summary length and structure.
How to process?
- Fetch 561 summaries associated with 491 papers;
- Papers are from Arxiv, NeurIPS, ACL, Springer;
- Utilize NLTK for word tokenization and sentence segmentation;
- Use Science-Parse to extract the PDF text and outputs a Json record containing abstract text, metadata(such as authors and year), and a flat llist of the article sections;
- Disregard sentences less than 20 characters, to minimize effect of parsing errors;
- The mean summary length is 447 words, and the median is 312 words;
- The average number of sentences per summary is 22.
3. Human Summaries Analysis
3.1 Summary subjectivity
For assessing to what extent the summaries represent a subjective account of the origin scientific work:
- extract all sentences containing terms “i” or “my”;
- 130 summaries out of 561 summaries include such sentences;
- 5 cases errorneous, 53% neutral, 32% positive, 15% negative.
When decide to publicly express their opinion on scientific work, they tend to present a positive or balanced view and not to criticize. People choose to summarize papers they deem valuable.
3.2 Summary coverage
To asses to what extent human summaries cover logical aspects of the papers:
- align each summary sentence to the sentence in the original paper most similar to it and with the category of that sentence;
- paper sections hierarchy was restored and sub-section are merged into their containing high level section;
- high level sections are as follows: Introduction, Related work, Method, Results, Experiments, Discussion, Conclusions, Future work, Unknown; (2051 out of 3421 article sections were assighed with a category while the rest were classified as Unknown)
- section sentences inhert their containing section title;
- experiment with three similarity methods:
- ROUGE-L
- average of F1, ROUGE-1, ROUGE21 and ROUGE-L
- cosine similarity over word vectors
The weights are quite stable when using different similarities. A summarization algorithm can aim at assigning higher focus to more salient logical sections, reflecting how humans attend different sections in their summary.
3.3 Summary style
3.3.1 Figures inclusion
Some human summaries include figures from the original paper including image captures of equations or tables. About 31% of the summaries include at least one such figure, with an average of 2 figures per summary.
We need to consider multi-modal summarization and no work now.
3.3.2 Summary Itemization
Almost half of the summaries utilized some form of structuring using itemization (i.e., bullets or numbering)(编号或符号,代表逐条记录):
- with an average of 15 items per summary;
- The average size of an item is 2 sentences.
3.3.3 Headlines
About 35% of the summaries contain lines that start with “#”, which act as summary “headlines”.
- 本文作者: 鱼咸滚酱
- 本文链接: https://github.com/WangMeng2018/WangMeng2018.github.io/tree/master/2020/02/16/Report-A-Study-of-Human-Summaries-of-Scientific-Articles/
- 版权声明: 本博客所有文章除特别声明外,均采用 Apache License 2.0 许可协议。转载请注明出处!