srush/gist:6dd95785a5cf6fbb8732dd7c704a9f0a

## gistfile1.txt
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------

                         Appropriateness: 5
                                 Clarity: 5
                             Originality: 3
                 Soundness / Correctness: 4
               Impact of Ideas / Results: 4
                   Meaningful Comparison: 4
                               Substance: 4
                           Replicability: 4
                          Recommendation: 4


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

This paper conducts sentence-level abstractive summarization/rewriting with a
neural network approach. The method in principle can handle not just deletion
but also rewriting (using words not in the original sentences), while minimally
relies on linguistic analysis. The paper is easy to follow and is written
clearly.  Advantages of the proposed models are supported by the experimental
results. I recommend for publication.

I have a couple of suggestions. First, (probably I missed something), will a
good neural-network-based deletion model be good enough to capture the benefit
seen in this paper? The approach proposed in this paper can generate summaries
with words not seen in the original sentences, but how much does that benefit?
(although the COMPRESS model is discussed in section 7.2, which used a very
different approach and still has some gaps). From summarization evaluation view
points, ROUGE may not value too much “unseen” words, compared with other
metrics like Pyramid, so the benefit of the proposed model may not be fully
showed. While Section 5 try to compromise towards ROUGE, some discussions may
help think of the issue in another way.

The paper used a trick (section 5) to tune the model towards ROUGE. If a little
more discussion can be added, it may help some readers further understand why
directly tuning an objective, e.g., ROUGE, is not feasible here, Is it because
that ROUGE may not be accurate on a small data set (like BLUE on individual
sentences), because of computational concerns, or because of other reasons.

============================================================================
                            REVIEWER #2
============================================================================


---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------

                         Appropriateness: 5
                                 Clarity: 4
                             Originality: 3
                 Soundness / Correctness: 4
               Impact of Ideas / Results: 4
                   Meaningful Comparison: 4
                               Substance: 4
                           Replicability: 3
                          Recommendation: 4


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

This paper talks about abstractive sentence summarization, specifically
headline  generation using a neural language model. The work was evaluated on
the DUC 2004 dataset. The paper is well written and the work is interesting and
has been carefully evaluated. However, while the quantitative evaluation seems
reasonable, the actual summaries (from the examples) seems to have major
grammatical and repetition issues and does not look quite as good as the true
headline. Having said that, the idea is promising but it needs more work in
terms of soundness of generated sentences.

A few questions/comments:
Why was the model trained using only the first line in the text? What is the
intuition for this? Could the last line which summarizes the text be used as
well? It would be nice if this is discussed in the paper.

In Section 7.2 the authors talk about a capped ROUGE score, but the authors
don't explain how this is computed. Was this used in the DUC 2004 task, if so
please state and if not please provide the exact formula for reproduceability.

It seems like the authors tried to fit more content than the page limit would
allow as the bottom margin is completely off. Please fix this and make the
writing more concise.

============================================================================
                            REVIEWER #3
============================================================================


---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------

                         Appropriateness: 4
                                 Clarity: 3
                             Originality: 4
                 Soundness / Correctness: 4
               Impact of Ideas / Results: 4
                   Meaningful Comparison: 3
                               Substance: 4
                           Replicability: 3
                          Recommendation: 4


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

This paper utilizes neural language models to generate sentence summarization
word by word, which goes beyond previous sentence based extractive methods and
phrase based abstractive approaches for sentence summarization. More specific,
their Attention Based Summarization (ABS) approach is modeled off the attention
based encoder and a beam search decoder with extractive features, which can be
seen as a tradeoff between abstractive and extractive methods.

For the encoder models, they deploy four step-by-step models of which two are
only considering the input word information, and the other two combining the
embedded current context information. Thus, the latter two encoders, which are
simultaneously learning embeddings for the input with distributions based on
the current context, are capable to show interpretable alignment between the
summary and the input sentence.

The author conducted extensive experiments on several strong and well known
baseline models, achieving promising results. Especially, their tuned model
ABS+, which leverages the advantage of fluency by extractive features, scores
significantly the best on the tasks. While they introduced how to tune the
weight vector alpha, they don't report the real value of the alpha in the final
best performance. Such real values would be beneficial to examine the
importance of the extractive features. Therefore, I'm weakly hesitated for the
analysis of the degree of their attention based neural models.

I am just wondering how the grammar can be ensured with the proposed approach.
	---------------------------------------------------------------------------
	Reviewer's Scores
	---------------------------------------------------------------------------

	Appropriateness: 5
	Clarity: 5
	Originality: 3
	Soundness / Correctness: 4
	Impact of Ideas / Results: 4
	Meaningful Comparison: 4
	Substance: 4
	Replicability: 4
	Recommendation: 4


	---------------------------------------------------------------------------
	Comments
	---------------------------------------------------------------------------

	This paper conducts sentence-level abstractive summarization/rewriting with a
	neural network approach. The method in principle can handle not just deletion
	but also rewriting (using words not in the original sentences), while minimally
	relies on linguistic analysis. The paper is easy to follow and is written
	clearly. Advantages of the proposed models are supported by the experimental
	results. I recommend for publication.

	I have a couple of suggestions. First, (probably I missed something), will a
	good neural-network-based deletion model be good enough to capture the benefit
	seen in this paper? The approach proposed in this paper can generate summaries
	with words not seen in the original sentences, but how much does that benefit?
	(although the COMPRESS model is discussed in section 7.2, which used a very
	different approach and still has some gaps). From summarization evaluation view
	points, ROUGE may not value too much “unseen” words, compared with other
	metrics like Pyramid, so the benefit of the proposed model may not be fully
	showed. While Section 5 try to compromise towards ROUGE, some discussions may
	help think of the issue in another way.

	The paper used a trick (section 5) to tune the model towards ROUGE. If a little
	more discussion can be added, it may help some readers further understand why
	directly tuning an objective, e.g., ROUGE, is not feasible here, Is it because
	that ROUGE may not be accurate on a small data set (like BLUE on individual
	sentences), because of computational concerns, or because of other reasons.

	============================================================================
	REVIEWER #2
	============================================================================


	---------------------------------------------------------------------------
	Reviewer's Scores
	---------------------------------------------------------------------------

	Appropriateness: 5
	Clarity: 4
	Originality: 3
	Soundness / Correctness: 4
	Impact of Ideas / Results: 4
	Meaningful Comparison: 4
	Substance: 4
	Replicability: 3
	Recommendation: 4


	---------------------------------------------------------------------------
	Comments
	---------------------------------------------------------------------------

	This paper talks about abstractive sentence summarization, specifically
	headline generation using a neural language model. The work was evaluated on
	the DUC 2004 dataset. The paper is well written and the work is interesting and
	has been carefully evaluated. However, while the quantitative evaluation seems
	reasonable, the actual summaries (from the examples) seems to have major
	grammatical and repetition issues and does not look quite as good as the true
	headline. Having said that, the idea is promising but it needs more work in
	terms of soundness of generated sentences.

	A few questions/comments:
	Why was the model trained using only the first line in the text? What is the
	intuition for this? Could the last line which summarizes the text be used as
	well? It would be nice if this is discussed in the paper.

	In Section 7.2 the authors talk about a capped ROUGE score, but the authors
	don't explain how this is computed. Was this used in the DUC 2004 task, if so
	please state and if not please provide the exact formula for reproduceability.

	It seems like the authors tried to fit more content than the page limit would
	allow as the bottom margin is completely off. Please fix this and make the
	writing more concise.

	============================================================================
	REVIEWER #3
	============================================================================


	---------------------------------------------------------------------------
	Reviewer's Scores
	---------------------------------------------------------------------------

	Appropriateness: 4
	Clarity: 3
	Originality: 4
	Soundness / Correctness: 4
	Impact of Ideas / Results: 4
	Meaningful Comparison: 3
	Substance: 4
	Replicability: 3
	Recommendation: 4


	---------------------------------------------------------------------------
	Comments
	---------------------------------------------------------------------------

	This paper utilizes neural language models to generate sentence summarization
	word by word, which goes beyond previous sentence based extractive methods and
	phrase based abstractive approaches for sentence summarization. More specific,
	their Attention Based Summarization (ABS) approach is modeled off the attention
	based encoder and a beam search decoder with extractive features, which can be
	seen as a tradeoff between abstractive and extractive methods.

	For the encoder models, they deploy four step-by-step models of which two are
	only considering the input word information, and the other two combining the
	embedded current context information. Thus, the latter two encoders, which are
	simultaneously learning embeddings for the input with distributions based on
	the current context, are capable to show interpretable alignment between the
	summary and the input sentence.

	The author conducted extensive experiments on several strong and well known
	baseline models, achieving promising results. Especially, their tuned model
	ABS+, which leverages the advantage of fluency by extractive features, scores
	significantly the best on the tasks. While they introduced how to tune the
	weight vector alpha, they don't report the real value of the alpha in the final
	best performance. Such real values would be beneficial to examine the
	importance of the extractive features. Therefore, I'm weakly hesitated for the
	analysis of the degree of their attention based neural models.

	I am just wondering how the grammar can be ensured with the proposed approach.