Distributional Semantic Models (DSMs) approximate the meaning of words with vectors summarizing their patterns of co-occurrence in corpora. Recently, several compositional extensions of DSMs (Compositional DSMs, or CDSMs) have been proposed, with the purpose of representing the meaning of phrases and sentences by composing the distributional representations of the words they contain. SICK (Sentences Involving Compositional Knowledge) provides a benchmark for CDSM testing. In fact, it includes a large number of sentence pairs that are rich in the lexical, syntactic and semantic phenomena that CDSMs are expected to account for (e.g., contextual synonymy and other lexical variation phenomena, active/passive and other syntactic alternations, impact of negation, determiners and other grammatical elements), but do not require dealing with other aspects of existing sentential data sets (e.g., STS, RTE) that are not within the scope of compositional distributional semantics.
The SICK data set consists of about 10,000 English sentence pairs, generated starting from two existing sets: the 8K ImageFlickr data set and the SemEval 2012 STS MSR-Video Description data set. We randomly selected a subset of sentence pairs from each of these sources and we applied a 3-step generation process: first, the original sentences were normalized to remove unwanted linguistic phenomena; the normalized sentences were then expanded to obtain up to three new sentences with specific characteristics suitable to CDSM evaluation; as a last step, all the sentences generated in the expansion phase were paired with the normalized sentences in order to obtain the final data set.
Each sentence pair was annotated for relatedness and entailment by means of crowdsourcing techniques. The sentence relatedness score (on a 5-point rating scale) provides a direct way to evaluate CDSMs, insofar as their outputs are meant to quantify the degree of semantic relatedness between sentences; the categorizations in terms of the entailment relation between the two sentences (with entailment, contradiction, and neutral as gold labels) is also a crucial aspect to consider, since detecting the presence of entailment is one of the traditional benchmarks of a successful semantic system.
In the final set, gold scores for relatedness and entailment were distributed as follows: the relatednes scoring resulted in 923 pairs within the [1,2) range, 1373 pairs within the [2,3) range, 3872 pairs within the [3,4) range, and 3672 pairs within the [4,5] range; the entailment annotation led to 5595 neutral pairs, 1424 contradiction pairs, and 2821 entailment pairs.
The SICK data set was presented at LREC 2014. A detailed description of the resource can be found in the associated paper.
SICK constituted the basis of a shared task in SEMEVAL 2014 (Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment). Two subtasks were proposed: (i) predicting the degree of relatedness between two sentences and (ii) detecting the entailment relation holding between them. We received submissions from 21 different teams. Results are presented on the SemEval website and discussed in this paper.
Indexes specifying further classifications can be found here.
The obtained subsets are described and analyzed in details in the following journal publication:
L. Bentivogli, R. Bernardi, M. Marelli, S. Menini, M. Baroni and R. Zamparelli (2016). SICK Through the SemEval Glasses. Lesson learned from the evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. Journal of Language Resources and Evaluation, 50(1), 95-124
The full data set can be downloaded here, distributed under a Creative Commons Attribution-NonCommercial-ShareAlike license.
A version of the data set annotated for the employed expansion rules can be found here.
When using SICK in published research, please cite:
M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi and R. Zamparelli (2014). A SICK cure for the evaluation of compositional distributional semantic models. Proceedings of LREC 2014, Reykjavik (Iceland): ELRA, 216-223.
List of entailment rules extracted from the SICK data are available here. Thanks to Islam Beltagy and his colleagues!
Further (manual) analysis of the SICK dataset, in particular on the Contradiction pairs have been carried out in the papers below.
For information, please write to marco marelli AT unitn DOT it
The following people have contributed to the creation of SICK:
The creation of SICK has been partially funded by ERC Starting Independent Research Grant nr. 283554 to the COMPOSES project (2011-2016).