COMPOSES
Project outline |
Team |
Milestones |
Publications and associated data |
Software and resources |
Contact information |
Acknowledgments
Follow this link for
information about the end-of-project workshop (including slides from
some talks)!
Project outline^
Pink dogs are rare. You understood this sentence even if
you've never read it before, because you know the meanings of
thousands of words (including pink, dogs
and rare) and how to construct the meaning of a novel sentence
from the meanings of its parts. The ability to construct new meanings
by combining words into larger constituents is one of the fundamental
and peculiarly human characteristics of language. For decades,
scientists in different fields have tried to develop computational
systems that understand sentences as humans do. They have, however,
failed either the challenge of coverage (acquiring the meaning of
thousands of words) or that of compositionality (putting together the
parts to reconstruct the meaning of new sentences).
COMPOSES tackles the meaning induction and composition problem from
a new perspective that brings together corpus-based distributional
semantics (that is very successful at inducing the meaning of single
content words, but ignores functional elements and compositionality)
and formal semantics (that focuses on functional elements and
composition, but largely ignores lexical aspects of meaning and lacks
methods to learn the proposed structures from data). As in
distributional semantics, we represent some content words (such as
nouns) by vectors recording their corpus contexts. Implementing ideas
from formal semantics, functional elements (such as determiners) are
represented by functions mapping from expressions of one type onto
composite expressions of the same or other types. These composition
functions are induced from corpus data by statistical learning of
mappings from observed context vectors of input arguments to observed
context vectors of composite structures. We model a number of
compositional processes in this way, developing a coherent fragment of
the semantics of English in a large-scale data-driven fashion.
Given the novelty of the approach, we also propose several new
evaluation frameworks: On the one hand, we take inspiration from
cognitive science and psycholinguistics in designing elicitation
methods to measure the perceived similarity and plausibility of
sentences (such data will be elicited on a large scale by
crowdsourcing). On the other, specialized entailment tests assess the
semantic inference properties of our corpus-induced system.
The following article sketches the approach we are implementing in
COMPOSES in some detail:
M. Baroni, R. Bernardi and
R. Zamparelli. 2014. Frege in space: A program for compositional
distributional semantics. Linguistic Issues in Language
Technologies 9(6): 5-110.
Team^
COMPOSES is carried out at the CLIC lab, a unit of the
University of
Trento's Center for Mind/Brain Sciences
(CIMeC), in
collaboration with the Departments of Computer Science
(DISI) and
Cognitive Science (DiPSCo).
Senior researchers
Post docs
PhD Students
Project manager
Milestones^
- April 2014: First global evaluation of COMPOSES system
- January 2015: Release of semantic space models
- October 2015: Semantic norm data set release
- July 2016: COMPOSES code toolkit release
- October 2016: Second global evaluation of COMPOSES system
Publications and
associated data^
- A. Lazaridou, M. Marelli and M. Baroni. To
appear. Multimodal
word meaning induction from minimal exposure to natural text.
Cognitive
Science. The data sets
described in this study.
- E. Vecchi, M. Marelli, R. Zamparelli and M. Baroni. 2017. Spicy
adjectives and nominal donkeys: Capturing semantic deviance using
compositionality in distributional spaces. Cognitive Science 41(1): 102-136. The adjective-noun
plausibility ratings described in this study.
- G. Kruszewski, D. Paperno, R. Bernardi and
M. Baroni. 2016. There
is no logical negation here, but there are alternatives: modeling
conversational negation with distributional semantics.
Computational Linguistics 42(4):
637-660. The data
set described in this article.
- D. Paperno, G. Kruszewski, A. Lazaridou, Q Pham, R. Bernardi,
S. Pezzelle, M. Baroni, G. Boleda and R. Fernandez. 2016. The LAMBADA dataset: Word prediction requiring a broad
discourse context. Proceedings of ACL 2016 (54th Annual Meeting of
the Association for Computational Linguistics), East Stroudsburg PA:
ACL, 1525-1534. The LAMBADA page, with the dataset.
- Sandro Pezzelle, Ravi Shekhar and Raffaella Bernardi. 2016. Building a
bagpipe with a bag and a pipe: Exploring conceptual combination in
vision. Proceedings of the 5th Workshop on Vision and Language,
pages 60-64. Paper.
- Ionut Sorodoc, Angeliki Lazaridou, Gemma Boleda, Aurelie Herbelot, Sandro Pezzelle and Raffaella Bernardi. 2016.
Look, some green circles! Learning to quantify from images. Proceedings of the 5th Workshop on Vision and Language, pages 75-79. Paper.
- A. Lazaridou, N. Pham and M. Baroni. 2016. The
red one! On learning to refer to things based on their discriminative
properties. Proceedings of ACL 2016 (54th Annual Meeting of the
Association for Computational Linguistics), East Stroudsburg PA:
ACL, 213-218.
- G. Boleda, S. Pado and M. Baroni. 2016. Show me the cup: Reference with continuous representations. arXiv e-print 1606.08777.
- D. Ryzhova, M. Kyuseva, and D. Paperno. 2016. Typology of adjectives benchmark for compositional distributional models. Proceedings of the 10th Language Resources and Evaluation Conference, 1253-1257.
- D. Paperno and M. Baroni. 2016. When
the whole is less than the sum of its parts: How composition affects
PMI values in distributional semantic vectors. Computational
Linguistics 42(2): 345-350.
- L. Bentivogli, R. Bernardi, M. Marelli, S. Menini, M. Baroni and R. Zamparelli. 2016. SICK through the SemEval glasses: Lessons learned from the evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. Journal of Language Resources and Evaluation 50(1): 95-124.
- M. Marelli and
M. Baroni. 2015. Affixation in semantic space: Modeling morpheme
meanings with compositional distributional semantics.
Psychological Review 122(3): 485-515.
- G. Kruszewski, D. Paperno and
M. Baroni. 2015. Deriving
Boolean structures from distributional vectors. Transactions of
the Association for Computational Linguistics 3: 375-388. An example of a derived boolean space.
- A. Gupta, G. Boleda, M. Baroni and S. Pado. 2015. Distributional
vectors encode referential attributes. Proceedings of EMNLP 2015
(Conference on Empirical Methods in Natural Language Processing), East
Stroudsburg PA: ACL, 12-21.
- R. Bernardi, G. Boleda, R. Fernandez and
D. Paperno. 2015. Distributional
semantics in use. Proceedings of LSD 2015 (First Workshop on
Linking Computational Models of Lexical, Sentential and
Discourse-level Semantics), East Stroudsburg PA: ACL, 95-101.
- N. Pham, G. Kruszewski, A. Lazaridou and M. Baroni. 2015. Jointly
optimizing word representations for lexical and sentential tasks with
the C-PHRASE model. Proceedings of ACL 2015 (53rd Annual Meeting of
the Association for Computational Linguistics), East Stroudsburg PA:
ACL, 971-981.
The C-PHRASE vectors.
- N. Pham, A. Lazaridou and M. Baroni. 2015. A multitask objective to inject lexical contrast into distributional semantics. Proceedings of ACL 2015 (53rd Annual Meeting of
the Association for Computational Linguistics) Volume 2: Short Papers,
East Stroudsburg PA: ACL, 21-26.
- A. Lazaridou, G. Dinu and M. Baroni. 2015. Hubness
and pollution: Delving into cross-space mapping for zero-shot
learning. Proceedings of ACL 2015 (53rd Annual Meeting of the
Association for Computational Linguistics), East Stroudsburg PA:
ACL, 270-280.
- A. Lazaridou, N. Pham and M. Baroni. 2015. Combining
language and vision with a multimodal skip-gram model. Proceedings
of NAACL HLT 2015 (2015 Conference of the North American Chapter of
the Association for Computational Linguistics - Human Language
Technologies), East Stroudsburg PA: ACL, 153-163.
- G. Kruszewski and M. Baroni. 2015. So
similar and yet incompatible: Toward automated identification of
semantically compatible words. Proceedings of NAACL HLT 2015
(2015 Conference of the North American Chapter of the Association for
Computational Linguistics - Human Language Technologies), East
Stroudsburg PA:
ACL, 964-969. The compatibility
data set described in this article.
- S. Ritter, C. Long, D. Paperno, M. Baroni, M. Botvinick and
A. Goldberg. 2015. Leveraging
preposition ambiguity to assess compositional distributional models of
semantics. Proceedings of *SEM 2015 (Fourth Joint Conference on
Lexical and Computational Semantics), East Stroudsburg PA: ACL, 199-204.
- G. Dinu, A. Lazaridou and
M. Baroni. 2015.
Improving zero-shot learning by mitigating the hubness problem.
Proceedings of ICLR 2015 (International Conference on Learning
Representations), workshop track, online
at http://www.iclr.cc/doku.php?id=iclr2015:main. Code and data for the bilingual lexicon induction
experiments.
- A. Lazaridou, G. Dinu, A. Liska and
M. Baroni. 2015. From
visual attributes to adjectives through decompositional distributional
semantics. Transactions of the Association for Computational
Linguistics 3: 183-196.
- F.M. Zanzotto, L. Ferrone and M. Baroni. 2015. When
the whole is not greater than the combination of its parts: A
decompositional look at compositional distributional semantics.
Computational Linguistics 41(1): 165-173.
- M. Marelli, G. Dinu, R. Zamparelli and M. Baroni. 2015. Picking
buttercups and eating butter cups: Spelling alternations, semantic
relatedness and their consequences for compound
processing. Applied Psycholinguistics 36(6): 1421-1439.
- M. Hurlimann, R. Bernardi and D. Paperno. 2014. Nominal
coercion in space: Mass/count nouns and distributional semantics.
Proceedings of CLIC-IT 2014, Pisa (Italy): Pisa University Press,
208-212.
- G. Kruszewski and
M. Baroni. 2014. Dead
parrots make bad pets: Exploring modifier effects in noun phrases.
Proceedings of *SEM 2014 (Third Joint Conference on Lexical and
Computational Semantics), East Stroudsburg PA: ACL, 171-181.
The Norwegian
Blue Parrot data set described in this article.
- M. Marelli, L. Bentivogli, M. Baroni, R. Bernardi, S. Menini and
R. Zamparelli. 2014. Semeval-2014
Task 1: Evaluation of compositional distributional semantic models on
full sentences through semantic relatedness and textual
entailment. Proceedings of SemEval 2014 (International Workshop on
Semantic Evaluation), East Stroudsburg PA: ACL, 1-8.
- D.T. Nguyen, A. Lazaridou and
R. Bernardi. 2014. Coloring
objects: Adjective-noun visual semantic compositionality.
Proceedings of VL (Third Workshop on Vision and Language), East
Stroudsburg PA: ACL, 112-114.
- D. Paperno, M. Marelli, K. Tentori and M. Baroni. 2014. Corpus-based
estimates of word association predict biases in judgment of word
co-occurrence likelihood. Cognitive Psychology 74: 66-83.
- M. Baroni, R. Bernardi and
R. Zamparelli. 2014. Frege in space: A program for compositional
distributional semantics. Linguistic Issues in Language
Technologies 9(6): 5-110.
- G. Dinu and
M. Baroni. 2014. How
to make words with vectors: Phrase generation in distributional
semantics. Proceedings of ACL 2014 (52nd Annual Meeting of the
Association for Computational Linguistics), East Stroudsburg PA: ACL,
624-633. The AN-to-NPN
data set from this
study. Code and data for the monolingual generation
experiments.
- D. Paperno, N. Pham and M. Baroni. 2014. A
practical and linguistically-motivated approach to compositional
distributional semantics. Proceedings of ACL 2014 (52nd Annual
Meeting of the Association for Computational Linguistics), East
Stroudsburg PA: ACL, 90-99. Code for the PLF model.
- M. Baroni, G. Dinu and
G. Kruszewski. 2014. Don't
count, predict! A systematic comparison of context-counting
vs. context-predicting semantic vectors Proceedings of ACL 2014
(52nd Annual Meeting of the Association for Computational
Linguistics), East Stroudsburg PA: ACL,
238-247. An archive
with results with further models and parameter settings on the same
benchmarks. The
best count and predict semantic vectors from this study.
- A. Lazaridou, E. Bruni and M. Baroni. 2014. Is
this a wampimuk? Cross-modal mapping between distributional semantics
and the visual world. Proceedings of ACL 2014 (52nd Annual Meeting
of the Association for Computational Linguistics), East Stroudsburg
PA: ACL, 1403-1414.
- M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi and
R. Zamparelli. 2014. A
SICK cure for the evaluation of compositional distributional semantic
models. Proceedings of LREC 2014, Reykjavik (Iceland):
ELRA, 216-223. The SICK data set described in this paper.
- R. Bernardi. 2014. Distributional
semantics: A Montagovian view. In C. Casadio, B. Coecke,
M. Moortgat and P. Scott (Eds.), Categories and Types in Logic,
Language, and Physics, Berlin: Springer.
- J. Li, M. Baroni and
G. Dinu. 2014. Improving
the lexical function composition model with pathwise optimized
elastic-net regression. Proceedings of EACL 2014 (14th Conference
of the European Chapter of the Association for Computational
Linguistics), East Stroudsburg PA: ACL, 434-442.
- M. Baroni. 2013. Composition in distributional semantics. Language
and Linguistics Compass 7(10): 511-522. Please contact Marco if you
would like a copy.
- E. Vecchi, R. Zamparelli and
M. Baroni. 2013. Studying
the recursive behaviour of adjectival modification with compositional
distributional semantics. Proceedings of EMNLP 2013 (Conference
on Empirical Methods in Natural Language Processing), East Stroudsburg
PA: ACL, 141-151.
- A. Lazaridou, E. Vecchi and
M. Baroni. 2013. Fish
transporters and miracle homes: How compositional distributional
semantics can help NP parsing. Proceedings of EMNLP 2013
(Conference on Empirical Methods in Natural Language Processing), East
Stroudsburg PA:
ACL, 1908-1913. The data
set from this study.
- G. Dinu, N. Pham and M. Baroni. 2013. General
estimation and evaluation of compositional distributional semantic
models. Proceedings of the ACL 2013 Workshop on Continuous Vector
Space Models and their Compositionality (CVSC 2013), East Stroudsburg
PA: ACL, 50-58.
- A. Lazaridou, M. Marelli, R. Zamparelli and M. Baroni. 2013. Compositional-ly
derived representations of morphologically complex words in
distributional semantics. Proceedings of ACL 2013 (51st Annual
Meeting of the Association for Computational Linguistics), East
Stroudsburg PA:
ACL, 1517-1526. The data
set from this study.
- R. Bernardi, G. Dinu, M. Marelli and
M. Baroni. 2013. A
relatedness benchmark to test the role of determiners in compositional
distributional semantics. Proceedings of ACL 2013 (51st Annual
Meeting of the Association for Computational Linguistics) Volume 2:
Short Papers, East Stroudsburg PA:
ACL, 53-57. The data
set from this study.
- G. Dinu, N. Pham and M. Baroni. 2013. DISSECT:
DIStributional SEmantics Composition Toolkit. Proceedings of the
System Demonstrations of ACL 2013 (51st Annual Meeting of the
Association for Computational Linguistics), East Stroudsburg PA:
ACL, 31-36.
- E. Grefenstette, G. Dinu, Y.-Z. Zhang, M. Sadrzadeh and
M. Baroni. 2013. Multi-step regression learning for compositional
distributional semantics. Proceedings of IWCS 2013 (10th
International Conference on Computational Semantics), East Stroudsburg
PA: ACL, 131-142.
- G. Boleda, M. Baroni, L. McNally and N. Pham. 2013. Intensionality was only
alleged: On adjective-noun composition in distributional
semantics. Proceedings of IWCS 2013 (10th International Conference
on Computational Semantics), East Stroudsburg PA: ACL, 35-46. The data
set from this study.
- N. Pham, R. Bernardi, Y.-Z. Zhang and M. Baroni. 2013. Sentence
paraphrase detection: When determiners and word order make the
difference. Proceedings of the Towards a Formal Distributional
Semantics Workshop at IWCS
2013, East Stroudsburg PA: ACL, 21-29. The data
sets from this study.
- M. Baroni, R. Bernardi, N. Do and C. Shan. 2012. Entailment above the
word level in distributional semantics. Proceedings of EACL 2012
(13th Conference of the European Chapter of the Association for
Computational Linguistics), East Stroudsburg PA:
ACL, 23-32. The data
sets from this study.
- E. Vecchi, M. Baroni and
R. Zamparelli. 2011. (Linear)
maps of the impossible: Capturing semantic anomalies in distributional
space. Proceedings of the DISCO (Distributional Semantics and
Compositionality) Workshop at ACL 2011, East Stroudsburg PA: ACL,
1-9.
- M. Baroni and R. Zamparelli. 2010. Nouns are vectors,
adjectives are matrices: Representing adjective-noun constructions in
semantic space. Proceedings of the Conference on Empirical Methods
in Natural Language Processing (EMNLP 2010), East Stroudsburg PA: ACL,
1183-1193.
Software and resources^
We developed the DISSECT
toolkit to construct and compose distributional semantic
representations.
This archive contains code written by Denis Paperno and
Alicia Krebs extending DISSECT functionality with context
distribution smoothing and shifted PMI.
Other code developed by COMPOSES include:
We developed the SICK data
set for large-scale evaluation of compositional semantic
models. The data set constitutes the basis
for the SEMEVAL
2014 Task 1.
We make high-performance semantic vectors available from
this page.
See the publications above for links to other
data sets that we make publicly available and the corresponding
reference papers.
Slides from
the Composition in distributional semantics mini-course taught
by Marco Baroni and Georgiana Dinu at ESSLLI 2014.
Contact information^
Write to mbaroni AT gmail com.
Acknowledgments^
We gratefully acknowledge the European Commission and European Research
Council for the COMPOSES Starting Independent Research Grant funded
under the 7th Framework Program.