COMPOSES

Compositional Operations in Semantic Space was a 5-year European Research Council project (nr. 283554) that started on November 1st, 2011. COMPOSES is funded within the 7th Framework Program by an ERC 2011 Starting Independent Research Grant (SH4: The Human Mind and Its Complexity panel).



Project outline | Team | Milestones | Publications and associated data | Software and resources | Contact information | Acknowledgments

Project proposal: Part B1 and Part B2

Follow this link for information about the end-of-project workshop (including slides from some talks)!

Project outline^

Pink dogs are rare. You understood this sentence even if you've never read it before, because you know the meanings of thousands of words (including pink, dogs and rare) and how to construct the meaning of a novel sentence from the meanings of its parts. The ability to construct new meanings by combining words into larger constituents is one of the fundamental and peculiarly human characteristics of language. For decades, scientists in different fields have tried to develop computational systems that understand sentences as humans do. They have, however, failed either the challenge of coverage (acquiring the meaning of thousands of words) or that of compositionality (putting together the parts to reconstruct the meaning of new sentences).

COMPOSES tackles the meaning induction and composition problem from a new perspective that brings together corpus-based distributional semantics (that is very successful at inducing the meaning of single content words, but ignores functional elements and compositionality) and formal semantics (that focuses on functional elements and composition, but largely ignores lexical aspects of meaning and lacks methods to learn the proposed structures from data). As in distributional semantics, we represent some content words (such as nouns) by vectors recording their corpus contexts. Implementing ideas from formal semantics, functional elements (such as determiners) are represented by functions mapping from expressions of one type onto composite expressions of the same or other types. These composition functions are induced from corpus data by statistical learning of mappings from observed context vectors of input arguments to observed context vectors of composite structures. We model a number of compositional processes in this way, developing a coherent fragment of the semantics of English in a large-scale data-driven fashion.

Given the novelty of the approach, we also propose several new evaluation frameworks: On the one hand, we take inspiration from cognitive science and psycholinguistics in designing elicitation methods to measure the perceived similarity and plausibility of sentences (such data will be elicited on a large scale by crowdsourcing). On the other, specialized entailment tests assess the semantic inference properties of our corpus-induced system.

The following article sketches the approach we are implementing in COMPOSES in some detail:

  • M. Baroni, R. Bernardi and R. Zamparelli. 2014. Frege in space: A program for compositional distributional semantics. Linguistic Issues in Language Technologies 9(6): 5-110.
  • Team^

    COMPOSES is carried out at the CLIC lab, a unit of the University of Trento's Center for Mind/Brain Sciences (CIMeC), in collaboration with the Departments of Computer Science (DISI) and Cognitive Science (DiPSCo).

    Senior researchers

    Post docs

    PhD Students

    Project manager

    Milestones^

    1. April 2014: First global evaluation of COMPOSES system
    2. January 2015: Release of semantic space models
    3. October 2015: Semantic norm data set release
    4. July 2016: COMPOSES code toolkit release
    5. October 2016: Second global evaluation of COMPOSES system

    Publications and associated data^

    Software and resources^

    We developed the DISSECT toolkit to construct and compose distributional semantic representations.

    This archive contains code written by Denis Paperno and Alicia Krebs extending DISSECT functionality with context distribution smoothing and shifted PMI.

    Other code developed by COMPOSES include:

    We developed the SICK data set for large-scale evaluation of compositional semantic models. The data set constitutes the basis for the SEMEVAL 2014 Task 1.

    We make high-performance semantic vectors available from this page.

    See the publications above for links to other data sets that we make publicly available and the corresponding reference papers.

    Slides from the Composition in distributional semantics mini-course taught by Marco Baroni and Georgiana Dinu at ESSLLI 2014.

    Contact information^

    Write to mbaroni AT gmail com.

    Acknowledgments^

    We gratefully acknowledge the European Commission and European Research Council for the COMPOSES Starting Independent Research Grant funded under the 7th Framework Program.