Marco's publications and presentations

Index

Papers
Presentations and more
Edited volumes

Papers

2025

E. Cheng, D. Doimo, C. Kervadec, I. Macocco, J. Yu, A. Laio and M. Baroni. 2025. Emergence of a high-dimensional abstraction phase in language transformers. Proceedings of ICLR 2025 (International Conference on Learning Representations), online at https://openreview.net/group?id=ICLR.cc/2025/Conference.

M. Mahaut, F. Franzon, R. Dessi and M. Baroni. 2025. Referential communication in heterogeneous communities of pre-trained visual deep networks. Transactions on Machine Learning Research: April 2025.

B. Ginn Nielsen, I. Macocco and M. Baroni. 2025. Prediction hubs are context-informed frequent tokens in LLMs. Proceedings of ACL 2025 (63d Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL: 23715-23745.

N. Rakotonirina, C. Kervadec, F. Franzon and M. Baroni. 2025. Evil twins are not that evil: Qualitative insights into machine-generated prompts. Proceedings of the 8th BlackboxNLP Workshop, East Stroudsburg PA: ACL: 48–68.

I. Macocco, N. Graichen, G. Boleda and M. Baroni. 2025. Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models. Proceedings of the 8th BlackboxNLP Workshop, East Stroudsburg PA: ACL: 109-136.

O. Ruest, M. Baroni and S. Stoll. 2025. Contexts of language learning: Predicting child language by interactive speech in 9 languages. Proceedings of BUCLD 49 (49th annual Boston University Conference on Language Development), online at http://www.cascadilla.com/bucld49toc.html.

2024

N. Rakotonirina and M. Baroni. 2024. MemoryPrompt: A light wrapper to improve context tracking in pre-trained language models. Proceedings of LREC-Coling 2024 (Joint International Conference on Computational Linguistics, Language Resources and Evaluation):11187-11195.

2023

B. Lake and M. Baroni. 2023. Human-like systematic generalization through a meta-learning neural network. Nature 623: 115-121.

E. Cheng, C. Kervadec and M. Baroni. 2023. Bridging information-theoretic and geometric compression in language models. Proceedings of EMNLP 2023 (Conference on Empirical Methods in Natural Language Processing), East Stroudsburg PA: ACL: 12397-12420.

C. Kervadec, F. Franzon and M. Baroni. 2023. Unnatural language processing: How do language models handle machine-generated prompts? Findings of EMNLP 2023 (Conference on Empirical Methods in Natural Language Processing), East Stroudsburg PA: ACL: 14377-14392.

R. Dessi, M. Bevilacqua, E. Gualdoni, N. Rakotonirina, F. Franzon and M. Baroni. 2023. Cross-domain image captioning with discriminative finetuning. Proceedings of CVPR 2023 (IEEE/CVF Conference on Computer Vision and Pattern Recognition): 6935-6944.

N. Rakotonirina, R. Dessi, F. Petroni, S. Riedel and M. Baroni. 2023. Can discrete information extraction prompts generalize across language models? Proceedings of ICLR 2023 (International Conference on Learning Representations), online at https://openreview.net/group?id=ICLR.cc/2023/Conference.

M. Mahaut, F. Franzon, R. Dessi and M. Baroni. 2023. Referential communication in heterogeneous communities of pre-trained visual deep networks. Proceedings of AAMAS 2023 (22nd International Conference on Autonomous Agents and Multiagent Systems): 2619-2621.

O. Ruest, M. Baroni and S. Stoll. 2023. Getting creative: A language modeling approach to predicting child utterances in 12 typologically diverse languages. Proceedings of BUCLD 47 (47th annual Boston University Conference on Language Development), online at http://www.cascadilla.com/bucld47toc.html.

2022

R. Dessi, E. Gualdoni, F. Franzon, G. Boleda and M. Baroni. 2022. Communication breakdown: On the low mutual intelligibility between human and neural captioning. Proceedings of EMNLP 2022 (Conference on Empirical Methods in Natural Language Processing), East Stroudsburg PA: ACL: 7998-8007.

M. Baroni. 2022. On the proper role of linguistically-oriented deep net analysis in linguistic theorizing. In Shalom Lappin (ed.), Algebraic systems and the representation of linguistic knowledge. Abingdon-on-Thames: Taylor and Francis: 5-22.

2021

R. Dessi, E. Kharitonov and M. Baroni. 2021. Interpretable agent communication from scratch (with a generic visual processor emerging on the side). Proceedings of NeurIPS 2021 (35th Conference on Neural Information Processing Systems), online at https://papers.nips.cc/paper/2021.

O. Ruest, M. Baroni and S. Stoll. 2021. The acquisition of case systems in typologically diverse languages: Children gradually generalize grammatical rules. Proceedings of BUCLD 46 (46th annual Boston University Conference on Language Development), online at http://www.cascadilla.com/bucld46toc.html.

Y. Lakretz, D. Hupkes, A. Vergallito, M. Marelli, M. Baroni and S. Dehaene. 2021. Mechanisms for handling nested dependencies in neural-network language models and humans. Cognition 213: 104699.

R. Chaabouni, E. Kharitonov, E. Dupoux and M. Baroni. 2021. Communicating artificial neural networks develop efficient color-naming systems. PNAS 118 (12): e2016569118.

T. Linzen and M. Baroni. 2021. Syntactic structure from deep learning. Annual Review of Linguistics 7:195-212.

I. Sorodoc, G. Boleda and M. Baroni. 2021. Controlled tasks for model analysis: Retrieving discrete information from sequences. Proceedings of the EMNLP 2021 Workshop on Analyzing and Interpreting Neural Networks for NLP (Blackbox NLP), East Stroudsburg PA: ACL: 468-478.

2020

E. Kharitonov and M. Baroni. 2020. Emergent language generalization and acquisition speed are not tied to compositionality. Proceedings of the EMNLP 2020 Workshop on Analyzing and Interpreting Neural Networks for NLP (Blackbox NLP), East Stroudsburg PA: ACL: 11-15.

E. Kharitonov, R. Chaabouni, D. Bouchacourt and M. Baroni. 2020. Entropy minimization in emergent languages. Proceedings of ICML 2020 (37th International Conference on Machine Learning): 2718-2728

R. Chaabouni, E. Kharitonov, D. Bouchacourt, E. Dupoux and M. Baroni. 2020. Compositionality and generalization in emergent languages. Proceedings of ACL 2020 (58th Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL: 4427-4442.

A. Lazaridou and M. Baroni. 2020. Emergent multi-agent communication in the Deep Learning era. Manuscript.

L. Ruis, J. Andreas, M. Baroni, D. Bouchacourt and B. Lake. 2020. A benchmark for systematic generalization in grounded language understanding. Proceedings of NeurIPS 2020 (34th Conference on Neural Information Processing Systems), online at https://papers.nips.cc/paper/2020.

M. Baroni. 2020. Rat big, cat eaten! Ideas for a useful deep-agent protolanguage. Manuscript.

E. Kharitonov, R. Chaabouni, D. Bouchacourt and M. Baroni. 2020. Entropy minimization in emergent languages (extended abstract). ICLR Workshop on Bridging AI and Cognitive Science (BAICS).

M. Baroni. 2020. Linguistic generalization and compositionality in modern artificial neural networks. Philosophical Transactions of the Royal Society B. 375(1791): 20190307.

J. Gordon, D. Lopez-Paz, M. Baroni and D. Bouchacourt. 2020. Permutation equivariant models for compositional generalization in language. Proceedings of ICLR 2020 (International Conference on Learning Representations), online at https://openreview.net/group?id=ICLR.cc/2020/Conference.

2019

E. Kharitonov, R. Chaabouni, D. Bouchacourt and M. Baroni. 2019. EGG: a toolkit for research on Emergence of lanGuage in Games. Proceedings of the System Demonstrations of EMNLP 2019 (Conference on Empirical Methods in Natural Language Processing), EastStroudsburg PA: ACL: 55-60.

R. Chaabouni, E. Kharitonov, E. Dupoux and M. Baroni. 2019. Anti-efficient encoding in emergent communication. Proceedings of NeurIPS 2019 (33d Conference on Neural Information Processing Systems), Vancouver, BC: Curran Asoociates: 6290-6300.

M. Hahn and M. Baroni. 2019. Tabula nearly rasa: Probing the linguistic knowledge of character-level neural language models trained on unsegmented text. Transactions of the Association for Computational Linguistics. 7: 467-484

B. Lake, T. Linzen and M. Baroni. 2019. Human few-shot learning of compositional instructions. Proceedings of CogSci 2019 (41st Annual Meeting of the Cognitive Science Society), Montreal, QB: Cognitive Science Society: 611-617.

R. Chaabouni, E. Kharitonov, A. Lazaric, E. Dupoux and M. Baroni. 2019. Word-order biases in deep-agent emergent communication. Proceedings of ACL 2019 (57th Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL: 5166-5175.

D. Bouchacourt and M. Baroni. 2019. Miss Tools and Mr Fruit: Emergent communication in agents learning about object affordances. Proceedings of ACL 2019 (57th Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL: 3909-3918.

R. Dessi and M. Baroni. 2019. CNNs found to jump around more skillfully than RNNs: Compositional generalization in seq2seq convolutional networks. Proceedings of ACL 2019 (57th Annual Meeting of the Association for Computational Linguistics), Short Papers, East Stroudsburg PA: ACL: 3919-3923.

D. Blasi, R. Cotterell, L. Wolf-Sonkin, S. Stoll, B. Bickel and M. Baroni. 2019. On the distribution of deep clausal embeddings: a large cross-linguistic study. Proceedings of ACL 2019 (57th Annual Meeting of the Association for Computational Linguistics), Short Papers, East Stroudsburg PA: ACL: 3938-3943.

Y. Lakretz, G. Kruszewski, T. Desbordes, D. Hupkes, S. Dehaene and M. Baroni. 2019. The emergence of number and syntax units in LSTM language models. Proceedings of NAACL 2019 (17th Annual Conference of the North American Chapter of the Association for Computational Linguistics), East Stroudsburg PA: ACL: 11-20.

R. Dessi, D. Bouchacourt, D. Crepaldi and M. Baroni. 2019. Focus on what's informative and ignore what's not: Communication strategies in a referential game. 3d NeurIPS Workshop on Emergent Communication.

2018

D. Bouchacourt and M. Baroni. 2018. How agents see things: On visual representations in an emergent language game. Proceedings of EMNLP 2018 (Conference on Empirical Methods in Natural Language Processing), EastStroudsburg PA: ACL: 981-985

J. Loula, M. Baroni and B. Lake. 2018. Rearranging the familiar: Testing compositional generalization in recurrent networks. Proceedings of the EMNLP 2018 Workshop on Analyzing and Interpreting Neural Networks for NLP (Blackbox NLP), EastStroudsburg PA: ACL: 108-114.

J. Bastings, M. Baroni, J. Weston, K. Cho and D. Kiela. 2018. Jump to better conclusions: SCAN both left and right. Proceedings of the EMNLP 2018 Workshop on Analyzing and Interpreting Neural Networks for NLP (Blackbox NLP), EastStroudsburg PA: ACL: 47-55.

B. Lake and M. Baroni. 2018. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. Proceedings of ICML 2018 (35th International Conference on Machine Learning): 2873-2882.

K. Gulordava, P. Bojanowski, E. Grave, T. Linzen and M. Baroni. 2018. Colorless green recurrent networks dream hierarchically. Proceedings of NAACL HLT 2018 (16th Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies), East Stroudsburg PA: ACL: 1195-1205.

A. Liska, G. Kruszewski and M. Baroni. 2018. Memorize or generalize? Searching for a compositional RNN in a haystack. Proceedings of AEGAP (FAIM Joint Workshop on Architectures and Evaluation for Generality, Autonomy and Progress in AI), online at: http://cadia.ru.is/workshops/aegap2018/.

A. Conneau, G. Kruszewski, G. Lample, L. Barrault and M. Baroni. 2018. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. Proceedings of ACL 2018 (56th Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 2126-2136.

M. Baroni, G. Boleda and S. Pado. 2018. Show me the cup: Reference with continuous representations. Proceedings of CICLing 2017 (International Conference on Computational Linguistics and Intelligent Text Processing), New York: Springer: 209-224.

D. Bouchacourt and M. Baroni. 2018. Understanding inner representations of perceptual data in grounded multi-agent simulations. Proceedings of DaP 2018 (Workshop on Dialogue and Perception), online at https://clasp.gu.se/digitalAssets/1692/1692436_proceedings-of-the-workshop-on-dialogue-and-perception-2018.pdf.

M. Rojas-Carulla, M. Baroni and D. Lopez-Paz. 2018. Causal discovery using proxy variables. Proceedings of ICLR 2018 (International Conference on Learning Representations), workshop track, online at https://openreview.net/group?id=ICLR.cc/2018/Workshop.

T. Mikolov, A. Joulin and M. Baroni. 2018. A roadmap towards machine intelligence. Proceedings of CICLing 2016 (International Conference on Computational Linguistics and Intelligent Text Processing), New York: Springer: 29-61.

2017

A. Herbelot and M. Baroni. 2017. High-risk learning: Acquiring new word vectors from tiny data. Proceedings of EMNLP 2017 (Conference on Empirical Methods in Natural Language Processing), EastStroudsburg PA: ACL: 304-309. The definitional data set used in this study.

A. Lazaridou, M. Marelli and M. Baroni. 2017. Multimodal word meaning induction from minimal exposure to natural text. Cognitive Science. 41(S4): 677-705. The data sets described in this study.

M. Baroni, A. Joulin, A. Jabri, G. Kruszewski, A. Lazaridou, K. Simonic and T. Mikolov. 2017. CommAI: Evaluating the first steps towards a useful general AI. Proceedings of ICLR 2017 (International Conference on Learning Representations) Workshop Track, online at https://openreview.net/group?id=ICLR.cc/2017/workshop.

A. Lazaridou, A. Peysakhovich, and M. Baroni. 2017. Multi-agent cooperation and the emergence of (natural) language. Proceedings of ICLR 2017 (International Conference on Learning Representations), online at http://iclr.cc/doku.php?id=ICLR2017:main.

E. Vecchi, M. Marelli, R. Zamparelli and M. Baroni. 2017. Spicy adjectives and nominal donkeys: Capturing semantic deviance using compositionality in distributional spaces. Cognitive Science 41(1): 102-136. The adjective-noun plausibility ratings described in this study.

G. Boleda, S. Pado, N. Pham and M. Baroni. 2017. Living a discrete life in a continuous world: Reference with distributed representations. Proceedings of IWCS 2017 (12th International Conference on Computational Semantics), Short Papers, East Stroudsburg PA: ACL, online at http://aclweb.org/anthology/W/W17/#6900.

J. Hernandez-Orallo, M. Baroni, J. Bieger, N. Chamit, D. Dowe, K. Hofmann, F. Martinez-Plumed, C. Strannegard and K. Thorisson. 2017. A new AI evaluation cosmos: Ready to play the game? AI Magazine 38(3): 66-69.

2016

G. Kruszewski, D. Paperno, R. Bernardi and M. Baroni. 2016. There is no logical negation here, but there are alternatives: modeling conversational negation with distributional semantics. Computational Linguistics 42(4): 637-660. The data set described in this article.

D. Paperno, G. Kruszewski, A. Lazaridou, N. Pham, R. Bernardi, S. Pezzelle, M. Baroni, G. Boleda and R. Fernandez. 2016. The LAMBADA dataset: Word prediction requiring a broad discourse context. Proceedings of ACL 2016 (54th Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 1525-1534. The LAMBADA page, with the dataset.

A. Lazaridou, N. Pham and M. Baroni. 2016. The red one! On learning to refer to things based on their discriminative properties. Proceedings of ACL 2016 (54th Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 213-218.

A. Lazaridou, N. Pham and M. Baroni. 2016. Towards multi-agent communication-based language learning. arXiv e-print 1605.07133.

A. Lazaridou, G. Chrupala, R. Fernandez and M. Baroni. 2016. Multimodal semantic learning from child-directed input. Proceedings of NAACL HLT 2016 (2016 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies), East Stroudsburg PA: ACL, 387-392.

D. Paperno and M. Baroni. 2016. When the whole is less than the sum of its parts: How composition affects PMI values in distributional semantic vectors. Computational Linguistics 42(2): 345-350.

M. Baroni. 2016. Grounding distributional semantics in the visual world. Language and Linguistics Compass 10(1): 3-13. Please contact me if you would like a copy.

P. Tremblay, I. Deschamps, M. Baroni and U. Hasson. 2016. Neural sensitivity to syllable frequency and mutual information in speech perception and production. NeuroImage 136: 106-121.

L. Bentivogli, R. Bernardi, M. Marelli, S. Menini, M. Baroni and R. Zamparelli. 2016. SICK through the SemEval glasses: Lessons learned from the evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. Journal of Language Resources and Evaluation 50(1): 95-124.

2015

M. Marelli and M. Baroni. 2015. Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review 122(3): 485-515.

A. Anderson, E. Bruni, A. Lopopolo, M. Poesio and M. Baroni. 2015. Reading visually embodied meaning from the brain: visually grounded computational models decode visual-object mental imagery induced by written text. Neuroimage 120: 309-322.

G. Kruszewski, D. Paperno and M. Baroni. 2015. Deriving Boolean structures from distributional vectors. Transactions of the Association for Computational Linguistics 3: 375-388.

A. Gupta, G. Boleda, M. Baroni and S. Pado. 2015. Distributional vectors encode referential attributes. Proceedings of EMNLP 2015 (Conference on Empirical Methods in Natural Language Processing), East Stroudsburg PA: ACL, 12-21.

A. Lazaridou, D. Nguyen, R. Bernardi and M. Baroni. 2015. Unveiling the dreams of word embeddings: Towards language-driven image generation. Poster at the Multimodal Machine Learning Workshop of NIPS 2015, Montreal (Canada).

A. Lazaridou, D. Nguyen and M. Baroni. 2015. Do distributed semantic models dream of electric sheep? Visualizing word representations through image synthesis. Proceedings of the EMNLP 2015 Workshop on Vision and Language (VL15), East Stroudsburg PA: ACL, 81-86.

N. Pham, G. Kruszewski, A. Lazaridou and M. Baroni. 2015. Jointly optimizing word representations for lexical and sentential tasks with the C-PHRASE model. Proceedings of ACL 2015 (53rd Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 971-981. The C-PHRASE vectors.

N. Pham, A. Lazaridou and M. Baroni. 2015. A multitask objective to inject lexical contrast into distributional semantics. Proceedings of ACL 2015 (53rd Annual Meeting of the Association for Computational Linguistics) Volume 2: Short Papers, East Stroudsburg PA: ACL, 21-26.

A. Lazaridou, G. Dinu and M. Baroni. 2015. Hubness and pollution: Delving into cross-space mapping for zero-shot learning. Proceedings of ACL 2015 (53rd Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 270-280.

A. Lazaridou, N. Pham and M. Baroni. 2015. Combining language and vision with a multimodal skip-gram model. Proceedings of NAACL HLT 2015 (2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies), East Stroudsburg PA: ACL, 153-163.

G. Kruszewski and M. Baroni. 2015. So similar and yet incompatible: Toward automated identification of semantically compatible words. Proceedings of NAACL HLT 2015 (2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies), East Stroudsburg PA: ACL, 964-969. The compatibility data set described in this article.

S. Ritter, C. Long, D. Paperno, M. Baroni, M. Botvinick and A. Goldberg. 2015. Leveraging preposition ambiguity to assess compositional distributional models of semantics. Proceedings of *SEM 2015 (Fourth Joint Conference on Lexical and Computational Semantics), East Stroudsburg PA: ACL, 199-204.

G. Dinu, A. Lazaridou and M. Baroni. 2015. Improving zero-shot learning by mitigating the hubness problem. Proceedings of ICLR 2015 (International Conference on Learning Representations), workshop track, online at http://www.iclr.cc/doku.php?id=iclr2015:main. Code and data for the bilingual lexicon induction experiments.

A. Lazaridou, G. Dinu, A. Liska and M. Baroni. 2015. From visual attributes to adjectives through decompositional distributional semantics. Transactions of the Association for Computational Linguistics 3: 183-196.

F.M. Zanzotto, L. Ferrone and M. Baroni. 2015. When the whole is not greater than the combination of its parts: A decompositional look at compositional distributional semantics. Computational Linguistics 41(1): 165-173.

M. Marelli, G. Dinu, R. Zamparelli and M. Baroni. 2015. Picking buttercups and eating butter cups: Spelling alternations, semantic relatedness and their consequences for compound processing. Applied Psycholinguistics 36(6): 1421-1439.

2014

G. Kruszewski and M. Baroni. 2014. Dead parrots make bad pets: Exploring modifier effects in noun phrases. Proceedings of *SEM 2014 (Third Joint Conference on Lexical and Computational Semantics), East Stroudsburg PA: ACL, 171-181. The Norwegian Blue Parrot data set described in this article.

D. Paperno, M. Marelli, K. Tentori and M. Baroni. 2014. Corpus-based estimates of word association predict biases in judgment of word co-occurrence likelihood. Cognitive Psychology 74: 66-83.

A. Lazaridou, E. Bruni and M. Baroni. 2014. Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world. Proceedings of ACL 2014 (52nd Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 1403-1414.

G. Dinu and M. Baroni. 2014. How to make words with vectors: Phrase generation in distributional semantics. Proceedings of ACL 2014 (52nd Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 624-633. The AN-to-NPN data set from this study. Code and data for the monolingual generation experiments.

D. Paperno, N. Pham and M. Baroni. 2014. A practical and linguistically-motivated approach to compositional distributional semantics. Proceedings of ACL 2014 (52nd Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 90-99. Code for the PLF model.

M. Baroni, G. Dinu and G. Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors Proceedings of ACL 2014 (52nd Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 238-247. An archive with results with further models and parameter settings on the same benchmarks. The best count and predict semantic vectors from this study.

M. Baroni, R. Bernardi and R. Zamparelli. 2014. Frege in space: A program for compositional distributional semantics. Linguistic Issues in Language Technologies 9(6): 5-110.

E. Bruni, N. Tran and M. Baroni. 2014. Multimodal distributional semantics. Journal of Artificial Intelligence Research 49: 1-47. 2017 IJCAI-JAIR Best Paper Prize.

M. Hernandez, S. Fairhall, A. Lenci, M. Baroni and A. Caramazza. 2014. Predication drives verb cortical signatures. Journal of Cognitive Neuroscience 26(8): 1829-1839.

J. Li, M. Baroni and G. Dinu. 2014. Improving the lexical function composition model with pathwise optimized elastic-net regression. Proceedings of EACL 2014 (14th Conference of the European Chapter of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 434-442.

M. Marelli, L. Bentivogli, M. Baroni, R. Bernardi, S. Menini and R. Zamparelli. 2014. Semeval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. Proceedings of SemEval 2014 (International Workshop on Semantic Evaluation), East Stroudsburg PA: ACL, 1-8.

M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi and R. Zamparelli. 2014. A SICK cure for the evaluation of compositional distributional semantic models. Proceedings of LREC 2014, Reykjavik (Iceland): ELRA, 216-223. The SICK data set described in this paper.

2013

A. Lenci, M. Baroni, G. Cazzolli and G. Marotta. 2013. BLIND: a set of semantic feature norms from the congenitally blind. Behavior Research Methods 45(4): 1218-1233. The norms described in this article.

M. Baroni. 2013. Composition in distributional semantics. Language and Linguistics Compass 7(10): 511-522. Please contact me if you would like a copy.

A. Anderson, E. Bruni, U. Bordignon, M. Poesio and M. Baroni. 2013. Of words, eyes and brains: Correlating image-based distributional semantic models with neural representations of concepts. Proceedings of EMNLP 2013 (Conference on Empirical Methods in Natural Language Processing), East Stroudsburg PA: ACL, 1960-1970.

E. Vecchi, R. Zamparelli and M. Baroni. 2013. Studying the recursive behaviour of adjectival modification with compositional distributional semantics. Proceedings of EMNLP 2013 (Conference on Empirical Methods in Natural Language Processing), East Stroudsburg PA: ACL, 141-151.

A. Lazaridou, E. Vecchi and M. Baroni. 2013. Fish transporters and miracle homes: How compositional distributional semantics can help NP parsing. Proceedings of EMNLP 2013 (Conference on Empirical Methods in Natural Language Processing), East Stroudsburg PA: ACL, 1908-1913. The data set from this study.

G. Dinu, N. Pham and M. Baroni. 2013. General estimation and evaluation of compositional distributional semantic models. Proceedings of the ACL 2013 Workshop on Continuous Vector Space Models and their Compositionality (CVSC 2013), East Stroudsburg PA: ACL, 50-58.

A. Lazaridou, M. Marelli, R. Zamparelli and M. Baroni. 2013. Compositional-ly derived representations of morphologically complex words in distributional semantics. Proceedings of ACL 2013 (51st Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 1517-1526. The data set from this study.

R. Bernardi, G. Dinu, M. Marelli and M. Baroni. 2013. A relatedness benchmark to test the role of determiners in compositional distributional semantics. Proceedings of ACL 2013 (51st Annual Meeting of the Association for Computational Linguistics) Volume 2: Short Papers, East Stroudsburg PA: ACL, 53-57. The data set from this study.

G. Dinu, N. Pham and M. Baroni. 2013. DISSECT: DIStributional SEmantics Composition Toolkit. Proceedings of the System Demonstrations of ACL 2013 (51st Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 31-36.

E. Grefenstette, G. Dinu, Y.-Z. Zhang, M. Sadrzadeh and M. Baroni. 2013. Multi-step regression learning for compositional distributional semantics. Proceedings of IWCS 2013 (10th International Conference on Computational Semantics), East Stroudsburg PA: ACL, 131-142.

G. Boleda, M. Baroni, L. McNally and N. Pham. 2013. Intensionality was only alleged: On adjective-noun composition in distributional semantics. Proceedings of IWCS 2013 (10th International Conference on Computational Semantics), East Stroudsburg PA: ACL, 35-46. The data set from this study.

N. Pham, R. Bernardi, Y.-Z. Zhang and M. Baroni. 2013. Sentence paraphrase detection: When determiners and word order make the difference. Proceedings of the Towards a Formal Distributional Semantics Workshop at IWCS 2013, East Stroudsburg PA: ACL, 21-29. The data sets from this study.

B. Magnini, M. Baroni, M. Federico and R. Navigli. 2013. Recent advancements in human language technology in Italy. Intelligenza Artificiale, VII-2. 91-100.

P. Tremblay, M. Baroni and U. Hasson. 2013. Processing of speech and non-speech sounds in the supratemporal plane: Auditory input preference does not predict sensitivity to statistical structure. NeuroImage 66: 318-332.

M. Baroni and S. Bernardini. 2013. Corpus query tools for lexicography. In Rufus Gouws, Ulrich Heid, Wolfgang Schweickard and Herbert Wiegand (eds.), Dictionaries: An international encyclopedia of lexicography, supplementary volume: Recent developments with focus on electronic and computational lexicography, Berlin: Mouton de Gruyter: 1395-1405.

2012

E. Bruni, J. Uijlings, M. Baroni and N. Sebe. 2012. Distributional semantics with eyes: Using image analysis to improve computational representations of word meaning. Brave New Idea paper. Proceedings of MM 12 (20th ACM International Conference on Multimedia), New York NY: ACM, 1219-1228.

M. Baroni, R. Bernardi, N. Do and C. Shan. 2012. Entailment above the word level in distributional semantics. Proceedings of EACL 2012 (13th Conference of the European Chapter of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 23-32. The data sets from this study.

E. Bruni, G. Boleda, M. Baroni and N. Tran. 2012. Distributional semantics in technicolor. Proceedings of ACL 2012 (50th Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 136-145. The data sets from this study.

A. Herdagdelen and M. Baroni. 2012. Bootstrapping a game with a purpose for commonsense collection. ACM Transactions on Intelligent Systems and Technology 3(4): 1-24.

2011

A. Herdagdelen and M. Baroni. 2011. Stereotypical gender actions can be extracted from Web text. Journal of the American Society for Information Science and Technology 62(9): 1653-1666.

E. Vecchi, M. Baroni and R. Zamparelli. 2011. (Linear) maps of the impossible: Capturing semantic anomalies in distributional space. Proceedings of the DISCO (Distributional Semantics and Compositionality) Workshop at ACL 2011, East Stroudsburg PA: ACL, 1-9.

E. Bruni, G.B. Tran and M. Baroni. 2011. Distributional semantics from text and images. Proceedings of the EMNLP 2011 Geometrical Models for Natural Language Semantics (GEMS 2011) Workshop, East Stroudsburg PA: ACL, 22-32.

M. Baroni and A. Lenci. 2011. How we BLESSed distributional semantic evaluation. Proceedings of the EMNLP 2011 Geometrical Models for Natural Language Semantics (GEMS 2011) Workshop, East Stroudsburg PA: ACL, 1-10.

K. Gulordava and M. Baroni. 2011. A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. Proceedings of the EMNLP 2011 Geometrical Models for Natural Language Semantics (GEMS 2011) Workshop, East Stroudsburg PA: ACL, 67-71. The 100-word similarity data-set described in the paper

G. Kremer and M. Baroni. 2011. A set of semantic norms for German and Italian. Behavior Research Methods 43(1): 97-109. The norms described in this article.

M. Baroni. 2011. Statistiche linguistiche. In Raffaele Simone (ed.), Enciclopedia dell'italiano, vol. 2. Roma: Istituto della Enciclopedia Italiana: 1400-1401.

2010

M. Baroni and A. Lenci. 2010. Distributional Memory: A general framework for corpus-based semantics. Computational Linguistics 36(4): 673-721. ACL 10-year Test of Time Award.

M. Baroni and R. Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), East Stroudsburg PA: ACL, 1183-1193. ACL 10-year Test of Time Award Shortlist.

M. Baroni, B. Murphy, E. Barbu and M. Poesio. 2010. Strudel: A corpus-based semantic model based on properties and types. Cognitive Science 34(2): 222-254.

A. Herdagdelen and M. Baroni. 2010. The Concept Game: Better commonsense knowledge extraction by combining text mining and a game with a purpose. In Catherine Havasi, Doug Lenat and Benjamin Van Durme (eds.), Commonsense Knowledge: Papers from the AAAI Fall Symposium, Menlo Park (CA): AAAI Press: 52-57.

M. Poesio, M. Baroni, O. Lanz, A. Lenci, A. Potamianos, H. Schütze, S. Schulte im Walde and L. Surian. 2010. BabyExp: Constructing a huge multimodal resource to acquire commonsense knowledge like children do. Proceedings of LREC 2010, Valletta (Malta): ELRA.

G. Kremer and M. Baroni. 2010. Predicting cognitively salient modifiers of the constitutive parts of concepts. Proceedings of the Cognitive Modeling and Computational Linguistics Workshop at ACL 2010, East Stroudsburg PA: ACL, 54-62.

M. Baroni. 2010. Corpora di italiano. In Raffaele Simone (ed.), Enciclopedia dell'italiano, vol 1. Roma: Istituto della Enciclopedia Italiana: 300-303.

V. Pirrelli, E. Guevara and M. Baroni. 2010. Computational issues in compound parsing. In Sergio Scalise and Irene Vogel (eds.), Cross-disciplinary issues in compounding, Amsterdam: Benjamins: 271-286.

2009

B. Murphy, M. Baroni and M. Poesio. 2009. EEG responds to conceptual stimuli and corpus semantics. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), East Stroudsburg PA: ACL, 619-627.

M. Baroni and A. Lenci. 2009. One distributional memory, many semantic spaces. Proceedings of the EACL 2009 Geometrical Models for Natural Language Semantics (GEMS) Workshop, East Stroudsburg PA: ACL, 1-8.

A. Herdagdelen and M. Baroni. 2009. BagPack: A general framework to represent semantic relations. Proceedings of the EACL 2009 Geometrical Models for Natural Language Semantics (GEMS) Workshop, East Stroudsburg PA: ACL, 33-40.

M. Baroni. 2009. Distributions in text. In Anke Lüdeling and Merja Kytö (eds.), Corpus linguistics: An international handbook, Volume 2, Berlin: Mouton de Gruyter: 803-821.

M. Baroni and S. Evert. 2009. Statistical methods for corpus exploitation. In Anke Lüdeling and Merja Kytö (eds.), Corpus linguistics: An international handbook, Volume 2, Berlin: Mouton de Gruyter: 777-802.

M. Baroni, S. Bernardini, A. Ferraresi and E. Zanchetta. 2009. The WaCky Wide Web: A collection of very large linguistically processed Web-crawled corpora. Journal of Language Resources and Evaluation 43(3): 209-226.

A. Herdagdelen, K. Erk and M. Baroni. 2009. Measuring semantic relatedness with vector space models and random walks. Proceedings of TextGraphs-4: Graph-based Methods for Natural Language Processing, East Stroudsburg PA: ACL, 50-53.

M. Kirschner, R. Bernardi, M. Baroni and L.T. Dinh. 2009. Analyzing interactive QA dialogues using logistic regression models. In Roberto Serra and Rita Cucchiara (eds.), AI*IA 2009: Emergent Perspectives in Artificial Intelligence (Lecture Notes in Computer Science 5883), New York: Springer, 334-344.

D. Pucci, M. Baroni, F. Cutugno and A. Lenci. 2009. Unsupervised lexical substitution with a word space model. Poster and Workshop Proceedings of the 11th Conference of the Italian Association for Artificial Intelligence, online at http://evalita.fbk.eu/proceedings.html.

M. Baroni, E. Guevara and R. Zamparelli. 2009. The dual nature of deverbal nominal constructions: Evidence from acceptability ratings and corpus analysis. Corpus Linguistics and Linguistic Theory 5(1): 27-60.

M. Baroni, E. Guevara and V. Pirrelli. 2009. Sulla tipologia dei composti N+N in italiano: Principi categoriali ed evidenza distribuzionale a confronto. In Ruben Benatti, Giacomo Ferrari and Monica Mosca (eds.), Linguistica e modelli tecnologici di ricerca (Atti del 40esimo Congresso della Società di Linguistica Italiana). Roma: Bulzoni: 73-95.

2008

M. Baroni and A. Lenci. 2008. Concepts and properties in word spaces. In Alessandro Lenci (ed.), From context to meaning: Distributional models of the lexicon in linguistics and cognitive science (Special issue of the Italian Journal of Linguistics 20(1)): 55-88.

L. Onnis, T. Farmer, M. Baroni, M. Christiansen and M. Spivey. 2008. Generalizable distributional regularities aid fluent language processing: The case of semantic valence tendencies. In Alessandro Lenci (ed.), From context to meaning: Distributional models of the lexicon in linguistics and cognitive science (Special issue of the Italian Journal of Linguistics 20(1)): 129-156.

G. Kremer, A. Abel and M. Baroni. 2008. Cognitively salient relations for multilingual lexicography. Proceedings of CogALex (Cognitive Aspects of the Lexicon) Workshop at COLING 2008. 94-101.

M. Baroni, F. Chantree, A. Kilgarriff and S. Sharoff. 2008. CleanEval: A competition for cleaning Webpages. Proceedings of LREC 2008, Marrakech: ELRA.

A. Ferraresi, E. Zanchetta, M. Baroni and S. Bernardini. 2008. Introducing and evaluating ukWaC, a very large Web-derived corpus of English. Proceedings of the WAC4 Workshop at LREC 2008, Marrakech: ELRA.

A. Ferraresi, S. Bernardini, G. Picci, M. Baroni. 2008. Web corpora for bilingual lexicography: A pilot study of English/French collocation extraction and translation. Proceedings of UCCTS: International Symposium on Using Corpora in Contrastive and Translation Studies.

2007

M. Baroni, A. Lenci and L. Onnis. 2007. ISA meets Lara: An incremental word space model for cognitively plausible simulations of semantic learning. Proceedings of the ACL 2007 Workshop on Cognitive Aspects of Computational Language Acquisition, East Stroudsburg PA: ACL. 49-56.

M. Baroni and S. Evert. 2007. Words and echoes: Assessing and mitigating the non-randomness problem in word frequency distribution modeling. Proceedings of ACL 2007 (45th Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL. 904-911.

S. Evert and M. Baroni. 2007. zipfR: Word frequency distributions in R. Proceedings of the Demo and Poster Sessions of ACL 2007 (45th Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL. 29-32.

M. Baroni, E. Guevara and V. Pirrelli. 2007. NN compounds in Italian: Modelling category induction and analogical extension. In Vito Pirrelli (ed.), Psycho-Computational Issues in Morphology Learning and Processing (Special issue of Lingue e Linguaggio, 6.2), Bologna: il Mulino. 263-290.

M. Baroni. 2007. I sensi di ri-: Un'indagine preliminare. In Roberta Maschi, Nicoletta Penello and Piera Rizzolatti (eds.), Miscellanea di studi linguistici offerti a Laura Vanelli. Udine: Forum. 163-171.

H. Schmid, M. Baroni, E. Zanchetta and A. Stein. 2007. The Enriched TreeTagger System. Intelligenza Artificiale IV-2. 22-23.

A. Lüdeling, S. Evert and M. Baroni. 2007. Using Web data for linguistic purposes. In Marianne Hundt, Nadjia Nesselhauf and Caroline Biewer (eds.), Corpus linguistics and the Web. Amsterdam: Rodopi. 7-24.

2006

M. Baroni and S. Bernardini. 2006. A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing 21(3). 259-274.

S. Bernardini, M. Baroni and S. Evert. 2006. A WaCky introduction. In Marco Baroni and Silvia Bernardini (eds.), Wacky! Working papers on the Web as Corpus, Bologna: Gedit. 9-40.

M. Ciaramita and M. Baroni. 2006. Measuring Web-corpus randomness: A progress report. In Marco Baroni and Silvia Bernardini (eds.), Wacky! Working papers on the Web as Corpus, Bologna: Gedit. 127-158.

M. Baroni, A. Kilgarriff, J. Pomikalek and P. Rychly. 2006. WebBootCaT: a web tool for instant corpora. Proceedings of Euralex 2006, Alessandria: Edizioni dell'Orso. 123-132.

M. Baroni, A. Kilgarriff, J. Pomikalek and P. Rychly. 2006. WebBootCaT: Instant domain-specific corpora to support human translators. Proceedings of EAMT-2006. 247-252.

M. Ciaramita and M. Baroni. 2006. A figure of merit for the evaluation of Web-corpus randomness. Proceedings of EACL 2006 (11th Conference of the European Chapter of the Association for Computational Linguistics), East Stroudsburg PA: ACL. 217-224.

M. Baroni and A. Kilgarriff. 2006. Large linguistically-processed Web corpora for multiple languages. Conference Companion of EACL 2006 (11th Conference of the European Chapter of the Association for Computational Linguistics), East Stroudsburg PA: ACL. 87-90.

S. Evert and M. Baroni. 2006. Testing the extrapolation quality of word frequency models. Proceedings of Corpus Linguistics 2005, online at http://www.corpus.bham.ac.uk/PCLC/.

M. Ueyama and M. Baroni. 2006. Automated construction and evaluation of a Japanese web-based reference corpus. Proceedings of Corpus Linguistics 2005, online at http://www.corpus.bham.ac.uk/PCLC/.

S. Bernardini and M. Baroni. 2006. Spotting translationese: A corpus-driven approach using support vector machines. Proceedings of Corpus Linguistics 2005, online at http://www.corpus.bham.ac.uk/PCLC/.

E. Zanchetta and M. Baroni. 2006. Morph-it! A free corpus-based morphological resource for the Italian language. Proceedings of Corpus Linguistics 2005, online at http://www.corpus.bham.ac.uk/PCLC/.

M. Baroni and M. Ueyama. 2006. Building general- and special-purpose corpora by Web crawling. Proceedings of the 13th NIJL International Symposium, Language Corpora: Their Compilation and Application. 31-40.

R. Scarborough, P. Keating, M. Baroni, T. Cho, S. Mattys, A. Alwan, E. Auer and L. Bernstein. 2006. Optical cues to the visual perception of lexical and phrasal stress in English. Speech Prosody 2006 (Proceedings of the 3rd International Conference on Speech Prosody), Dresden: TUDpress Verlag. 217-220.

2005

H. Trost, J. Matiasek and M. Baroni. 2005. The language component of the FASTY text prediction system. Applied Artificial Intelligence 19(8). 743-781.

M. Mazzoleni and M. Baroni. 2005. I toponimi stranieri nella stampa italiana: primi risultati di una ricerca sul corpus de la Repubblica. Atti di XXII ICOS: Congresso Internazionale di Scienze Onomastiche.

M. Baroni and M. Mazzoleni. 2005. I toponimi stranieri nella stampa quotidiana italiana: fasi preliminari di una ricerca sul corpus de la Repubblica. In I. Korzen (ed.), Lingua, cultura e intercultura: l'italiano e le altre lingue, Atti dell'VIII Convegno SILFI, Frederiksberg: Samfundslitteratur Press (in the CD-ROM attached to the volume).

2004

M. Baroni and S. Vegnaduzzo. 2004. Identifying subjective adjectives through web-based mutual information. In Ernst Buchberger (ed.), Proceedings of KONVENS 2004, Vienna: ÖGAI. 17-24.

M. Baroni and M. Ueyama. 2004. Retrieving Japanese specialized terms and corpora from the World Wide Web. In Ernst Buchberger (ed.), Proceedings of KONVENS 2004, Vienna: ÖGAI. 13-16.

M. Baroni and S. Bernardini. 2004. BootCaT: Bootstrapping corpora and terms from the web. Proceedings of LREC 2004, Lisbon: ELDA. 1313-1316.

M. Baroni and S. Bisi. 2004. Using cooccurrence statistics and the web to discover synonyms in a technical language. Proceedings of LREC 2004, Lisbon: ELDA.1725-1728.

M. Baroni, S. Bernardini, F. Comastri, L. Piccioni, A. Volpi, G. Aston and M. Mazzoleni. 2004. Introducing the la Repubblica corpus: A large, annotated, TEI(XML)-compliant corpus of newspaper Italian. Proceedings of LREC 2004, Lisbon: ELDA. 1771-1774.

C. Bendazzoli, C. Monti, A. Sandrelli, M. Russo, M. Baroni, S. Bernardini, G. Mack, E. Ballardini and P. Mead. 2004. Towards the creation of an electronic corpus to study directionality in simultaneous interpreting. In Nelleke Oostdijk, Gjert Kristoffersen and Geoffrey Sampson (eds.), Compiling and processing spoken language corpora: Proceedings of the LREC 2004 Satellite Workshop, Lisbon: ELDA. 33-39

2003

M. Baroni. 2003. Distribution-driven morpheme discovery: A computational/experimental study. In Geert Booij and Jaap van Marle (eds.), Yearbook of Morphology 2003, Dordrecht: Springer. 213-248. The full data-sets from the surveys discussed in this study.

P. Keating, M. Baroni, S. Mattys, R. Scarborough, A. Alwan, E. Auer and L. Bernstein. 2003. Optical phonetics and visual perception of lexical and phrasal stress in English. Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS). 2071-2074.

J. Matiasek and M. Baroni. 2003. Exploiting long distance collocational relations in predictive typing. In Karin Harbusch, Michael Kühn and Harald Trost (eds.), Proceedings of the EACL-03 Workshop on Language Modeling for Text Entry Methods, East Stroudsburg PA: ACL. 1-8.

M. Baroni and S. Bernardini. 2003. A preliminary analysis of collocational differences in monolingual comparable corpora. In Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery (eds.), Proceedings of Corpus Linguistics 2003, Lancaster: UCREL. 82-91. Re-printed in: Wolfgang Teubert and Ramesh Krishnamurthy (eds.), Corpus Linguistics: Critical Concepts in Linguistics, London: Routledge, 2007, vol. IV, 366-383.

2002

M. Baroni. 2002. FASTY: A multilingual approach to text prediction. Elsnews 11.2. 11-12.

M. Baroni, J. Matiasek and H. Trost. 2002. Wordform- and class-based prediction of the components of German nominal compounds in an AAC system. In Shu-Chuan Tseng (ed.), COLING 2002: Proceedings of the 19th International Conference on Computational Linguistics, East Stroudsburg PA: ACL. 57-63.

M. Baroni, J. Matiasek and H. Trost. 2002. Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In Mike Maxwell (ed.), Proceedings of the Workshop on Morphological and Phonological Learning of ACL/SIGPHON-2002, East Stroudsburg PA: ACL. 48-57.

M. Baroni, J. Matiasek and H. Trost. 2002. Predicting the components of German nominal compounds. In Frank van Harmelen (eds.), Proceedings of the 15th European Conference on Artificial Intelligence (ECAI), Amsterdam: IOS Press. 470-474.

J. Matiasek, M. Baroni and H. Trost. 2002. FASTY: A multi-lingual approach to text prediction. In Klaus Miesenberger, Joachim Klaus, Wolfgang Zagler (eds.), Proceedings of the 8th International Conference on Computers Helping People with Special Needs (ICCHP), Dordrecht: Springer. 243-250.

2001

M. Baroni. 2001. The representation of prefixed forms in the Italian lexicon: Evidence from the distribution of intervocalic [s] and [z] in northern Italian. In Geert Booij and Jaap van Marle (eds.), Yearbook of Morphology 1999, Dordrecht: Springer. 121-152.

M. Baroni. 2001. How do languages get crazy constraints? Phonetically-based phonology and the evolution of the Galeata Romagnolo vowel system. (Gzipped PS file!) In Adam Albright and Taehong Cho (eds.), Papers in Phonology 5, Los Angeles: UCLA WPL Series. 152-178.

2000 and earlier

M. Baroni and L. Vanelli. 2000. The relationship between vowel length and consonantal voicing in Friulian. In Lori Repetti (ed.), Phonological theory and the dialects of Italy. Amsterdam: John Benjamins. 13-44.

M. Baroni and L. Vanelli. 1999. Il contrasto di lunghezza vocalica in friulano. In Paola Benincà, Alberto Mioni and Laura Vanelli (eds.), Fonologia and morfologia dell'Italiano e dei dialetti d'Italia. Roma: Bulzoni. 291-317.

M. Baroni. 1998. The phonetic nature of the Northern Italian allophones [s] and [z] in words with variable realization: Electroglottographic and acoustic evidence. UCLA Working Papers in Phonetics 96. 166-174.

M. Baroni. 1996. The natural classes of Lughese vowels and why they are natural. UCLA Working Papers in Phonology 1. 1-17.

M. Baroni. 1995. Iambic senarii. Quaderni Patavini di Linguistica 14. 13-38.

M. Baroni. 1994. Moraic structure and vowel length in Galeatese. Romance Linguistics and Literature Review 7. 24-52.

M. Baroni. 1993. Teorie della sottospecificazione e restrizioni sulle code consonantiche in italiano. Rivista di Grammatica Generativa 18. 3-59.

Back to the index

Presentations and more

Invited speeches, seminar talks, presentations and posters at conferences without proceedings, etc.

2025

M. Baroni. 2025. Exploring large language models through the lens of intrinsic dimensionality. Seminar at the Barcelona Supercomputing Center Language Technologies Laboratory, Barcelona (Spain).

2024

M. Baroni. 2024. Unnatural Language Processing: On the puzzling out-of-distribution behaviour of language models. Invited webinar at HiTZ, Donostia, Spain (earlier versions presented at SISSA, Trieste, Italy, and CIMeC, University of Trento, Italy).

2023

M. Baroni. 2023. Can discrete information extraction prompts generalize across language models? Presentation at the Deep Learning Barcelona Symposium 2023 (DLBCN), Barcelona (Spain).

2022

M. Baroni. 2022. Machine-to-machine communication: Do we need it? What should it be like? Is it "language"? Invited keynote talk at the 29th International Conference on Computational Linguistics (COLING), Gyeongju (Korea).

M. Baroni. 2022. Unnatural Language Processing: Entering the machine-to-machine communication era. Invited plenary Next Big Idea talk at the 60th Annual Meeting of the Association for Computational Linguistics (ACL), Dublin (Ireland).

M. Baroni. 2022. Dos and don'ts of being a researcher: some lessons I've learned on the way Invited presentation in the PI Stories seminar series of the IECS doctoral program of the University of Trento (Italy).

M. Baroni. 2022. Deep net emergent communication: Why bother? Invited talk at the ICLR 2022 Emecom Workshop (5th Workshop on Emergent Communication), online.

2021

M. Baroni. 2021. On the gap between theoretical and computational linguistics. Keynote talk at the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL), online (also presented in a revised version at the Collège de France Representation of Language in Brains and Machines symposium, Paris, France).

2020

M. Baroni. 2020. Quirks of deep emergent languages. Invited lecture at the CIFAR/Mila 2020 Deep Learning and Reinforcement Learning Summer School (Montreal, Canada).

M. Baroni. 2020. Is compositionality over-rated? A view from emergent neural network language analysis. Invited seminar at the MIT Center for Brains, Minds and Machines (Cambridge, MA). Video of the presentation.

E. Kharitonov, R. Chaabouni, D. Bouchacourt and M. Baroni. 2020. Entropy minimization in emergent languages. Poster and spotlight presentation at the ICLR 2020 Bridging AI and Cognitive Science Workshop, Addis Ababa (Ethiopia).

2019

M. Baroni. 2019. Neural network linguistics. Invited talk at the University of Zurich Colloquium in Comparative Language Science, Switzerland (earlier versions presented as CCIL master inaugural lecture, Barcelona, Spain, and as invited talks at the MIT Computational Psycholinguistics Lab and at Artificial and Biological Cognition, the 7th Cambridge Neuroscience Symposium, Cambridge, UK).

M. Baroni. 2019. Language emergence as representation learning. Invited talk at the 4th Workshop on Representation Learning for NLP (RepL4NLP), co-located with ACL 2019, Florence (Italy).

M. Baroni. 2019. Emergence of a grammatical subsystem in a neural language model. Talk in the Generalisation in Mind and Machines seminar series, School of Psychological Science, University of Bristol (UK).

R. Chaabouni, E. Kharitonov, E. Dupoux and M. Baroni. 2019. Are deep communicating agents efficient coders? Poster at the Interaction and the Evolution of Linguistic Complexity Workshop, Center for Language and Evolution, University of Edinburgh (UK).

M. Baroni. 2019. Formal neural network linguistics. Talk at the COLT Kick-off Workshop, Barcelona (Spain).

2018

M. Baroni. 2018. On the linguistic knowledge acquired by character-level neural language models. Invited seminar at CLASP, Gothenburg (Sweden).

M. Baroni. 2018. Compositional generalization in artificial neural networks and humans. Invited talk at the Horizon Maths 2018 : Intelligence Artificielle event, Paris (France) (other versions presented at the Facebook Understanding Human and Machine Intelligence Workshop, New York, at Amazon CoreAI, Barcelona, Spain, CLASP, Gothenburg, Sweden, at the Collège de France, Paris, France, at the Institute for Logic, Language and Computation, Amsterdam, the Netherlands, and in the CLIC lab seminar series, University of Trento, Italy).

M. Baroni. 2018. What do modern recurrent neural networks learn about syntactic structure and compositionality? Invited talk at the Towards mechanistic models of meaning composition symposium, Trondheim (Norway).

M. Baroni. 2018. LSTMs vs hierarchical structure in language. Invited talk in the Amore seminar series, Universitat Pompeu Fabra, Barcelona (Spain).

2017

M. Baroni. 2017. Spectacular successes and failures of recurrent neural networks applied to language. Keynote talk at the Fourth Italian Conference on Computational Linguistics (CLIC-it), Rome (Italy) (other versions presented as keynote talk at the Paris Syntax and Semantics Conference, at the UPF Translation and Language Sciences Unit, Barcelona, Spain, and at LSCP-ENS, Paris).

M. Baroni. 2017. Learning to generalize by skill composition. Invited talk at the First Conference on Logic and Machine Learning in Natural Language (LaML), Gothenburg (Sweden).

M. Baroni. 2017. Simple tasks, grand challenges: defining an evaluation roadmap for general AI. Invited talk at the Third Research and Applied AI Summit, London (UK) (other versions presented at MAIN@NIPS, at the Facebook Faculty Summit, and as invited/keynote talk at: CICLing 2017, Budapest, Hungary, Language Technology Lab Seminar series, University of Cambridge, UK, TiCC Colloquium, Tilburg University, the Netherlands, Synthetic Language Learner seminar series, LSCP-ENS, Paris, From Computational Modelling to Behavior via Multimodal Corpus Data symposium at ICPS 2017, Vienna, Austria, Laboratoire Lattice seminar series, CNRS, Paris).

M. Baroni. 2017. Linguists on the verge of a nervous breakdown. Invited special event talk at CICLing 2017, Budapest (Hungary).

M. Baroni. 2017. Statistical learning from big data: Where it works, where it doesn't. Invited position statement at the Language and Big Data debate of ICPS 2017, Vienna (Austria).

2016

M. Baroni. 2016. End-to-end conversational agents: what's missing? Invited talk at the Let's Discuss Workshop of NIPS 2016, Barcelona (Spain).

M. Baroni. 2016. Will computers ever be able to chat with us? Invited evening lecture at ESSLLI 2016, Bolzano (Italy).

M. Baroni. 2016. Living a discrete life in a continuous world. Invited talk at the ESSLLI 2016 DSALT workshop, Bolzano (Italy).

M. Baroni. 2016. Composes: An executive summary. Talk at the Composes workshop, Bolzano (Italy).

M. Baroni. 2016. The role of simulation in training end-to-end conversational agents. Invited talk at the Deep Learning Workshop at ICML16, New York (USA).

2015

M. Baroni. 2015. Grounding distributional semantics in the visual world. Invited keynote talk at the EMNLP Workshop on Vison and Language (VL15), Lisbon (Portugal).

M. Baroni. 2015. Grounding word representations in the visual world. Invited presentation at the ERC Allegro Workshop, INRIA, Grenoble (France).

M. Baroni. 2015. What (multimodal) distributional semantic models learn during their childhood. Joint invited keynote talk at *SEM 2015 and SemEval-2015, Denver (Colorado, USA).

M. Baroni. 2015. The grumpy linguist's guide to distributed representations of sentence meaning. Invited keynote talk at the NAACL 2015 Workshop on Vector Space Modeling for NLP, Denver (Colorado, USA). Also presented at the Meaning in Context Symposium (CAS-LMU, Munich, Germany).

A. Gupta, G. Boleda, M. Baroni and Sebastian Pado. 2015. Mapping conceptual features to referential properties. Talk at the 3rd International ESSENCE Workshop: Algorithms for Processing Meaning, Barcelona (Spain).

M. Baroni. 2015. Computational systems that learn linguistic meaning composition from natural data. Invited presentation at the Bridging Neural Mechanisms and Cognition FENS Spring Brain Conference, Rungstedgaard, Copenhagen (Denmark).

M. Baroni. 2015. Vector-based semantics from joint linguistic and visual evidence. Invited presentation at the Interaction between Formal and Distributional Semantics Workshop, IRIT, Toulouse (France).

2014

A. Lazaridou, N. Pham and M. Baroni. 2014. Combining language and vision with a multimodal skip-gram model. Poster at the Learning Semantics Workshop of NIPS 2014, Montreal (Canada).

S. Ritter, C. Long, D. Paperno M. Baroni, M. Botvinick and A. Goldberg. 2014. Leveraging preposition ambiguity to assess representation of semantic interaction in CDSM. Poster at the Learning Semantics Workshop of NIPS 2014, Montreal (Canada).

M. Baroni. 2014. DEcompositional distributional semantics. Invited presentation in the CIS seminar series, Ludwig-Maximilians-Universität, Munich (Germany). Earlier versions presented in the GLIF seminar series (Universitat Pompeu Fabra, Barcelona, Spain) and at the Facebook AI Research Group (New York, USA).

M. Baroni. 2014. Multimodal and cross-modal distributional semantics: Towards a common semantic space for words and things. Invited keynote talk at Dialogue 2014, Bekasovo (Russia).

M. Baroni. 2014. Composition in distributional semantics. Invited presentation in the ABBYY Open Seminars series, ABBYY Headquarters, Moscow (Russia).

M. Marelli and M. Baroni. 2014. Dissecting semantic transparency effects in derived word processing: A new perspective from distributional semantics. Presentation at Mental Lexicon 2014, Niagara on the Lake (Canada).

M. Marelli and M. Baroni. 2014. A "mountain lake" is OK, but a "lake mountain" is not: A model for the generation of novel compounds based on distributional semantics. Poster at Mental Lexicon 2014, Niagara on the Lake (Canada).

M. Baroni. 2014. Learning meaning representations from naturally occurring text and images. Invited talk at the SLI Usage-based Theories and Approaches in Linguistics Conference, Bolzano (Italy).

M. Baroni. 2014. Your first encounter with a wampimuk: Cross-modal mapping of words and things. Invited presentation at GET 2014, the first Google Event in Trento, DISI, Trento (Italy).

M. Baroni. 2014. Linking vectors to the world: Multimodal and cross-modal distributional semantics. Invited presentation in the TLA e-Humanities in Action lecture series, Max Planck Institute for Psycholinguistics, Nijmegen (Holland).

M. Baroni. 2014. Bringing distributional semantics out in the (visual) world. Invited talk at the Department of Linguistics of the University of Potsdam (Germany).

2013

L. McNally, G. Boleda and M. Baroni. 2013. Conceptual vs. referential affordance in concept composition. Talk at the Workshop on Concept Composition & Experimental Semantics/Pragmatics (CC&ESP 2013), Utrecht (Netherlands).

M. Baroni. 2013. My love affair with the Web, and why it ended. Invited talk at BOTWU, Forlì (Italy).

2012

M. Baroni. 2012. Compositionality in (high-dimensional) space. Invited keynote talk at KONVENS 2012, Vienna (Austria).

M. Baroni. 2012. Compositional operations to represent phrases and sentences in distributional semantics. Invited keynote talk at LSD 2012 (Leuven Statistics Days), KU Leuven (Belgium).

M. Baroni. 2012. Distributional semantics with eyes: Enriching corpus-based models of word meaning with automatically extracted visual features. Invited talk at the Computational Linguistics Colloquium of the Department of Computational Linguistics and Phonetics of Saarland University, Saarbruecken (Germany).

M. Baroni. 2012. You can tell a word by the (visual) company it keeps: Extracting lexico-semantic information from text and images. Invited presentation at the Internet Lexicography workshop, EURAC, Bolzano (Italy).

M. Baroni. 2012. Making corpus-based semantics more human-like. Invited presentation in the Cognition, Perceptual and Brain Sciences seminar series, UCL, London (UK).

M. Baroni. 2012. Distributional semantics beyond and above words. Invited keynote talk at Linguistic Evidence 2012, Tübingen University (Germany).

M. Baroni. 2012. Distributional semantics in the phrasal and sentential domains: How and why. Invited keynote talk at CLIN 2012, Tilburg University (the Netherlands).

M. Baroni. 2012. You shall know a word by the (visual) company it keeps: Towards a multimodal distributional semantics. Invited presentation in the CLiPS Colloquia series, CLiPS research center, University of Antwerp (Belgium).

M. Marelli, G. Dinu, R. Zamparelli and M. Baroni. 2012. Semantic transparency and the distributional origin of constituent effects in compound processing. Poster at AMLaP 2012, Riva del Garda, Italy.

2011

E. Bruni and M. Baroni. 2011. Multimodal distributional semantics. Poster and short presentation at the Workshop on Integrating Language and Vision held at NIPS 2011.

M. Baroni, E. Bruni and G.B. Tran. 2011. Multimodal distributional semantics. Poster and short presentation at the V&L Net Workshop on Vision and Language (VL11).

A. Anderson, E. Bruni, B. Murphy, M. Baroni and M. Poesio. 2011. fMRI analyses of semantic structure using joint text and image models. Poster and short presentation at the V&L Net Workshop on Vision and Language (VL11).

M. Poesio, A. Anderson, M. Baroni, S. Eisenbeiss, C. Rennie and A. Lenci. 2011. BabyExp: From data collection to analysis. Poster and short presentation at the V&L Net Workshop on Vision and Language (VL11).

M. Baroni. 2011. Distributional semantics at CLIC. Invited presentation at the Information Technology for Cultural Heritage Workshop, FBK, Trento (Italy).

2010

M. Baroni. 2010. The PAISA' project. Invited presentation at the Human Language Technologies Workshop of AI*IA 2010.

M. Baroni. 2010. Web 2.0 as corpus: One decade of textual analysis with Web data. Invited keynote talk at JADT 2010, Sapienza University, Rome (Italy).

2009

M. Baroni, B. Murphy and M. Poesio. 2009. Using corpus-based semantic models to predict EEG activation patterns. Presentation at the DiSCo workshop of CogSci 2009.

M. Baroni. 2009. Of corpora and brains: Predicting EEG activation in response to conceptual stimuli using corpus-based semantic models. Presentation in the Natural Language Processing seminar series, NLP Research Group, UPC - Barcelona Tech.

2008

M. Baroni. 2008. Distributional Semantics: From Ad Hoc Solutions to Persistent Models. Invited keynote talk at IS-LTC 2008, Stefan Institute, Ljubljana, Slovenia.

M. Baroni, A. Lenci, B. Murphy and M. Poesio. 2008. Modelling semantic property acquisition from single linguistic exposures. Talk at PsychoCompLA-2008, Washington, USA.

M. Baroni, E. Guevara and V. Pirrelli. 2008. Psycho-computational modeling of NN compound interpretation. Talk at the CompoNet Congress on Compounding, Bologna, Italy.

M. Baroni. 2008. Extracting conceptual knowledge from text corpora. Invited talk at the ESP Doctoral School, Federico II University, Naples, Italy.

M. Baroni. 2008. Distributional semantics. Invited talk at the Beyond Short Units workshop of the "Sound to Sense" Marie Curie Research Training Network, Federico II University, Naples, Italy.

M. Baroni. 2008. From word co-occurrences to properties of concepts: Using corpora to simulate the human experience. Invited talk at the Corpus Tools in Teaching and Research Colloquium, Bressanone/Brixen, Italy.

2007

M. Baroni. 2007. Extracting structured semantic spaces from corpora. Invited talk at the National Institute for Japanese Language, Tokyo, Japan.

M. Baroni. 2007. Building large linguistic corpora by Web crawling. Invited lecture at the Dipartimento di Scienze Letterarie e Filologiche, University of Turin, Italy.

M. Baroni. 2007. Building very large corpora from the Web. IRST HLT Seminar series, Povo, Italy.

R. Zamparelli, E. Guevara and M. Baroni. 2007. Italian deverbal compounds: Words, phrases or either. 33esimo Incontro di Grammatica Generativa, University of Bologna, Bologna, Italy.

M. Baroni and E. Guevara. 2007. Categorie di composti NN in italiano: Induzione ed estensione. Invited lecture at the Seminario dei Dottorandi di Linguistica di Padova, University of Padua, Padua, Italy.

2006

M. Baroni. 2006. Empirical NLP in a cognitive perspective: The case of word meaning. Lecture in the Computational and Cognitive Neuroscience series, CIMeC, University of Trento, Rovereto, Italy.

S. Evert and M. Baroni. 2006. The zipfR library: Words and other rare events in R. useR! 2006: The second R user conference, Vienna, Austria.

M. Baroni, E. Guevara, V. Pirrelli and E. Zanchetta. 2006. Corpus evidence and compound structure: The case of Italian NN compounds. Quantitative Investigations in Theoretical Linguistics 2 (QITL-2), University of Osnabrück, Osnabrück, Germany.

A. Lüdeling, M. Baroni and S. Evert. 2006. Need and competition: deconstructing quantitative productivity. Quantitative Investigations in Theoretical Linguistics 2 (QITL-2), University of Osnabrück, Osnabrück, Germany.

A. Lüdeling and M. Baroni. 2006. Need and competition: Deconstructing quantitative productivity. Talk given at the MorBo Seminar, Dipartimento di Lingue, University of Bologna, Bologna, Italy.

A. Lüdeling, M. Baroni and S. Evert. 2006. Need and competition in word formation and where to find data to study them. Poster presented at the International Conference on Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives, Tübingen, Germany.

M. Baroni. 2006. Building large corpora from the Web. Language and Communication Technologies colloquia, University of Bolzano, Italy.

2005

M. Baroni. 2005. Spotting translated text: Humans vs. (support vectors) machines. Invited talk at the Cognitive Science Institute, University of Osnabrück, Germany.

L. Onnis, T. Farmer, M. Baroni, M. Spivey and M. Christiansen. 2005. Native speakers capitalize on semantic valence tendencies to boost fluent comprehension: Experimental and computational evidence. Poster presented at AMLaP 2005 (Architectures and Mechanisms for Language Processing), Ghent, Belgium.

M. Baroni. 2005. Large crawls of the Web for linguistic purposes. Presented at the Corpus Linguistics 2005 Web as Corpus Workshop, Birmingham, UK.

M. Baroni and S. Sharoff. 2005. Creating specialized and general corpora using automated search engine queries. Presented at the Corpus Linguistics 2005 Web as Corpus Workshop, Birmingham, UK.

M. Baroni. 2005. Misurare la produttività: Esperimenti e riflessioni. Talk given at the MorBo Seminar, Dipartimento di Lingue, University of Bologna, Bologna, Italy.

M. Baroni. 2005. Usare la rete come fonte di dati linguistici: Esperienze, problemi e prospettive. Talk given at the Laboratorio di Ontologia Applicata, CNR, Rome, Italy.

2004

M. Baroni. 2004. Using the web as a source of linguistic data: Experiences, problems and perspectives. Invited lecture at the Humboldt University, Berlin, Germany.

S. Bernardini and M. Baroni. 2004. Web-mining disposable corpora in the translation classroom. Teaching and Language Corpora (TaLC) 2004, Grandada, Spain.

2003

M. Baroni. 2003. Metodi non-supervisionati per la scoperta di morfemi e relazioni morfologiche. Slides. Invited lecture at the Istituto di Linguistica Computazionale, CNR, Pisa, Italy.

M. Baroni. 2003. Annotazione morfosintattica e lemmatizzazione. Talk Given at the Corpus Linguistics Seminar, SITLEC, University of Bologna at Forlì, Italy.

M. Baroni and S. Vegnaduzzo. 2003. Assessing morphological productivity via automated measures of semantic transparency. Slides. Presented at the Workshop on Explaining Productivity of the DGFS.

M. Baroni. 2003. Estrazione non supervisionata di informazioni morfologiche da corpora non annotati. Invited lecture at the Seminario dei Dottorandi di Linguistica di Padova, University of Padua, Padua, Italy.

2002 and earlier

M. Baroni, J. Matiasek and H. Trost. 2002. Using textual association measures and minimum edit distance to discover morphological relations. Slides (in PS format!) Presented at the International Workshop on Computational Approaches to Collocations, Vienna, Austria.

P. Keating, T. Cho, S. Mattys, L. Bernstein, B. Chaney, M. Baroni and A. Alwan. 2000. Articulation of word and sentence stress. Poster presented at Meeting of the Acoustical Society of America, Newport Beach CA, USA.

M. Baroni. 2000. Using distributional information to discover morphemes: An automated distribution-driven prefix learner. Presented at the 9th International Morphology Meeting, Vienna, Austria.

M. Baroni. 2000. Using distributional information to discover morphemes: A distribution-driven prefix learner. Presented at the Meeting of the Linguistic Society of America, Chicago IL, USA.

M. Baroni and L. Vanelli. 1997. Il contrasto di lunghezza vocalica in friulano. Presented at the 31st Italian Linguistic Society Conference (SLI), Padua, Italy. (Warning: There are problems with the IPA fonts in the PDF file -- I would say that the handout is still legible, though...)

M. Baroni. 1996. An acoustic study of Italian unstressed mid-vowels. Poster presented at the Meeting of the Acoustical Society of America, Honolulu, Hawaii. Poster text, experiment 1 plots, experiment 2 plots. (Note that the almost perfect overlap among categories that makes the plots in experiment 1 unreadable is a fact about the distribution of the relevant sounds, and not the result of a technical problem... On the other hand, there are real technical problems with the phonetic characters.)

M. Baroni. 1993. Teorie della sottospecificazione e restrizioni sulle code consonantiche in italiano. Presented at the Incontro di Grammatica Generativa, Trento, Italy.

Back to the index

Edited volumes

M. Baroni, S. Evert and A. Lenci (editors). 2008. Bridging the gap between semantic theory and computational simulations: Proceedings of the ESSLLI 2008 Workshop on Distributional Semantics, Hamburg: ESSLLI.

M. Baroni, A. Lenci, and M. Sahlgren (editors). 2007. Beyond words and documents: Proceedings of the Workshop on Contextual Information in Semantic Space Models at CONTEXT 07, Roskilde: Roskilde University.

M. Baroni and S. Bernardini (editors). 2006. Wacky! Working papers on the Web as Corpus, Bologna: Gedit.

A. Kilgarriff and M. Baroni (editors). 2006. Proceedings of the 2nd International Workshop on the Web as Corpus (EACL 2006 SIGWAC Workshop), East Stroudsburg PA: ACL.

Back to the index

Back to Marco's page