Logo du CNRS, de l EHESS et du Collège de France
LREC 2010

MACAQ : A Multi Annotated Corpus to Study how we Adapt Answers to Various Questions

This paper presents a new corpus of human answers in natural language. The answers were collected in order to build a base of examples useful when generating natural language answers. We present the corpus and the approach we used for its acquisition. Answers correspond to questions with fixed linguistic form, focus, and topic. Answers to a given question exist for two modalities of interaction: oral and written. The whole corpus of answers was annotated both manually and automatically on different levels including for the most innovative: words from the questions being reused in the answer, the precise sentence part answering the question, which we define “answering-information”, completions. A detailed description of each annotation is presented. Two examples of corpus analyses are described. The first analysis shows some differences between oral and written modality especially in terms of length of the answers. The second analysis concerns the reuse of the question focus in the answers.

author = {Anne Garcia-Fernandez, Sophie Rosset, Anne Vilnat},
title = {MACAQ : A Multi Annotated Corpus to study how we adapt Answers to various Questions},
booktitle = {Proceedings of LREC},
year = {2010},

Associated documents :
Garcia-Fernandez&al_LREC2010_slideshow - 165 Ko - Thu, Mar 17 2011
Garcia-Fernandez&al_LREC2010 (en) - 129 Ko - Mon, May 17 2010

Associated news:
Multi-Annotated Corpus of Answers to Question (MACAQ) - Wed, Mar 17 2010

mentions légales - création revaweb