LREC 2010

MACAQ : A Multi Annotated Corpus to Study how we Adapt Answers to Various Questions

This paper presents a new corpus of human answers in natural language. The answers were collected in order to build a base of examples useful when generating natural language answers. We present the corpus and the approach we used for its acquisition. Answers correspond to questions with fixed linguistic form, focus, and topic. Answers to a given question exist for two modalities of interaction: oral and written. The whole corpus of answers was annotated both manually and automatically on different levels including for the most innovative: words from the questions being reused in the answer, the precise sentence part answering the question, which we define “answering-information”, completions. A detailed description of each annotation is presented. Two examples of corpus analyses are described. The first analysis shows some differences between oral and written modality especially in terms of length of the answers. The second analysis concerns the reuse of the question focus in the answers.

