Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph | CSQA dataset

Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph

Abstract
While conversing with chatbots, humans typically tend to ask many questions, a significant portion of which can be answered by referring to large-scale knowledge graphs (KG). While Question Answering (QA) and dialog systems have been studied independently, there is a need to study them closely to evaluate such real-world scenarios faced by bots involving both these tasks. Towards this end, we introduce the task of Complex Sequential QA which combines the two tasks of (i) answering factual questions through complex inferencing over a realistic-sized KG of millions of entities, and (ii) learning to converse through a series of coherently linked QA pairs. Through a labor intensive semi-automatic process, involving in-house and crowdsourced workers, we created a dataset containing around 200K dialogs with a total of 1.6M turns. Further, unlike existing large scale QA datasets which contain simple questions that can be answered from a single tuple, the questions in our dialogs require a larger subgraph of the KG. Specifically, our dataset has questions which require logical, quantitative, and comparative reasoning as well as their combinations. This calls for models which can: (i) parse complex natural language questions, (ii) use conversation context to resolve coreferences and ellipsis in utterances, (iii) ask for clarifications for ambiguous queries, and finally (iv) retrieve relevant subgraphs of the KG to answer such questions. However, our experiments with a combination of state of the art dialog and QA models show that they clearly do not achieve the above objectives and are inadequate for dealing with such complex real world settings. We believe that this new dataset coupled with the limitations of existing models as reported in this paper should encourage further research in Complex Sequential QA.

CODE

Github Repository of code repo link



PAPER

Please download the paper here paper link



AAAI 2018 SLIDES

Please download the slides here slides link



BIBTEX

@article{1801.10314,
Author = {Amrita Saha and Vardaan Pahuja and Mitesh M. Khapra and Karthik Sankaranarayanan and Sarath Chandar},
Title = {Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph},
Year = {2018},
Eprint = {arXiv:1801.10314},
}



LICENSE

This dataset is released under Creative-Commons license



DATASET

Please click here to download the dataset CSQA.
NEW: We have revised the dialogs after incorporating some more feedback from users. (DATED March 29, 2018).

NEW: Some slight renaming of JSON fields done in the dialog zip. (DATED March 15, 2018).

NEW: We have revised the dialog and wikidata jsons after incorporating feedback from several users. All users are requested to re-download the entire data inclusive of wikidata and dialog JSONs. (DATED March 6, 2018).

Please click here to download the dataset CQA.
This contains the subset of the QA pairs from the CSQA dataset, where the questions are answerable without needing the previous dialog context (Hence named Complex Question Answering i.e. CQA)

Please click here to download the dataset CQA_12K.
This is same as the above dataset, except its a smaller version, containing only 10K QA pairs for training and 1K for development and test set each. Each of the three splits are respectively subsets of the original train, development, test splits of the CQA dataset.


Two-Fold Challenges of this Dataset