Dataset Details
Question Nomenclature
Question type | Question sub-type | Question sub-sub-type |
---|---|---|
ques_type_id=1 Simple Question (subject-based) |
||
ques_type_id=2 Secondary question |
sec_ques_type=1 Subject based question |
sec_ques_sub_type=1 Direct (Singular) sec_ques_sub_type=2 Indirect (Singular) sec_ques_sub_type=3 Indirect (Plural) sec_ques_sub_type=4 Direct (Plural) |
sec_ques_type=2 Object based question |
||
ques_type_id=3 Clarification (for secondary) question |
||
ques_type_id=4 Set-based question |
set_op_choice=1 OR set_op_choice=2 AND set_op_choice=3 Difference |
is_inc=1 Incomplete version of set-based ques. |
ques_type_id=5 Boolean (Factual Verification) question |
bool_ques_type = 1 Verification | 2 entities, both direct bool_ques_type = 2 Verification | 2 entities, one direct and one indirect, subject is indirect bool_ques_type = 3 Verification | 2 entities, one direct and one indirect, object is indirect bool_ques_type = 4 Verification | 3 entities, all direct, 2 are query entities bool_ques_type = 5 Verification | 3 entities, 2 direct, 2(direct) are query entities, subject is indirect bool_ques_type = 6 Verification | one entity, multiple entities (as object) referred indirectly |
|
ques_type_id=6 Incomplete question (for secondary) |
inc_ques_type=1 Incomplete | object parent is changed, subject and predicate remain same inc_ques_type=2 Only subject is changed, parent and predicate remains same inc_ques_type=3 Incomplete count-based ques |
|
ques_type_id=7 Comparative and Quantitative questions (involving single entity) |
count_ques_sub_type=1 Quantitative (count) single entity count_ques_sub_type=2 Quantitative (min/max) single entity count_ques_sub_type=3 Quantitative (atleast/atmost) single entity (which) count_ques_sub_type=5 Quantitative (atleast/atmost) single entity (count) count_ques_sub_type=7 Quantitative Indirect (count) single entity |
is_incomplete=1 Incomplete form (of the category of question) |
count_ques_sub_type=4 Comparative (more/less) single entity (count) count_ques_sub_type=6 Comparative(more/less) single entity (which) count_ques_sub_type=8 Comparative Indirect (more/less) single entity (count) count_ques_sub_type=9 Comparative Indirect (more/less) single entity (which) |
||
ques_type_id=8 Comparative and Quantitative questions (involving multiple(2) entities) |
count_ques_sub_type=1 Quantitative with Logical Operators count_ques_sub_type=2 Quantitative (count) multiple entity count_ques_sub_type=3 Quantitative (min/max) multiple entity count_ques_sub_type=4 Quantitative (atleast/atmost) multiple entity (which) count_ques_sub_type=6 Quantitative (atleast/atmost) multiple entity (count) count_ques_sub_type=8 Quantitative Indirect (count) multiple entity |
is_incomplete=1 Incomplete form (of the category of question) |
count_ques_sub_type=5 Comparative (more/less) multiple entity (count) count_ques_sub_type=7 Comparative(more/less) multiple entity (which) count_ques_sub_type=9 Comparative Indirect (more/less) single entity(count) count_ques_sub_type=10 Comparative Indirect (more/less) multiple entity (which) |
Dataset statistics (Number of QA pairs for each question type)
Question Type | Train | Valid | Test |
---|---|---|---|
Simple|Direct | 465184 | 52189 | 81994 |
Simple|Indirect | 293692 | 32877 | 54854 |
Simple|Incomplete | 58627 | 6658 | 10045 |
Comparative|Count over More/Less|Mult. entity type|Direct | 36658 | 3791 | 7711 |
Comparative|Count over More/Less|Mult. entity type|Indirect | 7783 | 808 | 1177 |
Comparative|Count over More/Less|Mult. entity type|Incomplete | 15137 | 1564 | 3249 |
Comparative|Count over More/Less|Single entity type|Direct | 47682 | 4738 | 5224 |
Comparative|Count over More/Less|Single entity type|Indirect | 9100 | 932 | 922 |
Comparative|Count over More/Less|Single entity type|Incomplete | 19324 | 1929 | 1972 |
Comparative|More/Less|Mult. entity type|Direct | 36538 | 3711 | 7655 |
Comparative|More/Less|Mult. entity type|Indirect | 6797 | 645 | 1184 |
Comparative|More/Less|Mult. entity type|Incomplete | 15086 | 1546 | 3209 |
Comparative|More/Less|Single entity type|Direct | 47149 | 4725 | 5520 |
Comparative|More/Less|Single entity type|Indirect | 7087 | 736 | 925 |
Comparative|More/Less|Single entity type|Incomplete | 19107 | 1910 | 2064 |
Logical|Union|Direct | 70694 | 7345 | 14418 |
Logical|Intersection|Direct | 31205 | 3278 | 5708 |
Logical|Difference|Direct | 3726 | 373 | 661 |
Logical|Incomplete | 6372 | 765 | 1679 |
Quantitative|Atleast/ Atmost/ Approx. the same/Equal|Mult. entity type|Direct | 21110 | 2161 | 3910 |
Quantitative|Atleast/ Atmost/ Approx. the same/Equal|Single entity type|Direct | 27613 | 2790 | 2306 |
Quantitative|Count over Atleast/ Atmost/ Approx. the same/Equal|Mult. entity type|Direct | 21257 | 2272 | 3850 |
Quantitative|Count over Atleast/ Atmost/ Approx. the same/Equal|Single entity type|Direct | 27507 | 2801 | 2288 |
Quantitative|Count|Logical operators|Direct | 21734 | 2089 | 3753 |
Quantitative|Count|Logical operators|Indirect | 10802 | 991 | 2035 |
Quantitative|Count|Mult. entity type|Direct | 24561 | 2472 | 4329 |
Quantitative|Count|Single entity type|Direct | 51584 | 5125 | 4477 |
Quantitative|Count|Single entity type|Indirect | 15995 | 1519 | 2547 |
Quantitative|Count|Single entity type|Incomplete | 20050 | 1990 | - |
Verification|Single/Multiple Entity|Direct | 47505 | 5376 | 10150 |
Verification|Single/Multiple Entity|Indirect | 83325 | 9218 | 16578 |
Clarification (All) | 77835 | 8164 | 12121 |
Indirect (All) | 407784 | 45216 | 75640 |
Incomplete (All) | 172957 | 18341 | 23220 |
Logical|Multiple Relations|Direct | 49970 | 5164 | 9598 |
Quantitative|Min/Max|Single entity type | 29409 | 2942 | 342 |
Quantitative|Min/Max|Mult. entity type | 21098 | 2133 | 2695 |
Overall Dataset Statistics
Dataset Statistics | Train | Valid | Test |
---|---|---|---|
Total No. of Dialogs(chat sessions) | 152391 | 16813 | 27797 |
Avg. No. of Utterances per dialog | 15.9 | 15.65 | 19.44 |
Total No. of Utterances having Question/Answer | 1.2M | .13M | .27M |
Length of user’s question (in words) | 9.7 | 9.68 | 10.28 |
Length of system’s response (in words) | 4.74 | 4.67 | 4.37 |
Avg. No. of Dialog states per dialog | 3.89 | 3.84 | 4.53 |
Vocab size (freq>=10) | 0.1M | - | - |