Download
The dataset can be downloaded in .zip format from the link below:
download link
Information About Different Files in the Dataset
- dataset This contains the two versions of the dataset as discussed in the paper. Each of the train, valid and test splits has a json file for each dialog session. Each json file is a list of utterances, each utterance being a dictionary of the following
- speaker: speaker of the current utterance (User or System)
- utterance: which is itself a dictionary of the following; "nlg" (i.e. the true text utterance), "images" i.e. a list of true image responses, "false nlg" i.e. a false text utterance and "false images" i.e. a list of false image responses. Remaining items of the dictionary hold more structured details about the utterance and should not be used for building the models. They can only be used for evaluation (though our current evaluation strategy does not require any more additional field other than the above four, "nlg", "images", "false nlg" and "false images"
- question-type: when the speaker is "user", sometimes, there is an extra annotated field named "question-type". (The value of "question-type" can be either one of the following: "ask_attribute", "buy", "celebrity", "do_not_like_earlier_show_result", "do_not_like_n_show_result", "do_not_like_show_result", "filter_results", "go_with", "like_earlier_show_result", "like_n_show_result", "like_show_result", "show_orientation", "show_result", "show_similar_to", "sort_results", "suited_for"). Each of these correspond to the different states in the Table 1 of (dataset.html) and are used for state-wise evaluation of the dialogs.
- raw_catalogs It contains the raw catalog of fashion items crawled from 4 fashion sites. Each of the folders contain a list of json files, each json file being the catalog description of a single product. The catalog description usually comes as a list of attribute value pairs, where some of the values may be an image or text. Further the textual attributes can be short crisp phrases or more unstructured longer descriptions. Some of the top attributes (and some example values) are listed below.
- fashion-category (taxonomy): category of the fashion item (e.g. men > jacket > leather jacket)
- gender: gender suited for the fashion item (e.g. men, women, kids, all)
- product_name: name (or title of the product e.g. “Levis blue skinny fit jeans for women”)
- product_url: url of the product (the url might be currently obsolete now)
- image-main_url: url of the main image of the product (only <5% of cases have obsolete image urls)
- image-back_url: url of the image of the product taken from the back
- image-front_url: url of the image of the product taken from the front
- image-left_url: url of the image of the product taken from the left
- image-right_url: url of the image of the product taken from the right
- image-detail_url: url of a zoomed up image of the product
- model_worn: yes or no depending on whether the image is that of a model wearing the item or an isolated image of the item itself
- price: price of the item (in INR (Indian currency) or USD)
- currency: currency of the price
- material: material the item is made out of
- care: wash-care instructions for the item
- color: color of the item
- brand: brand of the item
- type: specific type of the fashion item (e.g. espadrille type shoes)
- style: style of the fashion item (e.g. funky, formal, casual)
- neck: neckline of the fashion item (e.g. round, v-neck)
- fit: fitting of the fashion item (e.g. skinny-fit, form-fit, one-size-fits-all)
- length: length of the fashion item (e.g. ankle-length, knee-length)
- sleeves: type of sleeves of the fashion item (e.g. full sleeves, bell sleeves, sleeveless)
- available_sizes: list of available sizes for the fashion item (e.g. Small, Medium, Large)
- details: details of other attributes of the fashion item (it is usually more unstructured, with longer descriptions)
- bestSellerRanking: bestSeller Ranking of the fashion item (e.g. Rank #1003 in the Category: footwear)
- reviewStars: Review rating received by the fashion item
- review: Actual reviews (anonymized) posted by users about the fashion item
- similar-items: list of urls to similar looking product
- meta_data It contains several meta-data for e.g. taxonomy over the fashion items, handcrafted lexicon of more than 40 fashion attributes, celebrity profiles (anonymized) and style-tip
- taxonomy: contains two files, taxonomy_men and taxonomy_women, each of them of the form (for e.g. in taxonomy_men). Each taxonomy entry is called a fashion synset or simply synset (e.g. turtleneck, sweater, quilter jacket etc. are synsets)
- man>apparel>layer_3_upper_body>sweater>turtleneck,turtle neck
- man>apparel>layer_3_upper_body>jacket>quilted jacket
- man>apparel>layer-2-lower-body>trouser>formal-trousers, dressed pants
- man>apparel>layer_2_lower_body>joggers,jogger
- attribute_lexicons: contains a txt files for each of the 47 fashion attributes. Each line in the txt files is of the format
(here frequency refers to the approximate frequency of the lexicon word in the dataset)</li> - celebrity profiles: this contains two files, one capturing the distribution of fashion preferences of each of the celebrity over all the fashion items, and the other enlisting the distribution of celebrities likely to show preference towards each of the fashion items
- style-tip: this contains three files, each for men's fashion and women's fashion, enlisting for every fashion item, the list of associated fashion items that are likely to go well with it
</ul>- goes_with_synset_per_synset: for every fashion synset (taxonomy entry), the list of other fashion items that can go well with the former (e.g. scarf goes well with blouson top)
- goes_with_synset_attribute_per_synset: for every fashion item, list the other fashion items having a specific attribute, that can go well with the former (e.g. polka dotted scarf goes well with blouson top)
- goes_with_synset_attribute_per_synset_attribute: for every fashion item having a specific attribute, list the other fashion items having a specific attribute, that can go well with the former (e.g. polka dotted scarf goes well with white blouson top)