Download | MMD dataset


The dataset can be downloaded in .zip format from the link below:
download link

Information About Different Files in the Dataset

  1. dataset This contains the two versions of the dataset as discussed in the paper. Each of the train, valid and test splits has a json file for each dialog session. Each json file is a list of utterances, each utterance being a dictionary of the following

  1. raw_catalogs It contains the raw catalog of fashion items crawled from 4 fashion sites. Each of the folders contain a list of json files, each json file being the catalog description of a single product. The catalog description usually comes as a list of attribute value pairs, where some of the values may be an image or text. Further the textual attributes can be short crisp phrases or more unstructured longer descriptions. Some of the top attributes (and some example values) are listed below.

  1. meta_data It contains several meta-data for e.g. taxonomy over the fashion items, handcrafted lexicon of more than 40 fashion attributes, celebrity profiles (anonymized) and style-tip