mindcv.data¶

Data processing

class mindcv.data.Cifar100Download(root)[源代码]¶

基类：DownLoad

Utility class for downloading Cifar100 dataset.

参数: root (str) – The root path where the downloaded dataset is placed.

base_dir = 'cifar-100-binary'¶

download()[源代码]¶: Download the Cifar100 dataset if it doesn’t exist.

resources = ['train.bin', 'test.bin', 'fine_label_names.txt', 'coarse_label_names.txt']¶

url = ('http://www.cs.toronto.edu/~kriz/cifar-100-binary.tar.gz', '03b5dce01913d631647c71ecec9e9cb8')¶

class mindcv.data.Cifar10Download(root)[源代码]¶

基类：DownLoad

Utility class for downloading Cifar10 dataset.

参数: root (str) – The root path where the downloaded dataset is placed.

base_dir = 'cifar-10-batches-bin'¶

download()[源代码]¶: Download the Cifar10 dataset if it doesn’t exist.

resources = ['data_batch_1.bin', 'data_batch_2.bin', 'data_batch_3.bin', 'data_batch_4.bin', 'data_batch_5.bin', 'test_batch.bin', 'batches.meta.txt']¶

url = ('http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz', 'c32a1d4ab5d03f1284b67883e8d87530')¶

class mindcv.data.MnistDownload(root)[源代码]¶

基类：DownLoad

Utility class for downloading Mnist dataset.

参数: root (str) – The root path where the downloaded dataset is placed.

download()[源代码]¶: Download the MNIST dataset if it doesn’t exist.

resources = [('train-images-idx3-ubyte.gz', 'f68b3c2dcbeaaa9fbdd348bbdeb94873'), ('train-labels-idx1-ubyte.gz', 'd53e105ee54ea40749a09fcbcd1e9432'), ('t10k-images-idx3-ubyte.gz', '9fb629c4189551a2d022fa330f9573f3'), ('t10k-labels-idx1-ubyte.gz', 'ec29112dd5afa0611ce80d1b7f02629c')]¶

url_path = 'http://yann.lecun.com/exdb/mnist/'¶

mindcv.data.create_dataset(name='', root='./', split='train', shuffle=True, num_samples=None, num_shards=None, shard_id=None, num_parallel_workers=None, download=False, num_aug_repeats=0, **kwargs)[源代码]¶

Creates dataset by name.

参数

name (str) – dataset name like MNIST, CIFAR10, ImageNeT, ‘’. ‘’ means a customized dataset. Default: ‘’.
root (str) – dataset root dir. Default: ‘./’.
split (str) – data split: ‘’ or split name string (train/val/test), if it is ‘’, no split is used. Otherwise, it is a subfolder of root dir, e.g., train, val, test. Default: ‘train’.
shuffle (bool) – whether to shuffle the dataset. Default: True.
num_samples (Optional[bool]) – Number of elements to sample (default=None, which means sample all elements).
num_shards (Optional[int]) – Number of shards that the dataset will be divided into (default=None). When this argument is specified, num_samples reflects the maximum sample number of per shard.
shard_id (Optional[int]) – The shard ID within num_shards (default=None). This argument can only be specified when num_shards is also specified.
num_parallel_workers (Optional[int]) – Number of workers to read the data (default=None, set in the config).
download (bool) – whether to download the dataset. Default: False
num_aug_repeats (int) – Number of dataset repeatition for repeated augmentation. If 0 or 1, repeated augmentation is diabled. Otherwise, repeated augmentation is enabled and the common choice is 3. (Default: 0)

备注

For custom datasets and imagenet, the dataset dir should follow the structure like: .dataset_name/ ├── split1/ │ ├── class1/ │ │ ├── 000001.jpg │ │ ├── 000002.jpg │ │ └── …. │ └── class2/ │ ├── 000001.jpg │ ├── 000002.jpg │ └── …. └── split2/

├── class1/ │ ├── 000001.jpg │ ├── 000002.jpg │ └── …. └── class2/

├── 000001.jpg ├── 000002.jpg └── ….

返回

Dataset object

参数

name (str) –
root (str) –
split (str) –
shuffle (bool) –
num_samples (Optional[bool]) –
num_shards (Optional[int]) –
shard_id (Optional[int]) –
num_parallel_workers (Optional[int]) –
download (bool) –
num_aug_repeats (int) –

mindcv.data.create_loader(dataset, batch_size, drop_remainder=False, is_training=False, mixup=0.0, cutmix=0.0, cutmix_prob=0.0, num_classes=1000, transform=None, target_transform=None, num_parallel_workers=None, python_multiprocessing=False)[源代码]¶

Creates dataloader.

Applies operations such as transform and batch to the ms.dataset.Dataset object created by the create_dataset function to get the dataloader.

参数

dataset (ms.dataset.Dataset) – dataset object created by create_dataset.
batch_size (int or function) – The number of rows each batch is created with. An int or callable object which takes exactly 1 parameter, BatchInfo.
drop_remainder (bool, optional) – Determines whether to drop the last block whose data row number is less than batch size (default=False). If True, and if there are less than batch_size rows available to make the last batch, then those rows will be dropped and not propagated to the child node.
is_training (bool) – whether it is in train mode. Default: False.
mixup (float) – mixup alpha, mixup will be enbled if > 0. (default=0.0).
cutmix (float) – cutmix alpha, cutmix will be enabled if > 0. (default=0.0). This operation is experimental.
cutmix_prob (float) – prob of doing cutmix for an image (default=0.0)
num_classes (int) – the number of classes. Default: 1000.
transform (list or None) – the list of transformations that wil be applied on the image, which is obtained by create_transform. If None, the default imagenet transformation for evaluation will be applied. Default: None.
target_transform (list or None) – the list of transformations that will be applied on the label. If None, the label will be converted to the type of ms.int32. Default: None.
num_parallel_workers (int, optional) – Number of workers(threads) to process the dataset in parallel (default=None).
python_multiprocessing (bool, optional) – Parallelize Python operations with multiple worker processes. This option could be beneficial if the Python operation is computational heavy (default=False).

备注

cutmix is now experimental (which means performance gain is not guarantee) and can not be used together with mixup due to the label int type conflict.
is_training, mixup, num_classes is used for MixUp, which is a kind of transform operation.

However, we are not able to merge it into transform, due to the limitations of the mindspore.dataset API.

返回: BatchDataset, dataset batched.

mindcv.data.create_transforms(dataset_name='', image_resize=224, is_training=False, **kwargs)[源代码]¶

Creates a list of transform operation on image data.

参数

dataset_name (str) – if ‘’, customized dataset. Currently, apply the same transform pipeline as ImageNet. if standard dataset name is given including imagenet, cifar10, mnist, preset transforms will be returned. Default: ‘’.
image_resize (int) – the image size after resize for adapting to network. Default: 224.
is_training (bool) – if True, augmentation will be applied if support. Default: False.
**kwargs – additional args parsed to transforms_imagenet_train and transforms_imagenet_eval

返回

A list of transformation operations