mindcv.models

class mindcv.models.create_model(model_name, num_classes=1000, pretrained=False, in_channels=3, checkpoint_path='', use_ema=False, **kwargs)[源代码]

Creates model by name.

参数
  • model_name (str) – The name of model.

  • num_classes (int) – The number of classes. Default: 1000.

  • pretrained (bool) – Whether to load the pretrained model. Default: False.

  • in_channels (int) – The input channels. Default: 3.

  • checkpoint_path (str) – The path of checkpoint files. Default: “”.

  • use_ema (bool) – Whether use ema method. Default: False.

class mindcv.models.list_models(filter='', module='', pretrained=False, exclude_filters='')[源代码]
class mindcv.models.is_model(model_name)[源代码]

Check if a model name exists

class mindcv.models.model_entrypoint(model_name)[源代码]

Fetch a model entrypoint for specified model name

class mindcv.models.list_modules[源代码]

Return list of module names that contain models / model entrypoints

class mindcv.models.is_model_in_modules(model_name, module_names)[源代码]

Check if a model exists within a subset of modules :param model_name: :type model_name: str :param module_names: :type module_names: tuple, list, set

class mindcv.models.is_model_pretrained(model_name)[源代码]
class mindcv.models.BiTresnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)[源代码]

Get 50 layers ResNet model. Refer to the base class models.BiT for more details.

参数
  • pretrained (bool) –

  • num_classes (int) –

class mindcv.models.ConViT(in_channels=3, num_classes=1000, image_size=224, patch_size=16, embed_dim=48, num_heads=12, drop_rate=0.0, drop_path_rate=0.1, depth=12, mlp_ratio=4.0, qkv_bias=False, attn_drop_rate=0.0, local_up_to_layer=10, use_pos_embed=True, locality_strength=1.0)[源代码]

ConViT model class, based on ‘“Improving Vision Transformers with Soft Convolutional Inductive Biases” <https://arxiv.org/pdf/2103.10697.pdf>’

参数
  • in_channels (int) – number the channels of the input. Default: 3.

  • num_classes (int) – number of classification classes. Default: 1000.

  • image_size (int) – images input size. Default: 224.

  • patch_size (int) – image patch size. Default: 16.

  • embed_dim (int) – embedding dimension in all head. Default: 48.

  • num_heads (int) – number of heads. Default: 12.

  • drop_rate (float) – dropout rate. Default: 0.

  • drop_path_rate (float) – drop path rate. Default: 0.1.

  • depth (int) – model block depth. Default: 12.

  • mlp_ratio (float) – ratio of hidden features in Mlp. Default: 4.

  • qkv_bias (bool) – have bias in qkv layers or not. Default: False.

  • attn_drop_rate (float) – attention layers dropout rate. Default: 0.

  • locality_strength (float) – determines how focused each head is around its attention center. Default: 1.

  • local_up_to_layer (int) – number of GPSA layers. Default: 10.

  • use_pos_embed (bool) – whether use the embeded position. Default: True.

  • locality_strength(float) – the strength of locality. Default: 1.

class mindcv.models.ConvNeXt(in_channels, num_classes, depths, dims, drop_path_rate=0.0, layer_scale_init_value=1e-06, head_init_scale=1.0)[源代码]

ConvNeXt model class, based on ‘“A ConvNet for the 2020s” <https://arxiv.org/abs/2201.03545>’ :param in_channels: dim of the input channel. :type in_channels: int :param num_classes: dim of the classes predicted. :type num_classes: int :param depths: the depths of each layer. :type depths: List[int] :param dims: the middle dim of each layer. :type dims: List[int] :param drop_path_rate: the rate of droppath default : 0. :type drop_path_rate: float :param layer_scale_init_value: the parameter of init for the classifier default : 1e-6. :type layer_scale_init_value: float :param head_init_scale: the parameter of init for the head default : 1. :type head_init_scale: float

参数
  • in_channels (int) –

  • num_classes (int) –

  • depths (List[int]) –

  • dims (List[int]) –

  • drop_path_rate (float) –

  • layer_scale_init_value (float) –

  • head_init_scale (float) –

class mindcv.models.DenseNet(growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64, bn_size=4, drop_rate=0.0, in_channels=3, num_classes=1000)[源代码]

Densenet-BC model class, based on “Densely Connected Convolutional Networks”

参数
  • growth_rate (int) – how many filters to add each layer (k in paper). Default: 32.

  • block_config (Tuple[int, int, int, int]) – how many layers in each pooling block. Default: (6, 12, 24, 16).

  • num_init_features (int) – number of filters in the first Conv2d. Default: 64.

  • bn_size (int) – multiplicative factor for number of bottleneck layers (i.e. bn_size * k features in the bottleneck layer). Default: 4.

  • drop_rate (float) – dropout rate after each dense layer. Default: 0.

  • in_channels (int) – number of input channels. Default: 3.

  • num_classes (int) – number of classification classes. Default: 1000.

class mindcv.models.DPN(num_init_channel=64, k_r=96, g=32, k_sec=(3, 4, 20, 3), inc_sec=(16, 32, 24, 128), in_channels=3, num_classes=1000)[源代码]

DPN model class, based on “Dual Path Networks”

参数
  • num_init_channel (int) – int type, the output channel of first blocks. Default: 64.

  • k_r (int) – int type, the first channel of each stage. Default: 96.

  • g (int) – int type,number of group in the conv2d. Default: 32.

  • Tuple[int] (inc_sec) – multiplicative factor for number of bottleneck layers. Default: 4.

  • Tuple[int] – the first output channel in each stage. Default: (16, 32, 24, 128).

  • in_channels (int) – int type, number of input channels. Default: 3.

  • num_classes (int) – int type, number of classification classes. Default: 1000.

  • k_sec (Tuple[int, int, int, int]) –

  • inc_sec (Tuple[int, int, int, int]) –

class mindcv.models.EdgeNeXt(in_chans=3, num_classes=1000, depths=[3, 3, 9, 3], dims=[24, 48, 88, 168], global_block=[0, 0, 0, 3], global_block_type=['None', 'None', 'None', 'SDTA'], drop_path_rate=0.0, layer_scale_init_value=1e-06, head_init_scale=1.0, expan_ratio=4, kernel_sizes=[7, 7, 7, 7], heads=[8, 8, 8, 8], use_pos_embd_xca=[False, False, False, False], use_pos_embd_global=False, d2_scales=[2, 3, 4, 5], **kwargs)[源代码]

EdgeNeXt model class, based on “Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision”

参数
  • in_channels – number of input channels. Default: 3

  • num_classes – number of classification classes. Default: 1000

  • depths – the depths of each layer. Default: [0, 0, 0, 3]

  • dims – the middle dim of each layer. Default: [24, 48, 88, 168]

  • global_block – number of global block. Default: [0, 0, 0, 3]

  • global_block_type – type of global block. Default: [‘None’, ‘None’, ‘None’, ‘SDTA’]

  • drop_path_rate – Stochastic Depth. Default: 0.

  • layer_scale_init_value – value of layer scale initialization. Default: 1e-6

  • head_init_scale – scale of head initialization. Default: 1.

  • expan_ratio – ratio of expansion. Default: 4

  • kernel_sizes – kernel sizes of different stages. Default: [7, 7, 7, 7]

  • heads – number of attention heads. Default: [8, 8, 8, 8]

  • use_pos_embd_xca – use position embedding in xca or not. Default: [False, False, False, False]

  • use_pos_embd_global – use position embedding globally or not. Default: False

  • d2_scales – scales of splitting channels

class mindcv.models.EfficientNet(arch, dropout_rate, width_mult=1.0, depth_mult=1.0, in_channels=3, num_classes=1000, inverted_residual_setting=None, keep_prob=0.2, norm_layer=None)[源代码]

EfficientNet architecture. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.

参数
  • arch (str) – The name of the model.

  • dropout_rate (float) – The dropout rate of efficientnet.

  • width_mult (float) – The ratio of the channel. Default: 1.0.

  • depth_mult (float) – The ratio of num_layers. Default: 1.0.

  • in_channels (int) – The input channels. Default: 3.

  • num_classes (int) – The number of class. Default: 1000.

  • inverted_residual_setting (Sequence[Union[MBConvConfig, FusedMBConvConfig]], optional) – The settings of block. Default: None.

  • keep_prob (float) – The dropout rate of MBConv. Default: 0.2.

  • norm_layer (nn.Cell, optional) – The normalization layer. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, 1000)\).

class mindcv.models.GhostNet(cfgs, num_classes=1000, in_channels=3, width=1.0, dropout=0.2)[源代码]

GhostNet model class, based on “GhostNet: More Features from Cheap Operations “

参数
  • cfgs – the config of the GhostNet.

  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (int) – number of input channels. Default: 3.

  • width (float) – base width of hidden channel in blocks. Default: 1.0

  • droupout – the probability of the features before classification. Default: 0.2

  • dropout (float) –

class mindcv.models.GoogLeNet(num_classes=1000, aux_logits=False, in_channels=3, drop_rate=0.2, drop_rate_aux=0.7)[源代码]

GoogLeNet (Inception v1) model architecture from “Going Deeper with Convolutions”.

参数
  • num_classes (int) – number of classification classes. Default: 1000.

  • aux_logits (bool) – use auxiliary classifier or not. Default: False.

  • in_channels (int) – number the channels of the input. Default: 3.

  • drop_rate (float) – dropout rate of the layer before main classifier. Default: 0.2.

  • drop_rate_aux (float) – dropout rate of the layer before auxiliary classifier. Default: 0.7.

class mindcv.models.InceptionV3(num_classes=1000, aux_logits=True, in_channels=3, drop_rate=0.2)[源代码]

Inception v3 model architecture from “Rethinking the Inception Architecture for Computer Vision”.

备注

Important: In contrast to the other models the inception_v3 expects tensors with a size of N x 3 x 299 x 299, so ensure your images are sized accordingly.

参数
  • num_classes (int) – number of classification classes. Default: 1000.

  • aux_logits (bool) – use auxiliary classifier or not. Default: False.

  • in_channels (int) – number the channels of the input. Default: 3.

  • drop_rate (float) – dropout rate of the layer before main classifier. Default: 0.2.

class mindcv.models.InceptionV4(num_classes=1000, in_channels=3, drop_rate=0.2)[源代码]

Inception v4 model architecture from “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning”.

参数
  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (int) – number the channels of the input. Default: 3.

  • drop_rate (float) – dropout rate of the layer before main classifier. Default: 0.2.

class mindcv.models.Mnasnet(alpha, in_channels=3, num_classes=1000, drop_rate=0.2)[源代码]

MnasNet model architecture from “MnasNet: Platform-Aware Neural Architecture Search for Mobile”.

参数
  • alpha (float) – scale factor of model width.

  • in_channels (int) – number the channels of the input. Default: 3.

  • num_classes (int) – number of classification classes. Default: 1000.

  • drop_rate (float) – dropout rate of the layer before main classifier. Default: 0.2.

class mindcv.models.MobileNetV1(alpha=1.0, in_channels=3, num_classes=1000)[源代码]

MobileNetV1 model class, based on “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”

参数
  • alpha (float) – scale factor of model width. Default: 1.

  • in_channels (int) – number the channels of the input. Default: 3.

  • num_classes (int) – number of classification classes. Default: 1000.

class mindcv.models.MobileNetV2(alpha=1.0, round_nearest=8, in_channels=3, num_classes=1000)[源代码]

MobileNetV2 model class, based on “MobileNetV2: Inverted Residuals and Linear Bottlenecks”

参数
  • alpha (float) – scale factor of model width. Default: 1.

  • round_nearest (int) – divisor of make divisible function. Default: 8.

  • in_channels (int) – number the channels of the input. Default: 3.

  • num_classes (int) – number of classification classes. Default: 1000.

class mindcv.models.MobileNetV3(arch, alpha=1.0, round_nearest=8, in_channels=3, num_classes=1000)[源代码]

MobileNetV3 model class, based on “Searching for MobileNetV3”

参数
  • arch (str) – size of the architecture. ‘small’ or ‘large’.

  • alpha (float) – scale factor of model width. Default: 1.

  • round_nearest (int) – divisor of make divisible function. Default: 8.

  • in_channels (int) – number the channels of the input. Default: 3.

  • num_classes (int) – number of classification classes. Default: 1000.

class mindcv.models.NASNetAMobile(in_channels=3, num_classes=1000, stem_filters=32, penultimate_filters=1056, filters_multiplier=2)[源代码]

NasNet model class, based on “Learning Transferable Architectures for Scalable Image Recognition” :param num_classes: number of classification classes. :param stem_filters: number of stem filters. Default: 32. :param penultimate_filters: number of penultimate filters. Default: 1056. :param filters_multiplier: size of filters multiplier. Default: 2.

参数
  • in_channels (int) –

  • num_classes (int) –

  • stem_filters (int) –

  • penultimate_filters (int) –

  • filters_multiplier (int) –

class mindcv.models.Pnasnet(in_channels=3, num_classes=1000)[源代码]

PNasNet model class, based on “Progressive Neural Architecture Search” :param number of input channels. Default:

参数
  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (int) –

class mindcv.models.PoolFormer(layers, embed_dims=(64, 128, 320, 512), mlp_ratios=(4, 4, 4, 4), downsamples=(True, True, True, True), pool_size=3, in_chans=3, num_classes=1000, global_pool='avg', norm_layer=<class 'mindspore.nn.layer.normalization.GroupNorm'>, act_layer=<class 'mindspore.nn.layer.activation.GELU'>, in_patch_size=7, in_stride=4, in_pad=2, down_patch_size=3, down_stride=2, down_pad=1, drop_rate=0.0, drop_path_rate=0.0, layer_scale_init_value=1e-05, fork_feat=False)[源代码]

PoolFormer model class, based on “MetaFormer Is Actually What You Need for Vision”

参数
  • layers – number of blocks for the 4 stages

  • embed_dims – the embedding dims for the 4 stages. Default: (64, 128, 320, 512)

  • mlp_ratios – mlp ratios for the 4 stages. Default: (4, 4, 4, 4)

  • downsamples – flags to apply downsampling or not. Default: (True, True, True, True)

  • pool_size – the pooling size for the 4 stages. Default: 3

  • in_chans – number of input channels. Default: 3

  • num_classes – number of classes for the image classification. Default: 1000

  • global_pool – define the types of pooling layer. Default: avg

  • norm_layer – define the types of normalization. Default: nn.GroupNorm

  • act_layer – define the types of activation. Default: nn.GELU

  • in_patch_size – specify the patch embedding for the input image. Default: 7

  • in_stride – specify the stride for the input image. Default: 4.

  • in_pad – specify the pad for the input image. Default: 2.

  • down_patch_size – specify the downsample. Default: 3.

  • down_stride – specify the downsample (patch embed.). Default: 2.

  • down_pad – specify the downsample (patch embed.). Default: 1.

  • drop_rate – dropout rate of the layer before main classifier. Default: 0.

  • drop_path_rate – Stochastic Depth. Default: 0.

  • layer_scale_init_value – LayerScale. Default: 1e-5.

  • fork_feat – whether output features of the 4 stages, for dense prediction. Default: False.

class mindcv.models.PyramidVisionTransformer(img_size=224, patch_size=4, in_chans=3, num_classes=1000, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=<class 'mindspore.nn.layer.normalization.LayerNorm'>, depths=[2, 2, 2, 2], sr_ratios=[8, 4, 2, 1], num_stages=4)[源代码]

Pyramid Vision Transformer model class, based on “Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions”

参数
  • img_size (int) – size of a input image.

  • patch_size (int) – size of a single image patch.

  • in_chans (int) – number the channels of the input. Default: 3.

  • num_classes (int) – number of classification classes. Default: 1000.

  • embed_dims (list) – how many hidden dim in each PatchEmbed.

  • num_heads (list) – number of attention head in each stage.

  • mlp_ratios (list) – ratios of MLP hidden dims in each stage.

  • qkv_bias (bool) – use bias in attention.

  • qk_scale (float) – Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.

  • drop_rate (float) – The drop rate for each block. Default: 0.0.

  • attn_drop_rate (float) – The drop rate for attention. Default: 0.0.

  • drop_path_rate (float) – The drop rate for drop path. Default: 0.0.

  • norm_layer (nn.Cell) – Norm layer that will be used in blocks. Default: nn.LayerNorm.

  • depths (list) – number of Blocks.

  • sr_ratios (list) – stride and kernel size of each attention.

  • num_stages (int) – number of stage. Default: 4.

class mindcv.models.PyramidVisionTransformerV2(img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dims=[64, 128, 256, 512], num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], qkv_bias=False, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=<class 'mindspore.nn.layer.normalization.LayerNorm'>, depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], num_stages=4, linear=False)[源代码]

Pyramid Vision Transformer V2 model class, based on “PVTv2: Improved Baselines with Pyramid Vision Transformer”

参数
  • img_size (int) – size of a input image.

  • patch_size (int) – size of a single image patch.

  • in_chans (int) – number the channels of the input. Default: 3.

  • num_classes (int) – number of classification classes. Default: 1000.

  • embed_dims (list) – how many hidden dim in each PatchEmbed.

  • num_heads (list) – number of attention head in each stage.

  • mlp_ratios (list) – ratios of MLP hidden dims in each stage.

  • qkv_bias (bool) – use bias in attention.

  • qk_scale (float) – Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.

  • drop_rate (float) – The drop rate for each block. Default: 0.0.

  • attn_drop_rate (float) – The drop rate for attention. Default: 0.0.

  • drop_path_rate (float) – The drop rate for drop path. Default: 0.0.

  • norm_layer (nn.Cell) – Norm layer that will be used in blocks. Default: nn.LayerNorm.

  • depths (list) – number of Blocks.

  • sr_ratios (list) – stride and kernel size of each attention.

  • num_stages (int) – number of stage. Default: 4.

  • linear (bool) – use linear SRA.

class mindcv.models.RepMLPNet(in_channels=3, num_class=1000, patch_size=(4, 4), num_blocks=(2, 2, 6, 2), channels=(192, 384, 768, 1536), hs=(64, 32, 16, 8), ws=(64, 32, 16, 8), sharesets_nums=(4, 8, 16, 32), reparam_conv_k=(3,), globalperceptron_reduce=4, use_checkpoint=False, deploy=False)[源代码]

RepMLPNet model class, based on “RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality”

参数
  • in_channels – number of input channels. Default: 3.

  • num_classes – number of classification classes. Default: 1000.

  • patch_size – size of a single image patch. Default: (4, 4)

  • num_blocks – number of blocks per stage. Default: (2,2,6,2)

  • channels – number of in_channels(channels[stage_idx]) and out_channels(channels[stage_idx + 1]) per stage. Default: (192,384,768,1536)

  • hs – height of picture per stage. Default: (64,32,16,8)

  • ws – width of picture per stage. Default: (64,32,16,8)

  • sharesets_nums – number of share sets per stage. Default: (4,8,16,32)

  • reparam_conv_k – convolution kernel size in local Perceptron. Default: (3,)

  • globalperceptron_reduce – Intermediate convolution output size(in_channal = inchannal, out_channel = in_channel/globalperceptron_reduce) in globalperceptron. Default: 4

  • use_checkpoint – whether to use checkpoint

  • deploy – whether to use bias

class mindcv.models.RepVGG(num_blocks, num_classes=1000, in_channels=3, width_multiplier=None, override_group_map=None, deploy=False, use_se=False)[源代码]

RepVGG model class, based on “RepVGGBlock: An all-MLP Architecture for Vision”

参数
  • num_blocks (list) – number of RepVGGBlocks

  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (in_channels) – number the channels of the input. Default: 3.

  • width_multiplier (list) – the numbers of MLP Architecture.

  • override_group_map (dict) – the numbers of MLP Architecture.

  • deploy (bool) – use rbr_reparam block or not. Default: False

  • use_se (bool) – use se_block or not. Default: False

class mindcv.models.Res2Net(block, layer_nums, version='res2net', num_classes=1000, in_channels=3, groups=1, base_width=26, scale=4, norm=None)[源代码]

Res2Net model class, based on “Res2Net: A New Multi-scale Backbone Architecture”

参数
  • block (Type[Cell]) – block of resnet.

  • layer_nums (List[int]) – number of layers of each stage.

  • version (str) – variety of Res2Net, ‘res2net’ or ‘res2net_v1b’. Default: ‘res2net’.

  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (int) – number the channels of the input. Default: 3.

  • groups (int) – number of groups for group conv in blocks. Default: 1.

  • base_width (int) – base width of pre group hidden channel in blocks. Default: 26.

  • scale – scale factor of Bottle2neck. Default: 4.

  • norm (Optional[Cell]) – normalization layer in blocks. Default: None.

class mindcv.models.ResNet(block, layers, num_classes=1000, in_channels=3, groups=1, base_width=64, norm=None)[源代码]

ResNet model class, based on “Deep Residual Learning for Image Recognition”

参数
  • block (Type[Union[BasicBlock, Bottleneck]]) – block of resnet.

  • layers (List[int]) – number of layers of each stage.

  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (int) – number the channels of the input. Default: 3.

  • groups (int) – number of groups for group conv in blocks. Default: 1.

  • base_width (int) – base width of pre group hidden channel in blocks. Default: 64.

  • norm (Optional[Cell]) – normalization layer in blocks. Default: None.

class mindcv.models.ShuffleNetV1(num_classes=1000, in_channels=3, model_size='2.0x', group=3)[源代码]

ShuffleNetV1 model class, based on “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices”

参数
  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (int) – number of input channels. Default: 3.

  • model_size (str) – scale factor which controls the number of channels. Default: ‘2.0x’.

  • group (int) – number of group for group convolution. Default: 3.

class mindcv.models.ShuffleNetV2(num_classes=1000, in_channels=3, model_size='1.5x')[源代码]

ShuffleNetV2 model class, based on “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design”

参数
  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (int) – number of input channels. Default: 3.

  • model_size (str) – scale factor which controls the number of channels. Default: ‘1.5x’.

class mindcv.models.SKNet(block, layers, num_classes=1000, in_channels=3, groups=1, base_width=64, norm=None, sk_kwargs=None)[源代码]

SKNet model class, based on “Selective Kernel Networks”

参数
  • block (Type[Cell]) – block of sknet.

  • layers (List[int]) – number of layers of each stage.

  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (int) – number the channels of the input. Default: 3.

  • groups (int) – number of groups for group conv in blocks. Default: 1.

  • base_width (int) – base width of pre group hidden channel in blocks. Default: 64.

  • norm (Optional[Cell]) – normalization layer in blocks. Default: None.

  • sk_kwargs (Optional[Dict]) – kwargs of selective kernel. Default: None.

class mindcv.models.SqueezeNet(version='1_0', num_classes=1000, drop_rate=0.5, in_channels=3)[源代码]

SqueezeNet model class, based on “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size”

备注

Important: In contrast to the other models the inception_v3 expects tensors with a size of N x 3 x 227 x 227, so ensure your images are sized accordingly.

参数
  • version (str) – version of the architecture, ‘1_0’ or ‘1_1’. Default: ‘1_0’.

  • num_classes (int) – number of classification classes. Default: 1000.

  • drop_rate (float) – dropout rate of the classifier. Default: 0.5.

  • in_channels (int) – number the channels of the input. Default: 3.

class mindcv.models.SwinTransformer(image_size=224, patch_size=4, in_chans=3, num_classes=1000, embed_dim=96, depths=None, num_heads=None, window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, norm_layer=<class 'mindspore.nn.layer.normalization.LayerNorm'>, ape=False, patch_norm=True)[源代码]

SwinTransformer model class, based on “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”

参数
  • image_size (int | tuple(int)) – Input image size. Default 224

  • patch_size (int | tuple(int)) – Patch size. Default: 4

  • in_chans (int) – Number of input image channels. Default: 3

  • num_classes (int) – Number of classes for classification head. Default: 1000

  • embed_dim (int) – Patch embedding dimension. Default: 96

  • depths (tuple(int)) – Depth of each Swin Transformer layer.

  • num_heads (tuple(int)) – Number of attention heads in different layers.

  • window_size (int) – Window size. Default: 7

  • mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim. Default: 4

  • qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (float) – Override default qk scale of head_dim ** -0.5 if set. Default: None

  • drop_rate (float) – Dropout rate. Default: 0

  • attn_drop_rate (float) – Attention dropout rate. Default: 0

  • drop_path_rate (float) – Stochastic depth rate. Default: 0.1

  • norm_layer (nn.Cell) – Normalization layer. Default: nn.LayerNorm.

  • ape (bool) – If True, add absolute position embedding to the patch embedding. Default: False

  • patch_norm (bool) – If True, add normalization after patch embedding. Default: True

class mindcv.models.VGG(model_name, batch_norm=False, num_classes=1000, in_channels=3, drop_rate=0.5)[源代码]

VGGNet model class, based on “Very Deep Convolutional Networks for Large-Scale Image Recognition”

参数
  • model_name (str) – name of the architecture. ‘vgg11’, ‘vgg13’, ‘vgg16’ or ‘vgg19’.

  • batch_norm (bool) – use batch normalization or not. Default: False.

  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (int) – number the channels of the input. Default: 3.

  • drop_rate (float) – dropout rate of the classifier. Default: 0.5.

class mindcv.models.ViT(image_size=224, input_channels=3, patch_size=16, embed_dim=768, num_layers=12, num_heads=12, mlp_dim=3072, keep_prob=1.0, attention_keep_prob=1.0, drop_path_keep_prob=1.0, activation=<class 'mindspore.nn.layer.activation.GELU'>, norm=<class 'mindspore.nn.layer.normalization.LayerNorm'>, pool='cls')[源代码]

Vision Transformer architecture implementation.

参数
  • image_size (int) – Input image size. Default: 224.

  • input_channels (int) – The number of input channel. Default: 3.

  • patch_size (int) – Patch size of image. Default: 16.

  • embed_dim (int) – The dimension of embedding. Default: 768.

  • num_layers (int) – The depth of transformer. Default: 12.

  • num_heads (int) – The number of attention heads. Default: 12.

  • mlp_dim (int) – The dimension of MLP hidden layer. Default: 3072.

  • keep_prob (float) – The keep rate, greater than 0 and less equal than 1. Default: 1.0.

  • attention_keep_prob (float) – The keep rate for attention layer. Default: 1.0.

  • drop_path_keep_prob (float) – The keep rate for drop path. Default: 1.0.

  • activation (nn.Cell) – Activation function which will be stacked on top of the normalization layer (if not None), otherwise on top of the conv layer. Default: nn.GELU.

  • norm (nn.Cell, optional) – Norm layer that will be stacked on top of the convolution layer. Default: nn.LayerNorm.

  • pool (str) – The method of pooling. Default: ‘cls’.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, 768)\)

抛出

ValueError – If split is not ‘train’, “test or ‘infer’.

参数
  • image_size (int) –

  • input_channels (int) –

  • patch_size (int) –

  • embed_dim (int) –

  • num_layers (int) –

  • num_heads (int) –

  • mlp_dim (int) –

  • keep_prob (float) –

  • attention_keep_prob (float) –

  • drop_path_keep_prob (float) –

  • activation (Cell) –

  • norm (Optional[Cell]) –

  • pool (str) –

Supported Platforms:

GPU

示例

>>> net = ViT()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 768)

About ViT:

Vision Transformer (ViT) shows that a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Citation:

@article{2020An,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, A. and Beyer, L. and Kolesnikov, A. and Weissenborn, D. and Houlsby, N.},
year={2020},
}
class mindcv.models.Xception(num_classes=1000, in_channels=3)[源代码]

Xception model architecture from “Deep Learning with Depthwise Separable Convolutions”.

参数
  • num_classes (int) – number of classification classes. Default: 1000.

  • in_channels (int) – number the channels of the input. Default: 3.