mindcv.models¶

class mindcv.models.create_model(model_name, num_classes=1000, pretrained=False, in_channels=3, checkpoint_path='', use_ema=False, **kwargs)[源代码]¶

Creates model by name.

参数

model_name (str) – The name of model.
num_classes (int) – The number of classes. Default: 1000.
pretrained (bool) – Whether to load the pretrained model. Default: False.
in_channels (int) – The input channels. Default: 3.
checkpoint_path (str) – The path of checkpoint files. Default: “”.
use_ema (bool) – Whether use ema method. Default: False.

class mindcv.models.list_models(filter='', module='', pretrained=False, exclude_filters='')[源代码]¶

class mindcv.models.is_model(model_name)[源代码]¶: Check if a model name exists

class mindcv.models.model_entrypoint(model_name)[源代码]¶: Fetch a model entrypoint for specified model name

class mindcv.models.list_modules[源代码]¶: Return list of module names that contain models / model entrypoints

class mindcv.models.is_model_in_modules(model_name, module_names)[源代码]¶: Check if a model exists within a subset of modules :param model_name: :type model_name: str :param module_names: :type module_names: tuple, list, set

class mindcv.models.is_model_pretrained(model_name)[源代码]¶

class mindcv.models.BiTresnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)[源代码]¶

Get 50 layers ResNet model. Refer to the base class models.BiT for more details.

参数

pretrained (bool) –
num_classes (int) –

class mindcv.models.ConViT(in_channels=3, num_classes=1000, image_size=224, patch_size=16, embed_dim=48, num_heads=12, drop_rate=0.0, drop_path_rate=0.1, depth=12, mlp_ratio=4.0, qkv_bias=False, attn_drop_rate=0.0, local_up_to_layer=10, use_pos_embed=True, locality_strength=1.0)[源代码]¶

ConViT model class, based on ‘“Improving Vision Transformers with Soft Convolutional Inductive Biases” <https://arxiv.org/pdf/2103.10697.pdf>’

参数

in_channels (int) – number the channels of the input. Default: 3.
num_classes (int) – number of classification classes. Default: 1000.
image_size (int) – images input size. Default: 224.
patch_size (int) – image patch size. Default: 16.
embed_dim (int) – embedding dimension in all head. Default: 48.
num_heads (int) – number of heads. Default: 12.
drop_rate (float) – dropout rate. Default: 0.
drop_path_rate (float) – drop path rate. Default: 0.1.
depth (int) – model block depth. Default: 12.
mlp_ratio (float) – ratio of hidden features in Mlp. Default: 4.
qkv_bias (bool) – have bias in qkv layers or not. Default: False.
attn_drop_rate (float) – attention layers dropout rate. Default: 0.
locality_strength (float) – determines how focused each head is around its attention center. Default: 1.
local_up_to_layer (int) – number of GPSA layers. Default: 10.
use_pos_embed (bool) – whether use the embeded position. Default: True.
locality_strength（float） – the strength of locality. Default: 1.

class mindcv.models.ConvNeXt(in_channels, num_classes, depths, dims, drop_path_rate=0.0, layer_scale_init_value=1e-06, head_init_scale=1.0)[源代码]¶

ConvNeXt model class, based on ‘“A ConvNet for the 2020s” <https://arxiv.org/abs/2201.03545>’ :param in_channels: dim of the input channel. :type in_channels: int :param num_classes: dim of the classes predicted. :type num_classes: int :param depths: the depths of each layer. :type depths: List[int] :param dims: the middle dim of each layer. :type dims: List[int] :param drop_path_rate: the rate of droppath default : 0. :type drop_path_rate: float :param layer_scale_init_value: the parameter of init for the classifier default : 1e-6. :type layer_scale_init_value: float :param head_init_scale: the parameter of init for the head default : 1. :type head_init_scale: float

参数

in_channels (int) –
num_classes (int) –
depths (List[int]) –
dims (List[int]) –
drop_path_rate (float) –
layer_scale_init_value (float) –
head_init_scale (float) –

class mindcv.models.DenseNet(growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64, bn_size=4, drop_rate=0.0, in_channels=3, num_classes=1000)[源代码]¶

Densenet-BC model class, based on “Densely Connected Convolutional Networks”

参数

growth_rate (int) – how many filters to add each layer (k in paper). Default: 32.
block_config (Tuple[int, int, int, int]) – how many layers in each pooling block. Default: (6, 12, 24, 16).
num_init_features (int) – number of filters in the first Conv2d. Default: 64.
bn_size (int) – multiplicative factor for number of bottleneck layers (i.e. bn_size * k features in the bottleneck layer). Default: 4.
drop_rate (float) – dropout rate after each dense layer. Default: 0.
in_channels (int) – number of input channels. Default: 3.
num_classes (int) – number of classification classes. Default: 1000.

class mindcv.models.DPN(num_init_channel=64, k_r=96, g=32, k_sec=(3, 4, 20, 3), inc_sec=(16, 32, 24, 128), in_channels=3, num_classes=1000)[源代码]¶

DPN model class, based on “Dual Path Networks”

参数

num_init_channel (int) – int type, the output channel of first blocks. Default: 64.
k_r (int) – int type, the first channel of each stage. Default: 96.
g (int) – int type,number of group in the conv2d. Default: 32.
Tuple[int] (inc_sec) – multiplicative factor for number of bottleneck layers. Default: 4.
Tuple[int] – the first output channel in each stage. Default: (16, 32, 24, 128).
in_channels (int) – int type, number of input channels. Default: 3.
num_classes (int) – int type, number of classification classes. Default: 1000.
k_sec (Tuple[int, int, int, int]) –
inc_sec (Tuple[int, int, int, int]) –

class mindcv.models.EdgeNeXt(in_chans=3, num_classes=1000, depths=[3, 3, 9, 3], dims=[24, 48, 88, 168], global_block=[0, 0, 0, 3], global_block_type=['None', 'None', 'None', 'SDTA'], drop_path_rate=0.0, layer_scale_init_value=1e-06, head_init_scale=1.0, expan_ratio=4, kernel_sizes=[7, 7, 7, 7], heads=[8, 8, 8, 8], use_pos_embd_xca=[False, False, False, False], use_pos_embd_global=False, d2_scales=[2, 3, 4, 5], **kwargs)[源代码]¶

EdgeNeXt model class, based on “Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision”

参数

in_channels – number of input channels. Default: 3
num_classes – number of classification classes. Default: 1000
depths – the depths of each layer. Default: [0, 0, 0, 3]
dims – the middle dim of each layer. Default: [24, 48, 88, 168]
global_block – number of global block. Default: [0, 0, 0, 3]
global_block_type – type of global block. Default: [‘None’, ‘None’, ‘None’, ‘SDTA’]
drop_path_rate – Stochastic Depth. Default: 0.
layer_scale_init_value – value of layer scale initialization. Default: 1e-6
head_init_scale – scale of head initialization. Default: 1.
expan_ratio – ratio of expansion. Default: 4
kernel_sizes – kernel sizes of different stages. Default: [7, 7, 7, 7]
heads – number of attention heads. Default: [8, 8, 8, 8]
use_pos_embd_xca – use position embedding in xca or not. Default: [False, False, False, False]
use_pos_embd_global – use position embedding globally or not. Default: False
d2_scales – scales of splitting channels

class mindcv.models.EfficientNet(arch, dropout_rate, width_mult=1.0, depth_mult=1.0, in_channels=3, num_classes=1000, inverted_residual_setting=None, keep_prob=0.2, norm_layer=None)[源代码]¶

EfficientNet architecture. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.

参数

arch (str) – The name of the model.
dropout_rate (float) – The dropout rate of efficientnet.
width_mult (float) – The ratio of the channel. Default: 1.0.
depth_mult (float) – The ratio of num_layers. Default: 1.0.
in_channels (int) – The input channels. Default: 3.
num_classes (int) – The number of class. Default: 1000.
inverted_residual_setting (Sequence[Union[MBConvConfig, FusedMBConvConfig]], optional) – The settings of block. Default: None.
keep_prob (float) – The dropout rate of MBConv. Default: 0.2.
norm_layer (nn.Cell, optional) – The normalization layer. Default: None.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, 1000)\).

class mindcv.models.GhostNet(cfgs, num_classes=1000, in_channels=3, width=1.0, dropout=0.2)[源代码]¶

GhostNet model class, based on “GhostNet: More Features from Cheap Operations “

参数

cfgs – the config of the GhostNet.
num_classes (int) – number of classification classes. Default: 1000.
in_channels (int) – number of input channels. Default: 3.
width (float) – base width of hidden channel in blocks. Default: 1.0
droupout – the probability of the features before classification. Default: 0.2
dropout (float) –

class mindcv.models.GoogLeNet(num_classes=1000, aux_logits=False, in_channels=3, drop_rate=0.2, drop_rate_aux=0.7)[源代码]¶

GoogLeNet (Inception v1) model architecture from “Going Deeper with Convolutions”.

参数

num_classes (int) – number of classification classes. Default: 1000.
aux_logits (bool) – use auxiliary classifier or not. Default: False.
in_channels (int) – number the channels of the input. Default: 3.
drop_rate (float) – dropout rate of the layer before main classifier. Default: 0.2.
drop_rate_aux (float) – dropout rate of the layer before auxiliary classifier. Default: 0.7.

class mindcv.models.InceptionV3(num_classes=1000, aux_logits=True, in_channels=3, drop_rate=0.2)[源代码]¶

Inception v3 model architecture from “Rethinking the Inception Architecture for Computer Vision”.

备注

Important: In contrast to the other models the inception_v3 expects tensors with a size of N x 3 x 299 x 299, so ensure your images are sized accordingly.

参数

num_classes (int) – number of classification classes. Default: 1000.
aux_logits (bool) – use auxiliary classifier or not. Default: False.
in_channels (int) – number the channels of the input. Default: 3.
drop_rate (float) – dropout rate of the layer before main classifier. Default: 0.2.

class mindcv.models.InceptionV4(num_classes=1000, in_channels=3, drop_rate=0.2)[源代码]¶

Inception v4 model architecture from “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning”.

参数

num_classes (int) – number of classification classes. Default: 1000.
in_channels (int) – number the channels of the input. Default: 3.
drop_rate (float) – dropout rate of the layer before main classifier. Default: 0.2.

class mindcv.models.Mnasnet(alpha, in_channels=3, num_classes=1000, drop_rate=0.2)[源代码]¶

MnasNet model architecture from “MnasNet: Platform-Aware Neural Architecture Search for Mobile”.

参数

alpha (float) – scale factor of model width.
in_channels (int) – number the channels of the input. Default: 3.
num_classes (int) – number of classification classes. Default: 1000.
drop_rate (float) – dropout rate of the layer before main classifier. Default: 0.2.

class mindcv.models.MobileNetV1(alpha=1.0, in_channels=3, num_classes=1000)[源代码]¶

MobileNetV1 model class, based on “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”

参数

alpha (float) – scale factor of model width. Default: 1.
in_channels (int) – number the channels of the input. Default: 3.
num_classes (int) – number of classification classes. Default: 1000.

class mindcv.models.MobileNetV2(alpha=1.0, round_nearest=8, in_channels=3, num_classes=1000)[源代码]¶

MobileNetV2 model class, based on “MobileNetV2: Inverted Residuals and Linear Bottlenecks”

参数

alpha (float) – scale factor of model width. Default: 1.
round_nearest (int) – divisor of make divisible function. Default: 8.
in_channels (int) – number the channels of the input. Default: 3.
num_classes (int) – number of classification classes. Default: 1000.

class mindcv.models.MobileNetV3(arch, alpha=1.0, round_nearest=8, in_channels=3, num_classes=1000)[源代码]¶

MobileNetV3 model class, based on “Searching for MobileNetV3”

参数

arch (str) – size of the architecture. ‘small’ or ‘large’.
alpha (float) – scale factor of model width. Default: 1.
round_nearest (int) – divisor of make divisible function. Default: 8.
in_channels (int) – number the channels of the input. Default: 3.
num_classes (int) – number of classification classes. Default: 1000.

class mindcv.models.NASNetAMobile(in_channels=3, num_classes=1000, stem_filters=32, penultimate_filters=1056, filters_multiplier=2)[源代码]¶

NasNet model class, based on “Learning Transferable Architectures for Scalable Image Recognition” :param num_classes: number of classification classes. :param stem_filters: number of stem filters. Default: 32. :param penultimate_filters: number of penultimate filters. Default: 1056. :param filters_multiplier: size of filters multiplier. Default: 2.

参数

in_channels (int) –
num_classes (int) –
stem_filters (int) –
penultimate_filters (int) –
filters_multiplier (int) –

class mindcv.models.Pnasnet(in_channels=3, num_classes=1000)[源代码]¶

PNasNet model class, based on “Progressive Neural Architecture Search” :param number of input channels. Default:

参数

num_classes (int) – number of classification classes. Default: 1000.
in_channels (int) –

class mindcv.models.PoolFormer(layers, embed_dims=(64, 128, 320, 512), mlp_ratios=(4, 4, 4, 4), downsamples=(True, True, True, True), pool_size=3, in_chans=3, num_classes=1000, global_pool='avg', norm_layer=<class 'mindspore.nn.layer.normalization.GroupNorm'>, act_layer=<class 'mindspore.nn.layer.activation.GELU'>, in_patch_size=7, in_stride=4, in_pad=2, down_patch_size=3, down_stride=2, down_pad=1, drop_rate=0.0, drop_path_rate=0.0, layer_scale_init_value=1e-05, fork_feat=False)[源代码]¶

PoolFormer model class, based on “MetaFormer Is Actually What You Need for Vision”

参数

layers – number of blocks for the 4 stages
embed_dims – the embedding dims for the 4 stages. Default: (64, 128, 320, 512)
mlp_ratios – mlp ratios for the 4 stages. Default: (4, 4, 4, 4)
downsamples – flags to apply downsampling or not. Default: (True, True, True, True)
pool_size – the pooling size for the 4 stages. Default: 3
in_chans – number of input channels. Default: 3
num_classes – number of classes for the image classification. Default: 1000
global_pool – define the types of pooling layer. Default: avg
norm_layer – define the types of normalization. Default: nn.GroupNorm
act_layer – define the types of activation. Default: nn.GELU
in_patch_size – specify the patch embedding for the input image. Default: 7
in_stride – specify the stride for the input image. Default: 4.
in_pad – specify the pad for the input image. Default: 2.
down_patch_size – specify the downsample. Default: 3.
down_stride – specify the downsample (patch embed.). Default: 2.
down_pad – specify the downsample (patch embed.). Default: 1.
drop_rate – dropout rate of the layer before main classifier. Default: 0.
drop_path_rate – Stochastic Depth. Default: 0.
layer_scale_init_value – LayerScale. Default: 1e-5.
fork_feat – whether output features of the 4 stages, for dense prediction. Default: False.

class mindcv.models.PyramidVisionTransformer(img_size=224, patch_size=4, in_chans=3, num_classes=1000, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=<class 'mindspore.nn.layer.normalization.LayerNorm'>, depths=[2, 2, 2, 2], sr_ratios=[8, 4, 2, 1], num_stages=4)[源代码]¶

Pyramid Vision Transformer model class, based on “Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions”

参数

img_size (int) – size of a input image.
patch_size (int) – size of a single image patch.
in_chans (int) – number the channels of the input. Default: 3.
num_classes (int) – number of classification classes. Default: 1000.
embed_dims (list) – how many hidden dim in each PatchEmbed.
num_heads (list) – number of attention head in each stage.
mlp_ratios (list) – ratios of MLP hidden dims in each stage.
qkv_bias (bool) – use bias in attention.
qk_scale (float) – Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.
drop_rate (float) – The drop rate for each block. Default: 0.0.
attn_drop_rate (float) – The drop rate for attention. Default: 0.0.
drop_path_rate (float) – The drop rate for drop path. Default: 0.0.
norm_layer (nn.Cell) – Norm layer that will be used in blocks. Default: nn.LayerNorm.
depths (list) – number of Blocks.
sr_ratios (list) – stride and kernel size of each attention.
num_stages (int) – number of stage. Default: 4.

class mindcv.models.PyramidVisionTransformerV2(img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dims=[64, 128, 256, 512], num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], qkv_bias=False, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=<class 'mindspore.nn.layer.normalization.LayerNorm'>, depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], num_stages=4, linear=False)[源代码]¶

Pyramid Vision Transformer V2 model class, based on “PVTv2: Improved Baselines with Pyramid Vision Transformer”

参数

img_size (int) – size of a input image.
patch_size (int) – size of a single image patch.
in_chans (int) – number the channels of the input. Default: 3.
num_classes (int) – number of classification classes. Default: 1000.
embed_dims (list) – how many hidden dim in each PatchEmbed.
num_heads (list) – number of attention head in each stage.
mlp_ratios (list) – ratios of MLP hidden dims in each stage.
qkv_bias (bool) – use bias in attention.
qk_scale (float) – Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.
drop_rate (float) – The drop rate for each block. Default: 0.0.
attn_drop_rate (float) – The drop rate for attention. Default: 0.0.
drop_path_rate (float) – The drop rate for drop path. Default: 0.0.
norm_layer (nn.Cell) – Norm layer that will be used in blocks. Default: nn.LayerNorm.
depths (list) – number of Blocks.
sr_ratios (list) – stride and kernel size of each attention.
num_stages (int) – number of stage. Default: 4.
linear (bool) – use linear SRA.

class mindcv.models.RepMLPNet(in_channels=3, num_class=1000, patch_size=(4, 4), num_blocks=(2, 2, 6, 2), channels=(192, 384, 768, 1536), hs=(64, 32, 16, 8), ws=(64, 32, 16, 8), sharesets_nums=(4, 8, 16, 32), reparam_conv_k=(3,), globalperceptron_reduce=4, use_checkpoint=False, deploy=False)[源代码]¶

RepMLPNet model class, based on “RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality”

参数

in_channels – number of input channels. Default: 3.
num_classes – number of classification classes. Default: 1000.
patch_size – size of a single image patch. Default: (4, 4)
num_blocks – number of blocks per stage. Default: (2,2,6,2)
channels – number of in_channels(channels[stage_idx]) and out_channels(channels[stage_idx + 1]) per stage. Default: (192,384,768,1536)
hs – height of picture per stage. Default: (64,32,16,8)
ws – width of picture per stage. Default: (64,32,16,8)
sharesets_nums – number of share sets per stage. Default: (4,8,16,32)
reparam_conv_k – convolution kernel size in local Perceptron. Default: (3,)
globalperceptron_reduce – Intermediate convolution output size(in_channal = inchannal, out_channel = in_channel/globalperceptron_reduce) in globalperceptron. Default: 4
use_checkpoint – whether to use checkpoint
deploy – whether to use bias

class mindcv.models.RepVGG(num_blocks, num_classes=1000, in_channels=3, width_multiplier=None, override_group_map=None, deploy=False, use_se=False)[源代码]¶

RepVGG model class, based on “RepVGGBlock: An all-MLP Architecture for Vision”

参数

num_blocks (list) – number of RepVGGBlocks
num_classes (int) – number of classification classes. Default: 1000.
in_channels (in_channels) – number the channels of the input. Default: 3.
width_multiplier (list) – the numbers of MLP Architecture.
override_group_map (dict) – the numbers of MLP Architecture.
deploy (bool) – use rbr_reparam block or not. Default: False
use_se (bool) – use se_block or not. Default: False

class mindcv.models.Res2Net(block, layer_nums, version='res2net', num_classes=1000, in_channels=3, groups=1, base_width=26, scale=4, norm=None)[源代码]¶

Res2Net model class, based on “Res2Net: A New Multi-scale Backbone Architecture”

参数

block (Type[Cell]) – block of resnet.
layer_nums (List[int]) – number of layers of each stage.
version (str) – variety of Res2Net, ‘res2net’ or ‘res2net_v1b’. Default: ‘res2net’.
num_classes (int) – number of classification classes. Default: 1000.
in_channels (int) – number the channels of the input. Default: 3.
groups (int) – number of groups for group conv in blocks. Default: 1.
base_width (int) – base width of pre group hidden channel in blocks. Default: 26.
scale – scale factor of Bottle2neck. Default: 4.
norm (Optional[Cell]) – normalization layer in blocks. Default: None.

class mindcv.models.ResNet(block, layers, num_classes=1000, in_channels=3, groups=1, base_width=64, norm=None)[源代码]¶

ResNet model class, based on “Deep Residual Learning for Image Recognition”

参数

block (Type[Union[BasicBlock, Bottleneck]]) – block of resnet.
layers (List[int]) – number of layers of each stage.
num_classes (int) – number of classification classes. Default: 1000.
in_channels (int) – number the channels of the input. Default: 3.
groups (int) – number of groups for group conv in blocks. Default: 1.
base_width (int) – base width of pre group hidden channel in blocks. Default: 64.
norm (Optional[Cell]) – normalization layer in blocks. Default: None.

class mindcv.models.ShuffleNetV1(num_classes=1000, in_channels=3, model_size='2.0x', group=3)[源代码]¶

ShuffleNetV1 model class, based on “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices”

参数

num_classes (int) – number of classification classes. Default: 1000.
in_channels (int) – number of input channels. Default: 3.
model_size (str) – scale factor which controls the number of channels. Default: ‘2.0x’.
group (int) – number of group for group convolution. Default: 3.

class mindcv.models.ShuffleNetV2(num_classes=1000, in_channels=3, model_size='1.5x')[源代码]¶

ShuffleNetV2 model class, based on “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design”

参数

num_classes (int) – number of classification classes. Default: 1000.
in_channels (int) – number of input channels. Default: 3.
model_size (str) – scale factor which controls the number of channels. Default: ‘1.5x’.

class mindcv.models.SKNet(block, layers, num_classes=1000, in_channels=3, groups=1, base_width=64, norm=None, sk_kwargs=None)[源代码]¶

SKNet model class, based on “Selective Kernel Networks”

参数

block (Type[Cell]) – block of sknet.
layers (List[int]) – number of layers of each stage.
num_classes (int) – number of classification classes. Default: 1000.
in_channels (int) – number the channels of the input. Default: 3.
groups (int) – number of groups for group conv in blocks. Default: 1.
base_width (int) – base width of pre group hidden channel in blocks. Default: 64.
norm (Optional[Cell]) – normalization layer in blocks. Default: None.
sk_kwargs (Optional[Dict]) – kwargs of selective kernel. Default: None.

class mindcv.models.SqueezeNet(version='1_0', num_classes=1000, drop_rate=0.5, in_channels=3)[源代码]¶

SqueezeNet model class, based on “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size”

备注

Important: In contrast to the other models the inception_v3 expects tensors with a size of N x 3 x 227 x 227, so ensure your images are sized accordingly.

参数

version (str) – version of the architecture, ‘1_0’ or ‘1_1’. Default: ‘1_0’.
num_classes (int) – number of classification classes. Default: 1000.
drop_rate (float) – dropout rate of the classifier. Default: 0.5.
in_channels (int) – number the channels of the input. Default: 3.

class mindcv.models.SwinTransformer(image_size=224, patch_size=4, in_chans=3, num_classes=1000, embed_dim=96, depths=None, num_heads=None, window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, norm_layer=<class 'mindspore.nn.layer.normalization.LayerNorm'>, ape=False, patch_norm=True)[源代码]¶

SwinTransformer model class, based on “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”

参数

image_size (int | tuple(int)) – Input image size. Default 224
patch_size (int | tuple(int)) – Patch size. Default: 4
in_chans (int) – Number of input image channels. Default: 3
num_classes (int) – Number of classes for classification head. Default: 1000
embed_dim (int) – Patch embedding dimension. Default: 96
depths (tuple(int)) – Depth of each Swin Transformer layer.
num_heads (tuple(int)) – Number of attention heads in different layers.
window_size (int) – Window size. Default: 7
mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim. Default: 4
qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (float) – Override default qk scale of head_dim ** -0.5 if set. Default: None
drop_rate (float) – Dropout rate. Default: 0
attn_drop_rate (float) – Attention dropout rate. Default: 0
drop_path_rate (float) – Stochastic depth rate. Default: 0.1
norm_layer (nn.Cell) – Normalization layer. Default: nn.LayerNorm.
ape (bool) – If True, add absolute position embedding to the patch embedding. Default: False
patch_norm (bool) – If True, add normalization after patch embedding. Default: True

class mindcv.models.VGG(model_name, batch_norm=False, num_classes=1000, in_channels=3, drop_rate=0.5)[源代码]¶

VGGNet model class, based on “Very Deep Convolutional Networks for Large-Scale Image Recognition”

参数

model_name (str) – name of the architecture. ‘vgg11’, ‘vgg13’, ‘vgg16’ or ‘vgg19’.
batch_norm (bool) – use batch normalization or not. Default: False.
num_classes (int) – number of classification classes. Default: 1000.
in_channels (int) – number the channels of the input. Default: 3.
drop_rate (float) – dropout rate of the classifier. Default: 0.5.

class mindcv.models.ViT(image_size=224, input_channels=3, patch_size=16, embed_dim=768, num_layers=12, num_heads=12, mlp_dim=3072, keep_prob=1.0, attention_keep_prob=1.0, drop_path_keep_prob=1.0, activation=<class 'mindspore.nn.layer.activation.GELU'>, norm=<class 'mindspore.nn.layer.normalization.LayerNorm'>, pool='cls')[源代码]¶

Vision Transformer architecture implementation.

参数

image_size (int) – Input image size. Default: 224.
input_channels (int) – The number of input channel. Default: 3.
patch_size (int) – Patch size of image. Default: 16.
embed_dim (int) – The dimension of embedding. Default: 768.
num_layers (int) – The depth of transformer. Default: 12.
num_heads (int) – The number of attention heads. Default: 12.
mlp_dim (int) – The dimension of MLP hidden layer. Default: 3072.
keep_prob (float) – The keep rate, greater than 0 and less equal than 1. Default: 1.0.
attention_keep_prob (float) – The keep rate for attention layer. Default: 1.0.
drop_path_keep_prob (float) – The keep rate for drop path. Default: 1.0.
activation (nn.Cell) – Activation function which will be stacked on top of the normalization layer (if not None), otherwise on top of the conv layer. Default: nn.GELU.
norm (nn.Cell, optional) – Norm layer that will be stacked on top of the convolution layer. Default: nn.LayerNorm.
pool (str) – The method of pooling. Default: ‘cls’.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, 768)\)

抛出

ValueError – If split is not ‘train’, “test or ‘infer’.

参数

image_size (int) –
input_channels (int) –
patch_size (int) –
embed_dim (int) –
num_layers (int) –
num_heads (int) –
mlp_dim (int) –
keep_prob (float) –
attention_keep_prob (float) –
drop_path_keep_prob (float) –
activation (Cell) –
norm (Optional[Cell]) –
pool (str) –

Supported Platforms:: GPU

示例

>>> net = ViT()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 768)

About ViT:

Vision Transformer (ViT) shows that a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Citation:

@article{2020An,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, A. and Beyer, L. and Kolesnikov, A. and Weissenborn, D. and Houlsby, N.},
year={2020},
}

class mindcv.models.Xception(num_classes=1000, in_channels=3)[源代码]¶

Xception model architecture from “Deep Learning with Depthwise Separable Convolutions”.

参数

num_classes (int) – number of classification classes. Default: 1000.
in_channels (int) – number the channels of the input. Default: 3.