deeplite-torch-zoo

The deeplite-torch-zoo package is a collection of popular CNN model architectures and benchmark datasets for PyTorch framework. The models are grouped under different datasets and different task types such as classification, object detection, and segmentation. The primary aim of this deeplite-torch-zoo is to booststrap applications by starting with the most suitable pretrained models. In addition, the pretrained models from deeplite-torch-zoo can be used as a good starting point for optimizing model architectures using our Deeplite Neutrino™.

Installation

1. Install using pip

Use following command to install the package from our internal PyPI repository.

$ pip install --upgrade pip
$ pip install deeplite-torch-zoo

2. Install from source

$ git clone https://github.com/Deeplite/deeplite-torch-zoo.git
$ pip install .

3. Install in Dev mode

$ git clone https://github.com/Deeplite/deeplite-torch-zoo.git
$ pip install -e .
$ pip install -r requirements-test.txt

To test the installation, one can run the basic tests using pytest command in the root folder.

Minimal Dependencies

  • torch>=1.4,<=1.8.1

  • opencv-python

  • scipy>=1.4.1

  • numpy==1.19.5

  • pycocotools==2.0.4

  • Cython==0.29.30

  • tqdm==4.46.0

  • albumentations

  • pretrainedmodels==0.7.4

  • torchfcn==1.9.7

  • tensorboardX==2.4.1

  • pyvww==0.1.1

  • timm==0.5.4

  • texttable==1.6.4

  • pytz

  • torchmetrics==0.8.0

  • mean_average_precision==2021.4.26.0

  • ptflops==0.6.2

How to Use

The deeplite-torch-zoo is collection of benchmark computer vision datasets and pretrained models. The main API functions provided in the zoo are

from deeplite_torch_zoo import get_data_splits_by_name  # create dataloaders
from deeplite_torch_zoo import get_model_by_name  # get a pretrained model for a task
from deeplite_torch_zoo import get_eval_function  # get an evaluation function for a given model and dataset
from deeplite_torch_zoo import create_model  # create a model with an arbitrary number of classes

Loading Datasets

The loaded datasets are available as a dictionary of the following format: {'train': train_dataloder, 'test': test_dataloader}. The train_dataloder and test_dataloader are objects of type torch.utils.data.DataLoader.

Classification Datasets

# Example: DATASET_NAME = "cifar100", BATCH_SIZE = 128, MODEL_NAME = "resnet18"
data_splits = get_data_splits_by_name(
    data_root="./",
    dataset_name=DATASET_NAME,
    model_name=MODEL_NAME,
    batch_size=BATCH_SIZE
)

Object Detection Datasets

The following sample code loads PASCAL VOC dataset. train contains data loader for train sets for VOC2007 and/or VOC2012. If both datasets are provided it concatenates both VOC2007 and VOC2012 train sets. Otherwise, it returns the train set for the provided dataset. ‘test’ contains dataloader (always with batch_size=1) for test set based on VOC2007. You also need to provide the model name to instantiate the dataloaders.

# Example: DATASET_NAME = "voc", BATCH_SIZE = 32, MODEL_NAME = "yolo4s"
data_splits = get_data_splits_by_name(
    data_root=PATH_TO_VOCdevkit,
    dataset_name=DATASET_NAME,
    model_name=MODEL_NAME,
    batch_size=BATCH_SIZE,
)

Note

As it can be observed the dataloaders are provided based on the passed model name argument (model_name). Different object detection models consider inputs/outputs in different formats, and thus the data_splits are formatted according to the needs of the model.

Loading Models

Models are generally provided with weights pretrained on specific datasets. One would load a model X pretrained on a dataset Y to get the appropriate weights for the task Y. The get_model_by_name could used for this purpose. There is also an option to create a new model with an arbitrary number of categories for the downstream tasl and load the weights from another dataset for transfer learning (e.g. to load COCO weights to train a model on the VOC dataset). The create_model method should be generally used for that. Note that get_model_by_name always returns a fully-trained model for the specified task, this method thus does not allow specifying a custom number of classes.

Classification Models

To get a pretrained classification model one could use

model = get_model_by_name(
    model_name=MODEL_NAME, # example: "resnet18"
    dataset_name=DATASET_NAME, # example: "cifar100"
    pretrained=True, # or False, if pretrained weights are not required
    progress=False, # or True, if a progressbar is required
    device="cpu", # or "cuda"
)

To create a new model with ImageNet weights and a custom number of classes one could use

model = create_model(
    model_name=MODEL_NAME, # example: "resnet18"
    pretraining_dataset=PRETRAIN_DATASET, # example: "imagenet"
    num_classes=NUM_CLASSES, # example: 42
    pretrained=True, # or False, if pretrained weights are not required
    progress=False, # or True, if a progressbar is required
    device="cpu", # or "cuda"
)

This method would load the ImageNet-pretrained weights to all the modules of the model where one could match the shape of the weight tensors (i.e. all the layers except the last fully-connected one in the above case).

Object Detection Models

To create an object detection model pretrained on a given dataset:

model = get_model_by_name(
    model_name=MODEL_NAME, # example: "yolo4s"
    dataset_name=DATASET_NAME, # example: "voc"
    pretrained=True, # or False, if pretrained weights are not required
    progress=False, # or True, if a progressbar is required
)

Likewise, to create a object detection model with an arbitrary number of classes

model = create_model(
    model_name=MODEL_NAME, # example: "yolo4s"
    num_classes=NUM_CLASSES, # example: 8
    pretraining_dataset=PRETRAIN_DATASET, # example: "coco"
    pretrained=True, # or False, if pretrained weights are not required
    progress=False, # or True, if a progressbar is required
)

Evaluating models

To create an evaluation fuction for the given model and dataset one could call get_eval_function passing the model_name and dataset_name arguments:

eval_fn = get_eval_function(
    model_name=MODEL_NAME, # example: "resnet50"
    dataset_name=DATASET_NAME, # example: "imagenet"
)

The returned evaluation function is a Python callable that takes two arguments: a PyTorch model object and a PyTorch dataloader object (logically corresponding to the test split dataloader) and returns a dictionary with metric names as keys and their corresponding values.

Please refer to the tables below for the performance metrics of the pretrained models available in the deeplite-torch-zoo. After downloading the model, please evaluate the model using deeplite-profiler to verify the metric values. However, one may see different numbers for the execution time as the target hardware and/or the load on the system may impact it.

Available Models

There is an useful utility function list_models which can be imported as

from deeplite_torch_zoo import list_models

This utility will help in listing available pretrained models or datasets.

For instance list_models("yolo5") will provide the list of available pretrained models that contain yolo5 in their model names. Similar results e.g. can be obtained using list_models("yo"). Filtering models by the corresponding task type is also possible by passing the string of the task type with the task_type_filter argument (the following task types are available: classification, object_detection, semantic_segmentation).

+------------------+------------------------------------+
| Available models |          Source datasets           |
+==================+====================================+
| yolo5_6l         | voc                                |
+------------------+------------------------------------+
| yolo5_6m         | coco, voc                          |
+------------------+------------------------------------+
| yolo5_6m_relu    | person_detection, voc              |
+------------------+------------------------------------+
| yolo5_6ma        | coco                               |
+------------------+------------------------------------+
| yolo5_6n         | coco, person_detection, voc, voc07 |
+------------------+------------------------------------+
| yolo5_6n_hswish  | coco                               |
+------------------+------------------------------------+
| yolo5_6n_relu    | coco, person_detection, voc        |
+------------------+------------------------------------+
| yolo5_6s         | coco, person_detection, voc, voc07 |
+------------------+------------------------------------+
| yolo5_6s_relu    | person_detection, voc              |
+------------------+------------------------------------+
| yolo5_6sa        | coco, person_detection             |
+------------------+------------------------------------+
| yolo5_6x         | voc                                |
+------------------+------------------------------------+

Available Datasets

#

Dataset (dataset_name)

Training Instances

Test Instances

Resolution

Comments

1

MNIST

60,000

10,000

28x28

Downloadable through torchvision API

2

CIFAR100

50,000

10,000

32x32

Downloadable through torchvision API

3

VWW

40,775

8,059

224x224

Based on COCO dataset

4

Tiny Imagenet

100,000

10,000

64x64

Subset of Imagenet with 100 classes

5

Imagenet10

385,244

15,011

224x224

Subset of Imagenet2012 with 10 classes

6

Imagenet16

180,119

42,437

224x224

Subset of Imagenet2012 with 16 classes

7

Imagenet

1,282,168

50,000

224x224

Imagenet2012

8

VOC2007 (Detection)

5,011

4,952

500xH/Wx500

20 classes, 24,640 annotated objects

9

VOC2012 (Detection)

11,530 (train/val)

N/A

500xH/Wx500

20 classes, 27,450 annotated objects

10

COCO2017 (Detection)

117,266, 5,000(val)

40,670

300x300

80 Classes, 1.5M object instances

11

COCO Person (Detection)

39283(train/val)

1648

300x300

1 Class

Contribute a Model/Dataset to the Zoo

Design

The deeplite-torch-zoo is organized as follows. It has two main directories: src and wrappers. The src directory contains all the source code required to define and load the model and dataset. The wrappers contain the entry point API to load the dataset and model. The API definitions in the wrappers following a specific structure and any new model/dataset has to respect this structure.

- src
    - classification
    - objectdetection
    - segmentation
- wrappers
    - datasets
        - classification
        - objectdetection
        - segmentation
    - models
        - classification
        - objectdetection
        - segmentation
    - eval

Contribute

Please perform the following steps to contribute a new model or dataset to the deeplite-torch-zoo

  1. Add the source code under the following directory src/task_type
    1. Add an existing repository as a git-submodule

    2. otherwise, add the source code of data loaders, model definition, loss function, and eval function in a seperate directory

  2. Train the model and upload the trained model weights in a public storage container. Please contact us to add the trained model weights to Deeplite’s common hosted Amazon-S3 container.

  3. Add API calls in wrappers directory:
    1. The entry point method for loading a model has to be named as: {model_name}_{dataset_name}_{num_classes}

    2. The entry point method for dataloaders has to be named as: get_{dataset_name}_for_{model_name}

    3. The eval function has to consider two inputs: (i) a model and (ii) a data_loader

  4. Import the wrapper functions in the __init__ file of the same directory

  5. Add tests for the model in tests/real_tests/test_models.py check for the format in the file

  6. Add fake test for the model in tests/fake_tests/test_models.py

Benchmark Results

Models on VOC Object Detection Dataset

#

Architecture (model_name)

mean Average Precision

Size (MB)

MACs (Billions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

vgg16_ssd

0.7733

100.2731

31.4368

26.2860

309.7318

download

2

mb1_ssd

0.6718

36.1214

1.5547

9.4690

143.1124

download

3

resnet18_ssd

0.728

32.489

6.2125

8.516

122.866

download

4

resnet34_ssd

0.761

54.044

14.306

14.16

194.167

download

5

resnet50_ssd

0.766

58.853

16.2557

15.428

443.1532

download

6

mb2_ssd_lite

0.687

12.9

0.699

3.38

149.7

download

7

yolo-v3

0.8291

235.0847

38.0740

61.6260

999.7075

download

8

yolo-v4s

0.849

34.9

5.1

9.1

355.72

download

9

yolo-v4m

0.874

93.2

13

24.4

606.41

download

10

yolo-v4l

0.872

200.65

29.30

52.60

1006.24

download

11

yolo-v4l-leaky

0.891

200.65

29.35

52.60

1006.24

download

12

yolo-v4x

0.882

368

55.32

96

1528

download

13

yolo-v5l

0.875

176.39

26.52

46.24

806.64

download

14

yolo-v5m

0.902

79.91

11.82

20.94

471.96

download

15

yolo-v5m-relu

0.856

79.91

11.85

20.94

471.9

download

18

yolo-v5n

0.762

6.832

1.043

1.790

115.40

download

19

yolo-v5s

0.871

26.98

3.92

7.073

235.95

download

21

yolo-v5s_relu

0.819

26.98

3.93

7.073

235.95

download

22

yolo-v5x

0.884

329.4

50.13

86.34

1252.96

download

Models on COCO Object Detection Dataset

#

Architecture (model_name)

mean Average Precision

Size (MB)

MACs (Billions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

yolo4m

0.309

94.133

11.44

24.67

548.83

download

2

yolo4s

0.288

35.58

4.50

9.32

324.34

download

3

yolo5_6m

0.374

80.83

10.36

21.19

431.063

download

4

yolo5_6n

0.211

7.14

0.954

1.87

112.94

download

5

yolo5_6n_hswish

0.183

7.14

0.954

1.872

112.94

download

6

yolo5_6n_relu

0.167

34.9

5.1

9.1

320.7

download

7

yolo5_6s

0.301

27.60

3.49

7.235

219.96

download

Models on COCO Person Detection Dataset (only person class from the 80-class COCO dataset)

#

Architecture (model_name)

mean Average Precision

Size (MB)

MACs (Billions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

yolo5_6m_relu

0.709

79.61

6.015

20.87

277.36

download

2

yolo5_6n

0.6718

6.73

0.522011

1.765

59.84

download

3

yolo5_6n_relu

0.621

6.73

0.5249

1.765

59.847

download

4

yolo5_6s

0.738

26.788

1.981

7.022

131.122

download

5

yolo5_6s_relu

0.682

26.788

1.987

7.022

131.122

download

6

yolo5_6sa

0.659

47

2.026

12.32

153.033

download

Models on VOC2007 Dataset (VOC2007 train split taken as training data and VOC2007 val split used for testing)

#

Architecture (model_name)

mean Average Precision

Size (MB)

MACs (Billions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

yolo5_6n

0.620

6.83

1.043

1.79

115.40

download

2

yolo5_6s

0.687

26.98

3.92

7.07

235.95

download

Models on VOC Segmentation Dataset

#

Architecture (model_name)

mean Inter. over Union

Size (MB)

MACs (Billions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

unet_scse_resnet18

0.582

83.3697

20.8930

21.8549

575.0954

download

2

unet_scse_resnet18_1cls

0.673

83.3647

20.5522

21.8536

535.0954

download

3

unet_scse_resnet18_2cls

0.679

83.3652

20.5862

21.8537

539.0954

download

4

fcn32

0.713

519.382

136.142

136.152

858.2010

download

5

deeplab_mobilenet

0.571

29.0976

26.4870

5.8161

1134.6057

download

Models on MNIST dataset

#

Architecture (model_name)

Top1 (%)

Size (MB)

MACs (Millions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

lenet5

99.1199

0.1695

0.2930

0.0444

0.1904

download

2

mlp2

97.8046

0.4512

0.1211

0.1183

0.4572

download

3

mlp4

97.8145

0.5772

0.1549

0.1513

0.5861

download

4

mlp8

96.6970

0.8291

0.2226

0.2174

0.8439

download

Models on CIFAR100 dataset

#

Architecture (model_name)

Top1 (%)

Size (MB)

MACs (Billions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

resnet18

76.8295

42.8014

0.5567

11.2201

48.4389

download

2

resnet50

78.0657

90.4284

1.3049

23.7053

123.5033

download

3

vgg19

72.3794

76.6246

0.3995

20.0867

80.2270

download

4

densenet121

78.4612

26.8881

0.8982

7.0485

66.1506

download

5

googlenet

79.3513

23.8743

1.5341

6.2585

64.5977

download

6

mobilenet_v1

66.8414

12.6246

0.0473

3.3095

16.6215

download

7

mobilenet_v2

73.0815

9.2019

0.0947

2.4122

22.8999

download

8

pre_act_resnet18

76.5229

42.7907

0.5566

11.2173

48.1781

download

9

resnext29_2x64d

79.9150

35.1754

1.4167

9.2210

67.6879

download

10

shufflenet_v2_1_0

69.9169

5.1731

0.0462

1.356

12.3419

download

Models on VWW dataset

#

Architecture (model_name)

Top1 (%)

Size (MB)

MACs (Billions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

resnet18

93.5496

42.6389

1.8217

11.1775

74.6057

download

2

resnet50

94.3675

89.6917

4.1199

23.5121

233.5413

download

3

mobilenet_v1

92.4444

12.2415

0.5829

3.2090

70.5286

download

3

mobilenet_v3_small

89.1180

5.7980

0.0599

1.5199

30.2576

download

3

mobilenet_v3_large

89.1800

16.0393

0.2286

4.2046

83.8590

download

Models on Imagenet10 dataset

#

Architecture (model_name)

Top1 (%)

Size (MB)

MACs (Billions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

resnet18

93.8294

42.6546

1.8217

11.1816

74.6215

download

2

mobilenet_v2_0_35

81.0492

1.5600

0.0664

0.4089

34.9010

download

Models on Imagenet16 dataset

#

Architecture (model_name)

Top1 (%)

Size (MB)

MACs (Billions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

resnet18

94.5115

42.6663

1.8217

11.1816

74.6332

download

2

resnet50

96.8518

89.8011

4.1199

23.5408

233.6508

download

Models on Imagenet dataset (from torchvision)

#

Architecture (model_name)

Top1 (%)

Size (MB)

MACs (Billions)

#Params (Millions)

Memory Footprint (MB)

Pretrained Weights

1

resnet18

69.7319

44.5919

1.8222

11.6895

76.5664

.

2

resnet34

73.2880

83.1515

3.6756

21.7977

131.8740

.

3

resnet50

76.1001

97.4923

4.1219

25.5570

241.3496

.

4

resnet101

77.3489

169.9416

7.8495

44.549

385.3847

.

5

resnet152

78.2836

229.6173

11.5807

60.1928

533.4902

.

6

inception_v3

69.5109

90.9217

2.8472

27.1613

149.3052

.

7

densenet121

74.4106

30.4369

2.8826

7.9789

187.7805

.

8

densenet161

77.1120

109.4093

7.8184

28.681

393.9603

.

9

densenet169

75.5635

53.9760

3.4184

14.149

238.9538

.

10

densenet201

76.8702

76.3471

4.3670

20.0139

307.5974

.

11

alexnet

56.4758

233.0812

0.7156

61.1008

237.8486

.

12

squeezenet1_0

58.0591

4.7624

0.8300

1.2484

51.2403

.

13

squeezenet1_1

58.1438

4.7130

0.3559

1.235

32.1729

.

14

vgg11

68.9946

506.8334

7.6301

132.8633

570.0989

.

15

vgg11_bn

70.3433

506.8544

7.6449

132.8688

598.4480

.

16

vgg13

69.9017

507.5373

11.3391

133.0478

607.5527

.

17

vgg13_bn

71.5557

507.5597

11.3636

133.0537

654.2783

.

18

vgg16

71.5605

527.7921

15.5035

138.3575

637.7607

.

19

vgg16_bn

73.3352

527.8243

15.5306

138.3660

689.4726

.

20

vgg19

72.3449

548.0470

19.6679

143.6672

667.9687

.

21

vgg19_bn

74.1900

548.0890

19.6976

143.6782

724.6669

.

Models on Imagenet dataset (timm/torchvision)

The zoo enables to load any ImageNet-pretrained model from the timm repo as well as any ImageNet model from torchvision. In case the model names overlap with timm, the corresponding timm model is loaded.

  • Model Size: Memory consumed by the parameters (weights and biases) of the model

  • MACs: Summation of Multiply-Add Cumulations (MACs) per single image (batch_size=1)

  • #Parameters: Total number of parameters (trainable and non-trainable) in the model

  • Memory Footprint: Total memory consumed by the parameters (weights and biases) and activations (per layer) per single image (batch_size=1)