deeplite-torch-zoo
The deeplite-torch-zoo
package is a collection of popular CNN model architectures and benchmark datasets for PyTorch framework. The models are grouped under different datasets and different task types such as classification, object detection, and segmentation. The primary aim of this deeplite-torch-zoo
is to booststrap applications by starting with the most suitable pretrained models. In addition, the pretrained models from deeplite-torch-zoo
can be used as a good starting point for optimizing model architectures using our Deeplite Neutrino™.
Installation
1. Install using pip
Use following command to install the package from our internal PyPI repository.
$ pip install --upgrade pip
$ pip install deeplite-torch-zoo
2. Install from source
$ git clone https://github.com/Deeplite/deeplite-torch-zoo.git
$ pip install .
3. Install in Dev mode
$ git clone https://github.com/Deeplite/deeplite-torch-zoo.git
$ pip install -e .
$ pip install -r requirements-test.txt
To test the installation, one can run the basic tests using pytest command in the root folder.
Minimal Dependencies
torch>=1.4,<=1.8.1
opencv-python
scipy>=1.4.1
numpy==1.19.5
pycocotools==2.0.4
Cython==0.29.30
tqdm==4.46.0
albumentations
pretrainedmodels==0.7.4
torchfcn==1.9.7
tensorboardX==2.4.1
pyvww==0.1.1
timm==0.5.4
texttable==1.6.4
pytz
torchmetrics==0.8.0
mean_average_precision==2021.4.26.0
ptflops==0.6.2
How to Use
The deeplite-torch-zoo
is collection of benchmark computer vision datasets and pretrained models. The main API functions provided in the zoo are
from deeplite_torch_zoo import get_data_splits_by_name # create dataloaders
from deeplite_torch_zoo import get_model_by_name # get a pretrained model for a task
from deeplite_torch_zoo import get_eval_function # get an evaluation function for a given model and dataset
from deeplite_torch_zoo import create_model # create a model with an arbitrary number of classes
Loading Datasets
The loaded datasets are available as a dictionary of the following format: {'train': train_dataloder, 'test': test_dataloader}
. The train_dataloder and test_dataloader are objects of type torch.utils.data.DataLoader
.
Classification Datasets
# Example: DATASET_NAME = "cifar100", BATCH_SIZE = 128, MODEL_NAME = "resnet18"
data_splits = get_data_splits_by_name(
data_root="./",
dataset_name=DATASET_NAME,
model_name=MODEL_NAME,
batch_size=BATCH_SIZE
)
Object Detection Datasets
The following sample code loads PASCAL VOC dataset. train
contains data loader for train sets for VOC2007 and/or VOC2012. If both datasets are provided it concatenates both VOC2007 and VOC2012 train sets. Otherwise, it returns the train set for the provided dataset. ‘test’ contains dataloader (always with batch_size=1
) for test set based on VOC2007. You also need to provide the model name to instantiate the dataloaders.
# Example: DATASET_NAME = "voc", BATCH_SIZE = 32, MODEL_NAME = "yolo4s"
data_splits = get_data_splits_by_name(
data_root=PATH_TO_VOCdevkit,
dataset_name=DATASET_NAME,
model_name=MODEL_NAME,
batch_size=BATCH_SIZE,
)
Note
As it can be observed the dataloaders are provided based on the passed model name argument (model_name). Different object detection models consider inputs/outputs in different formats, and thus the data_splits are formatted according to the needs of the model.
Loading Models
Models are generally provided with weights pretrained on specific datasets. One would load a model X
pretrained on a dataset Y
to get the appropriate weights for the task Y
. The get_model_by_name
could used for this purpose. There is also an option to create a new model with an arbitrary number of categories for the downstream tasl and load the weights from another dataset for transfer learning (e.g. to load COCO
weights to train a model on the VOC
dataset). The create_model
method should be generally used for that. Note that get_model_by_name
always returns a fully-trained model for the specified task, this method thus does not allow specifying a custom number of classes.
Classification Models
To get a pretrained classification model one could use
model = get_model_by_name(
model_name=MODEL_NAME, # example: "resnet18"
dataset_name=DATASET_NAME, # example: "cifar100"
pretrained=True, # or False, if pretrained weights are not required
progress=False, # or True, if a progressbar is required
device="cpu", # or "cuda"
)
To create a new model with ImageNet weights and a custom number of classes one could use
model = create_model(
model_name=MODEL_NAME, # example: "resnet18"
pretraining_dataset=PRETRAIN_DATASET, # example: "imagenet"
num_classes=NUM_CLASSES, # example: 42
pretrained=True, # or False, if pretrained weights are not required
progress=False, # or True, if a progressbar is required
device="cpu", # or "cuda"
)
This method would load the ImageNet-pretrained weights to all the modules of the model where one could match the shape of the weight tensors (i.e. all the layers except the last fully-connected one in the above case).
Object Detection Models
To create an object detection model pretrained on a given dataset:
model = get_model_by_name(
model_name=MODEL_NAME, # example: "yolo4s"
dataset_name=DATASET_NAME, # example: "voc"
pretrained=True, # or False, if pretrained weights are not required
progress=False, # or True, if a progressbar is required
)
Likewise, to create a object detection model with an arbitrary number of classes
model = create_model(
model_name=MODEL_NAME, # example: "yolo4s"
num_classes=NUM_CLASSES, # example: 8
pretraining_dataset=PRETRAIN_DATASET, # example: "coco"
pretrained=True, # or False, if pretrained weights are not required
progress=False, # or True, if a progressbar is required
)
Evaluating models
To create an evaluation fuction for the given model and dataset one could call get_eval_function
passing the model_name
and dataset_name
arguments:
eval_fn = get_eval_function(
model_name=MODEL_NAME, # example: "resnet50"
dataset_name=DATASET_NAME, # example: "imagenet"
)
The returned evaluation function is a Python callable that takes two arguments: a PyTorch model object and a PyTorch dataloader object (logically corresponding to the test split dataloader) and returns a dictionary with metric names as keys and their corresponding values.
Please refer to the tables below for the performance metrics of the pretrained models available in the deeplite-torch-zoo
. After downloading the model, please evaluate the model using deeplite-profiler to verify the metric values. However, one may see different numbers for the execution time as the target hardware and/or the load on the system may impact it.
Available Models
There is an useful utility function list_models
which can be imported as
from deeplite_torch_zoo import list_models
This utility will help in listing available pretrained models or datasets.
For instance list_models("yolo5")
will provide the list of available pretrained models that contain yolo5
in their model names. Similar results e.g. can be obtained using list_models("yo")
. Filtering models by the corresponding task type is also possible by passing the string of the task type with the task_type_filter
argument (the following task types are available: classification
, object_detection
, semantic_segmentation
).
+------------------+------------------------------------+
| Available models | Source datasets |
+==================+====================================+
| yolo5_6l | voc |
+------------------+------------------------------------+
| yolo5_6m | coco, voc |
+------------------+------------------------------------+
| yolo5_6m_relu | person_detection, voc |
+------------------+------------------------------------+
| yolo5_6ma | coco |
+------------------+------------------------------------+
| yolo5_6n | coco, person_detection, voc, voc07 |
+------------------+------------------------------------+
| yolo5_6n_hswish | coco |
+------------------+------------------------------------+
| yolo5_6n_relu | coco, person_detection, voc |
+------------------+------------------------------------+
| yolo5_6s | coco, person_detection, voc, voc07 |
+------------------+------------------------------------+
| yolo5_6s_relu | person_detection, voc |
+------------------+------------------------------------+
| yolo5_6sa | coco, person_detection |
+------------------+------------------------------------+
| yolo5_6x | voc |
+------------------+------------------------------------+
Available Datasets
# |
Dataset (dataset_name) |
Training Instances |
Test Instances |
Resolution |
Comments |
1 |
MNIST |
60,000 |
10,000 |
28x28 |
Downloadable through torchvision API |
2 |
CIFAR100 |
50,000 |
10,000 |
32x32 |
Downloadable through torchvision API |
3 |
VWW |
40,775 |
8,059 |
224x224 |
Based on COCO dataset |
4 |
Tiny Imagenet |
100,000 |
10,000 |
64x64 |
Subset of Imagenet with 100 classes |
5 |
Imagenet10 |
385,244 |
15,011 |
224x224 |
Subset of Imagenet2012 with 10 classes |
6 |
Imagenet16 |
180,119 |
42,437 |
224x224 |
Subset of Imagenet2012 with 16 classes |
7 |
Imagenet |
1,282,168 |
50,000 |
224x224 |
Imagenet2012 |
8 |
VOC2007 (Detection) |
5,011 |
4,952 |
500xH/Wx500 |
20 classes, 24,640 annotated objects |
9 |
VOC2012 (Detection) |
11,530 (train/val) |
N/A |
500xH/Wx500 |
20 classes, 27,450 annotated objects |
10 |
COCO2017 (Detection) |
117,266, 5,000(val) |
40,670 |
300x300 |
80 Classes, 1.5M object instances |
11 |
COCO Person (Detection) |
39283(train/val) |
1648 |
300x300 |
1 Class |
Contribute a Model/Dataset to the Zoo
Design
The deeplite-torch-zoo
is organized as follows. It has two main directories: src
and wrappers
. The src
directory contains all the source code required to define and load the model and dataset. The wrappers
contain the entry point API to load the dataset and model. The API definitions in the wrappers
following a specific structure and any new model/dataset has to respect this structure.
- src
- classification
- objectdetection
- segmentation
- wrappers
- datasets
- classification
- objectdetection
- segmentation
- models
- classification
- objectdetection
- segmentation
- eval
Contribute
Please perform the following steps to contribute a new model or dataset to the deeplite-torch-zoo
- Add the source code under the following directory
src/task_type
Add an existing repository as a git-submodule
otherwise, add the source code of data loaders, model definition, loss function, and eval function in a seperate directory
- Add the source code under the following directory
Train the model and upload the trained model weights in a public storage container. Please contact us to add the trained model weights to Deeplite’s common hosted Amazon-S3 container.
- Add API calls in
wrappers
directory: The entry point method for loading a model has to be named as:
{model_name}_{dataset_name}_{num_classes}
The entry point method for dataloaders has to be named as:
get_{dataset_name}_for_{model_name}
The eval function has to consider two inputs: (i) a model and (ii) a data_loader
- Add API calls in
Import the wrapper functions in the
__init__
file of the same directoryAdd tests for the model in
tests/real_tests/test_models.py
check for the format in the fileAdd fake test for the model in
tests/fake_tests/test_models.py
Benchmark Results
Models on VOC Object Detection Dataset
# |
Architecture (model_name) |
Size (MB) |
MACs (Billions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
|
1 |
vgg16_ssd |
0.7733 |
100.2731 |
31.4368 |
26.2860 |
309.7318 |
|
2 |
mb1_ssd |
0.6718 |
36.1214 |
1.5547 |
9.4690 |
143.1124 |
|
3 |
resnet18_ssd |
0.728 |
32.489 |
6.2125 |
8.516 |
122.866 |
|
4 |
resnet34_ssd |
0.761 |
54.044 |
14.306 |
14.16 |
194.167 |
|
5 |
resnet50_ssd |
0.766 |
58.853 |
16.2557 |
15.428 |
443.1532 |
|
6 |
mb2_ssd_lite |
0.687 |
12.9 |
0.699 |
3.38 |
149.7 |
|
7 |
yolo-v3 |
0.8291 |
235.0847 |
38.0740 |
61.6260 |
999.7075 |
|
8 |
yolo-v4s |
0.849 |
34.9 |
5.1 |
9.1 |
355.72 |
|
9 |
yolo-v4m |
0.874 |
93.2 |
13 |
24.4 |
606.41 |
|
10 |
yolo-v4l |
0.872 |
200.65 |
29.30 |
52.60 |
1006.24 |
|
11 |
yolo-v4l-leaky |
0.891 |
200.65 |
29.35 |
52.60 |
1006.24 |
|
12 |
yolo-v4x |
0.882 |
368 |
55.32 |
96 |
1528 |
|
13 |
yolo-v5l |
0.875 |
176.39 |
26.52 |
46.24 |
806.64 |
|
14 |
yolo-v5m |
0.902 |
79.91 |
11.82 |
20.94 |
471.96 |
|
15 |
yolo-v5m-relu |
0.856 |
79.91 |
11.85 |
20.94 |
471.9 |
|
18 |
yolo-v5n |
0.762 |
6.832 |
1.043 |
1.790 |
115.40 |
|
19 |
yolo-v5s |
0.871 |
26.98 |
3.92 |
7.073 |
235.95 |
|
21 |
yolo-v5s_relu |
0.819 |
26.98 |
3.93 |
7.073 |
235.95 |
|
22 |
yolo-v5x |
0.884 |
329.4 |
50.13 |
86.34 |
1252.96 |
Models on COCO Object Detection Dataset
# |
Architecture (model_name) |
Size (MB) |
MACs (Billions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
|
1 |
yolo4m |
0.309 |
94.133 |
11.44 |
24.67 |
548.83 |
|
2 |
yolo4s |
0.288 |
35.58 |
4.50 |
9.32 |
324.34 |
|
3 |
yolo5_6m |
0.374 |
80.83 |
10.36 |
21.19 |
431.063 |
|
4 |
yolo5_6n |
0.211 |
7.14 |
0.954 |
1.87 |
112.94 |
|
5 |
yolo5_6n_hswish |
0.183 |
7.14 |
0.954 |
1.872 |
112.94 |
|
6 |
yolo5_6n_relu |
0.167 |
34.9 |
5.1 |
9.1 |
320.7 |
|
7 |
yolo5_6s |
0.301 |
27.60 |
3.49 |
7.235 |
219.96 |
Models on COCO Person Detection Dataset (only person class from the 80-class COCO dataset)
# |
Architecture (model_name) |
Size (MB) |
MACs (Billions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
|
1 |
yolo5_6m_relu |
0.709 |
79.61 |
6.015 |
20.87 |
277.36 |
|
2 |
yolo5_6n |
0.6718 |
6.73 |
0.522011 |
1.765 |
59.84 |
|
3 |
yolo5_6n_relu |
0.621 |
6.73 |
0.5249 |
1.765 |
59.847 |
|
4 |
yolo5_6s |
0.738 |
26.788 |
1.981 |
7.022 |
131.122 |
|
5 |
yolo5_6s_relu |
0.682 |
26.788 |
1.987 |
7.022 |
131.122 |
|
6 |
yolo5_6sa |
0.659 |
47 |
2.026 |
12.32 |
153.033 |
Models on VOC2007 Dataset (VOC2007 train split taken as training data and VOC2007 val split used for testing)
# |
Architecture (model_name) |
Size (MB) |
MACs (Billions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
|
1 |
yolo5_6n |
0.620 |
6.83 |
1.043 |
1.79 |
115.40 |
|
2 |
yolo5_6s |
0.687 |
26.98 |
3.92 |
7.07 |
235.95 |
Models on VOC Segmentation Dataset
# |
Architecture (model_name) |
mean Inter. over Union |
Size (MB) |
MACs (Billions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
1 |
unet_scse_resnet18 |
0.582 |
83.3697 |
20.8930 |
21.8549 |
575.0954 |
|
2 |
unet_scse_resnet18_1cls |
0.673 |
83.3647 |
20.5522 |
21.8536 |
535.0954 |
|
3 |
unet_scse_resnet18_2cls |
0.679 |
83.3652 |
20.5862 |
21.8537 |
539.0954 |
|
4 |
fcn32 |
0.713 |
519.382 |
136.142 |
136.152 |
858.2010 |
|
5 |
deeplab_mobilenet |
0.571 |
29.0976 |
26.4870 |
5.8161 |
1134.6057 |
Models on MNIST dataset
# |
Architecture (model_name) |
Top1 (%) |
Size (MB) |
MACs (Millions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
1 |
lenet5 |
99.1199 |
0.1695 |
0.2930 |
0.0444 |
0.1904 |
|
2 |
mlp2 |
97.8046 |
0.4512 |
0.1211 |
0.1183 |
0.4572 |
|
3 |
mlp4 |
97.8145 |
0.5772 |
0.1549 |
0.1513 |
0.5861 |
|
4 |
mlp8 |
96.6970 |
0.8291 |
0.2226 |
0.2174 |
0.8439 |
Models on CIFAR100 dataset
# |
Architecture (model_name) |
Top1 (%) |
Size (MB) |
MACs (Billions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
1 |
resnet18 |
76.8295 |
42.8014 |
0.5567 |
11.2201 |
48.4389 |
|
2 |
resnet50 |
78.0657 |
90.4284 |
1.3049 |
23.7053 |
123.5033 |
|
3 |
vgg19 |
72.3794 |
76.6246 |
0.3995 |
20.0867 |
80.2270 |
|
4 |
densenet121 |
78.4612 |
26.8881 |
0.8982 |
7.0485 |
66.1506 |
|
5 |
googlenet |
79.3513 |
23.8743 |
1.5341 |
6.2585 |
64.5977 |
|
6 |
mobilenet_v1 |
66.8414 |
12.6246 |
0.0473 |
3.3095 |
16.6215 |
|
7 |
mobilenet_v2 |
73.0815 |
9.2019 |
0.0947 |
2.4122 |
22.8999 |
|
8 |
pre_act_resnet18 |
76.5229 |
42.7907 |
0.5566 |
11.2173 |
48.1781 |
|
9 |
resnext29_2x64d |
79.9150 |
35.1754 |
1.4167 |
9.2210 |
67.6879 |
|
10 |
shufflenet_v2_1_0 |
69.9169 |
5.1731 |
0.0462 |
1.356 |
12.3419 |
Models on VWW dataset
# |
Architecture (model_name) |
Top1 (%) |
Size (MB) |
MACs (Billions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
1 |
resnet18 |
93.5496 |
42.6389 |
1.8217 |
11.1775 |
74.6057 |
|
2 |
resnet50 |
94.3675 |
89.6917 |
4.1199 |
23.5121 |
233.5413 |
|
3 |
mobilenet_v1 |
92.4444 |
12.2415 |
0.5829 |
3.2090 |
70.5286 |
|
3 |
mobilenet_v3_small |
89.1180 |
5.7980 |
0.0599 |
1.5199 |
30.2576 |
|
3 |
mobilenet_v3_large |
89.1800 |
16.0393 |
0.2286 |
4.2046 |
83.8590 |
Models on Imagenet10 dataset
# |
Architecture (model_name) |
Top1 (%) |
Size (MB) |
MACs (Billions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
1 |
resnet18 |
93.8294 |
42.6546 |
1.8217 |
11.1816 |
74.6215 |
|
2 |
mobilenet_v2_0_35 |
81.0492 |
1.5600 |
0.0664 |
0.4089 |
34.9010 |
Models on Imagenet16 dataset
# |
Architecture (model_name) |
Top1 (%) |
Size (MB) |
MACs (Billions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
1 |
resnet18 |
94.5115 |
42.6663 |
1.8217 |
11.1816 |
74.6332 |
|
2 |
resnet50 |
96.8518 |
89.8011 |
4.1199 |
23.5408 |
233.6508 |
Models on Imagenet dataset (from torchvision)
# |
Architecture (model_name) |
Top1 (%) |
Size (MB) |
MACs (Billions) |
#Params (Millions) |
Memory Footprint (MB) |
Pretrained Weights |
1 |
resnet18 |
69.7319 |
44.5919 |
1.8222 |
11.6895 |
76.5664 |
. |
2 |
resnet34 |
73.2880 |
83.1515 |
3.6756 |
21.7977 |
131.8740 |
. |
3 |
resnet50 |
76.1001 |
97.4923 |
4.1219 |
25.5570 |
241.3496 |
. |
4 |
resnet101 |
77.3489 |
169.9416 |
7.8495 |
44.549 |
385.3847 |
. |
5 |
resnet152 |
78.2836 |
229.6173 |
11.5807 |
60.1928 |
533.4902 |
. |
6 |
inception_v3 |
69.5109 |
90.9217 |
2.8472 |
27.1613 |
149.3052 |
. |
7 |
densenet121 |
74.4106 |
30.4369 |
2.8826 |
7.9789 |
187.7805 |
. |
8 |
densenet161 |
77.1120 |
109.4093 |
7.8184 |
28.681 |
393.9603 |
. |
9 |
densenet169 |
75.5635 |
53.9760 |
3.4184 |
14.149 |
238.9538 |
. |
10 |
densenet201 |
76.8702 |
76.3471 |
4.3670 |
20.0139 |
307.5974 |
. |
11 |
alexnet |
56.4758 |
233.0812 |
0.7156 |
61.1008 |
237.8486 |
. |
12 |
squeezenet1_0 |
58.0591 |
4.7624 |
0.8300 |
1.2484 |
51.2403 |
. |
13 |
squeezenet1_1 |
58.1438 |
4.7130 |
0.3559 |
1.235 |
32.1729 |
. |
14 |
vgg11 |
68.9946 |
506.8334 |
7.6301 |
132.8633 |
570.0989 |
. |
15 |
vgg11_bn |
70.3433 |
506.8544 |
7.6449 |
132.8688 |
598.4480 |
. |
16 |
vgg13 |
69.9017 |
507.5373 |
11.3391 |
133.0478 |
607.5527 |
. |
17 |
vgg13_bn |
71.5557 |
507.5597 |
11.3636 |
133.0537 |
654.2783 |
. |
18 |
vgg16 |
71.5605 |
527.7921 |
15.5035 |
138.3575 |
637.7607 |
. |
19 |
vgg16_bn |
73.3352 |
527.8243 |
15.5306 |
138.3660 |
689.4726 |
. |
20 |
vgg19 |
72.3449 |
548.0470 |
19.6679 |
143.6672 |
667.9687 |
. |
21 |
vgg19_bn |
74.1900 |
548.0890 |
19.6976 |
143.6782 |
724.6669 |
. |
Models on Imagenet dataset (timm/torchvision)
The zoo enables to load any ImageNet-pretrained model from the timm repo as well as any ImageNet model from torchvision. In case the model names overlap with timm, the corresponding timm model is loaded.
Model Size: Memory consumed by the parameters (weights and biases) of the model
MACs: Summation of Multiply-Add Cumulations (MACs) per single image (batch_size=1)
#Parameters: Total number of parameters (trainable and non-trainable) in the model
Memory Footprint: Total memory consumed by the parameters (weights and biases) and activations (per layer) per single image (batch_size=1)