deeplite-profiler
To be able to use a deep learning model in research and production, it is essential to understand different performance metrics of the model beyond just the model’s accuracy. deeplite-profiler
helps to easily and effectively measure the different performance metrics of a deep learning model. In addition to the existing metrics in the deeplite-profiler
, users could seamlessly contribute any custom metric to measure using the profiler. deeplite-profiler
could also be used to compare the performance between two different deep learning models, for example, a teacher and a student model. deeplite-profiler
currently supports PyTorch and TensorFlow Keras (v1) as two different backend frameworks.
Installation
1. Install using pip
Use following command to install the package from our internal PyPI repository.
$ pip install --upgrade pip
$ pip install deeplite-profiler[`backend`]
2. Install from source
$ git clone https://github.com/Deeplite/deeplite-profiler.git
$ pip install .[`backend`]
- One can install specific
backend
modules, depending on the required framework and compute support.backend
could be one of the following values torch
: to install atorch
specific profilertf
: to install aTensorFlow
specific profiler (this supports only CPU compute)tf-gpu
: to install aTensorFlow-gpu
specific profiler (this supports both CPU and GPU compute)all
: to install bothtorch
andTensorFlow
specific profiler (this supports only CPU compute for TensorFlow)all-gpu
: to install bothtorch
andTensorFlow-gpu
specific profiler (for GPU environment) (this supports both CPU and GPU compute for TensorFlow)
3. Install in Dev mode
$ git clone https://github.com/Deeplite/deeplite-profiler.git
$ pip install -e .[`backend`]
$ pip install -r requirements-test.txt
To test the installation, one can run the basic tests using pytest command in the root folder.
Note
Currently, we support TensorFlow 1.14 and 1.15 versions, for Python 3.6 and 3.7. We do not support TensorFlow for Python 3.8+.
Minimal Dependencies
numpy>=1.17
torch (if using
PyTorch
backend)tensorFlow, tensorFlow-gpu (if using
TensorFlow
backend)
How to Use
For PyTorch
model
# Step 1: Define native pytorch dataloaders and model
# 1a. data_splits = {"train": train_dataloder, "test": test_dataloader}
data_splits = /* ... load iterable data loaders ... */
model = /* ... load native deep learning model ... */
# Step 2: Create Profiler class and register the profiling functions
profiler = TorchProfiler(model, data_splits, name="Original Model")
profiler.register_profiler_function(ComputeComplexity())
profiler.register_profiler_function(ComputeExecutionTime())
profiler.register_profiler_function(ComputeEvalMetric(get_accuracy, 'accuracy', unit_name='%'))
# Step 3: Compute the registered profiler metrics for the PyTorch Model
profiler.compute_network_status(batch_size=1, device=Device.CPU, short_print=False,
include_weights=True, print_mode='debug')
# Step 4: Compare two different models or profilers.
profiler2 = profiler.clone(model=deepcopy(model)) # Creating a dummy clone of the current profiler
profiler2.name = "Clone Model"
profiler.compare(profiler2, short_print=False, batch_size=1, device=Device.CPU, print_mode='debug')
For TensorFlow
model
# Step 1: Define native TensorFlow dataloaders and model
# 1a. data_splits = {"train": train_dataloder, "test": test_dataloader}
data_splits = /* ... load iterable data loaders ... */
model = /* ... load native deep learning model ... */
# Step 2: Create Profiler class and register the profiling functions
profiler = TFProfiler(model, data_splits, name="Original Model")
profiler.register_profiler_function(ComputeFlops())
profiler.register_profiler_function(ComputeSize())
profiler.register_profiler_function(ComputeParams())
profiler.register_profiler_function(ComputeLayerwiseSummary())
profiler.register_profiler_function(ComputeExecutionTime())
profiler.register_profiler_function(ComputeEvalMetric(get_accuracy, 'accuracy', unit_name='%'))
# Step 3: Compute the registered profiler metrics for the TensorFlow Keras Model
profiler.compute_network_status(batch_size=1, device=Device.CPU, short_print=False,
include_weights=True, print_mode='debug')
# Step 4: Compare two different models or profilers.
profiler2 = profiler.clone(model=model) # Creating a dummy clone of the current profiler
profiler2.name = "Clone Model"
profiler.compare(profiler2, short_print=False, batch_size=1, device=Device.CPU, print_mode='debug')
Output Display
An example output of the deeplite-profiler
for resnet18
model using the standard CIFAR100
dataset using PyTorch
backend, looks as follows
+---------------------------------------------------------------+
| Neutrino Model Profiler |
+-----------------------------------------+---------------------+
| Param Name (Original Model) | Value|
| Backend: TorchBackend | |
+-----------------------------------------+---------------------+
| Evaluation Metric (%) | 76.8295|
| Model Size (MB) | 42.8014|
| Computational Complexity (GigaMACs) | 0.5567|
| Total Parameters (Millions) | 11.2201|
| Memory Footprint (MB) | 48.4389|
| Execution Time (ms) | 2.6537|
+-----------------------------------------+---------------------+
Evaluation Metric: Computed performance of the model on the given data
Model Size: Memory consumed by the parameters (weights and biases) of the model
Computational Complexity: Summation of Multiply-Add Cumulations (MACs) per single image (batch_size=1)
#Total Parameters: Total number of parameters (trainable and non-trainable) in the model
Memory Footprint: Total memory consumed by parameters and activations per single image (batch_size=1)
Execution Time: On NVIDIA TITAN V GPU, time required for the forward pass per single image (batch_size=1)
Examples
A list of different examples to use deeplite-profiler
to profiler different PyTorch and TensorFlow models can be found here.
Torch Example
from neutrino_torch_zoo.wrappers.wrapper import get_data_splits_by_name, get_model_by_name
from neutrino.framework.torch_profiler.torch_profiler import TorchProfiler
from neutrino.framework.torch_profiler.torch_profiler import *
from neutrino.framework.profiler import Device, ComputeEvalMetric
from neutrino.framework.torch_profiler.torch_inference import get_accuracy
from copy import deepcopy
# Step 1: Define pytorch dataloaders and model
# 1a. data_splits = {"train": train_dataloder, "test": test_dataloader}
data_splits = get_data_splits_by_name(dataset_name='cifar100',
data_root='',
batch_size=128,
num_torch_workers=4)
# 1b. Load the Pytorch model
model = get_model_by_name(model_name='resnet18',
dataset_name='cifar100',
pretrained=True,
progress=True)
# Step 2: Create Profiler class and register the profiling functions
data_loader = TorchProfiler.enable_forward_pass_data_splits(data_splits)
profiler = TorchProfiler(model, data_loader, name="Original Model")
profiler.register_profiler_function(ComputeComplexity())
profiler.register_profiler_function(ComputeExecutionTime())
profiler.register_profiler_function(ComputeEvalMetric(get_accuracy, 'accuracy', unit_name='%'))
# Step 3: Compute the registered profiler metrics for the PyTorch Model
profiler.compute_network_status(batch_size=1, device=Device.GPU, short_print=False,
include_weights=True, print_mode='debug')
# Step 4: Clone and Compare
profiler2 = profiler.clone(model=deepcopy(model))
profiler2.name = "Clone Model"
profiler2.compare(profiler, short_print=False, batch_size=1, device=Device.GPU, print_mode='debug')
Output of the above code is as follows:
Computing network status...
--------------------------------------------------------------------------------------------------------------------------------------------
Layer () Weight Shape Bias Shape Output Shape ActivationSize (Bytes) # Params Time (ms)
============================================================================================================================================
Conv2d-1 [64, 3, 3, 3] None [[1, 64, 32, 32]] 262,144 1,728 0.6216
BatchNorm2d-2 [64] [64] [[1, 64, 32, 32]] 262,144 128 0.1142
Conv2d-3 [64, 64, 3, 3] None [[1, 64, 32, 32]] 262,144 36,864 0.1574
BatchNorm2d-4 [64] [64] [[1, 64, 32, 32]] 262,144 128 0.0937
Conv2d-5 [64, 64, 3, 3] None [[1, 64, 32, 32]] 262,144 36,864 0.1509
BatchNorm2d-6 [64] [64] [[1, 64, 32, 32]] 262,144 128 0.0913
BasicBlock-7 None None [[1, 64, 32, 32]] 262,144 0 0.0839
Conv2d-8 [64, 64, 3, 3] None [[1, 64, 32, 32]] 262,144 36,864 0.1223
BatchNorm2d-9 [64] [64] [[1, 64, 32, 32]] 262,144 128 0.0908
Conv2d-10 [64, 64, 3, 3] None [[1, 64, 32, 32]] 262,144 36,864 0.1450
BatchNorm2d-11 [64] [64] [[1, 64, 32, 32]] 262,144 128 0.0894
BasicBlock-12 None None [[1, 64, 32, 32]] 262,144 0 0.1090
Conv2d-13 [128, 64, 3, 3] None [[1, 128, 16, 16]] 131,072 73,728 0.1042
BatchNorm2d-14 [128] [128] [[1, 128, 16, 16]] 131,072 256 0.0877
Conv2d-15 [128, 128, 3, 3] None [[1, 128, 16, 16]] 131,072 147,456 0.1416
BatchNorm2d-16 [128] [128] [[1, 128, 16, 16]] 131,072 256 0.0863
Conv2d-17 [128, 64, 1, 1] None [[1, 128, 16, 16]] 131,072 8,192 0.1125
BatchNorm2d-18 [128] [128] [[1, 128, 16, 16]] 131,072 256 0.0849
BasicBlock-19 None None [[1, 128, 16, 16]] 131,072 0 0.0732
Conv2d-20 [128, 128, 3, 3] None [[1, 128, 16, 16]] 131,072 147,456 0.1125
BatchNorm2d-21 [128] [128] [[1, 128, 16, 16]] 131,072 256 0.0882
Conv2d-22 [128, 128, 3, 3] None [[1, 128, 16, 16]] 131,072 147,456 0.1407
BatchNorm2d-23 [128] [128] [[1, 128, 16, 16]] 131,072 256 0.0861
BasicBlock-24 None None [[1, 128, 16, 16]] 131,072 0 0.0765
Conv2d-25 [256, 128, 3, 3] None [[1, 256, 8, 8]] 65,536 294,912 0.1161
BatchNorm2d-26 [256] [256] [[1, 256, 8, 8]] 65,536 512 0.0868
Conv2d-27 [256, 256, 3, 3] None [[1, 256, 8, 8]] 65,536 589,824 0.1357
BatchNorm2d-28 [256] [256] [[1, 256, 8, 8]] 65,536 512 0.0911
Conv2d-29 [256, 128, 1, 1] None [[1, 256, 8, 8]] 65,536 32,768 0.1161
BatchNorm2d-30 [256] [256] [[1, 256, 8, 8]] 65,536 512 0.0896
BasicBlock-31 None None [[1, 256, 8, 8]] 65,536 0 0.0737
Conv2d-32 [256, 256, 3, 3] None [[1, 256, 8, 8]] 65,536 589,824 0.1066
BatchNorm2d-33 [256] [256] [[1, 256, 8, 8]] 65,536 512 0.0861
Conv2d-34 [256, 256, 3, 3] None [[1, 256, 8, 8]] 65,536 589,824 0.1276
BatchNorm2d-35 [256] [256] [[1, 256, 8, 8]] 65,536 512 0.0856
BasicBlock-36 None None [[1, 256, 8, 8]] 65,536 0 0.0825
Conv2d-37 [512, 256, 3, 3] None [[1, 512, 4, 4]] 32,768 1,179,648 0.1166
BatchNorm2d-38 [512] [512] [[1, 512, 4, 4]] 32,768 1,024 0.1233
Conv2d-39 [512, 512, 3, 3] None [[1, 512, 4, 4]] 32,768 2,359,296 0.1392
BatchNorm2d-40 [512] [512] [[1, 512, 4, 4]] 32,768 1,024 0.0861
Conv2d-41 [512, 256, 1, 1] None [[1, 512, 4, 4]] 32,768 131,072 0.1197
BatchNorm2d-42 [512] [512] [[1, 512, 4, 4]] 32,768 1,024 0.0858
BasicBlock-43 None None [[1, 512, 4, 4]] 32,768 0 0.0725
Conv2d-44 [512, 512, 3, 3] None [[1, 512, 4, 4]] 32,768 2,359,296 0.1118
BatchNorm2d-45 [512] [512] [[1, 512, 4, 4]] 32,768 1,024 0.0861
Conv2d-46 [512, 512, 3, 3] None [[1, 512, 4, 4]] 32,768 2,359,296 0.1359
BatchNorm2d-47 [512] [512] [[1, 512, 4, 4]] 32,768 1,024 0.0889
BasicBlock-48 None None [[1, 512, 4, 4]] 32,768 0 0.0758
Linear-49 [100, 512] [100] [[1, 100]] 400 51,300 0.1304
ResNet-50 None None [[1, 100]] 400 0 0.0515
--------------------------------------------------------------------------------------------------------------------------------------------
+---------------------------------------------------------------+
| Neutrino Model Profiler |
+-----------------------------------------+---------------------+
| Param Name (Original Model) | Value|
| Backend: TorchBackend | |
+-----------------------------------------+---------------------+
| Evaluation Metric (%) | 76.8295|
| Model Size (MB) | 42.8014|
| Computational Complexity (GigaMACs) | 0.5567|
| Total Parameters (Millions) | 11.2201|
| Memory Footprint (MB) | 48.4389|
| Execution Time (ms) | 2.6537|
+-----------------------------------------+---------------------+
TensorFlow Example
import numpy as np
import TensorFlow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
try:
tf.compat.v1.enable_eager_execution()
except Exception:
pass
from neutrino.framework.tf_profiler.tf_profiler import TFProfiler
from neutrino.framework.tf_profiler.tf_profiler import *
from neutrino.framework.profiler import Device, ComputeEvalMetric
from neutrino.framework.tf_profiler.tf_inference import get_accuracy, get_missclass
# Step 1: Define TensorFlow dataloaders and model (tf.data.Dataset)
# 1a. data_splits = {"train": train_dataloder, "test": test_dataloader}
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar100.load_data()
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
y_train = np.eye(100)[y_train.reshape(-1)]
y_test = np.eye(100)[y_test.reshape(-1)]
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) \
.shuffle(buffer_size=x_train.shape[0]) \
.batch(128)
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)) \
.batch(128)
data_splits = {'train': train_dataset, 'test': test_dataset}
# 1b. Load the TensorFlow Keras model: Transfer learning from pretrained model
preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input#tf.keras.applications.vgg19.preprocess_input
base_model = tf.keras.applications.VGG19(input_shape=(32, 32, 3),
include_top=False,
weights='imagenet')
inputs = tf.keras.Input(shape=(32, 32, 3))
x = preprocess_input(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.Dense(100)(x)
model = tf.keras.Model(inputs, outputs)
model.compile(loss=tf.keras.losses.CategoricalCrossentropy(), optimizer=tf.keras.optimizers.SGD(), metrics=['accuracy'])
# Step 2: Create Profiler class and register the profiling functions
data_loader = TFProfiler.enable_forward_pass_data_splits(data_splits)
profiler = TFProfiler(model, data_loader, name="Original Model")
profiler.register_profiler_function(ComputeFlops())
profiler.register_profiler_function(ComputeSize())
profiler.register_profiler_function(ComputeParams())
profiler.register_profiler_function(ComputeLayerwiseSummary())
profiler.register_profiler_function(ComputeExecutionTime())
profiler.register_profiler_function(ComputeEvalMetric(get_accuracy, 'accuracy', unit_name='%'))
# Step 3: Compute the registered profiler metrics for the TensorFlow Keras Model
profiler.compute_network_status(batch_size=1, device=Device.GPU, short_print=False,
include_weights=True, print_mode='debug')
Output of the above code is as follows:
Computing network status...
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 32, 32, 3)] 0
_________________________________________________________________
tf_op_layer_tf_op_layer_true (None, 32, 32, 3) 0
_________________________________________________________________
tf_op_layer_tf_op_layer_sub (None, 32, 32, 3) 0
_________________________________________________________________
vgg19 (Model) (None, 1, 1, 512) 20024384
_________________________________________________________________
global_average_pooling2d (Gl (None, 512) 0
_________________________________________________________________
dropout (Dropout) (None, 512) 0
_________________________________________________________________
dense (Dense) (None, 100) 51300
=================================================================
+---------------------------------------------------------------+
| Neutrino Model Profiler |
+-----------------------------------------+---------------------+
| Param Name (Original Model) | Value|
| Backend: TFBackend | |
+-----------------------------------------+---------------------+
| Evaluation Metric (%) | 1.0000|
| Model Size (MB) | 76.5827|
| Computational Complexity (GigaMACs) | 0.4185|
| Total Parameters (Millions) | 20.0757|
| Memory Footprint (MB) | 76.6358|
| Execution Time (ms) | 3.1892|
+-----------------------------------------+---------------------+
Contribute a Custom Metric
Note
If you looking for an SDK documentation, please head over here.
The following section describes a conceptual overview on the workings of deeplite-profiler
. This help in adding and/or contributing custom performance metrics while profiling deep learning models. The core of deeplite-profiler
is made up of three main classes, Profiler
itself, ProfilerFunction
and StatusKey
. To be able to use a custom profiling metric, the following three steps must be performed:
Create a new StatusKey
Consider a custom metric to be profiled. All the properties of this metric are captured in the StatusKey
class and
its children. From the Profiler
’s perspective, think of a StatusKey
as a bit more fancy
dict
entry. Thus, StatusKey
requires a hashable key and will be assigned a value after its associated ProfilerFunction
computes it. Its hashable key (usually a str
) is the
class attribute NAME
.
Its most used abstract child is the Metric
class. A metric object is composed of five methods:
description
, a string description of the metric to be computedfriendly_name
, a human friendly name as opposed to the hashable key. For example, Computational Complexity is a human friendly name as opposed to flops being the hashable key.get_units
, units for the measuring themetric. For example, the unit of size is bytes, the unit of accuracy is %, and the unit of execution time is secondsget_comparative
, a way to compare two metrics together. It captures the sense of better or worse. For example, flops should be smaller, accuracy should be higher. Refer to this enum Comparitive for more information.get_value
, how to report the value of the metric, and is often used to take care of the scaling in the value. For example, it might be better to report “1 megabytes” rather than “1000000 bytes”
Understandably, these methods have been designed for displaying and reporting the values in mind. This is essential for achieving and presenting a good, readable output of the profiled metrics.
An example Metric
is shown below. A list of other existing Metrics
in deeplite-profiler
can be here.
class Flops(Metric):
NAME = 'flops'
@staticmethod
def description():
return "Summation of Multiply-Add Cumulations (MACs) per single image (batch_size=1)"
@staticmethod
def friendly_name():
return 'Computational Complexity'
def get_comparative(self):
return Comparative.RECIPROCAL
def get_value(self):
if self.value > 1e-3:
return self.value
if self.value > 1e-6:
return self.value * 1024
return self.value * (1024 ** 2)
def get_units(self):
if self.value > 1e-3:
return 'GigaMACs'
if self.value > 1e-6:
return 'MegaMACs'
return 'KiloMACs'
Write a ProfilerFunction
The job of the ProfilerFunction
class is to compute the StatusKey
. This is a function and therefore a callable which will be called by the Profiler
upon a status request. There are two things to know about this object,
First, it needs to inform the profiler which
StatusKey
it is currently computing. AProfilerFunction
can compute more than oneStatusKey
. This information is provided by the return tuple of itsget_bounded_status_key
method.Second, its
__call__
signature does not have full arbitrary freedom of any callable. Indeed, since theProfiler
object does not know in advance which functions it will have, it tries to magically pipe arguments to the correct functions. Therefore, the__call__
signature of aProfilerFunction
instance cannot have *args or **kwargs in it. It is very important to note, that the instance ofProfilerFunction
cannot have **kwargs. The emphasis is on the instance because it allows at least to have mid level interfaces that can contain *args and **kwargs since they wont be instantiated.
Some example ProfilerFunction
written in PyTorch can be found here and some examples written in TensorFlow can be found here
Registering with the Profiler
A Profiler
object is originally empty. One needs to register ProfilerFunction
in order to start
using it. Registering functions will build the internal dict-like structure that maps StatusKey
to
ProfilerFunction
. The method register_profiler_function
also exhibits an optional override
parameter. This is to make sure the ProfilerFunction
you are currently registering takes precedence
over previously registered functions.
For example, a Profiler
object is instantiated and all the default functions are registered. In your custom project, you have a special case for computing the model size. You can simply register it with override=True
and this will be taken to compute the model size instead. However, this will raise an error if you are registering against another function
with override=True
over model size since there is now an ambiguous priority clash.
As an example, for using PyTorch a ProfilerFunction
can be registered as follows,
profiler = TorchProfiler(model, data_loader, name="Original Model")
profiler.register_profiler_function(ComputeComplexity()) # Provide your `ProfilerFunction` here
Compute status
The main method to be implemented is the compute_status
, which requires a valid StatusKey
an argument and any additional arguments to be passed to the relevant ProfilerFunction
. Valid here really just means that there was a function registered where its get_bounded_status_key
returned a StatusKey
with
its NAME
being this valid argument.
To compute all the status at once, the class provides the method compute_network_status
. This
takes an arbitrary dict as **kwargs as argument which are all the arguments to be passed to
every ProfilerFunction
being called. This is where the limitation mentioned above on the
__call__
signature of said function makes sense. Internally, the Profiler
does an
introspection of the signature through Python’s standard module inspect in order to pipe
a portion of this **kwargs to each function. If such a function had also **kwargs, there would
be no way to know what portion of the compute_network_status
’s **kwargs to pass.
The Profiler
could not call the function unambiguously.
Compare
A popular use case is to compare two different models (for example, a teacher and a student model) and see their relative performance in terms of different metrics. Profiler
has the method compare
to do just that. It needs another
profiler and accepts an additional **kwargs. Internally it will call compute_network_status
on
both profiler and uses a display function to provide the comparison report.
Please look at some existing examples here showing how to register profiling functions and use them to compute network status.