Going Deeper with Neutrino
By default, Neutrino is wired for optimizing a classification task that has a fairly simple setup. This imposes tight constraints on the assumed structure of how tensors flow from the data loader, to the model, to the loss function and to the evaluation.
More complex and custom tasks can be supported by Neutrino by following some additional steps. The three main pieces are how to extract what is coming out of the data loader, the loss function and the evaluation function. Finally, we need some information about the optimizer used for the provided pretrained model in order to make it all work.
Customize Forward Pass
One of the most important things when using the engine for customized data and models is to tell it how to extract
the tensors from the data loader and trigger forward passes on the model object. We provide a modular interface
that needs to be implemented only on the non-standard cases. Also, the interface can default to a working implementation
if provided with two keywords, model_input_pattern
and expecting_common_inputs
.
input pattern
By default, we assume that your data loader returns a 2-tuple where the first element goes to the model
and the second element goes to the loss function, we call this the standard (x, y) pattern. If the model
signature of its callable expect more than simply x, this can be made available by providing
a tuple of integers and ‘_’ as value to model_input_pattern
. The integers represent the order on which to pipe
the element of data loader tuple to the model and the ‘_’ are placeholders to ignore.
It is easier to understand with a few examples:
# default classification
model_input_pattern = (0, '_')
x, y = next(dataloader)
logits = model(x)
# example 1
model_input_pattern = (1, 0, '_', 2)
a, b, c, d = next(dataloader)
out = model(b, a, d)
# example 2
model_input_pattern = (0, '_', 1, '_', '_')
a, b, c, d, e = next(dataloader)
out = model(a, c)
common inputs
Another assumption is that the value of expecting_common_input
is True. This translates to the fact that we expect
each element of the data loader tuple to be standard. Standard here means that they are either tensor or
common container types, list, tuple or dict, of tensors. This grants some abilities like being able to automatically
infer the shapes of the input tensors or their numeric types for example. If the output of your data loader is not
standard, then it is required to implement some methods of the interface.
interface
class ForwardPass(ABC):
def __init__(self, model_input_pattern=None, expecting_common_inputs=True):
""" init """
def model_call(self, model, x, device):
"""
Call the model with 'x' extracted from a loader's batch and on device 'device'.
'x' is literally := x = forward_pass.extract_model_inputs(next(dataloader))
Default implementation is provided if the ForwardPass is instantiated expecting common inputs.
"""
def create_random_model_inputs(self, batch_size):
"""
Create a compatible random input of corresponding 'batch_size'. Compatible in the sense that
`model_call` can run without crashing this return value.
Default implementation is provided if the ForwardPass is instantiated expecting common inputs.
"""
def extract_model_inputs(self, batch):
"""
Extract a compatible input from a loader's 'batch'. Compatible in the sense that
`model_call` can run without crashing this return value.
Default implementation is provided if the ForwardPass is instantiated with a pattern.
"""
def get_model_input_shapes(self):
"""
Returns a tuple of all input shapes that are fed to the model.
Default implementation is provided if the ForwardPass is instantiated expecting common inputs.
"""
Note
When subclassing or only using the features model_input_pattern
and expecting_common_inputs
you have to use the framework specific ForwardPass
. An example can be found at the end Wrapping it up together.
The following example shows how to implement the ForwardPass in the case you cannot activate both inner default implementations.
from deeplite.profiler.data_loader import ForwardPass
class ClassificationTorchForwardPass(ForwardPass):
def __init__(self):
super().__init__(model_input_pattern=None, expecting_common_inputs=False)
def model_call(self, model, x, device):
# this is built on the assumption that you know how to call your model.
# imagine here that its like 'def forward(self, x, z)'
x, z = x # this comes from the output of the method `extract_model_inputs`
if device == Device.GPU:
x, z = x.cuda(), z.cuda()
else:
x, z = x.cpu(), z.cpu()
return model(x, z)
def create_random_model_inputs(self, batch_size):
shapes = self.get_model_input_shapes()
return torch.rand(batch_size, *shapes[0]), torch.rand(batch_size, *shapes[1])
def extract_model_inputs(self, batch):
x, y, z = batch
return x, z
def get_model_input_shapes(self):
# imagine your model input data have these shapes
return (3, 32, 32), (100,)
Customize Loss Function
The next class that needs an implementation is the LossFunction
. This is a straightforward interface that needs to be implemented
is __call__
which accepts the model
and a batch
. model
has exactly the same call signature as the one you have provided to the
engine and batch
is an element in the iteration over your data loader. There is much freedom as to what can happen there. It simply needs
to return a dict
of tensors that will be summed or a single tensor to yield the scalar for backprop.
interface
class LossFunction(ABC):
def __init__(self, device=Device.CPU):
self._device = None
self.to_device(device)
def to_device(self, device):
"""
Optionally do something if there is a device switch
"""
self._device = device
@property
def device(self):
return self._device
@abstractmethod
def __call__(self, model, batch):
raise NotImplementedError
The following example shows how to implement CrossEntropy loss function by this interface.
from neutrino.framework.functions import LossFunction
class ClassificationLoss(LossFunction):
def __call__(self, model, batch):
x, y = batch
if self.device == Device.GPU:
x, y = x.cuda(), y.cuda()
else:
x, y = x.cpu(), y.cpu()
out = model(x)
return {'loss': F.cross_entropy(out, y)}
Customize Evaluation Function
The last class that needs an implementation is the EvaluationFunction
. Only the apply
method needs to
be implemented and there is even more flexibility than for the LossFunction
. It receives your model
and your loader
as input and it is expected to return a dict
of metrics you wish to keep track of.
Important
You are free to return multiple evaluation metrics that we are going to report from the evaluation function. However, the engine can only listen to one at a time (this is the value that has to be specified in the config as eval_key.)
interface
class TorchEvaluationFunction(EvaluationFunction):
@abstractmethod
def _compute_inference(self, model, data_loader, device=Device.CPU, transform=None):
raise NotImplementedError("Base class call")
The following example shows how to implement top1 accuracy eval function for classification task for PyTorch (TorchFramework).
from deeplite.torch_profiler.torch_inference import TorchEvaluationFunction
class EvalAccuracy(TorchEvaluationFunction):
def __init__(self, device='cuda'):
self.device = device
def _compute_inference(self, model, data_loader, **kwargs):
total_acc = 0
with torch.no_grad():
for x, y in data_loader:
if self.device == 'cuda':
x, y = x.cuda(), y.cuda()
else:
x, y = x.cpu(), y.cpu()
out = model(x)
out = F.softmax(out, dim=-1)
out = torch.argmax(out, dim=1)
if out.dim() == 1 and y.dim() == 2 and y.shape[1] == 1:
y = y.flatten()
acc = torch.mean((out == y).float())
total_acc += acc.cpu().item()
return {'accuracy': 100. * (total_acc / float(len(data_loader)))}
Customize Optimizer
It is important that the optimizer used to train the model is the same as the one we will use internally. There are two ways to bring your optimizer into the engine:
A
dict
format enables an optimizer directly importable from the framework library. Thedict
needs to have a ‘name’ key that points to the optimizer class to import and all the rest of the items are key value pairs used to instantiate it.Implementing Neutrino’s interface
NativeOptimizerFactory
.
# If you use SGD with 0.1 learning rate, we would need
optimizer = {'name': 'SGD', 'lr': 0.1}
# this allows such a thing to happen:
# from torch.optim import SGD
# opt = SGD(lr=0.1)
# Now an implementation of the interface:
class NativeOptimizerFactory(ABC):
@abstractmethod
def make(self, native_model):
""" Returns a native optimizer object """
# Example
from neutrino.framework.torch_nn import NativeOptimizerFactory
class CustomOptimizerFactory(NativeOptimizerFactory):
def make(self, native_model):
from torch.optim import Adam
return Adam(native_model.parameters(), lr=1e-4, betas=(0.9, 0.9))
optimizer = CustomOptimizerFactory()
Customize Scheduler
It is also possible to provide a scheduler and is recommended to do so if it was used to train the original model.
The scheduler has to be given as a dict
with keys ‘factory’ and ‘interval’.
‘factory’ is the factory pattern to bring in the scheduler which follows the same structure as the optimizer. There are two ways to bring your scheduler into the engine:
A
dict
format enables an scheduler directly importable from the framework library. Thedict
needs to have a ‘name’ key that points to the scheduler class to import and all the rest of the items are key value pairs used to instantiate it.Implementing Neutrino’s interface
NativeSchedulerFactory
.
‘interval’ is a
str
that controls when the scheduler is stepped. The valid values are:‘eval’: Step scheduler after a call to the evaluation function. The Scheduler will receive the evaluation metric when stepped
‘epoch’: Step scheduler after each training epoch
‘iteration’: Step scheduler after each training batch.
# If using the pytorch scheduler that reduces the learning rate by some factor at every patience count.
# Note that this scheduler listens to the evaluation metric (ex.: accuracy) to guide its schedule.
scheduler = {'factory': {'name': 'ReduceLROnPlateau', 'mode': 'max', 'patience': 10, 'factor': 0.2},
'interval': 'eval'}
# Now an implementation of the interface:
class NativeSchedulerFactory(ABC):
@abstractmethod
def make(self, native_optimizer):
""" Returns a native scheduler object """
# Example
from neutrino.framework.torch_nn import NativeSchedulerFactory
class CustomSchedulerFactory(NativeSchedulerFactory):
def make(self, native_optimizer):
from torch.optim.lr_scheduler import MultiplicativeLR
return MultiplicativeLR(native_optimizer, lr_lambda=lambda epoch: 0.95)
scheduler = {'factory': CustomSchedulerFactory(),
'interval': 'epoch'}
Model Averaging
A popular method for improving training of object detection models is exponential model averaging. We support this method within neutrino with a keyword argument to the full_trainer
in the config:
‘ema’ is a
bool
ordict
which enables EMA model averaging during model trainingbool
: Set toTrue
to use default PyTorch EMA configurationdict
: Use EMA with modified parameters ‘decay_rate’ and/or ‘period’. i.e.{'decay_rate': 0.9999, 'period': 2000}
Custom Training Loop
If your model requires complex training methods that are not configurable via our API, you may interface your own training loop code with our optimization engine. Neutrino will take care of the model transformations and the
ExternalTrainingLoop
will control any model training.
Interface
The interface between Neutrino and your training code is implemented through the
ExternalTrainingLoop
. An instance of this class must be initialized in your script with two inputs: a callabletrain_function
and an args dict/Namespacetrain_args
. The loop is passed to Neutrino via the config dict and is used to train the model throughout the optimization process. When it is time to train the model, the train_function will be called with both the model and thetrain_args
passed as inputs.
- class ExternalTrainingLoop
Training loop interface. It accepts a callable
train_function
with signature:trained_model = train_function(model, train_args)
.The
train_args
are given at construction of the object and mostly remain static throughout its lifetime. Care should then be given to setup any globals the training function needs in order to run training multiple times.There are two mandatory modifications that Neutrino needs to be able to do on the way the
train_function
is ran and it has to go through the args:modify_args_for_validation(args, validation)
-> activate or deactivate eval according to bool validationmodify_args_for_epochs(args, epochs)
-> change the number of epochs according to the int epochs
- And one optional:
modify_args_for_finetuning(args)
-> change the args so the loop is in finetuning mode
- copy_args(self)
Copy args. Provides a default implementation through deepcopy if the args are of type
argparse.Namespace
or a simple python dict. It is essential the args can be copied because they could be modified inside the train callable and pollute further repeated calls.
- static _modify_args(args, key, value)
Updates args with key, value pair
- modify_args_for_finetuning(self, args)
Modify args for a loop in finetuning mode. By default it is not implemented and the loop is considered not finetuning enabled.
Note
If finetuning is enabled and requested, this modification to args takes precedence over
modify_args_for_epochs()
.
- modify_args_for_validation(self, args, validation)
Activate or deactivate validation according to
validation
.Default implementation assumes key ‘validation’ in args is the control for activating validation.
return self._modify_args(args, ‘validation’, validation)
- modify_args_for_epochs(self, args, epochs)
Modify the number of epochs the train function will run according to
epochs
.Default implementation assumes key ‘epochs’ in args is the control for changing epochs numbers
- train(self, model, epochs=None, validation=True, finetuning=False)
- Train the model through
train_function
withtrain_args
. The flow is: copy args
if
finetuning
, modify for args for itelif
epochs
is given modify for new number of epochselse the training function should take its default value
modify args for
validation
call
train_function(model, args)
- Train the model through
See below for an example of the interface
from neutrino.training import ExternalTrainingLoop
class MyTrainingLoop(ExternalTrainingLoop):
def modify_args_for_finetuning(self, args):
# example: update scheduler arg for finetuning
return self._modify_args(args, 'scheduler', 'finetune_scheduler')
def modify_args_for_validation(self, args, validation):
# example: translate to 'validate' arg name
return self._modify_args(args, 'validate', validation)
def modify_args_for_epochs(self, args, epochs):
# example: translate to 'train_epochs' arg name
return self._modify_args(args, 'train_epochs', epochs)
my_train_args = my_argparser.parse_args()
my_loop = MyTrainingLoop(my_train_function, my_train_args)
config = {
'optimization': 'quantization',
'task_type': 'classification',
'external_training_loop': my_loop
}
opt_model = Neutrino.Job(
config=config,
...,
).run()
Wrapping it up together
Here is an example of how the call to the engine would be made with some of those specifications. Please notice
that we do not show here all the possibilities in the ForwardPass
object. We only use the
model_input_pattern
(and by default expecting_common_inputs
is True).
from deeplite.torch_profiler.torch_data_loader import TorchForwardPass as FP
from neutrino.framework.torch_framework import TorchFramework
framework = TorchFramework()
forward_pass = FP(model_input_pattern=(0, '_', '_'))
eval_func = MyEvalFunc()
config = {
'deepsearch': args.deepsearch, #(boolean),
'level': args.level, # int {1, 2}
'delta': args.delta, #(between 0 to 100),
'device': args.device, # 'GPU' or 'CPU'
'use_horovod': args.horovod, #(boolean),
'full_trainer': {
'optimizer': {'name': 'SGD', lr: 0.1}, # optimizer in a dict format
'scheduler': {'factory': MySchedulerFactory(), 'interval': 'epoch'}, # scheduler in custom factory format
'epochs': 100, # int for nb of epochs required
'eval_freq': 2, # useful if the evaluation takes a lot of time
'eval_key': 'mykey', # str to take from the dict return by MyEvalFunc
'eval_split': 'test',
}
}
neutrino = Neutrino(framework=framework,
data=data_splits,
model=reference_model,
config=config,
forward_pass=forward_pass,
eval_func=eval_func,
loss_function_cls=MyLoss,
loss_function_kwargs=my_loss_config)
optimized_model = neutrino.run()
Warning
Neutrino tries to do model analysis to help improve metric retention while compressing. Doing so requires turning a PyTorch model into a graph using PyTorch’s own JIT infrastructure. Therefore, the compression capacity of the engine can be dramatically harmed if the model cannot be turned into a PyTorch graph due to JIT’s limitations. One JIT limitations to watch out for is that every torch.nn.Module (i.e.: the model and its intermediate layers) should return Tensor or list/tuple of Tensor.
Warning
Neutrino needs to keep copies of the model in order to test different variants. Therefore, the function deepcopy
of standard python needs to be able to return a copy of the model without crashing. In PyTorch a common pitfall that prevents
deepcopy
is to assign arbitrary Tensor (or something that contains Tensor) as an attribute of the torch.nn.Module.
PyTorch supports deepcopy
only for Parameter tensors, not arbitrary ones.
Important
For object detection and segmentation models, the community version displays the results of the optimization process including all the optimized metric values. To obtain the optimized model produced by Deeplite Neutrino, consider upgrading to the production version. Refer how to upgrade.
Important
Currently, the multi-GPU support is available only for the Production version of Deeplite Neutrino. Refer, how to upgrade.