megengine.quantization#

Note

import megengine.quantization as Q

model = ... # The pre-trained float model that needs to be quantified

Q.quantize_qat(model, qconfig=...) #

for _ in range(...):
    train(model)

Q.quantize(model)

具体用法说明请参考用户指南页面 —— Quantization

量化配置 QConfig#

QConfig

A config class indicating how to do quantize toward QATModule 's activation and weight.

可用预设配置#

min_max_fakequant_qconfig

使用 MinMaxObserverFakeQuantize 预设。

ema_fakequant_qconfig

使用 ExponentialMovingAverageObserverFakeQuantize 预设。

sync_ema_fakequant_qconfig

使用 SyncExponentialMovingAverageObserverFakeQuantize 的预设。

ema_lowbit_fakequant_qconfig

使用 ExponentialMovingAverageObserverFakeQuantize 且数值类型为 qint4 的预设。

calibration_qconfig

对激活值使用 HistogramObserver 进行后量化(无 FakeQuantize )的预设。

tqt_qconfig

使用 TQT 进行假量化的预设。

passive_qconfig

使用 PassiveObserverFakeQuantize 的预设。

easyquant_qconfig

用于 easyquant 算法的 QConfig,等价于 passive_qconfig.

Observer#

Observer

A base class for Observer Module.

MinMaxObserver

A Observer Module records input tensor's running min and max values to calc scale.

SyncMinMaxObserver

A distributed version of MinMaxObserver.

ExponentialMovingAverageObserver

A MinMaxObserver with momentum support for min/max updating.

SyncExponentialMovingAverageObserver

A distributed version of ExponentialMovingAverageObserver.

HistogramObserver

A MinMaxObserver using running histogram of tensor values for min/max updating.

PassiveObserver

An Observer that supports setting scale directly.

FakeQuantize#

FakeQuantize

A module to do quant and dequant according to observer's scale and zero_point.

TQT

TQT: https://arxiv.org/abs/1903.08066 Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks.

LSQ

LSQ: https://arxiv.org/pdf/1902.08153.pdf Estimating and scaling the task loss gradient at each weight and activation layer's quantizer step size

量化操作#

quantize_qat

Recursively convert float Module to QATModule through apply and set qconfig relatively.

quantize

Recursively convert QATModule to QuantizedModule through apply.

apply_easy_quant

Implementation of EasyQuant: https://arxiv.org/pdf/2006.16669.

enable_fake_quant

Recursively enable module fake quantization in QATModule through apply

disable_fake_quant

Recursively disable module fake quantization in QATModule through apply

enable_observer

Recursively enable module observer in QATModule through apply

disable_observer

Recursively disable module observer in QATModule through apply

propagate_qconfig

Recursively set module's qconfig through apply.

reset_qconfig

Reset _FakeQuantize and Observer according to qconfig

Utils#

QParams

To standardize FakeQuant, Observer and Tensor's qparams format.

QuantMode

Quantization mode enumerate class.

create_qparams

type mode:

QuantMode

fake_quant_bias

Apply fake quantization to bias, with the special scale from input tensor and weight tensor, the quantized type set to qint32 also.

fake_quant_tensor

Apply fake quantization to the inp tensor.