megengine.quantization#

Note

import megengine.quantization as Q

model = ... # The pre-trained float model that needs to be quantified

Q.quantize_qat(model, qconfig=...) #

for _ in range(...):
    train(model)

Q.quantize(model)

具体用法说明请参考用户指南页面 —— Quantization 。

量化配置 QConfig#

QConfig

A config class indicating how to do quantize toward QATModule 's activation and weight.

可用预设配置#

min_max_fakequant_qconfig: 使用 MinMaxObserver 和 FakeQuantize 预设。
ema_fakequant_qconfig: 使用 ExponentialMovingAverageObserver 和 FakeQuantize 预设。
sync_ema_fakequant_qconfig: 使用 SyncExponentialMovingAverageObserver 和 FakeQuantize 的预设。
ema_lowbit_fakequant_qconfig: 使用 ExponentialMovingAverageObserver 和 FakeQuantize 且数值类型为 qint4 的预设。
calibration_qconfig: 对激活值使用 HistogramObserver 进行后量化（无 FakeQuantize ）的预设。
tqt_qconfig: 使用 TQT 进行假量化的预设。
passive_qconfig: 使用 PassiveObserver 和 FakeQuantize 的预设。
easyquant_qconfig: 用于 easyquant 算法的 QConfig，等价于 passive_qconfig.

Observer#

`Observer`	A base class for Observer Module.
`MinMaxObserver`	A Observer Module records input tensor's running min and max values to calc scale.
`SyncMinMaxObserver`	A distributed version of `MinMaxObserver`.
`ExponentialMovingAverageObserver`	A `MinMaxObserver` with momentum support for min/max updating.
`SyncExponentialMovingAverageObserver`	A distributed version of `ExponentialMovingAverageObserver`.
`HistogramObserver`	A `MinMaxObserver` using running histogram of tensor values for min/max updating.
`PassiveObserver`	An Observer that supports setting `scale` directly.

FakeQuantize#

`FakeQuantize`	A module to do quant and dequant according to observer's scale and zero_point.
`TQT`	TQT: https://arxiv.org/abs/1903.08066 Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks.
`LSQ`	LSQ: https://arxiv.org/pdf/1902.08153.pdf Estimating and scaling the task loss gradient at each weight and activation layer's quantizer step size

量化操作#

`quantize_qat`	Recursively convert float `Module` to `QATModule` through `apply` and set qconfig relatively.
`quantize`	Recursively convert `QATModule` to `QuantizedModule` through `apply`.
`apply_easy_quant`	Implementation of `EasyQuant`: https://arxiv.org/pdf/2006.16669.
`enable_fake_quant`	Recursively enable `module` fake quantization in QATModule through `apply`
`disable_fake_quant`	Recursively disable `module` fake quantization in QATModule through `apply`
`enable_observer`	Recursively enable `module` observer in QATModule through `apply`
`disable_observer`	Recursively disable `module` observer in QATModule through `apply`
`propagate_qconfig`	Recursively set `module`'s qconfig through `apply`.
`reset_qconfig`	Reset `_FakeQuantize` and `Observer` according to `qconfig`

Utils#

`QParams`	To standardize FakeQuant, Observer and Tensor's qparams format.
`QuantMode`	Quantization mode enumerate class.

create_qparams

type mode:: QuantMode

fake_quant_bias

Apply fake quantization to bias, with the special scale from input tensor and weight tensor, the quantized type set to qint32 also.

fake_quant_tensor

Apply fake quantization to the inp tensor.