Model Deployment Overview and Process Recommendations#

After using MegEngine to complete the model training process, in order for the model to realize its value, we need to “deploy” the model, that is, use the model for inference under the constraints of specific hardware devices and system environments.

Depending on the final deployment device, we may go through different deployment:

computing hardware

Example

Applicable scene

Devices with a Python environment

GPU server

I want it to be as simple as possible, and I don’t care about Python performance limitations

Devices for the C/C++ environment

Any device, especially embedded chips, TEE environments, etc.

I hope the performance is as high as possible, the resource usage is low, and I can accept the complexity of compiling C++ libraries

NPU

Atlas / RockChip / Cambrian and other chips

Need to use the computing power of NPU, accept slightly complicated conversion steps

In the following flow chart, you can understand several basic steps in different deployment:.

graph LR training_code[训练代码] ==> |tm.trace_module| tm_file[.tm 文件] training_code .-> |dump| mge_file tm_file ==> |dump| mge_file[.mge 文件] mge_file ==> |load| litepy[Lite Python 运行时] mge_file ==> |load| lite[Lite C++ 运行时] tm_file -- mge_convert --> otherformat[其他格式: ONNX/TFLite/Caffe] -- NPU 厂商转换器 --> NPU tm_file -- mge_convert 自带 NPU 转换器 --> NPU

Note

In order to better choose model deployment, you need to know the following points:

  • The most recommended route is training code -> .tm file -> .mge file -> Lite execution;

  • If there is a researcher/engineering division of labor in your team, it is recommended to use the .tm file as the interface - the researcher is responsible for delivering the .tm model (permanently archived), and the engineering staff is responsible for the subsequent deployment process ;

  • If you are solely responsible for the complete training-to-deployment process and don’t care about archiving the model long-term. For convenience, the .mge file (ie the dashed line above) can be generated directly from the training code, and the results are equivalent.