Convert dynamic graphs to static graphs (Trace)#

Note

一般的模型开发阶段推荐用动态图,容易进行调试。

Use the trace decorator#

MegEngine provides a very convenient method of dynamic and static image conversion, which can be converted almost without code changes.

Suppose we have written a dynamic graph code, the training part of the code is as follows:

for epoch in range(total_epochs):
    total_loss = 0
    for step, (batch_data, batch_label) in enumerate(dataloader):
        data = mge.tensor(batch_data)
        label = mge.tensor(batch_label)

        with gm:
            logits = model(data)
            loss = F.loss.cross_entropy(logits, label)
            gm.backward(loss)
            optimizer.step().clear_grad()

        total_loss += loss.numpy().item()
    print("epoch: {}, loss {}".format(epoch, total_loss/len(dataloader)))

We can convert the following three steps above dynamic map static map:

  1. 将循环内的前向计算、反向传播和参数优化代码提取成单独的函数,如下面例子中的 train_func()

  2. 将网络所需输入作为训练函数的参数,并返回任意你需要的结果(如输出结果、损失函数值等);

  3. Use the trace decorator in the jit module to decorate this function and change the code in it into static graph code.

The modified code is as follows:

from megengine.jit import trace

@trace
def train_func(data, label, *, opt, gm, net):
    with gm:
        logits = model(data)
        loss = F.loss.cross_entropy(logits, label)
        gm.backward(loss)
        opt.step().clear_grad()
    return loss

for epoch in range(total_epochs):
    total_loss = 0
    for step, (batch_data, batch_label) in enumerate(dataloader):
        data = mge.tensor(batch_data)
        label = mge.tensor(batch_label)

        loss = train_func(data, label, opt=optimizer, gm=gm, net=model)
        total_loss += loss.numpy().item()
    print("epoch: {}, loss {}".format(epoch, total_loss/len(dataloader)))

For the above code, we will further explain:

  • jit : Abbreviation for Just-in-time compilation, here as the name of the module related to the entire static image.

  • trace :is a way to get a static graph, literally translated as “tracing”, meaning that the overall calculation graph is obtained by tracing the network structure that the output (such as loss value, predicted value) depends on, and then compiling.

  • Parameter list : trace will take different processing methods according to whether the input parameter is a positional parameter or a keyword parameter when compiling a static graph. The position parameter is used to pass in the input of the network, such as data and labels, and the keyword parameter is used to pass in other variables, such as the network and optimizer.

Advanced trace settings#

Specify the static graph construction method#

MegEngine has two modes of “dynamic construction” and “static construction” when compiling static graphs (the former is used by default).

In most cases, there is no difference between the static images constructed in the two modes, and there is no difference in use.

We can specify the ``symbolic’’ parameter to specify the construction method, the sample code is as follows:

from megengine.jit import trace

@trace(symbolic=True)
def train_func(data, label, *, opt, gm, net):
    pass

Set to True to mean “static construction” or “construction based on symbols” ——

  • Principle: At this time, all data nodes (ie tensors) in the calculation graph are regarded as symbols (ie symbolic). They are only used as placeholders and do not generate actual memory allocations, nor do they have actual values. At this time, the compilation process of the calculation graph depends entirely on the structure of the calculation graph, and not on the specific value of the tensor, which is truly “static”.

  • Advantage: Always efficient, can make full use of the memory optimization of static images.

  • Disadvantages: If the network contains conditional statements that require runtime dynamic information to be calculated, it will fail.

Set to False means “dynamic construction” or “construction based on value” ——

  • Principle: When the decorated function is called for the first time, it performs a calculation based on the input data to construct a dynamic graph. Then the dynamic graph will be compiled into a static graph. All subsequent calls to this function will run this static graph, instead of relying on the value entered during the call. This mode can be regarded as “dynamic construction for the first time, then static operation”.

  • Advantages: According to the different information in the first run, different static graphs can be constructed.

  • Disadvantages: Since the first run is in the dynamic graph mode, the memory optimization of the static graph cannot be used, and it usually consumes more memory. This may cause the network that can be run in static graph mode to fail due to insufficient memory the first time it runs.

Warning

In the dynamic construction mode (set to False), if the conditional statement appears in the loop statement, the static graph constructed in the first execution of the loop will be fixed and no longer change (even in the subsequent execution of the loop, the conditional statement The result of has changed)

Fix parameters for export#

Sometimes we want to solidify some parameters (such as the convolution kernel of the convolutional layer, etc.), so we need to specify ``capture_as_const = True’’:

from megengine.jit import trace

@trace(capture_as_const = True)
def train_func(data, label, *, opt, gm, net):
    pass

Note

If you want to use: py:meth:~.jit.trace.dump to export the model serialization file and perform subsequent processing, you must fix the parameters when: py:class:~.jit.trace.

Reduce memory access operations to achieve acceleration#

Usually, the model contains not only computationally limited operations, but also some memory-restricted operations (such as Elemwsie). MegEngine has an embedded Codegen optimization mechanism, which can merge multiple operations in the model at runtime, and generate The code running on the target machine can reduce the memory access operation and achieve the purpose of acceleration.

Note

MegEngine 的 Codegen 目前集成了三种后端,分别是 NVRTC, HALIDE 和 MLIR. 其中 NVRTC 和 HALIDE 仅支持在 GPU 上使用,MLIR 则同时支持 GPU 和 CPU, 不同的后端生成代码的策略有所不同,所以运行效率也各异。 我们可以通过设置 MGB_JIT_BACKEND 环境变量来改变 Codegen 的后端,例如:

export MGB_JIT_BACKEND="NVRTC"

The possible values of this environment variable in the NVIDIA GPU environment are NVRTC, HALIDE and MLIR, and the default value is HALIDE.

For CPU, only MLIR backend is currently supported.

Warning

If you want to use the MLIR backend, you need to compile MegEngine separately. Change to the following command when using CMake:

cmake .. -DMGE_WITH_JIT=ON -DMGE_WITH_JIT_MLIR=ON -DMGE_WITH_HALIDE=OFF

Then set the following environment variables:

export MGB_JIT_BACKEND="MLIR"

The specified code is not converted#

Use: py:func:~.exclude_from_trace, the code in it will not be traced, and allow access to the static area: py:class:~megengine.Tensor.

The sample code is as:

from megengine import jit, tensor

@jit.trace
def f(x):
    x += 1
    with jit.exclude_from_trace():  # 不对下面的 if 语句进行 trace
        if i % 2 == 0:
            x += 1
    return x

for i in range(3):
    x = tensor([1])
    print(f(x))

Output is:

Tensor([3], dtype=int32, device=xpux:0)
Tensor([2], dtype=int32, device=xpux:0)
Tensor([3], dtype=int32, device=xpux:0)