trace#

class trace(function, symbolic=False, capture_as_const=False, record_only=False, sublinear_memory_config=None, dtr_config=None, profiling=False, opt_level=2, graph_opt_config=None, symbolic_shape=True)[source]#

Wraps a callable and provide:

  • tracing via trace and dump

  • accelerated evalutaion via __call__

Parameters:
  • function – the function will be traced.

  • symbolic – whether to apply symbolic execution for tracing. Default: False

  • capture_as_const – capture global vars or closures as const value. Default: False

  • record_only – if True, won’t run even if call the function. Default: False

  • sublinear_memory_config (Optional[SublinearMemoryConfig]) – configuration for sublinear memory optimization. If not None, it enables sublinear memory optimization with given setting.

  • profiling (bool) – whether to profile compiled trace. Default: False

  • opt_level (int) – optimization level for compiling trace. Default: 2

  • graph_opt_config (Optional[GraphOptimizationConfig]) – configuration for graph optimization. Default: None

  • symbolic_shape (bool) – whether to use symbolic shape for tracing. Default: True

dump(file, *, arg_names=None, output_names=None, append=False, keep_var_name=1, keep_opr_name=False, keep_param_name=False, keep_opr_priority=False, no_change_graph=False, strip_info_file=None, append_json=False, optimize_for_inference=True, user_info=None, enable_metadata=True, input_data=None, repeat=1, silent=False, no_assert=False, maxerr=1e-4, resize_input=False, input_transform=None, dump_format=None, model_version=2, **kwargs)[source]#

Serializes trace to file system.

Parameters:
  • file – output file, could be file object or filename.

  • arg_names – names of the input tensors in the traced function.

  • output_names – names of the output tensors in the traced function, use the default name if not specified.

  • append – whether output is appended to file. Only works when file is str.

  • keep_var_name (int) –

    level for keeping variable names:

    • 0: none of the names are kept

    • 1: (default)keep names of output vars

    • 2: keep names of all (output and internal) vars

  • keep_opr_name (bool) – whether to keep operator names.

  • keep_param_name (bool) – whether to keep param names, so param values can be easily manipulated after loading model

  • keep_opr_priority (bool) – whether to keep priority setting for operators

  • no_change_graph (bool) –

    whether to change the compute graph when dump, for model compatibility, some operators will convert to its compatible format in this version.

    • if set False, some operators maybe convert to other operator for compatibility, all operators will ensure compatibility.

    • if set True, no operator will change in the graph when dump.

  • strip_info_file – a string for path or a file handler. if is not None, then the dump information for code strip would be written to strip_info_file

  • append_json – will be check when strip_info_file is not None. if set true, the information for code strip will be append to strip_info_file. if set false, will rewrite strip_info_file

  • optimize_for_inference – enbale optmizations, will skip all optimize options if this is False. Default: True

  • user_info (Optional[Any]) – any type object, which will be pickled to bytes.

  • enable_metadata (bool) – whether to save metadata into output file.

  • input_data – input test data and current network output would be used as groundtruth. The format is “var0:file0;var1:file1…” to specify data files for input vars. It can also be “#rand(min,max,shape…)” for generating random input data, for example, “#rand(0,255)”, “#rand(0,255,1,3,224,224)” or “#rand(0, 255, 1, …)” where means the remaining part of the original shape. If the shape is not specified, the shape of corresponding input tensors in the network will be used. If there is only one input var, its name can be omitted. Each data file can either be an image which can be loaded by opencv, or a pickled numpy.ndarray. This option can be given multiple times to add multiple testcases. If you start the data with the letter @, the rest should be a filename, and each line in the file should be a single datum in the format described above. NOTE If input_data is not None, you can only use load-and-run to run the output file.

  • repeat – how many times the input image is repeated. Useful when running benchmark for batch size other than one. Have no effect on randomly generated input data.

  • silent – whether set verbose to False in assert_equal opr.

  • no_assert – whether insert assert_equal opr to check result; this option is useful for benchmarking.

  • maxerr – max error for assert_equal check during runtime.

  • resize_input – whether resize input image to fit input var shape.

  • input_transform – a python expression to transform the input data. Example: data / np.std(data)

  • dump_format (Optional[str]) – using different dump formats. the open source MegEngine defaults to the FBS_V2 format, there are two format FBS_V2 and FBS to choose, internal MegEngine have an other choice of internal proprietary formats

  • model_version (int) – the model version of FBS_V2, begin with version 2, this works only when dump format is FBS_V2.

Keyword Arguments:

  • enable_io16xc32 – whether to use float16 for I/O between oprs and use float32 as internal computation precision. Note the output var would be changed to float16.

  • enable_ioc16 – whether to use float16 for both I/O and computation precision.

  • enable_hwcd4 – whether to use NHWCD4 data layout. This is faster on some OpenCL backend.

  • enable_nchw88 – whether to use NCHW88 data layout, currently used in X86 AVX backend.

  • enable_nchw44 – whether to use NCHW44 data layout, currently used in arm backend.

  • enable_nchw44_dot – whether to use NCHW44_dot data layout, currently used in armv8.2+dotprod backend.

  • enable_nchw4 – whether to use NCHW4 data layout, currently used in nvidia backend(based on cudnn).

  • enable_nchw32 – whether to use NCHW32 data layout, currently used in nvidia backend with tensorcore(based on cudnn).

  • enable_chwn4 – whether to use CHWN4 data layout, currently used in nvidia backend with tensorcore.

  • enable_nchw64 – whether to use NCHW64 data layout, used for fast int4 support on Nvidia GPU.

  • enable_fuse_conv_bias_nonlinearity: whether to fuse conv+bias+nonlinearty into one opr.

  • enable_fuse_conv_bias_with_z: whether to fuse conv_bias with z input for inference on nvidia backend(this optimization pass will result in mismatch of the precision of output of training and inference)

  • enable_fuse_preprocess: whether to fuse astypepad_channeldimshuffle and etc opr