class DataLoader(dataset, sampler=None, transform=None, collator=None, num_workers=0, timeout=0, preload=False, parallel_stream=False)[源代码]

Provides a convenient way to iterate on a given dataset. The process is as follows:

flowchart LR Dataset.__len__ -- Sampler --> Indices batch_size -- Sampler --> Indices Indices -- Dataset.__getitem__ --> Samples Samples -- Transform + Collator --> mini-batch

DataLoader combines a Dataset with Sampler, Transform and Collator, make it flexible to get minibatch continually from a dataset. See 使用 Data 构建输入 Pipeline for more details.

  • dataset (Dataset) – 需要从中分批加载的数据集。

  • sampler (Optional[Sampler]) – defines the strategy to sample data from the dataset. If None, it will sequentially sample from the dataset one by one.

  • transform (Optional[Transform]) – defined the transforming strategy for a sampled batch.

  • collator (Optional[Collator]) – defined the merging strategy for a transformed batch.

  • num_workers (int) – 加载、转换和整理批次的子进程数量。 0 表示使用单进程。 默认:0

  • timeout (int) – 如果为正,则表示从 worker 那里收集批次的超时值(秒)。 默认:0

  • preload (bool) – whether to enable the preloading strategy of the dataloader. When enabling, the dataloader will preload one batch to the device memory to speed up the whole training process.

  • parallel_stream (bool) – whether to splitting workload across all workers when dataset is streamdataset and num_workers > 0. When enabling, each worker will collect data from different dataset in order to speed up the whole loading process. See ref:streamdataset-example for more details

The effect of enabling preload

  • All elements in map, list, and tuple will be converted to Tensor by preloading, and you will get Tensor instead of the original Numpy array or Python built-in data structrure.

  • Tensors’ host2device copy and device kernel execution will be overlapped, which will improve the training speed at the cost of higher device memory usage (due to one more batch data on device memory). This feature saves more time when your NN training time is short or your machine’s host PCIe bandwidth for each device is low.