This object implements a dataset using the torch::dataset()
framework.
It loads all the images in torch tensors, as well as the labels.
Usage
captcha_dataset(
root,
transform_image = captcha::captcha_transform_image,
transform_label = captcha::captcha_transform_label,
augmentation = NULL
)
Arguments
- root
(string): root directory where the files are stored
- transform_image
(callable, optional): A function/transform that takes in an file path and returns an torch tensor prepared to feed the model. By default, uses the
captcha_transform_image()
function.- transform_label
(callable, optional): A function/transform that takes in the file paths and transform them. By default, uses the
captcha_transform_label()
function.- augmentation
(function, optional) If not
NULL
, applies a function to augment data with randomized preprocessing layers.This is an object of class
dataset_generator
created usingtorch::dataset()
function. It has ainitialize()
method that takes a directory containing the input images, then assigns all the information in-memory with the array data structure for the response variable. It also has a.getitem()
method that correctly extracts one observation of the dataset in this data structure, and a.length()
method that correctly calculates the number of Captchas of the dataset.The function calculates the vocabulary based on the identified values in the dataset.
Examples
if (torch::torch_is_installed()) {
annotated_folder <- system.file(
"examples/annotated_captcha",
package = "captcha"
)
suppressMessages({
ds <- captcha_dataset(annotated_folder)
})
# gets the first item (the only item in the example)
# returns a list with x and y torch tensors.
ds$.getitem(1)
}