Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

fairseq_dataset.py 1.8 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
  1. # Copyright (c) 2017-present, Facebook, Inc.
  2. # All rights reserved.
  3. #
  4. # This source code is licensed under the license found in the LICENSE file in
  5. # the root directory of this source tree. An additional grant of patent rights
  6. # can be found in the PATENTS file in the same directory.
  7. import torch.utils.data
  8. class FairseqDataset(torch.utils.data.Dataset):
  9. """A dataset that provides helpers for batching."""
  10. def __getitem__(self, index):
  11. raise NotImplementedError
  12. def __len__(self):
  13. raise NotImplementedError
  14. def collater(self, samples):
  15. """Merge a list of samples to form a mini-batch.
  16. Args:
  17. samples (List[int]): sample indices to collate
  18. Returns:
  19. dict: a mini-batch suitable for forwarding with a Model
  20. """
  21. raise NotImplementedError
  22. def get_dummy_batch(self, num_tokens, max_positions):
  23. """Return a dummy batch with a given number of tokens."""
  24. raise NotImplementedError
  25. def num_tokens(self, index):
  26. """Return the number of tokens in a sample. This value is used to
  27. enforce ``--max-tokens`` during batching."""
  28. raise NotImplementedError
  29. def size(self, index):
  30. """Return an example's size as a float or tuple. This value is used when
  31. filtering a dataset with ``--max-positions``."""
  32. raise NotImplementedError
  33. def ordered_indices(self):
  34. """Return an ordered list of indices. Batches will be constructed based
  35. on this order."""
  36. raise NotImplementedError
  37. @property
  38. def supports_prefetch(self):
  39. """Whether this dataset supports prefetching."""
  40. return False
  41. def prefetch(self, indices):
  42. """Prefetch the data required for this epoch."""
  43. raise NotImplementedError
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...