-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For how long does the meta batch data loader iterate? #69
Comments
Like any vanilla PyTorch dataloader, the dataloader has size However since the dataloader is a combinatorially large, it is not recommended to loop over the whole dataloader (and reaching the I am closing this issue, because the example is working as intended. Feel free to re-open it if you still get |
HI Tristan, thanks for your reply. It was very helpful. But I am still confused. When I do:
With
Where is that number coming from? What confuses me is that it is NOT infinity. In standard training (when meta-learning is not involved) we usually have the data loader go through the data-set entirely (one epoch). However, in the N-way, K-shot classification to form 1 (meta) batch we sample N classes and K images for each (plus K_eval for the query set) task we sample. This means that in principle the number of tasks we can generate are essentially infinite since the task we can do are an infinite combination (at least in principle). So what I am confused os where Can you clarify that? |
I don't know where this number comes from. When I tried it on my end I'm getting diff --git a/examples/maml-higher/train.py b/examples/maml-higher/train.py
index 71634d8..1d5abd3 100644
--- a/examples/maml-higher/train.py
+++ b/examples/maml-higher/train.py
@@ -82,7 +82,7 @@ def train(args):
meta_train=True,
download=args.download)
dataloader = BatchMetaDataLoader(dataset,
- batch_size=args.batch_size,
+ batch_size=1,
shuffle=True,
num_workers=args.num_workers)
@@ -94,6 +94,8 @@ def train(args):
inner_optimiser = torch.optim.SGD(model.parameters(), lr=args.step_size)
meta_optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+ print(f'len(dataloader) = {len(dataloader)}')
+ import pdb; pdb.set_trace()
# Training loop
with tqdm(dataloader, total=args.num_batches) as pbar:
for batch_idx, batch in enumerate(pbar):
Can you provide a minimal example where
The number of tasks is not infinite, but it is combinatorially large. For example, in Omniglot 5-way classification, the number of possible tasks is |
Hi tristan, thnx for your help!
That number comes from miniimagnet from the code you provide.
I will provide code as soon as Im on my computer.
…Sent from my iPhone
On Jul 8, 2020, at 6:15 AM, Tristan Deleu ***@***.***> wrote:
Reopened #69.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
this was a really useful discussion thanks Trist. I am curious, what is the formula for the size of the dataloader (i.e. number of tasks)? I was thinking of something using the choose function CN_Ci choose N(K+K_eval) where C is the total # of labels (e.g. 64 for mini-imagenet and N_Ci number of examples per label e.g. 600 images for each label). But that formula seems to give me a number that is much larger than I wanted. Do you have an actual formula for calculating this length? |
There was indeed a bug in the way the length of the dataset was computed, thank you! I have fixed it in Torchmeta 1.5.2.
Where
That is for 5-way classification tasks on Omniglot from the meta-train split, there are in total
And you can verify it with from torchmeta.datasets.helpers import omniglot
from scipy.special import comb
dataset = omniglot('data', ways=5, shots=5, meta_train=True)
print(len(dataset)) # 9772996309770512
print(comb(1028 * 4, 5, exact=True)) # 9772996309770512 |
@tristandeleu thanks for checking that out! Will read through the details to help you double check it later. For now though, I was wondering if the meta-loaders worked as expected for regression (my believe is that they do not but I might have yet another misunderstanding of how looping through tasks happens/dataloaders happens). I noticed that if I make 500 tasks (functions) with 700 examples, then the torchmeta dataloader makes 500 loops/iterations. I would have expected many more iterations e.g. there are 5+15=20 potential examples in 1 task and with 700 examples there seems to be many missing. Perhaps something like ~ C(500*700, 20) is what I more or less expected (or at least that's a very rough estimate). But whatever it is it's definitively more than 500. Can you take a look at that for me, please? Thanks for sharing you great library btw! |
The way it works in Torchmeta (for both regression and classification tasks), tasks and datasets have a one-to-one correspondance, to ensure reproducibility. This means that a specific task will be associated to one and only one dataset: if you sample the same task twice, you'll get the same train/test datasets. Concretely for regression tasks, this means that even though you ask for 700 examples for a specific task/function, only the first* 5+15 (if If you want to have different samples for the same function, one way to do it would be to sample the same function multiple times when the tasks are created (e.g. here for *Not necessarily the first 20 as in |
using this scheme, did you try reproducing MAML and get it's ~63% accuracy on mini-imagenet. I am just worried that this isn't the way most standard meta-learning algorithms mean by "episodic training". The use of the same 20 images during training seems to me a strong underutilization of the meta-train set during training. |
why is my augmented dataloader the same length as the normal one?
you can reproduce with this:
I made sure to have torchmeta 1.5.3 when I did |
Yes I have been able to reproduce the 63% accuracy with MAML on miniImageNet (5-shot, 5-way) using this data-loading scheme. Keep in mind that the number of possible tasks is combinatorially large, so the probability of sampling the same task twice is very small (e.g. for the meta-train split of miniImageNet, and 5-way classification tasks, you have over 7.6M tasks), meaning that this is very similar to the behavior you'd see if you were to sample the tasks on the fly (which is probably what you mean by "standard") because this sampling scheme also has very little chance to sample the same task twice. Also note that it doesn't mean that the same 20 images are always used for all tasks involving a specific class. See the second part of #67 (comment) for an example. In that sense, when Although I wouldn't recommend it, you can bypass this behavior if you want by setting the hash of the task to a random number (see #95 (comment))
You don't have any
Here is an example of what from torchmeta.datasets.helpers import omniglot
from PIL import Image
dataset = omniglot('data', shots=1, ways=2, meta_train=True,
transform=None, target_transform=None)
# There are 1028 classes in Omniglot's meta-train split. We offset
# by this much to get the same label with a different rotation.
task = dataset[(0, 1028 + 0)]
train = task['train']
print(f'Number of examples in training set: {len(train)}')
img_0, target_0 = train[0]
img_1, target_1 = train[1]
print(f'Target of sample 0: {target_0}')
print(f'Target of sample 1: {target_1}')
# For display
img = Image.new(img_0.mode, (2 * img_0.width, img_0.height))
img.paste(img_0, (0, 0))
img.paste(img_1, (img_0.width, 0))
img.show() |
@tristandeleu in my regression tasks I don't have access to many tasks/functions and they are not being created on the fly like yours where they are basically unbounded. It's a small size of tasks (well 500 seems pretty large to me already) and it's easy for me to go through all the tasks with the meta data loader (it doesn't combinatorially explode since for regression there is no choose function going on). I can't create them from scratch as you are suggesting. Thus, when the next epoch starts, does my meta-learner see different x values (I do have many other x values which I'd like it to see if possible)? In summary: What I need is that when the same task (function) is sampled in a different epoch that the examples are also different. How is that possible for torchmeta? maybe a simple flag or does it just do it automagically? |
There is no flag to enable that unfortunately. But in your case since this your own dataset, you can write the class SinusoidTask(Task):
# Other functions __init__, __getitem__ and __len__
def __hash__(self):
return random.randrange(1 << 32) Let me know if this works! |
@tristandeleu will let you know! I also wanted to do this, have N-way K-shot of size 64-way, 5-shot. But if the dataloader is always giving me the same set of examples (since we only have 1 task) that is no good. How do I get different examples in this case? |
You can probably do the same thing as in #69 (comment). But really if you have a 64-way problem you are likely to get a very large number of possible tasks, so this wouldn't be necessary (for the reasons explained in #69 (comment)). Of course this depends on the number of classes you have in your meta-split, but for example in the meta-train split of Omniglot, if you plan on having 64-way tasks this means |
What I want is to use all the labels at once. e.g. as the authors do here: https://arxiv.org/abs/1801.05401. I want my dataloader to do this:
I also want to always have an output of 64 but get 5 classes (this different episodic manner that Y.Wang does), for this one I think innate pytorch might solve it for me. The first one is probably changing the hash function? (to guarantee different examples) Since I always do "1 epoch" according to the definition you have (btw your definitions are well motivated, I'm just trying some different things). |
Oh sorry I was confused about the setting. In case you want to use all the labels at once, this would probably require a fair amount of work. I think one big issue you might run into is that the meta-dataset would have a single task (even though you are doing this random I don't think this would be easy with the current tools in Torchmeta. One way I can think about doing that is to have explicitly an index per possible dataset, making the length of the meta-dataset something like
With an appropriate indexing scheme this could be possible: indexes could be tuples of length 64, each element being of length k (for k-shot). You can take inspiration from the way indexes work for ((1, 2, 3, 4, 5), (2, 4, 6, 8, 10), ...) # of length 64 meaning that you'd select images with indices You would also need to create custom components for a number of things for compatibility though, most importantly the |
do you think it's just simpler for me to implement my own dataloader? I was hoping not to do that. e.g. if I have a meta-batch size of B and N way and total labels C and k-shot, then I'd have a final batch of data of the following:
[BNK, CHW] I was hoping to use this using your library so that I don't have to re-implement this for each data set I use (since you already have a really nice set of datasets available). If you have an idea how to implement this so all your data sets work let me know! Even just a rough outline would be great! :) BTw, thank for all the discussion you've already had with me and great library! It's impressive work. |
To make the problem simpler first, I think the above should be easy when C=N i.e. number of total classes is equal to the N-way. So for that we sample a single task and make sure the hash trick is turned on (so that distinct examples are always given each epoch, since in this case we'd go through the whole data loader at once according to the data loaders definition). Then that at least gives us a 64-way 64-label task. Only a single one of course. Not sure if it's possible to get more at once or extract the 5 we'd need from that...hmm... will think about it. |
If If |
How would I test if this works? |
@tristandeleu I don't think a the Do you have thoughts?
output:
|
This is due to two things (possibly three):
The best test you can do is to run your code twice (ensuring that from torchmeta.toy.helpers import sinusoid
from torchmeta.utils.data import BatchMetaDataLoader
batch_size = 16
shots = 5
test_shots = 15
# Seed the dataset with `seed = 0`
dataset = sinusoid(shots=shots, test_shots=test_shots, seed=0)
# `num_workers = 0` to avoid stochasticity of multiple processes
dataloader = BatchMetaDataLoader(dataset, batch_size=batch_size,
shuffle=False, num_workers=0)
batch = next(iter(dataloader))
inputs, _ = batch['train']
print(f'Sum of inputs: {inputs.sum()}') If you run the code twice, you should get the same result (for me it's That random |
I don't think I see the random hash trick working for me. Is it working for you?
|
Anyway, it seems I don't actually need your random hash trick (@tristandeleu please confirm me if this is right), what I want is randomness not determinism (of the data not the tasks/classes). Just removing the seed input is enough (I also set num workers >0 since that's the setting closest to my real code). Output to confirm this:
|
The random hash trick I was talking about would be at the level of the If having |
thanks for all the help trist! I really appreciate it. Hopefully last question (just to make sure it is all good), what did you have in mind the hash trick would solve? (or what I was trying to do) again thanks for your willingness to discuss and for your great dataloader! |
The random hash trick would still allow you to do task reproducibility, where if you run the same code twice with the random seed fixed, you'll get the same tasks (e.g. same amplitudes/phases for sinusoid). The key difference is that
Note that this is when you run the same code twice, not if you call the same task twice within your code. Something I overlooked, and I corrected in #69 (comment) is that if you call the same task twice in your code, you do get different data because |
Usually in python iterators stop when the StopIteration exception is raised.
But I saw that the length of the data loader is a strange number (where I expected infinity since it's usually up to the user how many episodes they want to do, which usually corresponds to a batch of tasks sampled).
So when does the data loader stop?
Code I am referencing is from the example: https://github.com/tristandeleu/pytorch-meta/blob/master/examples/maml-higher/train.py
is
args.num_batches
the same as thenumber of episodes
?the weird size I mentioned:
The text was updated successfully, but these errors were encountered: