Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I understand the need to lock this setting away from the average user. Is there any way now to target retention under 70 percent? #718

Open
aedoncassiel opened this issue Jan 3, 2025 · 16 comments
Labels
enhancement New feature or request

Comments

@aedoncassiel
Copy link

aedoncassiel commented Jan 3, 2025

I could have marked this as a question, since I'm happy if there is an existing way to achieve this.

I thoroughly understand that for the vast majority of users, there is never a reason to go below 70% retention.

For most use cases, the gains are minimal, and the risks are high. I also see the need to simplify the program and make it as fool-proof as possible if it's going to achieve more recognition. And there is a lot to gain from seeing it being used more widely. So I sympathize with the reason why more fine-grained control has been removed in the latest updates. I also understand the theoretical likelihood that lower retention can lead to more total time spent on review.

However, it is worth observing that many language learners are in a unique position amongst all users of spaced repetition. The vast majority of Anki usersーeven those preparing for a language examーhave a defined time frame on which they need to have specific items available in memory. On the contrary, many language learners may have no concern whatsoever as to what the results of their spacing program are in the short term, as long as they move towards fluency over the following ten or more years.

There are many thousands of vocabulary terms that most people will never use in conversation in their second language, even if they do have opportunities to speakーespecially if they want to understand literary prose or technical discussions. Generating and testing can accelerate comprehension of these terms, and unless they become a literary author or a science lecturer, this can remain a very substantial value for SRS regardless of mastery level, throughout their entire life with the language. Clearly, few people need to master the word "catachresis" next week, but supposing one wants to read discussions of poetry or science in a second language, there may be many thousands of words like it they need to understand.

But there is no universally optimal spacing pattern for retention, because "retention" is not one thing. One can only select a specific time frame within which retention matters, and then different spacings will be optimal on different time frames.

Specifically, shorter repetitions are needed to make a more fragile memory available soon, while larger repetitions will create more durable memoriesーeventuallyーeven at the expense of more short-term failure. This statement runs counter to a consensus some users have accepted: a spacing regiment that leads to more short-term failure than its alternative can actually be preferable for efficienctly building durable long-term memories. Thus, depending on one's goals, failure to retrieve an item over a moderately long interval may not be a good reason to start spacing it over quicker intervals before testing it over larger intervals again.

As the authors of Spacing Repetitions Over Long Timescales: A Review and a Reconsolidation Explanation observe, spacing of language tasks is capable of leading to "equal or worse learning but enhanced retention." Thus, we always have to consider the time frame on which retention matters: "the optimal space varied depending on the retention delay, with the optimal space being longer for longer retention delays (e.g., for the 7-day retention delay the optimal space was 3 days and for the 35-day retention delay the optimal space was 8 days)." As observed in The right time to learn: mechanisms and optimization of spaced learning: "For verbal learning with a retention interval of 6 months, a training interval of 7 days was superior to an interval of 3 days."

Thus:

  • This correlation between longer training and retention intervals suggests that longer training intervals preferentially form a memory trace with a very long lifetime.

It is not only that memories can be spaced further as they become more ingrained; larger spacing also deepens memory.

... eventually.

That paper makes the case that, at least in terms of within-day learning steps, the shorter repetitions necessary to make info available in memory shortly interfere directly with the consolidation of more durable memoriesーagain, even if those repetitions are necessary to make the information available soon. So it may be that repeating the testing effect for an item over large intervals sends this message to the brain: "this information is worth building a memory for that will last over this kind of long interval."

My experience with FSRS since early adoption when it was first made available in spreadsheets has given me an openness to these conclusions that I would frankly never have had otherwise, so I have this tool to thank for changing my approach. And so while I may be cherry-picking from the research on SRS to some degree, I am doing so in an attempt to understand my own experience.

Specifically, when I have opted for lower retention in a deck, despite the program's initial advice andーto be clearーI have given it a bit of time for results to showーthe program learns that lower retention is quite effective for me after all. But decks in which I have strictly followed the recommendations never figure out that they can drop the intervals quite as much as those decks do.

I believe this may be in part due to what I described aboveーthat larger intervals themselves simply give the brain the message, "this information is worth encoding a memory for that will last over this span of time"ーand in part due to desirable difficulty.

Originally, I believed the rationale laid out by Anki that we are all constantly forgetting things, and the job of SRS is to go retrieve knowledge from the garbage bin before it's hauled away to the garbage dump forever. Now I believe that the value of SRS has a lot more to do with desirable difficultyーthat is, perhaps the reason that a target retention is such a powerful pivot point to design spacing around isn't because it lets you figure out how to spend the least time pulling the most things back out of the garbage. Maybe it's exactly about waiting until it becomes difficult to get them back out, because the work is the point.

In other words: maybe the real value of an 80% target retention isn't in the 80%, but in the 20%.

For the average user, of course, failing cards is going to be discouraging. So it may be true that most users mentally give up as they start failing cards. And if the user gives up, flipping cards over in an irritated rush is certainly not building memory at all.

But it is important to notice that if this is what happens, it is not some objective feature of the mathematical forgetting curve. It is due in part to what that user believes about how and why spacing works!

If the user believes that retaining an item consistently over short intervals is a prerequisite for building durable memories, and information needs to be pulled from the trash before it's hauled away over and over again, they may feel even feel incompetent or ashamed and self-loathing after failing a certain fraction of cards. Thus, even leaving aside the many users for whom the shorter time scale of an upcoming exam is all that matters anyway, users' perceptions are an unavoidable confounder in data.

But what if the user understands the value of spaced repetition to be precisely in strainingーas an act that builds and strengthens memory, in and of itselfーand what if it is true? Now success lies not so much in "rescuing" knowledge before it is wiped, but in the work. And since that work is hard, should be hard, and is ultimately more effective the more that it is hard, so long as you continue to do itーthen being given fewer cards may mean one is more willing to perform that work each time it's cued. The value of FSRS might be in its calculation of difficultyーnot as a means, but as an end. It is not entirely clear where a logical cut-off point should be at which more difficulty stops being more beneficialーbesides the user's emotional tolerance and developed skill at straining for memories they don't immediately see a mental path towards.

In short, current simulations are not intrinsically accounting for the eventual value that appears over extended time frames from practicing over larger intervals, and they cannot determine how much the user is applying themselves to effortfully retrieve, so they cannot easily account for the value of more effortful retrieval at lower retentions provided that the user with low retention does continue making effort to retrieve. To the extent either phenomena explains the value of spaced repetition, then, it seems plausible that more eventual value emerges from lower retention than current simulations are able to show.

Granted, this reasoning applies to memories whose strength we want to continue building, yet there are many that are as strong as we want them to be. At this point point the opportunity costs in time and effort for simply maintaining the memory are so low that it is completely logical to change priorities and now only worry about not forgetting. In other words: once knowledge built in this way reaches high durablity, we may want to put it in another deck with a higher retention rate.

The net effect of this of course is to flatten intervals away from the "expanding" pattern towards somewhat more equal spacing, and this is not obviously a bad thing: meta-analysis so far finds that "(c) equal and expanding spacing were statistically equivalent." I am excited to see if FSRS changes this situation in research for the first time, and I subjectively feel that it will. But it also seems plausible the evidence could point somewhere between the exponential intervals we all see in SRS sales pitches and flatter ones.

There is one last practical consideration that applies especially to this type of user that has nothing to do with anything argued above. Time in spaced repetition is necessarily time not spent exposed to spontaneous expressions of a languageーand no app can account for how many times a user encounters a word outside the app. If the user makes a card for Spanish "gato," and the algorithm escalates 'too far,' they are likely to encounter the word and absorb it during the interim anywayーhigher the more they can trade off time spent in SRS, especially because real-world repetitions are often more impactful due to emotional significance and contextual variety. On the flipside, some portion of words will turn out to be like catachresis and their lack of value for that user can only be perceived with more time and experience in the language (though again, they may want to know some thousands of terms within its ballpark in several years). Thus, SRS itself bears the most value over an unknown fraction of common words that won't stick with the learner otherwise, and a vague range of words that are not so common as to be reliably instilled from direct experiences in the language but not so rare as to be useless. For the most and least frequent words, it is far less necessary. Thus, a more inefficient algorithmーif it isーcould simply lead to more efficient language learning anyway.

Thus, there is a use case where experiments with very low retention are worth making, and the opportunity costs are low.

With fine-grained control taken out of the base settings, is it still possible?

@aedoncassiel aedoncassiel added the enhancement New feature or request label Jan 3, 2025
@brishtibheja
Copy link

(I didn't read all of it sorry) Did you try the custom scheduling script?

@Expertium
Copy link
Collaborator

Your first thought should be that this research hasn't been able to account for what impact innovations like FSRS might have on the equation, because it hasn't existed until now.

If any article suggests that there is no difference between expanding and fixed intervals, it's a load of baloney.
Anyway, I only see 2 ways to get desired retention below 70%:

  1. Use custom scheduling code
  2. Fork Anki and make your own version

I really doubt that lowering the lower bound would be a net positive.
Minimum recommended retention rarely goes below 70%, suggesting that studying in such a regime is not beneficial for most users. It would also result in absurdly long intervals, which is not what most users want. And if FSRS is not very accurate in that regime, it won't become apparent for a very long time.

@Expertium
Copy link
Collaborator

@L-M-Sherlock is it possible to add a setting to the Helper add-on that removes desired retention bounds? So that users can just use any value between 0 and 1.

@aedoncassiel
Copy link
Author

aedoncassiel commented Jan 3, 2025

(I didn't read all of it sorry) Did you try the custom scheduling script?

As far as I can tell, custom scheduling right now is going to affect the entire collectionーthe FAQ now says "The code you enter applies to the entire collection, and not just decks using the preset". This is not what I want, because I would want a minimum of two decks with different retentions anywhere I tried this.

There's nothing necessary in the post, I just wanted to explain that I'm not asking in ignorance of the 'risks'. And maybe one of the four people in the world who could gain something from the argument might stumble across the post some day.

@Expertium
Copy link
Collaborator

You can specify different desired retentions for different decks. Btw, custom FSRS scheduling works on a per-deck basis, unlike built-in FSRS, which works on a per-preset basis.

@aedoncassiel
Copy link
Author

aedoncassiel commented Jan 3, 2025

Oh! So it works just like the original implementation used toーand I can just ignore whatever visibly displays on the slider afterwards (no explicit confirmation it's doing what I want except for judging by the intervals)? I suppose the only trouble is that it won't auto-update if I move/rename a deck.

@L-M-Sherlock
Copy link
Member

There is a simple method: create a filtered deck with condition prop:r<0.7.

@Expertium
Copy link
Collaborator

But that's not the same as setting desired retention below 70%

@L-M-Sherlock
Copy link
Member

I'm not very interested in supporting this use case. The filtered deck method is good enough, and very flexible.

@aedoncassiel
Copy link
Author

aedoncassiel commented Jan 6, 2025

I really do think we are underestimating the number of language learners who can gain from this approachーbecause there is a compelling argument here for any language learner who is serious about learning long-term. Lately every time I open a paper or a book, I am seeing something relevant to this point: the optimal spacing for nearer-term learning is directly conflicting with the optimal pattern for longer-term retention, and we have to choose which time frame matters.

Here's yet another example I found last night after opening up a book from Oxford Press on effective studying:

0106250715

People seem to be resistant to this idea for exactly the same reason that they are resistant to spaced repetition (and often prefer cramming). It would appear that stressing about retention patterns for vocabulary learned this week essentially is cramming. Not as bad as juggling several excessive learning steps on new items the day they were learned, but inefficient for the same reasons. That is, arguably inefficient not just because it isn't the way of using time that gives the best bang per buck, but because it specifically postpones doing the thing that builds more durable long-term memories. Going by the study above, though I'm aware long steps conflict with FSRS, I wonder if months of thirty day steps would be enough time that it would make sense to graduate to maintenance of memory with a higher FSRS retention at the end. This is not my preference of course.

Custom scheduling, which I would greatly prefer and was able to do since the program came out, doesn't seem to be overriding the primary setting. I don't know if I've formatted something wrong. Expertium didn't confirm that this should work after I asked again if I can ignore what happens in the regular slider setting and Sherlock didn't confirm either, so I'd like to be sure.

@L-M-Sherlock
Copy link
Member

I get an error if I change the value to anything under 7

What's the error message?

@aedoncassiel
Copy link
Author

aedoncassiel commented Jan 6, 2025

It's working now and I'm unable to replicate the error.

@aedoncassiel
Copy link
Author

aedoncassiel commented Jan 6, 2025

I'm very confused about what the custom scheduler can or can't do now though! Opening up the section, I see I still have leftover code from when direct input was required, which doesn't seem to have affected anything, including default settings in new decks.

// User's custom parameters for global
"requestRetention": 0.75, // recommended setting: 0.75 ~ 0.95
"maximumInterval": 565,

In the section that asks for specific decks, as of days of old,

"deckName": "全て::パソコン::Core PC Decks::私の鍵ー仮名",
"w": [0.7697, 0.7697, 1.3614, 2.0597, 7.3905, 0.3137, 1.6535, 0.0010, 1.3423, 0.0000, 0.8134, 2.0636, 0.0010, 0.4185, 2.3967, 0.2918, 3.1273, 0.2907, 0.7392],
"requestRetention": 0.70,
"maximumInterval": 36500,
},

Still gives 1-2 day intervals in a deck previously set to 99 retention.

@aedoncassiel
Copy link
Author

aedoncassiel commented Jan 8, 2025

I believe many of us discovered Anki and spaced repetition at the same time, and in the avalanche of clear evidence supporting the practice in general, some claims about the how and why slipped in without much scrutiny. I don't want to be annoying, but I'm willing to take the risk, because I believe this is worth harping on:

If any article suggests that there is no difference between expanding and fixed intervals, it's a load of baloney.

Well, here is one study that would agree with thatーbut wait for the punchline:

Expanding retrieval practice promotes short-term retention, but equally spaced retrieval enhances long-term retention.:

equally spaced retrieval practice creates a desirable difficulty that enhances learning more than expanding retrieval practice. The important difficulty for promoting long-term retention is a delayed initial retrieval attempt ... the evidence is mixed regarding the effect of expanding versus equally spaced retrieval practice at short retention intervals... In contrast, all previous experiments examining long-term retention on a delayed criterial test have showed benefits of equally spaced retrieval practice over expanding retrieval (Cull, 2000; Logan & Balota, in press; our present results).

As I've been digging deeper to understand why the only meta-analyses we have find no overall benefits for expanding intervals, I've found one particularly shocking reason why expanding is found to be superior in many trials.

Many of them seem to have used the equivalent of flash cards with nothing on the back. That is: there was no corrective feedback. If a participant ever got an answer wrong, that information was likely gone forever. That is the main condition under which we have clear peer-reviewed evidence to say expanding intervals are definitely, clearly, significantly superior. It is obvious why this would happen. I suppose the design is an attempt to mimic what happens when an average person tries to recall a person's name which they haven't written down, or recall a fact in a conversation while washing dishesーin many cases before the invention of PCs, nevermind smartphonesーrather than an attempt to find optimal study patterns. The following quotations all just show that I'm not pulling this claim out of thin air, and if you believe me you can skip it all or just see the bolded words:

Optimizing retrieval as a learning event: When and why expanding retrieval practice enhances long-term retention:

Recently, however, several studies have failed to replicate Landauer and Bjork’s (1978) findings***. In some instances, there has been a failure to find an advantage of expanding retrieval practice (e.g., Balota, Duchek, Sergent- Marshall, & Roediger, 2006; Carpenter & DeLosh, 2005), and in other studies, uniform retrieval practice has actually led to significantly better performance than has expanding retrieval practice (e.g., Cull, 2000; Karpicke & Roediger, 2007; Logan & Balota, 2008) when memory was tested after a long retention interval. These and other findings have called into question whether expanding retrieval practice is the most effective practice schedule for learning and for long-term recall (for a recent review, see Balota, Duchek, & Logan, 2007).

When Expanding retrieval practice is and is not Optimal
Why Expanding retrieval practice is Optimal, When it is Optimal

The results of the present experiments provide an explanation for the apparent discrepancies in the literature ... In Experiments 1A, 1B, and 2, the participants first read a passage about Antarctica and were then instructed to recall facts about Antarctica without feedback ...

The logic for why we observed this pattern of results is relatively straightforward. Expanding retrieval practice allows for spacing to occur between test trials while minimizing the costs associated with retrieval failures. By testing immediately and then systematically increasing delays between subsequent tests, expanding retrieval practice ensures that a maximum number of items will continue to be recallable and therefore continue to benefit from the powerful consequences of repeated testing. In the corresponding uniform schedule, the likelihood of retrieval failure on the first test is much higher, and any such failures will propagate to all subsequent tests.

Obviously, the failures would only propagate because there is no feedback. Coming back to the *** triple asterisk in the first quoted paragraph, here is another paper's summary of Landauer and Bjork's original 1978 finding that expanding intervals were superiorーthis absence of corrective feedback in trials finding expanding intervals superior has been a feature from day one:

"Landauer and Bjork (1978) were the first to compare the efficacy of various schedules of retrieval practice. In one experiment, their subjects studied first name–last name pairs once, followed by a practice phase in which they made three attempts to retrieve the appropriate last name when cued with a first name (no feedback was provided).""

It is clear this is compatible with what I've argued: that recall over short intervals may be necessary to make a fast-decaying memory form soon, but recall over larger intervals is both necessary and sufficient to make a durable memory. If an answer must be supplied by the participant themselves, then we have to give them a flimsy memory first so that they have a chance of supplying the answer themselves in the larger intervals. However, if we do have corrective feedback, and we care about building durable memories efficiently, we may be able to save an absolute magnitude of time by skipping that first step.

It seems that what FSRS is currently doing is finding the optimal spacing to ensure that a flimsy memory becomes available now, then it grows sound enough to last awhile, and then it becomes durable over the coming months or years. But this doesn't mean it finds the minimum investment of time to ensure that memories become durable over coming months or years, because building flimy memories now does not seem to be a prerequisite for building durable ones... unless we're being tested without any corrective feedback. And in fact, the latter goal may require a lot less time than FSRS has already saved us. If what I've shown here is true, I don't know what the optimal tweak would beーbut it might be as simple as a bias towards a large gap before the very first review, with low weight given to failed recalls on this first attempt. Or maybe it's an ability to handle large learning steps: if I were to copy what were effectively a few 30d learning steps with nothing inbetween, as in the Bahrick study I posted an image of, I have no idea what effect this would have on FSRS's behavior.

@Expertium
Copy link
Collaborator

Jarrett actually experimanted with a different kind of scheduling, one where retention is not maintained at some target level, and instead the schedule is maximizing the growth rate of memory stability aka maximum memory stability in the least amount of time. But he said it didn't work reliably enough.

@aedoncassiel
Copy link
Author

aedoncassiel commented Jan 8, 2025

Just for the sake of argument, and so I can understand the implications on FSRS, suppose I took the extreme step of copying Bahrick 1979's "30d 30d 30d 30d 30d 30d" learning steps for new Spanish vocabulary. What effect do you believe this would have on FSRS at the end of those six months? From the algorithm's point of view, would there be an optimal way to approximate thisーsay, by setting two 30d intervals so you can keep it on 30d while marking passes or fails for awhile instead of actually setting multiple 30d steps? I'm imagining that this way you could see what FSRS's prediction is at each test, and decide at what point to hand things over to the algorithm. Supposing large steps like these do successfully create durable memories over time even with some failed retrievals in the mix, at what point would FSRS "learn" that this is the case and recommend similar intervalsーby which I mean: repeated large intervals on new items despite a few early failed retrievalsーon its own?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants