Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable cache #32

Open
mmoreram opened this issue Sep 4, 2020 · 26 comments
Open

Disable cache #32

mmoreram opened this issue Sep 4, 2020 · 26 comments

Comments

@mmoreram
Copy link
Member

mmoreram commented Sep 4, 2020

Cache is not useful anymore. In fact, it produces a lot of invalid and non expected behaviors (like having to work with dev or cleaning cache by hand each time we reload the server.

This task should be

  • Delete cache folder recreation, or, if that step is too complicated (in terms of code duplication), find a place where deleting the generated cache when it's completely filled.
  • Keep var/ folder for log files.
@nivpenso
Copy link
Contributor

nivpenso commented Feb 2, 2021

can you please be more specific when saying "cache is not useful anymore" ?

  1. Which caching parts are not useful?
  2. What are the side effects of having the current cache logic?
  3. What is the expected behavior from the async kernel when booted?

I'm asking it because some of the cache logic in the kernel is to "warmup" (prepare) some files that the environment will be able to deal with later on. (as explained here).
For example:
Symfony\Bundle\FrameworkBundle\CacheWarmer\RouterCacheWarmer

@mmoreram
Copy link
Member Author

mmoreram commented Feb 2, 2021

@nivpenso sure!

  1. In fact, all parts are now absolutely useless. Cache in Symfony is built to share pre-processed logic across requests. This is because memory is not shared among them all, so all this data that should be prebuilt like annotations, twig templates, doctrine metadata, container services architecture... is built exactly once, and written in disk in a way that you only need to require a file or call a method, and everything is done. In fact, all this pre-built data is precisely done for one and only one purpose: know how to build services with their in-memory data.
  2. What happens in Drift is that this cache is built once and again. As expected in Symfony. And each time something changes in terms of service definition, or a route... well, then Drift needs to rely on this cache logic on top of environments and console commands.
  3. As Drift shares memory across Requests, this pre-processing actions is only done once, just before the server starts listening the HTTP socket. Because it doesn't matter if we talk about 100ms or 1 second building this cache, we can delete this cache each time before starting the server (to avoid cache zombie files), and because we can delete the cache each time we want to start the server, then there's no need to have this cache. We don't even need to build it. At all.

So, what if we want to preload everything before we start listening to HTTP socket? Like something loaded from our database, or a file content, or a configuration file to avoid file reading each time we handle a request... then we can rely on this new kernel event named preload. This is a blocking event (can be blocking because the event loop has not even started), so you can do whatever you need, even using blocking actions.

You can see a bit more about this kernel.preload event in the documentation

So yes. Cache is not useful, and only adds some troubles when working on development (if you don't remember to delete the cache after some configuration changes).

The main goal of this PR is to find a way of, literally, do not generate this cache.

@nivpenso
Copy link
Contributor

nivpenso commented Feb 9, 2021

Here is where I got so far in my investigation after digging into symfony/http-kernel

Kernel::initializeContainer() is the function that initiates the cache for the container.
it first checks whether the cache dir exists or not - if not:

  1. it creates the cache directory
  2. create a lock file in the directory
  3. create annotation file (in here Kernel::buildContainer())
  4. compiling the container - this part doesn't include writing the files into the file system, and it doesn't set the kernel's container yet. It just preparing the files.
  5. dump the compiled container to the file system (cache folder)
  6. removing the lock file
  7. loading the compiled container from the file system

It is a little bit tricky to remove the cache from the file system because it used by the kernel to load the compiled container (from the file system).

I'm now investigating 2 approaches:

  1. writing the compiled container to PHP streams instead of file system
  2. immediately load all the compiled container to the kernel using eval() function

A different approach can be: just leave the cache directory as it is now, and make sure to delete it (if it exists) on kernel preload.
@mmoreram WDYT?

@mmoreram
Copy link
Member Author

mmoreram commented Feb 9, 2021

@nivpenso I like the container compilation into PHP streams. Basically because as soon as the kernel shuts down, this data is lost (expected behavior).

About the second option, I don't understand this approach. Can you explain a bit more?
And about the last one, delete the whole cache on preload, can be a good option for now, yes. I would go to avoid generating it, basically because is an amount of time we can win here (not much, but this is something).

Finally, take in account that this cache is not only used by the container, but as well by Twig, Doctrine... so the last option would solve them all.

@nivpenso
Copy link
Contributor

nivpenso commented Feb 9, 2021

I agree with you that working with streams sounds like a good solution but unfortunately, the kernel code for building the container and especially writing it into the filesystem is scattered across 2-4 different places and requires many code changes.

As of the 2nd approach, the compiled container is basically a group of PHP files that include things like configuration and other optimizations embedded within these files. So instead of dumping these files into the filesystem and load the files using require, we can use the eval() function on each one of the compiled container classes to load it into the memory. It is not that easy task of course, since the loading order is critical.

As more I think about it the more I agree with you. The 3rd approach of removing the cache directory on the server's preload state is the easiest idea to implement and will require the least changes and code duplication.

@nivpenso
Copy link
Contributor

nivpenso commented Feb 10, 2021

Just to update it seems that the preload kernel event can't help us here because the event is being fired after the container has created.

@mmoreram
Copy link
Member Author

Merged! Well done!

@raebbar
Copy link

raebbar commented Feb 22, 2021

Disabling the cache in general has a big impact on tests based on KernelTestCase. The kernel is rebooted for each individual test and the container is now rebuilt each time, which takes a long time in total.
In my case, 198 kernel tests ran in 10 seconds before the change, now they take over 30 minutes.

As we can see, the cache is not useless and should not be disabled, or disabling it should be optional.

@mmoreram
Copy link
Member Author

@raebbar you're. totally. right.

My appologies for that. I was completely focused on production, but in testing, working with the kernel cache, is something very useful.

We have 2 options here

  • Disabling the cache only on prod (which makes sense)
  • Adding a --no-cache flag in the server, and use it in the kernel. This change would explicitly modify the kernel adapter interface in the server package.

WDYT?

@nivpenso
Copy link
Contributor

@raebbar good catch!

@raebbar, Out of curiosity what is the reason to reboot the kernel for every test?

@mmoreram
Copy link
Member Author

@nivpenso each test should be isolated, so working with different kernels would have sense. The same happens when you have functional tests with a running Drift server, Several isolated servers would work with the same kernel definition, and rebuilding the same kernel once and again would result a dramatic decrease of performance.

@raebbar
Copy link

raebbar commented Feb 22, 2021

@nivpenso Just as mmoreram has already described, tests should be isolated in order to have a clear state at the beginning and not influence each other.

The question I am asking is whether we even need this logic for deleting the cache in the kernel.
php bin/console cache:clear --env=prod --no-debug
clears the cache already. If you use containers, you simply build this into the Docker entrypoint or you can always execute it directly before the php vendor/bin/server call. What do you think?

@mmoreram
Copy link
Member Author

@raebbar yes, you're right. Cache is something that is useful. But is only useful when you need velocity in terms of building and shutting up kernel instances. And this only happens in tests.

So, the question is.
Why don't we just keep the kernel and that's it? Why don't we just use rm -Rf var/cache/*?
Well, you're right. We could do that. But I can tell you that I've seen people an entire hour working against time, looking for the reason why a change is not seen in the server. And that was because the cache. And I've seen this scenarios dozens and dozens of times, both in Symfony and Drift.

In Symfony is something that you cannot bypass, basically because ALL requests re-build the same Kernel, and need this cache to do it efficiently. But in Drift, the cache is never reused across requests.

So.
My thoughts are that, in the main Drift experience, the cache should not be present at all. That will improve the developer experience so much.
In tests, cache makes sense, and I think that we should find a way (as invisible as possible) to keep using the cache only in tests.

@raebbar
Copy link

raebbar commented Feb 22, 2021

@mmoreram I understand your point ... ok let's make it convenient :-)

"Disabling the cache only on prod (which makes sense)" - doesn't solve the problem completely because I don't want to have a cache when developing locally (for example)

"Adding a --no-cache flag in the server" - With this option, I first have to look at the server package myself to decide whether it is the right way or not. You know that better.

@mmoreram
Copy link
Member Author

@raebbar what about working with cache ONLY in test environment?

@raebbar
Copy link

raebbar commented Feb 22, 2021

I think we can start with that.

Another possibility would be an environment variable to keep it flexible for all cases. By default, we would deactivate the cache unless the DRIFT_CACHE_ENABLED variable is not empty.
DRIFT_CACHE_ENABLED is then set in the .env.test.

@mmoreram
Copy link
Member Author

I like this idea as well. That makes it completely configurable, no matter the environment.
Said that, in that case, that option should be properly documented.

@nivpenso
Copy link
Contributor

Just wanted to go deeper into that isolation thing - I totally understand and agree that each test should not be affected by another. With that said, it doesn't mean that they can not share resources. An isolated environment for testing can also be done without restarting the server or rebooting the kernel each time.

Before trying to provide an in-framework solution. I think we need to understand better the problem we are facing here. The KernelTestCase class is a "Symfony" class, maybe it doesn't fit the Drift type of testing?

It is still not very clear to me why it is so important to reboot the kernel and load it again from cache instead of just working with the existing one? can you give an example of two tests that use the same kernel which one affects the other?

@raebbar
Copy link

raebbar commented Feb 22, 2021

Functional tests check the integration of the different layers of an application (from the routing to the response). Most of the application's services are used as they are. Only a few special ones may be mocked.
Any service with a state can change when a test is run. If tests are added or existing ones are changed, the initial state changes for subsequent tests.
You would have to reset the status manually for each test run. This is time-consuming, error-prone, sometimes impossible and requires constant maintenance when services change and creates many dependencies.
It is better to start a new "clean" system for each test.

@mmoreram
Copy link
Member Author

Just a thing here.

Drift can ONLY work with stateless services.
The thing to work with several kernels would be for parallelism features, but that's it.

@raebbar
Copy link

raebbar commented Feb 23, 2021

@mmoreram Just for clarification. I meant services in the sense of Symfony Container. And of course they can have a state.

@mmoreram
Copy link
Member Author

@raebbar nopes. In Drift (neither in any ReactPHP application) you can not use stateful services. Basically because you cannot guess the order of requests accesses.

The only state that a service can have in an emulation of a stack, like a list of logs, a repository or something similar.

@raebbar
Copy link

raebbar commented Feb 23, 2021

You have just given simple examples yourself. Would you say the Redis server implementation in ReactPHP by clue (https://github.com/clue/php-redis-server) has a state? Write your Varnish clone in reactphp, imaginable - state. Configs that are loaded at startup and can be changed at runtime - state. Statistics, metrics - state. DNS-Resolver mit Cache - state. Connections to external resources - state. And so on ...

I guess I know what your point was now, but saying "Drift can ONLY work with stateless services" is wrong in my opinion.

But our discussion is now a bit offtopic for this issue :-)

@mmoreram
Copy link
Member Author

@raebbar whan I talk about services, I mean this, state per request. So yes, can have state, but this state change will be shared per all request at any order.

Said that, I would go for an environment variable for this cache usage.
As soon as I have a PR, I will send it here :)

@mmoreram
Copy link
Member Author

mmoreram commented Mar 1, 2021

@raebbar @nivpenso

#45

WDYT?
It's a simple workaround that can work properly in dev environments.

@raebbar
Copy link

raebbar commented Mar 2, 2021

Looks good. Thank you for the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants