Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 32KB buffer for copyFile() (reduces copy time by 30%) #1749

Merged
merged 5 commits into from
Jun 14, 2024

Conversation

MahouShoujoMivutilde
Copy link
Contributor

@MahouShoujoMivutilde MahouShoujoMivutilde commented Jun 12, 2024

Fixes #1685

This makes :paste comparable in performance to cp --reflink=never for large files.

For large number of small files improvements are less substantial (compared to cp -r --reflink=never), though still noticeable.

In both cases the copy takes about 30% less time than with buf size at 4096.

32KB is the same number io.Copy() uses internally (when it can), and is about where improvements stop.

Testing of other possible values for buf size, relevant channels, and app.ui.draw(app.nav) interval during progress output, and their influence on copy performance:
#1685 (comment)

This makes `:paste` comparable in performance to `cp --reflink=never`
for large files.

For large number of small files improvements are less substantial
(compared to `cp -r --reflink=never`), though still noticeable.

In both cases the copy takes about 30% less time than with `buf` size at 4096.

32KB is the same number `io.Copy()` uses internally (when it can), and is about where improvements stop.

Context:
gokcehan#1685 (comment)
Copy link
Collaborator

@joelim-work joelim-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I think it would be cleaner and more idiomatic to enhance the io.Writer to include progress updates, so that it can be used directly with io.Copy.

Something like this:

type ProgressWriter struct {
	writer io.Writer
	nums   chan<- int64
}

func NewProgressWriter(writer io.Writer, nums chan<- int64) *ProgressWriter {
	return &ProgressWriter{
		writer: writer,
		nums:   nums,
	}
}

func (progressWriter *ProgressWriter) Write(b []byte) (int, error) {
	n, err := progressWriter.writer.Write(b)
	progressWriter.nums <- int64(n)
	return n, err
}


// add the following to the copyFile function
io.Copy(NewProgressWriter(w, nums), r)

I haven't tested the performance of io.Copy though - hopefully it also works well so that there's no need to have the low-level code for allocating a buffer and reading into it.

@MahouShoujoMivutilde
Copy link
Contributor Author

It works fine, but has problems with the error handling (see the end).

Tested with the same 4.6GB file, like before.

4096 buf: 24.601001088s

This PR as it is: 16.244729426s

Wrapped writer, io.Copy(): 15.705168349s

So pretty much the same, which is good.

Sanity check, using io.CopyBuffer(), which is the same as io.Copy(), except you can provide your own buffer:

io.CopyBuffer() with buf 4096: 17.685000156s
io.CopyBuffer() with buf 32768: 17.195739509s
io.CopyBuffer() with buf 4 (just 4): 16.687611596s

What??

It seems the provided buffer isn't used.

Adding logs to the key places of io.CopyBuffer() implementation yield this: myCopyBuffer: using wt.WriteTo(dst).

So it never gets to the buf part in the first place, however it still probably uses 32kb buffer internally (considering the similar speed).

I don't like this ambiguous buffer. But if you prefer it like that - sure.

Also, I don't know how to precisely replicate the current error handling behavior, because io.Copy() doesn't differentiate between general read and write errors.

I can match specific errors, but that's different - with the current implementation we check for "something (IDK what) is wrong with write", or "something is wrong with read".

BTW, somehow, io.ErrShortWrite, which is "ErrShortWrite means that a write accepted fewer bytes than requested but failed to return an explicit error", does not cover "not enough space left on device". So with this implementation a half-written file will be lost if space on device runs out. This should probably be something else, but what I've found to catch it is linux only. Overall, if we need to catch OS specific errors I'd rather use lf's implementation.

Anyway, here it is.

@joelim-work
Copy link
Collaborator

So looking at the Git history, this code was added in a single commit d6e9aec. Apart from 99734c7 where the buffer size was increased from 1024 to 4096, the code (including the error handling) hasn't really changed since.

Regarding the error handling in the original code, I don't know why the file is not removed if writing fails halfway, and I'm not sure if it's intentional or just a bug. So I was thinking that it would be OK to just close and remove the file for all error scenarios - in practical terms is there a use case for leaving the incomplete file there? How do cp and other file managers work?

@MahouShoujoMivutilde
Copy link
Contributor Author

Maybe if you copied a very large file and run out of space halfway through - you could copy the other half manually and stitch the parts together. Though, I've never done this. If that or incomplete transfers of any kind are a likely possibly - I would just use rsync.

  • cp throws "no space left" error and leaves what was written.
  • yazi just hangs (maybe waiting for you to clear space), leaving copied part of the file on exit
  • joshuto hangs for some time, then silently stops copy, keeping the file.

Dolphin does the sane thing and just checks if there is enough space before copy.

So I was thinking that it would be OK to just close and remove the file for all error scenarios

I agree. It's not like we delete everything that was copied (even if it's complete).

And if we're copying multiple files, having some files complete and one (more if other errors?) incomplete without immediate indication which file is broken is not a good user experience, IMO. Marking incomplete files as e.g. some-file.mkv.partial or just deleting on failure makes it obvious which file(s) user should try to copy again.

Copy link
Collaborator

@joelim-work joelim-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this patch is fine to merge now - users will appreciate a speedup in copying files, and the error handling looks reasonable (can be enhanced in the future if necessary).

I have also updated the PR description to link the original issue. Thanks once again for your changes.

@MahouShoujoMivutilde
Copy link
Contributor Author

Nice!

Do you want me to redo commit descriptions (mention now using io.Copy()) / squash?

@joelim-work
Copy link
Collaborator

We use squash when merging PRs anyway - all of the commits and messages will be squashed into a single commit on the master branch. The commit message will look something like this:

Use 32KB buffer for `copyFile()` (reduces copy time by 30%) (#1749)

* Use 32KB buffer for `copyFile()` (reduces copy time by 30%)

This makes `:paste` comparable in performance to `cp --reflink=never`
for large files.

For large number of small files improvements are less substantial
(compared to `cp -r --reflink=never`), though still noticeable.

In both cases the copy takes about 30% less time than with `buf` size at 4096.

32KB is the same number `io.Copy()` uses internally (when it can), and is about where improvements stop.

Context:
https://github.com/gokcehan/lf/issues/1685#issuecomment-2163934296

* Apply suggested changes

* Just delete incomplete an file on all errors

Let me know if you want to change anything, force push, etc.

@MahouShoujoMivutilde
Copy link
Contributor Author

Then I believe it's fine as it is.

While I could rewrite the message to reflect switching to io.Copy(), 32KB buffer is really the most important part, and is what's used internally. And with the wrapper used it's not like accelerated copy methods io.Copy() is capable of in certain cases (e.g. reflinks) will unexpectedly be used.

So you can merge it.

(progress UI updates don't seem to impact performance much)
@joelim-work
Copy link
Collaborator

Nice catch with the progress update frequency. I missed that, thanks.

@joelim-work joelim-work merged commit 2d3cc97 into gokcehan:master Jun 14, 2024
4 checks passed
@joelim-work joelim-work added the fix Pull requests that fix existing behavior label Jun 24, 2024
@joelim-work joelim-work added this to the r33 milestone Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Pull requests that fix existing behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Copy operations are (relatively) slow
2 participants