-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use 32KB buffer for copyFile()
(reduces copy time by 30%)
#1749
Conversation
This makes `:paste` comparable in performance to `cp --reflink=never` for large files. For large number of small files improvements are less substantial (compared to `cp -r --reflink=never`), though still noticeable. In both cases the copy takes about 30% less time than with `buf` size at 4096. 32KB is the same number `io.Copy()` uses internally (when it can), and is about where improvements stop. Context: gokcehan#1685 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I think it would be cleaner and more idiomatic to enhance the io.Writer
to include progress updates, so that it can be used directly with io.Copy
.
Something like this:
type ProgressWriter struct {
writer io.Writer
nums chan<- int64
}
func NewProgressWriter(writer io.Writer, nums chan<- int64) *ProgressWriter {
return &ProgressWriter{
writer: writer,
nums: nums,
}
}
func (progressWriter *ProgressWriter) Write(b []byte) (int, error) {
n, err := progressWriter.writer.Write(b)
progressWriter.nums <- int64(n)
return n, err
}
// add the following to the copyFile function
io.Copy(NewProgressWriter(w, nums), r)
I haven't tested the performance of io.Copy
though - hopefully it also works well so that there's no need to have the low-level code for allocating a buffer and reading into it.
It works fine, but has problems with the error handling (see the end). Tested with the same 4.6GB file, like before. 4096 buf: 24.601001088s This PR as it is: 16.244729426s Wrapped writer, So pretty much the same, which is good. Sanity check, using
What?? It seems the provided buffer isn't used. Adding logs to the key places of So it never gets to the I don't like this ambiguous buffer. But if you prefer it like that - sure. Also, I don't know how to precisely replicate the current error handling behavior, because I can match specific errors, but that's different - with the current implementation we check for "something (IDK what) is wrong with write", or "something is wrong with read". BTW, somehow, Anyway, here it is. |
So looking at the Git history, this code was added in a single commit d6e9aec. Apart from 99734c7 where the buffer size was increased from 1024 to 4096, the code (including the error handling) hasn't really changed since. Regarding the error handling in the original code, I don't know why the file is not removed if writing fails halfway, and I'm not sure if it's intentional or just a bug. So I was thinking that it would be OK to just close and remove the file for all error scenarios - in practical terms is there a use case for leaving the incomplete file there? How do |
Maybe if you copied a very large file and run out of space halfway through - you could copy the other half manually and stitch the parts together. Though, I've never done this. If that or incomplete transfers of any kind are a likely possibly - I would just use rsync.
Dolphin does the sane thing and just checks if there is enough space before copy.
I agree. It's not like we delete everything that was copied (even if it's complete). And if we're copying multiple files, having some files complete and one (more if other errors?) incomplete without immediate indication which file is broken is not a good user experience, IMO. Marking incomplete files as e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this patch is fine to merge now - users will appreciate a speedup in copying files, and the error handling looks reasonable (can be enhanced in the future if necessary).
I have also updated the PR description to link the original issue. Thanks once again for your changes.
Nice! Do you want me to redo commit descriptions (mention now using |
We use squash when merging PRs anyway - all of the commits and messages will be squashed into a single commit on the master branch. The commit message will look something like this:
Let me know if you want to change anything, force push, etc. |
Then I believe it's fine as it is. While I could rewrite the message to reflect switching to So you can merge it. |
(progress UI updates don't seem to impact performance much)
Nice catch with the progress update frequency. I missed that, thanks. |
Fixes #1685
This makes
:paste
comparable in performance tocp --reflink=never
for large files.For large number of small files improvements are less substantial (compared to
cp -r --reflink=never
), though still noticeable.In both cases the copy takes about 30% less time than with
buf
size at 4096.32KB is the same number
io.Copy()
uses internally (when it can), and is about where improvements stop.Testing of other possible values for
buf
size, relevant channels, andapp.ui.draw(app.nav)
interval during progress output, and their influence on copy performance:#1685 (comment)