-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hw3 #418
Open
george-qi
wants to merge
5
commits into
harvard-cs205:HW3
Choose a base branch
from
george-qi:HW3
base: HW3
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Hw3 #418
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file not shown.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
Below, please find my results: | ||
|
||
#0: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz on Apple | ||
#1: Intel(R) Iris(TM) Graphics 6100 on Apple | ||
|
||
Best one: configuration ('coalesced', 512, 128): 0.00284936 seconds | ||
|
||
coalesced reads, workgroups: 8, num_workers: 4, 0.14073944 seconds | ||
coalesced reads, workgroups: 8, num_workers: 8, 0.07071592 seconds | ||
coalesced reads, workgroups: 8, num_workers: 16, 0.04490056 seconds | ||
coalesced reads, workgroups: 8, num_workers: 32, 0.02539208 seconds | ||
coalesced reads, workgroups: 8, num_workers: 64, 0.01459816 seconds | ||
coalesced reads, workgroups: 8, num_workers: 128, 0.00884136 seconds | ||
coalesced reads, workgroups: 16, num_workers: 4, 0.07848448 seconds | ||
coalesced reads, workgroups: 16, num_workers: 8, 0.03739232 seconds | ||
coalesced reads, workgroups: 16, num_workers: 16, 0.02174264 seconds | ||
coalesced reads, workgroups: 16, num_workers: 32, 0.01281008 seconds | ||
coalesced reads, workgroups: 16, num_workers: 64, 0.00894688 seconds | ||
coalesced reads, workgroups: 16, num_workers: 128, 0.00466936 seconds | ||
coalesced reads, workgroups: 32, num_workers: 4, 0.04298944 seconds | ||
coalesced reads, workgroups: 32, num_workers: 8, 0.02368376 seconds | ||
coalesced reads, workgroups: 32, num_workers: 16, 0.01368224 seconds | ||
coalesced reads, workgroups: 32, num_workers: 32, 0.00891504 seconds | ||
coalesced reads, workgroups: 32, num_workers: 64, 0.00464296 seconds | ||
coalesced reads, workgroups: 32, num_workers: 128, 0.00299816 seconds | ||
coalesced reads, workgroups: 64, num_workers: 4, 0.02353056 seconds | ||
coalesced reads, workgroups: 64, num_workers: 8, 0.01384016 seconds | ||
coalesced reads, workgroups: 64, num_workers: 16, 0.00888672 seconds | ||
coalesced reads, workgroups: 64, num_workers: 32, 0.00468648 seconds | ||
coalesced reads, workgroups: 64, num_workers: 64, 0.00326712 seconds | ||
coalesced reads, workgroups: 64, num_workers: 128, 0.00355136 seconds | ||
coalesced reads, workgroups: 128, num_workers: 4, 0.02740848 seconds | ||
coalesced reads, workgroups: 128, num_workers: 8, 0.01566936 seconds | ||
coalesced reads, workgroups: 128, num_workers: 16, 0.0089792 seconds | ||
coalesced reads, workgroups: 128, num_workers: 32, 0.00475928 seconds | ||
coalesced reads, workgroups: 128, num_workers: 64, 0.00319496 seconds | ||
coalesced reads, workgroups: 128, num_workers: 128, 0.0034688 seconds | ||
coalesced reads, workgroups: 256, num_workers: 4, 0.02341696 seconds | ||
coalesced reads, workgroups: 256, num_workers: 8, 0.01217416 seconds | ||
coalesced reads, workgroups: 256, num_workers: 16, 0.00652944 seconds | ||
coalesced reads, workgroups: 256, num_workers: 32, 0.00355272 seconds | ||
coalesced reads, workgroups: 256, num_workers: 64, 0.00321208 seconds | ||
coalesced reads, workgroups: 256, num_workers: 128, 0.002994 seconds | ||
coalesced reads, workgroups: 512, num_workers: 4, 0.02225928 seconds | ||
coalesced reads, workgroups: 512, num_workers: 8, 0.0117604 seconds | ||
coalesced reads, workgroups: 512, num_workers: 16, 0.00665648 seconds | ||
coalesced reads, workgroups: 512, num_workers: 32, 0.00359816 seconds | ||
coalesced reads, workgroups: 512, num_workers: 64, 0.00299704 seconds | ||
coalesced reads, workgroups: 512, num_workers: 128, 0.00284936 seconds | ||
blocked reads, workgroups: 8, num_workers: 4, 0.14570672 seconds | ||
blocked reads, workgroups: 8, num_workers: 8, 0.08341456 seconds | ||
blocked reads, workgroups: 8, num_workers: 16, 0.05593968 seconds | ||
blocked reads, workgroups: 8, num_workers: 32, 0.03242192 seconds | ||
blocked reads, workgroups: 8, num_workers: 64, 0.01547184 seconds | ||
blocked reads, workgroups: 8, num_workers: 128, 0.00994824 seconds | ||
blocked reads, workgroups: 16, num_workers: 4, 0.07736544 seconds | ||
blocked reads, workgroups: 16, num_workers: 8, 0.04720448 seconds | ||
blocked reads, workgroups: 16, num_workers: 16, 0.03139616 seconds | ||
blocked reads, workgroups: 16, num_workers: 32, 0.01388616 seconds | ||
blocked reads, workgroups: 16, num_workers: 64, 0.00971104 seconds | ||
blocked reads, workgroups: 16, num_workers: 128, 0.00672608 seconds | ||
blocked reads, workgroups: 32, num_workers: 4, 0.04234944 seconds | ||
blocked reads, workgroups: 32, num_workers: 8, 0.02223264 seconds | ||
blocked reads, workgroups: 32, num_workers: 16, 0.0127568 seconds | ||
blocked reads, workgroups: 32, num_workers: 32, 0.00973904 seconds | ||
blocked reads, workgroups: 32, num_workers: 64, 0.00668352 seconds | ||
blocked reads, workgroups: 32, num_workers: 128, 0.00656008 seconds | ||
blocked reads, workgroups: 64, num_workers: 4, 0.02402304 seconds | ||
blocked reads, workgroups: 64, num_workers: 8, 0.01290296 seconds | ||
blocked reads, workgroups: 64, num_workers: 16, 0.00971152 seconds | ||
blocked reads, workgroups: 64, num_workers: 32, 0.006676 seconds | ||
blocked reads, workgroups: 64, num_workers: 64, 0.0063536 seconds | ||
blocked reads, workgroups: 64, num_workers: 128, 0.006968 seconds | ||
blocked reads, workgroups: 128, num_workers: 4, 0.02390144 seconds | ||
blocked reads, workgroups: 128, num_workers: 8, 0.0153476 seconds | ||
blocked reads, workgroups: 128, num_workers: 16, 0.01144752 seconds | ||
blocked reads, workgroups: 128, num_workers: 32, 0.00693616 seconds | ||
blocked reads, workgroups: 128, num_workers: 64, 0.00701288 seconds | ||
blocked reads, workgroups: 128, num_workers: 128, 0.00648784 seconds | ||
blocked reads, workgroups: 256, num_workers: 4, 0.01887544 seconds | ||
blocked reads, workgroups: 256, num_workers: 8, 0.01251208 seconds | ||
blocked reads, workgroups: 256, num_workers: 16, 0.00761376 seconds | ||
blocked reads, workgroups: 256, num_workers: 32, 0.00619296 seconds | ||
blocked reads, workgroups: 256, num_workers: 64, 0.00656528 seconds | ||
blocked reads, workgroups: 256, num_workers: 128, 0.00608576 seconds | ||
blocked reads, workgroups: 512, num_workers: 4, 0.018794 seconds | ||
blocked reads, workgroups: 512, num_workers: 8, 0.01230216 seconds | ||
blocked reads, workgroups: 512, num_workers: 16, 0.00884832 seconds | ||
blocked reads, workgroups: 512, num_workers: 32, 0.0061172 seconds | ||
blocked reads, workgroups: 512, num_workers: 64, 0.00630272 seconds | ||
blocked reads, workgroups: 512, num_workers: 128, 0.00573808 seconds | ||
configuration ('coalesced', 512, 128): 0.00284936 seconds |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Empty file.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
-- PART 1 | ||
|
||
Finished after 915 iterations, 213.29096 ms total, 0.233104874317 ms per iteration | ||
Found 2 regions | ||
|
||
Maze 2: | ||
Finished after 531 iterations, 122.0564 ms total, 0.229861393597 ms per iteration | ||
Found 35 regions | ||
|
||
|
||
-- PART 2 | ||
|
||
Maze 1: | ||
Finished after 529 iterations, 121.61456 ms total, 0.229895198488 ms per iteration | ||
Found 2 regions | ||
|
||
Maze 2: | ||
Finished after 272 iterations, 62.71688 ms total, 0.230576764706 ms per iteration | ||
Found 35 regions | ||
|
||
|
||
-- PART 3 | ||
|
||
Maze 1: | ||
Finished after 10 iterations, 2.81728 ms total, 0.281728 ms per iteration | ||
Found 2 regions | ||
|
||
Maze 2: | ||
Finished after 9 iterations, 2.19712 ms total, 0.244124444444 ms per iteration | ||
Found 35 regions | ||
|
||
|
||
-- PART 4 | ||
|
||
Maze 1: | ||
Finished after 10 iterations, 8.49528 ms total, 0.849528 ms per iteration | ||
Found 2 regions | ||
|
||
Maze 2: | ||
Finished after 9 iterations, 7.49304 ms total, 0.83256 ms per iteration | ||
Found 35 regions | ||
|
||
A graph of my results from Parts 1-4 can be found under "maze_1.png" and "maze_2.png". | ||
|
||
From my results above, it is clear that -- given the architecture of my computer -- using a single thread is not a reasonable choice, as the performance is roughly 3-4x worse. Further note that the single thread does not increase the number of iterations but only the amount of time per iteration. In general, the motivation behind using a single thread is to avoid some redundant global memory leads, but we lose the benefits of computing in parallel. | ||
|
||
In this case, it is clear that the efficiency gained from eliminating these memory redundancies does not dominate the slow down caused from having to perform all the work serially (e.g. having to remember the last fetch). If there the labels of pixels were more similar (i.e. more redundancy in the labels), then this optimization would have yielded better results due to reduced latency costs associated with going to global memory. | ||
|
||
This implies that we are compute bound in that computation is faster than global memory reads. Therefore, if the GPU were memory bound (i.e. were biased towards read speed as opposed to computation speed), then it could be possible that a single thread would be a good choice. This could include factors such as latency costs of accessing global memory, the number of iterations that we could reduce the problem to in the single thread case, and so on. | ||
|
||
|
||
-- PART 5 | ||
|
||
First of all, the atomic_min(*p, val) takes the 32-bit value stored at the location pointed by p, computes the minimum of that value and val, and then stores the answer at the location pointed to by p. | ||
|
||
Now given this, if we used min instead of atomic_min(), then a thread could write over the correct answer of another thread. | ||
|
||
This is probably best described with an example. Say we have a current label X3, and two other labels that both a smaller label number, X1 and X2, are both comparing to it. Our order in this case is X1 < X2 < X3. What should happen (and what does happen in the case of atomic updates), is that all three labels will result in the minimum of the three, or X1. | ||
|
||
In the case of using min(), however, both threads would perform comparisons at the same time. One thread might overwrite X3 with X1 while another thread might then overwrite it again with X2, which is higher than X2. In this example, X3 was unintentionally increased from X1 to X2, forcing us to perform at least one more iteration to completely decrease it. Thus, after a given iteration, the result may be wrong, but at the very end, we will still have a correct result (it may just more iterations to reach it). | ||
|
||
As a result of the above analysis, we have an increased number of iterations, which could make total computation time longer. In general though, min() is faster than atomic_min(), so it is unclear which will provide superior performance -- it really boils down to what hurts more: the extra iterations, or the slower atomic_min() function. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice