Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question for the implementation of ParseFileParallel #29

Open
lch32111 opened this issue Oct 16, 2024 · 0 comments
Open

Question for the implementation of ParseFileParallel #29

lch32111 opened this issue Oct 16, 2024 · 0 comments

Comments

@lch32111
Copy link

lch32111 commented Oct 16, 2024

Hello.

I also have a question about the implementation of ParseFileParallel.
Actually, you use ProcessBlocksImpl by assigning block_begin and block_end for each thread in the multi threaded configuration.

My concern is how your code is handling the case where the buffer has an uncomplete line at the end of blocks.
For example, Let's assume we have block_begin 4 and block_end 8 for thread 2 in ProcessBlocksImpl. I have an virtual obj lines for this example:

# BLOCK 4 Start
v 0.0 0.0 0.0
...
# BLOCK 4 End

# BLOCK 5 Start
v 0.0 0.0 0.0
...
# BLOCK 5 End

# BLOCK 6 Start
v 0.0 0.0 0.0
...
# BLOCK 6 End

# BLOCK 7 Start
v 0.0 0.0 0.0
v 0.0 0.0 0.0
...
v 0.0 0.0
# BLOCK 7 End

# BLOCK 8 Start
0.0
v 0.0 0.0 0.0
...
# BLOCK 8 END

In this case, when processing BLOCK 7, it encounters an uncomplete line v 0.0 0.0, missing one element of the vertex. I think your code is not handling this case in the multi thread case. In a single thread case, your code is handling this case by copying the rest of the line into the back_buffer with the remainder variable and stop_parsing_after_eol false.

I guess the problem is caused by stop_parsing_after_eol set as true in the multi thread case.

for (size_t i = 0; i != tasks.size(); ++i) {
bool is_last = i + 1 == tasks.size();
auto begin = tasks[i];
auto end = is_last ? num_blocks : (tasks[i + 1] + 1);
bool stop_parsing_after_eol = !is_last;
auto chunk = &(*chunks)[i];
threads.emplace_back(ProcessBlocks, source, i, begin, end, stop_parsing_after_eol, chunk, context);
threads.back().detach();
}

On the above code, you are setting stop_parsing_after_eol as true for all the threads except for the last one. As a result,
for (size_t i = block_begin; i != block_end; ++i) {
auto remainder = size_t{};
bool last_block = (i + 1 == block_end) || reached_eof;
if (!last_block) {
file_offset = (i + 1) * kBlockSize;
if (auto ec = reader->ReadBlock(file_offset, kBlockSize, back_buffer + kMaxLineLength)) {
chunk->error = Error{ ec };
return;
}
} else if (stop_parsing_after_eol) {
if (auto ptr = static_cast<const char*>(memchr(text.data(), '\n', kMaxLineLength))) {
auto pos = static_cast<size_t>(ptr - text.data());
line = text.substr(0, pos);
if (EndsWith(line, '\r')) {
line.remove_suffix(1);
}
++chunk->text.line_count;
if (auto rc = ProcessLine(line, chunk, context); rc != rapidobj_errc::Success) {
chunk->error = Error{ make_error_code(rc), std::string(line), chunk->text.line_count };
}
} else {
++chunk->text.line_count;
auto ec = make_error_code(rapidobj_errc::LineTooLongError);
chunk->error = Error{ ec, std::string(text, 0, kMaxLineLength), chunk->text.line_count };
}
return;
}

When i becomes block_end - 1 (the last i), it will at most process one line and then exit the ProcessBlocksImpl without handling the rest of the text data in the branch else if (stop_parsing_after_eol). Even though we set stop_parsing_after_eol as false in other threads, we need more code to handle the last line of BLOCK 7 which has a missing element. I think you have to read the next block (BLOCK 8 in my example) and then process one line to get the missing element.

I might be confused with your code because I have looked through your code for two days,
but what I still have seen works like that.
If you have any idea for this, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant