-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(autoware_lidar_centerpoint): added the cuda_blackboard to centerpoint #9453
base: main
Are you sure you want to change the base?
feat(autoware_lidar_centerpoint): added the cuda_blackboard to centerpoint #9453
Conversation
…point Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Thank you for contributing to the Autoware project! 🚧 If your pull request is in progress, switch it to draft mode. Please ensure:
|
@knzo25 Thank you for your great work 🎉 Do you have any documentation for the cuda_blackboard package? Just a simple API references and overall design in a readme would be helpful to review the PRs. |
@kminoda |
@kminoda |
…terpoint Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
std::bind(&LidarCenterPointNode::pointCloudCallback, this, std::placeholders::_1)); | ||
pointcloud_sub_ = | ||
std::make_unique<cuda_blackboard::CudaBlackboardSubscriber<cuda_blackboard::CudaPointCloud2>>( | ||
*this, "~/input/pointcloud", false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you write a documentation for bool add_compatible_sub
in cuda_blackboard repository? It is difficult to tell whether this "false" value is OK or not from the current documentation.
} | ||
inference(); | ||
postProcess(det_boxes3d); | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also perform voxel size validation?
return true; | |
// Check the actual number of pillars after inference to avoid unnecessary synchronization. | |
unsigned int num_pillars = 0; | |
CHECK_CUDA_ERROR( | |
cudaMemcpy(&num_pillars, num_voxels_d_.get(), sizeof(unsigned int), cudaMemcpyDeviceToHost)); | |
if (num_pillars >= config_.max_voxel_size_) { | |
rclcpp::Clock clock{RCL_ROS_TIME}; | |
RCLCPP_WARN_THROTTLE( | |
rclcpp::get_logger("image_projection_based_fusion"), clock, 1000, | |
"The actual number of pillars (%u) exceeds its maximum value (%zu). " | |
"Please considering increasing it since it may limit the detection performance.", | |
num_pillars, config_.max_voxel_size_); | |
} | |
return true; |
Description
This PR is part of a series of PRs that aim to accelerate the Sensing/Perception pipeline through an appropriate use of CUDA.
List of PRs:
To use these branches, the following additions to the
autoware.repos
are necessary:Depending on your machine and how many nodes are in a container, the following branch may also be required:
https://github.com/knzo25/launch_ros/tree/fix/load_composable_node
There seems to be a but in ROS where if you send too many services at once some will be lost and
ros_launch
can not handle that.Related links
Parent Issue:
How was this PR tested?
The sensing/perception pipeline was tested until centerpoint for TIER IV's taxi using the logging simulator.
The following tests were executed in a laptop equipped with a RTX 4060 (laptop) GPU and a Intel(R) Core(TM) Ultra 7 165H (22 cores)
Notes for reviewers
The main branch that I used for development is
feat/cuda_acceleration_and_transport_layer
.However, the changes were too big so I split the PRs. That being said, development, if any will still be on that branch (and then cherrypicked to the respective PRs), and the review changes will be cherrypicked into the development branch.
Interface changes
An additional topic is added to perform type negotiation:
Example:
input/pointcloud
->input/pointcloud
andinput/pointcloud/cuda
Effects on system behavior
Enabling this preprocessing in the launchers should provide a much reduced latency and cpu usage (at the cost of a higher GPU usage)