Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Host-Host communication #404

Open
juliangaal opened this issue Jan 9, 2025 · 6 comments
Open

Issues with Host-Host communication #404

juliangaal opened this issue Jan 9, 2025 · 6 comments

Comments

@juliangaal
Copy link

Hi, and thanks for making zenoh available to the ros2 world!

Issue

I am having some issues with setting up a Host-Host connection. Namely: subscribing to publisher @ 20Hz from "talker" on "listener" results in 0.5-4 Hz with ros2 topic hz

I have tried to make my experiments as reproducible as possible, with docker, in this repo. In addition, this is a recording of the sensor to reproduce the network traffic.

Setup

  • switch: TP-Link TL-SG105, connected:
    • OS1-128 (11.11.11.10)
    • Laptop (11.11.11.5, "listener")
    • Jetson Orin (11.11.11.2, "talker")

Steps to Reproduce

Based on the included Justfile:

On both listener and talker:

  • build the necessary jazzy container: just build

On talker:

  • Start router: just start router
  • Download the file(s), save them in a single directory and place in root. The bag file will be available in the container in /mydata/FILE. Launch a container to play the bagfile: just start debug, then ros2 bag play -l /mydata/FILE

On listener:

  • Start router: just start router_listener. Adjust endpoint in routerconfig.json5 if IPs differ
  • Start subscriber: just start debug. Enter ros2 topic hz /ouster/points

Working with one router on talker and the session config on listener pointing to router on talker resulted in even worse results.

I'm having a hard time understanding how to debug from here. Thanks for your help!

@Yadunund
Copy link
Member

Yadunund commented Jan 9, 2025

@imstevenpmwork could you try to reproduce this?

@Yadunund
Copy link
Member

@juliangaal it would also help if you could provide a much simpler reproducible setup. The reported issue involves a Jetson, docker containers and rosbags which all add additional layers of complexity and it becomes hard to isolate the cause of the problem.

I would greatly appreciate if you could provide a single repo we can compile with a talker and listener nodes that publish messages of payload you expect (if you know the number of bytes, you can just fill in an Image message with zero initialized data fields). Then reproducing the issues between across two Ubuntu 24.04 machines with rmw_zenoh running natively would be the first step. If we can reproduce the problem with this setup, we can look into it closer.

@juliangaal
Copy link
Author

I get your point - my host system (Jetson) does not support Ubuntu 24.04 however. That's the reason for my docker setup.

Let me work on simplifying the conditions needed for reproducibility and get back to you.

@clalancette
Copy link
Collaborator

One other thing I will point out; ros2 topic hz has known problems properly computing the hertz of a publication. So I would take what it outputs with a grain of salt. If you ignore the output of ros2 topic hz, can you confirm whether data is showing up at the subscriber at the expected frequency?

@juliangaal
Copy link
Author

I have heard of these issues before. Yet rviz shows the same behavior.

juliangaal added a commit to juliangaal/ouster_zenoh that referenced this issue Jan 10, 2025
juliangaal added a commit to juliangaal/ouster_zenoh that referenced this issue Jan 10, 2025
juliangaal added a commit to juliangaal/ouster_zenoh that referenced this issue Jan 10, 2025
@juliangaal
Copy link
Author

I have done my best to simplify the setup. I have added a node that publishes the exact same payload (please ignore the details, I have used the opportunity to make myself familiar with the PointCloud2 format), and the corresponding subscriber

Steps to Reproduce

Compile (on Talker and Listener)

mkdir -p ws/src && cd ws/src
git clone -b v2 [email protected]:juliangaal/ouster_zenoh.git
cd .. && colcon build

Listener (in my setup: Laptop)

Router

export ROS_DOMAIN_ID=10
export RMW_IMPLEMENTATION=rmw_zenoh_cpp
ros2 run rmw_zenoh_cpp rmw_zenohd

Subscriber

export ROS_DOMAIN_ID=10
export RMW_IMPLEMENTATION=rmw_zenoh_cpp
ros2 run ouster_simulator ouster_subscriber_node
ros2 topic hz /data

Talker (in my setup: Jetson)

Router

export ROS_DOMAIN_ID=10
export RMW_IMPLEMENTATION=rmw_zenoh_cpp
export ZENOH_ROUTER_CONFIG_URI=routerconfig.json5 # Adjust IP Address
ros2 run rmw_zenoh_cpp rmw_zenohd

Publisher

export ROS_DOMAIN_ID=10
export RMW_IMPLEMENTATION=rmw_zenoh_cpp
export ZENOH_ROUTER_CONFIG_URI=routerconfig.json5 # Adjust IP Address
ros2 run ouster_simulator ouster_publisher_node

Observations

  • subscriptions on talker: 20hz
  • subscriptions on listener: 20hz

At this point, I was thoroughly confused about what was going on. So i recorded a bag file with the simulated data: timeout 10 ros2 bag record /data. When playing this file on the talker and listening, I ran into some performance issues, but not even in the same ballpark as with orginal data recorded from sensor driver.

  • 17-20hz, possibly lower when 2 subscribers (ros2 topic hz /data and rviz2). Typically starting with 20hz and dropping. But definitely more instable as subscribing directly.
  • rviz2 always reports ~12-14hz on listener

What has also become clear is that the original issue with extremely low rate is probably due to the ouster driver publishing with Durability: TRANSIENT_LOCAL, see output of ros2 topic info -v /ouster/points below. no idea why. Thank you for suggesting slimming down the issue @Yadunund.

  Node namespace: /
Topic type: sensor_msgs/msg/PointCloud2
Topic type hash: RIHS01_9198cabf7da3796ae6fe19c4cb3bdd3525492988c70522628af5daa124bae2b5
Endpoint type: PUBLISHER
GID: ce.b2.88.88.a4.7d.b7.28.bf.a8.14.d3.84.17.30.bc
QoS profile:
  Reliability: RELIABLE
  History (Depth): KEEP_LAST (10)
  Durability: TRANSIENT_LOCAL
  Lifespan: Infinite
  Deadline: Infinite
  Liveliness: AUTOMATIC
  Liveliness lease duration: Infinite

Subscription count: 0

Are you able to reproduce the slight performance drop issues with recorded, simulated data? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants