Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate CAN bus load #231

Closed
PeterBowman opened this issue Sep 7, 2019 · 13 comments · Fixed by #243
Closed

Investigate CAN bus load #231

PeterBowman opened this issue Sep 7, 2019 · 13 comments · Fixed by #243

Comments

@PeterBowman
Copy link
Member

PeterBowman commented Sep 7, 2019

Issue #217 targets the saturation of CAN buses due to the Cui encoders being configured in continuous send (i.e. push) mode. It is proposed that polling (i.e. pull) mode should be used instead, although we'd like to measure bus traffic beforehand to estimate the actual RX/TX load.

If we take a look into the PCAN user manual, we learn it supports event-like incoming CAN frames (in addition to actual data):

  • pcanfd_set_init() supports the PCANFD_INIT_BUS_LOAD_INFO flag via pcanfd_init struct:

    (...) the driver will periodically put STATUS messages in the rx fifo queue of this channel to inform the application of the current bus load the channel is connected to.

  • pcanfd_get_state() populates a pcanfd_state struct in which a bus_load 16-bit property is included.
  • pcanfd_recv_msg() (and "overloads") reads messages that may contains status information (pcanfd_msg.type holds PCANFD_TYPE_STATUS) and, specifically, bus load (PCANFD_BUSLOAD is set in pcanfd_msg.flags).

All of this concerns the Peak CAN bus device in particular. Device monitoring is proposed in #225 by means of the ICanBusErrors class. This solution entails dealing with YARP interfaces and propagating desired data through the CanBusControlboard layer and beyond, assuming we build a EasySetup-like GUI to rule everything status-related in a global context (having access to each node in all CAN buses).

Now, other solutions exist and do not meddle with the CanBusControlboard architecture. Credits to @rsantos88 for his research on these tools (read #217 (comment) and watch https://youtu.be/GRwmkx_YIIc):

This issue aims to extract the most of information we can from CAN (hey, I'm a wordplay ninja) using available tools. Then, tweak RX/TX rates (#217 regarding Cuis) and decide on how monitoring should be carried out (#225).

@jgvictores
Copy link
Member

Tiny grain of salt: technically, the Yokogawa oscilloscope we have can be used to monitor CAN messages (maybe https://www.youtube.com/watch?v=FqDPMUk1KQQ).

@PeterBowman
Copy link
Member Author

Martin Rostan, Configuration Guideline for CANopen Networks, provides very useful pointers on determining CAN bus load and several hints and guidelines on this matter. Our CAN design decisions should span to issue #232.

@PeterBowman
Copy link
Member Author

Idea: drop the whole zero-delay thing in CanBusControlboard's RW threads, assume users will set a tiny yet non-micro-nor-zero-delay (e.g. microseconds?), check that CPU usage does not rocket up, and embrace yarp::os::PeriodicThread for clarity (#191 (comment)).

void CanReaderWriterThread::setDelay(double delay)
{
this->delay = delay <= 0.0 ? std::numeric_limits<double>::min() : delay;
}

@PeterBowman
Copy link
Member Author

PeterBowman commented Jan 17, 2020

Idea: drop the whole zero-delay thing in CanBusControlboard's RW threads, assume users will set a tiny yet non-micro-nor-zero-delay (e.g. microseconds?), check that CPU usage does not rocket up, and embrace yarp::os::PeriodicThread for clarity (#191 (comment)).

Done at 908bc29, currently delaying 100 microseconds for both read/write threads (roboticslab-uc3m/teo-configuration-files@8c69d4c).

For completeness: previous yarp::os::Thread implementation raised up to 25-35% usage on a single CAN bus (0-1 millisecond), and almost 50% on a dual-bus configuration. The periodic thread alternative reaches around 20% and 35%, respectively (100-100 microseconds). Both tests at SYNC period of 30 milliseconds.

Edit: which raises up to 200% with a SYNC period of 5 milliseconds on a single arm...

@PeterBowman
Copy link
Member Author

I managed to perform position control via joystick with a SYNC period of just 2 milliseconds on TEO's right arm. CSP mode was running under the hood (#222). Setup (roboticslab-uc3m/kinematics-dynamics#173 (comment)):

  • yarpdev --device SpaceNavigator --period 1 --name /spacenavigator --ports "(mouse buttons)" --channels 8 --mouse 0 5 0 5 --buttons 6 7 0 1

  • launchCanBus --from manipulation-rightArm.ini (make sure syncPeriod 0.002)

  • yarpdev --device BasicCartesianControl --name /bcc --remote /teo/rightArm --ik st --from /usr/local/share/teo-configuration-files/contexts/kinematics/fixedTrunk-rightArm-fetch.ini --local /bcc/client

  • streamingDeviceController --streamingDevice SpaceNavigator --remoteCartesian /bcc --movi --gain 0.001 --SpaceNavigator::fixedAxes "(rotx roty rotz)" --period 0.001

@PeterBowman
Copy link
Member Author

Edit: which raises up to 200% with a SYNC period of 5 milliseconds on a single arm...

I'm unable to replicate this today. I decided to revert 908bc29 since this is a classical save-CPU-resources problem. I don't really need to run stuff at precise intervals nor any of the statistical data yarp::os::PeriodicThread provides. Also, it seems the yarp::os::Thread solution can spare a bit of CPU %.

PeterBowman added a commit that referenced this issue Jan 18, 2020
@PeterBowman
Copy link
Member Author

PeterBowman commented Jan 26, 2020

I implemented yarp::dev::ICanBusErrors in order to inhibit CAN TX operations, most notably the periodic SYNC messages:

//-- Query bus state.
if (!iCanBusErrors->canGetErrors(errors) || errors.busoff)
{
//-- Bus off, reset TX queue.
preparedMessages = 0;
return;
}

On power-off condition (i.e. emergency button pressed), we observed a constantly increasing CAN error count and a nearly 100% bus load via lspcan utility. It takes a single message written to the bus in order to saturate it. I presume there goes some kind of "bouncing" across dead nodes, or perhaps the power-off state is electrically detrimental for the bus. The canGetErrors check is therefore not enough, but I think there is no other means to diagnose whether power has gone off at any instant, and at least it prevents from commencing pointless CAN transfers.

@PeterBowman
Copy link
Member Author

PeterBowman commented Jan 27, 2020

@PeterBowman
Copy link
Member Author

PeterBowman commented Jan 27, 2020

Added new /<robot>/load:o port for streaming CAN bus load info, see 30a2927 and roboticslab-uc3m/teo-configuration-files@eb8f6d2. The yarpscope utility comes quite handy:

yarpscope --remote /teo/can/rightArm/load:o --min 0 --max 1 --persistent

@PeterBowman
Copy link
Member Author

I managed to perform position control via joystick with a SYNC period of just 2 milliseconds on TEO's right arm.

On this setup, (estimated) bus load is stable around 90%.

@PeterBowman
Copy link
Member Author

ASWJ we pick the following defaults: 20 ms for SYNC period, 10 ms for YARPs' controlboardwrapper period (roboticslab-uc3m/teo-configuration-files@933370a).

@PeterBowman
Copy link
Member Author

Added new /<robot>/load:o port for streaming CAN bus load info

Split into RX bus load, TX bus load, and overall bus load: cf1aeaa. We learned a CAN bus is half-duplex, therefore it actually makes sense to sum both terms: https://stackoverflow.com/a/58505043.

yarpscope --remote /teo/can/rightArm/load:o --min 0 --max 1.1 --index "(0 1 2)" --color "(Red Green Blue)" --persistent

@PeterBowman
Copy link
Member Author

Currently 68% CAN bus load at much lower CPU usage (albeit on a new and better PC) with a SocketCAN implementation: #251 (comment). For local inspection, the canbusload command is preferable and more accurate (accounts for stuffed bits if enabled).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants