Localization

What is Localization?

In simple terms, localization is just figuring out where the robot is. In the case of our rover, we need to know what point it is located at in 3D space, as well as what angle it is rotated to in each axis. This information is also known as a pose (see this page). We need to have this data at all times, and it needs to be updated frequently enough to keep up with the rover's motion. To get this information, we employ a variety of sensors:

GPS

GPS (Global Positioning System) is a system that uses signals from an array of satellites orbiting earth to figure out where you are on the planet (read more about how this works here). We use a GPS unit mounted to the rover, which outputs our position on the earth in degrees latitude and degrees longitude.

TODO: add image of lat lon coords on globe

TODO: add an image of a GPS unit

TODO: copy/adapt David's RTK wiki page (and any other useful pages) from the old mrover workspace into here

IMU

An IMU (Inertial Measurement Unit) is a device that contains several sensors that are used in combination to measure orientation and movement. A 9DoF (Degree of Freedom) IMU like the one we use consists of:

A 3 axis gyroscope, which can measure angular velocity in each axis
A 3 axis accelerometer, which can measure linear acceleration in each axis
A 3 axis magnetometer, which essentially acts as a compass and can measure the direction of magnetic north in each axis All of this data is then combined using sensor fusion algorithms (more on this later) in order to produce a bearing measurement, AKA the angle of rotation around the Z axis.

TODO: add an image of an IMU

TODO: add section about cameras and visual odometry

Guide to Localization Frames

When determining and defining where the robot is, we have to deal with an inherent tradeoff in data quality. Usually data sources can be either locally accurate or globally accurate, but it is much more difficult to get a single source of data that is both globally and locally accurate.

This problem is addressed by the standards introduced in REP 105 (which you should read to get a general overview). Unfortunately, the REP uses some misleading terms and gives a somewhat confusing explanation, so here is a hopefully more clear way of explaining it.

First, a few important notes and definitions:

local vs global accuracy

In the context of localization, we often talk about local and global accuracy, so we need to be very clear about what these things mean. Locally accurate data must change in a smooth way, i.e. no discrete/discontinuous jumps. However, it may drift over time without bounds, meaning it can gradually accumulate large errors. On the other hand, globally accurate data must not drift significantly over time, but may have discrete/discontinuous jumps at any time. These definitions are of course a little bit vague, since they somewhat depend on the idea of a short vs long period of time, but this isn't usually a problem since we don't deal with a lot of significant edge cases.

abstraction of sensor data sources into transforms

For the purposes of this explanation, we will abstract away any sensor processing/fusion algorithms and just imagine all localization data sources as providing a transform telling us where the robot is located. A global sensor transform will be relative to a fixed frame, in this case the map frame. A local sensor transform on the other hand may be relative to any arbitrary starting point, so long as the transform obeys the rules of local accuracy.

`map`

Since pose is relative (see SE3 wiki entry), we need to define where the robot is relative to some "fixed" frame. Fixed in this case means it does not move relative to the thing/place we are navigating in, which we will call the world. We will use the "map" frame as our fixed frame, and the pose of the robot will be defined relative to that frame, i.e. it will be defined as a transform from map to the robot.

`base_link`

The base_link frame is simply the frame of the robot. It is rigidly attached to the robot, and in our case is located at the center of the chassis (TODO: is this true?).

`odom`

The odom frame is an intermediate frame in between map and base_link. It doesn't really have a physical representation, but it gives us a good way to separate local and global data sources. Odom essentially acts as a local reference frame for the robot. This means that the pose of the robot relative to the odom frame should always be locally accurate, but doesn't need to be globally accurate.

There are two transforms we need to define in order to connect these three frames, which thereby defines the pose of the robot:

`odom_to_base_link`

The transform from the odom frame to the base_link frame, which we will call odom_to_base_link, will be defined using locally accurate sensor data. This means the pose of the robot in the odom frame (which is the exact same thing as this transform) will be locally accurate, i.e it can drift over time, but must always change smoothly and continuously without discrete jumps.

`map_to_odom`

The transform from the map frame to the odom frame, which we will call map_to_odom in this case, will be defined using globally accurate sensor data. Our goal here is to use this data to make the transform from map to base_link (which is equal to map_to_odom * odom_to_base_link) globally accurate, i.e it will not drift over time, but it may change non-continuously with discrete jumps. We want to do this by only changing the map_to_odom frame, which means we have to first figure out what the odom_to_base_link transform is (by asking the TF tree) and then "subtract" that from our global sensor transform in order to determine the correct map_to_odom transform.

Using these frames in practice

The main benefit of this system is having an organized way of separately accessing local and global localization data. For applications where you need a locally accurate localization source, you can get the odom_to_base_link transform from the TF tree and use that as your robot pose. Similarly, for applications where you need globally accurate localization, you can get the map_to_odom transform from the TF tree and use it as your robot pose (for exact syntax, see the wiki section about the TF tree or just google it).

Here's a simple example of using these localization frames: suppose we have a simple 4 wheeled robot equipped with a GPS unit and wheel encoders. The GPS provides a globally accurate source of localization data, while the wheel encoder data can be fed into a sensor processing algorithm to produce a locally accurate source of localization data.

In order to use these two data sources, our first step is to publish the wheel encoder localization data as the odom_to_base_link transform to the TF tree. Similarly, we will also use the GPS data to publish the map_to_odom transform to the TF tree, as described in the map_to_odom section. Once this is done, we can actually use this data for some autonomous routines.

Suppose we want to write a function that rotates the robot 90 degrees in place. Since this is a routine that won't take very much time, is only based on relative measurements, and needs to be quite accurate on a local scale, we want to use a locally accurate data source. To do this, we will get the odom_to_base_link transform from the TF tree, look it's rotation, and then send the robot drive commands until the rotation we read from odom_to_baselink has increased by 90 degrees.

Now suppose we want to write a function to drive to a specific waypoint on our map, a mile away. Since this routine will take a while and is based on absolute measurements (relative to the map frame), we need to use a globally accurate data source to avoid drift and so we can measure relative to the map frame. To do this, we will get the map_to_odom transform from the TF tree, look at its position, and then use a drive controller to drive the robot in the direction of the waypoint until the position we read from map_to_odom is close enough to our target waypoint.