[Documentation] [TitleIndex] [WordIndex

Description

Package implementing image-based machine learning with the MxNet platform. This code and documentation was developed for those working with the Rover Robotics OpenRover (TX1 or TX2). However, it should work for anyone with a compatible NVidia JetPack installed TX1 and TX2 (see below in install instructions).

Packages available

SSD Object Detection

Semantic Segmentation

Combined Segmentation and Monocular Depth Estimation

Note on Tegra Memory

Because the Tegra platform (and other NVidia embedded platforms) share main memory (RAM) between the CPU and GPU, you will see a doubling of the memory used if you enable the GPU in this node. The reason for this is all backends are now written assuming you have a separate GPU memory (with a PCIe graphics card, like a GTX1080), which is not the case for most embedded GPUs. That means the backend calls CUDA to copy the network and data over to the GPU, which on the Tegra is just another part of the same main memory, thus doubling the memory used. It may be possible for the backends to call CUDA differently (e.g. sharing the same memory location) but it isn't well-documented and it may be some time before someone with time and interest takes on the task.

Install from source

Requirements

If using a NVIDIA TX1

If using a NVIDIA TX2

Install ROS Package

cd ~/catkin_ws/src/
git clone https://github.com/RoverRobotics/rr_mxnet

Download Pre-Built MXNet Bindings (Ubuntu 18.04)

New for JetPack4.2 and TX2 with Ubuntu18.04. Rebuilt mxnet bindings:

Install dependencies:

sudo apt install python-scipy python3-scipy python-matplotlib python3-matplotlib
sudo pip install tqdm

Download and unzip in home directory, cd inside and run ./install_bindings.sh to automatically make links into the .local/lib/python[2.7,3.6]/site-packages/ directories. Keep this unpacked directory but remove the zipfile when unpacked. When finished, install gluoncv with pip:

pip3 install gluoncv --user
pip2 install gluoncv --user

Download Pre-Built MXNet Bindings (Ubuntu 16.04)

Unzip the TX and GluonCV bindings and copy to either the /usr/local/lib/python2.7/dist-packages/ or the ~/.local/lib/python2.7/dist-packages/ area. There may be additional dependencies required to load GluonCV, at minimum it requires scipy and the tqdm progress meter. After unpacking the relevant TX and GluonCV bindings, test they will load in python. Note, GluonCV is in active development and if you have issues, try the nightly build (pip install gluoncv --pre). This GluonCV snapshot zip (commit 6721429cb93637a0f9a7ec8a85deea4d43cb5d62) was verified to work with SSD and segmentation networks.

unzip mxnet_<target>_bindings_<jetpack_cuda_cudnn string>.zip
sudo mv mxnet_<target>_bindings_<jetpack_cuda_cudnn string>/mxnet-1.3.0-py2.7.egg /usr/local/lib/python2.7/dist-packages/.

unzip gluon-cv-0.3.zip
mv gluoncv ~/.local/lib/python2.7/dist-packages/

sudo apt install python-scipy
sudo pip install tqdm

Add a link to this package in easy-install.pth in the /usr/local/lib/python2.7/dist-packages/ subdirectory:

./mxnet-1.3.0-py2.7.egg

Install required python libraries:

sudo apt-get install python-opencv python-matplotlib python-numpy

Test MxNet Installation

python
>>> import mxnet as mx
>>> mx.__version___
'1.3.0'
>>> import gluoncv
>>> gluoncv.__version__
'0.3.0'
>>> print mx.nd.zeros((5,5),ctx=mx.gpu(0))

[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
<NDArray 5x5 @cpu(0)>

If you're working on a computer other than the TX1 or TX2, and/or if the above bindings didn’t work for you, follow the mxnet documentation on installation and if you are lucky, pip will work for you. If not, install from source.

Download and Run an Example Model

The model parameters for several SSD (single shot detection) networks are provided by GluonCV, based on MxNet. These will be downloaded automatically by specifying the relevant network in the launch file for fast and accurate object detection. Several example networks provided out-of-the-box with gluoncv include ResNet50, ResNet100, VGG16, mobilenet1.0 among others. See GluonCV's Detection Model Zoo for up-to-date models. Here is the relevant line in the launch file for loading a resnet50 (v1) based SSD, trained on the 20-class VOC dataset and intended to be run on 512x512 sized images:

    <param name="network" value="ssd_512_resnet50_v1_voc" />

Test the detector with an image being published to /usb_cam/image run:

roslaunch rr_mxnet example-resnet50-20classes.launch

View detections with rqt_image_view

rqt_image_view

Choose /rr_mxnet/image topic from the dropdown menu. If all is working well, you should see fresh images and possibly detections overlain on the image.

Troubleshooting

If you do not see output, first check the node started and check that mxnet is working. The node should output a line indicating it has initialized with the selected model file like so:

[INFO] [1537973019.312896]: [MxNet Initialized with model ssd_512_resnet50_voc]

If instead the node gives an error starting MxNet, the following areas should be checked: the MxNet bindings, GluonCV bindings, the model name or CUDA/cuDNN/NVidia driver setup.

Nodes

SSD Object Detection Semantic Segmentation

mxnet_ssd_node

The mxnet_ssd node runs a single shot detector on images from a subscribed image topic and publishes localization and classification predictions from a pre-defined list of classes. The localization consists of relative bounding box locations with respect to the full image size (xmin, ymin, xmax, ymax) and the classification consists of a number between 0 and 20 which is converted to a meaningful object string in the detections topic. The actual detection is performed on a 512x512 resized image that is cropped out of the subscribed image topic. This node can be configured to crop and resize a single image out of the image topic or it can be configured to perform a sliding-window set of crops on the image for better distance detection.

SSD is a single-shot detector, so why are we specifying a sliding window pattern?

To improve detection distance. The detection layers require meaningful features to produce detections and we don’t get useful features until ~16-fold reduction in the image. Therefore, to get the best distance detection, we also use a sliding window method to best match SSD’s capabilities with what we want out of it. However, this will take more time to complete each set of detections, so find the tradeoff that works best for you. Note, your camera image quality will be more important for longer distance performance. If you have a 5 megapixel image, try zooming in to a 500x500 crop of the image - if this is zoom is noticeably blurry, you will not get as much benefit from a sliding window. <show example images?> For the first-level sliding window we only need to specify the number of crops across the larger image dimension (typically width) as it will take crops that are sized to the smaller image dimension (e.g. height). For the second-level sliding window, we need to specify the number of crops across both dimensions and the crop size. The detector will work out the resulting overlap of the crops (depends on your application, you could aim for 20% at first) and report this in both dimensions on the node standard output. This will require a little trial and error to tune.

Image Sizes

A brief aside on image sizes, most often the x-dimension (width) of an image is larger than the y-dimension (height), with an aspect ratio of 4:3 being common (e.g. 640x480 images have an aspect ratio of 4:3). However, the detector can only accept square image inputs, so if you are not specifying a sliding window crop, you have two choices: 1) resize the non-square image to square, which will squish one dimension, or 2) center crop the non-square image to a square image with the same minimum dimension (e.g., a 640x480 image could be cropped to 480x480, discarding a column of 15% of the pixels on each side). It is possible someone will have a vertical camera and want to use this code, so we let you configure this. By default the code will center-crop.

Subscribed Topics

/rr_mxnet_ssd/enable std_msgs::Bool

/rr_mxnet_ssd/zoom std_msgs::Bool

/usb_cam/image_raw sensor_msgs::Image

/rr_mxnet_segmentation/segmentation_mask sensor_msgs::Image

/rr_mxnet_ssd/mask_overlap std_msgs::Int32

Published Topics

~/image sensor_msgs::Image ~/image/compressed sensor_msgs:CompressedImage

~/detections vision_msgs::Detection2DArray

~/detections/<class name> std_msgs::Bool

Parameters

~image_topic (String, default "/usb_cam/image_raw")

~detections_topic (String, default ~/detections)

~image_detections_topic (String, default ~/image)

~throttle_timer (Int32 or Float, default 5)

~latency_threshold (Int32 or Float, default 2)

~start_enabled (Bool, default False)

~publish_detection_images (Bool, default False)

~camera_frame (String, default "ssd_detector_frame")

~start_zoom_enabled (Bool, default False)

~mask_overlap (Int32, default 0)

MxNet Settings

~threshold (Float, default 0.5)

~batch_size (Int32, default 1)

~model_directory (String, default "~/mxnet/ssd")

~model_filename (String, default "ssd_512_resnet50_v1_voc.params")

~network (String, default "ssd_512_resnet50_v1_voc")

. Name of the network symbol to load. Choices include "ssd_512_resnet50_v1_voc", "ssd_512_resnet50_v1_coco", "ssd_512_resnet50_v1_custom" and similar with mobilenet1.0, vgg16. See GluonCV's Detection Model Zoo for a more complete list.

~enable_gpu (Bool, default false)

~classes (String, default "aeroplane, bicycle, bird, boat, bottle, bus,car, cat, chair, cow, diningtable, dog, horse, motorbike,person, pottedplant, sheep, sofa, train, tvmonitor")

Crop Pattern

~level0_ncrops (Int32, default 1)

~level1_xcrops (Int32, default 0)

<TODO: images of crop patterns>

~level1_ycrops (Int32, default 0)

~level1_crop_pixels (Int32, default 300)

Save Parameters

~save_detections (Bool, default False)

. Save detected images in jpeg form, with incrementing number.

~save_directory (String, default "/tmp")

. Directory to save detection images.

Example Launch File

<launch>
  <node name="rr_mxnet_ssd" pkg="rr_mxnet" type="mxnet_ssd_node.py" output="screen" >
    <param name="image_topic" value="/usb_cam/image_raw" />
    <param name="mask_topic" value="/rr_mxnet_segmentation/segmentation_mask" />
    <param name="detections_topic" value="~detections" />
    <param name="publish_detection_images" value="false" type="bool" />
    <param name="image_detections_topic" value="~image" />
    <param name="throttle_timer" value="5" type="int" />
    <param name="threshold" value="0.5" type="double" />
    <param name="start_enabled" value="false" type="bool" />
    <param name="start_zoom_enabled" value="false" type="bool" />
    <param name="level0_ncrops" value="2" type="int" />
    <param name="level1_xcrops" value="4" type="int" />
    <param name="level1_ycrops" value="2" type="int" />
    <param name="level1_crop_size" value="380" type="int" />
    <param name="enable_gpu" value="true" type="bool" />
    <param name="classes" value="aeroplane, bicycle, bird, boat, bottle, bus,car, cat, chair, cow, diningtable, dog, horse, motorbike,person, pottedplant, sheep, sofa, train, tvmonitor" />
    <param name="batch_size" value="1" type="int" />
    <param name="network" value="ssd_512_resnet50_v1_voc" />
    <param name="model_filename" value="ssd_512_resnet50_v1_voc.params" />
    <param name="save_detections" value="false" type="bool" />
    <param name="save_directory" value="/tmp" />
  </node>
</launch>

mxnet_segmentation_node

Semantic segmentation using pretrained segmentation networks from GluonCV, see segmentation model zoo for a list of relevant pre-trained segmentation networks.

Published Topics

~/segmentation_mask sensor_msgs::Image, latched output

~/segmentation_overlay sensor_msgs::Image

Subscribed Topics

/usb_cam/image_raw sensor_msgs::Image

/rr_mxnet_segmentation/run_once std_msgs::Bool

rostopic pub /rr_mxnet_segmentation/run_once std_msgs/Bool "data: true" -1

/rr_mxnet_segmentation/run_continuous std_msgs::Bool

Parameters

~image_topic (String, default "/usb_cam/image_raw")

~mask_topic (String, default "~segmentation_mask", latched=True)

~overlay_topic (String, default "~segmentation_overlay")

~run_continuous (Bool, default False)

~mask_values (String, default "9, 12")

~network (String, default "deeplab_resnet50_ade")

. Name of the network symbol to load. Choices include "vgg_resnet50_ade", "psp_resnet50_ade", "deeplab_resnet50_ade" and others. See GluonCV's Segmentation Model Zoo for a more complete list.

~enable_gpu (Bool, default false)

~throttle_timer (Int32, default 5)

~latency_threshold (Int32, default 2)

Example Launch File

<launch>
  <node name="rr_mxnet_segmentation" pkg="rr_mxnet" type="mxnet_segmentation_node.py" output="screen" >
    <param name="image_topic" value="/usb_cam/image_raw" />
    <param name="mask_topic" value="~segmentation_mask" />
    <param name="overlay_topic" value="~segmentation_overlay" />
    <param name="enable_gpu" value="true" type="bool" />
    <param name="avg_segmentation_frames" value="5" type="int" />
    <param name="run_continuous" value="false" type="bool" />
    <param name="mask_values" value="9, 12" />
    <param name="throttle_timer" value="5" type="int" />
    <param name="latency_threshold" value="2" type="int" />
    <param name="network" value="deeplab_resnet50_ade" />
  </node>
</launch>

mxnet_depseg_hydra_node

Combined monocular depth and semantic segmentation trained on CityScapes and KITTI datasets.

Published Topics

~/segmentation_mask sensor_msgs::Image, latched output

~/segmentation_overlay sensor_msgs::Image

Subscribed Topics

/usb_cam/image_raw sensor_msgs::Image

/rr_mxnet_segmentation/run_once std_msgs::Bool

rostopic pub /rr_mxnet_segmentation/run_once std_msgs/Bool "data: true" -1

/rr_mxnet_segmentation/run_continuous std_msgs::Bool

Parameters

~image_topic (String, default "/usb_cam/image_raw")

~mask_topic (String, default "~segmentation_mask", latched=True)

~overlay_topic (String, default "~segmentation_overlay")

~run_continuous (Bool, default False)

~mask_values (String, default "9, 12")

~network (String, default "deeplab_resnet50_ade")

. Name of the network symbol to load. Choices include "vgg_resnet50_ade", "psp_resnet50_ade", "deeplab_resnet50_ade" and others. See GluonCV's Segmentation Model Zoo for a more complete list.

~enable_gpu (Bool, default false)

~throttle_timer (Int32, default 5)

~latency_threshold (Int32, default 2)

Example Launch File

<launch>
  <node name="rr_mxnet_depseg_hydra" pkg="rr_mxnet" type="mxnet_depseg_hydra_node.py" output="screen" >
    <param name="image_topic" value="/usb_cam/image_raw" />
    <param name="mask_topic" value="~segmentation_mask" />
    <param name="overlay_topic" value="~segmentation_overlay" />
    <param name="enable_gpu" value="true" type="bool" />
    <param name="avg_segmentation_frames" value="5" type="int" />
    <param name="run_continuous" value="false" type="bool" />
    <param name="mask_values" value="9, 12" />
    <param name="throttle_timer" value="5" type="int" />
    <param name="latency_threshold" value="2" type="int" />
    <param name="network" value="UNetHydra18" />
  </node>
</launch>

Release Notes

This package is currently in pre-release meaning the code has been tested and is functional, but backwards compatibility is not guaranteed until v1.0.0, so make note of which version you are using and read the changelog carefully before updating.


2024-12-07 18:19