[Documentation] [TitleIndex] [WordIndex



The Probabilistic Scene Model PSM is a system to recognize scenes. It uses objects and relative poses (relations) between the objects. The relations can be dynamic and each object can be a reference object.

The system consists of a training subsystem that trains new scenes and a scene inference subsystem that calculates scene probabilities of previously trained scenes.


The Learner collects objects from a database and learns a gaussian mixture model (GMM) with expectation maximization. The GMM represents the relation between to objects. The training process is done offline.


Figure 1: The learner uses observed object trajectories as the basis for its calculations. Here, the blue lines indicate the movement of the cup and plate respectively. Alternatively, combinatorial optimization can be used to find a set of relations; see below.

The learner also learns probabability tables representing the probabilites that an object was not observed in some of the steps of the trajectory used for learning ("Object Existence") or that an object was misclassified as another by the independent object recognizer ("Object Appearence"). The current implementation assumes that every object was visible and correctly classified in every step.

Output: XML File that contains the learned Object Constallation Models (a modified variant of the Constellation Model https://en.wikipedia.org/wiki/Constellation_model), Object Appearence and Object Existence for the new scene.


Figure 2: An example for the models created by the learner. The xml-file contains the scene model of a breakfast scene with a coffee box and a cup. The model consists of a fore- and a background scene. The foreground is made up of the two objects containing the parameters for the three terms of the OCM.

The Inference calculates the a posteriori probability of all scenes in the learned scene description xml-file and the background scene given a list of object_msgs. The probabilities are floats that sum up to 1. The inference can be done online.

It uses the GMMs (representing a Bayesian Network) and the probabilty tables.

Since the PSM is a parametric model and the learned models are stored in a human readable xml file, the paramteres of the model can be edited in a text editor and one doesn't need to train a whole new model if e.g. an object should be exchanged with a new one.

Compare asr_psm_visualizations and the general asr_psm tutorial for how the results are visualized.


Needed packages

Needed software

Needed hardware

Start system

Learner: Start ptu_driver, recognition manager, the object detectors and recorders. Then edit the parameters in the learner.launch and finally launch it.

Inference: For online evaluation of the PSM start inference.launch.

You can also run the inference system through the !ProbabilisticSceneRecognition::SceneInferenceEngine. It uses asr_msgs::AsrObject messages and calculates the scene probabilities of the scene that is given via a ros param. Take a look at asr_recognition_prediction_psm/psm_node_server.cpp for an example.


The starting process is the same for simulation and real. For simulation, use rosbag play or asr_ism fake_data_publisher to simulate prerecorded or artificially created files.


Use the camera and object recognizers to create object messages.

ROS Nodes

Subscribed Topics


/stereo/objects (asr_msgs::AsrObject) for evidences



*The parameters kernels_min, kernels_max, runs_per_kernel, synthetic_samples, interval_position, interval_orientation and attempts_per_run can be specified for each scene separately by setting "<scene name>/<parameter name>". If unspecified, the default (without added "<scene name>/") is used. This can be done for each parameter separately.



A set of relations between the observed objects can be selected via a heuristic based on how parallel trajectories are to one another, as illustrated above. Alternatively, combinatorial optimization can be used to find the relations. During optimization, different relation sets are considered, selected based on the particular optimization algorithm. They are assigned a cost in terms of their average recognition runtime and number of recognition errors, which are calculated by running recognition on a model learned on the relation set against a number of test sets, randomly created observations of the scene objects. In the end, a relation set with optimal cost is returned. This method takes much longer to learn than the heuristical method, but can result in better relation sets in terms of recognition errors.

To use combinatorial optimization, set the optional parameter relation_tree_trainer_type to combinatorial_optimization, like in combinatorial_learner.launch (if the parameter is not provided, the default value of tree is used, which refers to the heuristic. Alternatively, it can be set to fully_meshed. In that case, all possible relations are used, which results in a higher recognition runtime but the best possible recognition accuracy). In addition to several familiar parameters (compare above), a couple of new ones have to be provided, as in combinatorial_optimization.yaml:

3.png 3_2.png

Figure 3: Two test sets randomly picked from the trajectories (lower objects; one valid and one invalid)


Figure 4: There are relations between B and D as well as C and D. B and C are called the parents of D. Each node can have several parents in graphs generated by combinatorial optimization, other than in the trees from the heuristical method.


Figure 5: A partial model. Compare to the complete model above. Only the OCM part is used here.


Figure 6: A short optimization history, visualized through circles representing relation sets. The size depends on their cost (middle value), their colour on their average recogntion runtime in s (lower right value). The value in the lower left is the number of recogntion errors. Dotted lines divide optimization steps. A bright blue circle indicates the set selected in each step, a dark blue one the final, optimal set. The blue line shows the path taken through the sets.


Figure 7: Intermediate visualization of the recognition results of a triangle shaped set against two test sets.


General usage introduction

2024-06-15 12:30