3. KR&R within perception-action loops

3.1. Linking representation to perception and motion

In order to embody knowledge representation and reasoning into robot control systems we need to address the following aspects:

  • proposing suitable representations of motions that enable us to combine knowledge representation and reasoning with control engineering, cognitive psychology, and robot learning;
  • maintaining and acquiring environment and object representations that are detailed enough for object manipulation; and
  • maintaining and updating the symbolic representations by the    robot perception system.

Representation and reasoning about motions
Let us first look at suitable representations for robot body motions and their physical effects.  To develop such representations we can take inspiration of how humans accomplish their manipulation actions through the learning, planning, and execution of body motions, which is intensively investigated in a research field that Prinz, 2013 call action science. Action science lies in the intersection of several research disciplines including cognitive psychology, cognitive neuroscience, and sport science. These research threads contribute to the representation and reasoning about actions as they investigate more detailed and realistic models of manipulation actions.

For example, Flanagan, 2006 propose a widely used conceptual model for representing and reasoning about the motions that constitute human manipulation actions. This model is depicted in the previous chapter. In this model the body motions of manipulation actions are composed of motion phases. For example, fetching and placing an object is composed of a reaching, a lifting, a transport, and a release phase. Each of these motion phases has a motion goal. The goal of the reaching phase is that the object is grasped, of the lifting phase that the object is lifted off the supporting surface, of the transport phase that the object is at the destination, and of the releasing phase that the object has left the hand. The transitions between the motion phases are so-called force-dynamic events (Siskind, 2001). The force-dynamic events, which also cause distinctive perceptual events in the perceptual system of the robot agent are the contact of the hand with the objects, the hand feeling the gravity of the object, the collision of the object with the destination surface, and the hand losing the grip of the object. The individual motion phases also have knowledge preconditions. When starting the reaching motion the robot must have decided on the target pose, when starting the lifting motion the robot must have committed to a holding pose, and so on.

state transition vs constraint- and
        optimization-based control model

Fig. 1: Comparing the state transition system model with the constraint- and optimization-based control model of manipulation actions.

Another approach to represent the manipulation actions at the motion level is to represent the conceptual model underlying the constraint-and optimization-based control frameworks (Khatib, 1987; Smits, 2009) In constraint- and optimization-based control the bodies of robot agents are represented as chains of control frames (6D poses of key conrol points of the robot). Figure 1 compares the state transition model with the constraint- and optimization-based control model. While the logic approach represents the objects and effects, the control approach specifies constraints and objectives on motions. Therefore, the semantics of the manipulation actions, in particular the causal relation between motion parameterization and their physical effects, has to be encoded into the constraints and objectives for the motions.

Knowledge-enabled robot programming

When applying the Flannagan model the task of reasoning is to infer the motion parameters for the individual motion phases and in constraint- and optimization-based control the reasoning system has to infer the motion constraints and objective functions to be optimized. This can be accomplished through knowledge-enabled robot programming.

In the knowledge-enabled programming style programmers equip robots with a combination of generalized action plans and a knowledge base of general, modular knowledge pieces.  The basic idea of knowledge-enabled programming is to separate knowledge from the control program and to modularise it into small broadly applicable chunks, such as "grasp objects by their handles'', "hold filled containers upright'', "do not squeeze breakable items too hard'', etc. The program would ask: "How should I pick up the object in front of me?''. A reasoning mechanism would then collect the relevant knowledge units and combine them in order to propose the appropriate grasp. The primary advantage is that these knowledge pieces can be collected and that they can be reused by other robots accomplishing different tasks with different objects in different environments (Pratt, 2015). This reuse of generalized knowledge can, therefore, drastically accelerate the realisation of new robot applications. Figure~\ref{fig:generalized-knowledge} shows examples of such generalized knowledge chunks, which can be conveniently stated as logical axioms.

This idea of leveraging generalized plans for manipulation actions is also supported by a number of cognitive scientists including Schmidt (Drescher, 1991) suggest that movements within a class, such as fetch and place and pouring, share the same order of sub-movements and their relative timing. If these programs are to be executed in different contexts they are called generalized motion programs (or motion schemata) and include variable parameters (e.g., coordinates, body poses, timing, forces) are to be specified to the program. The parameterizations define the details of the movement and how it is executed for particular objects in particular contexts for different purposes. These generalized motion programs are called generalized motion plans if they cannot only be executed but also reasoned about and modified at execution time.

A generalized motion plan for a pouring action

Fig. 2: A generalized motion plan for a pouring action.

Figure 2 show a generalized motion plan for pouring actions. The plan consists of the signature of the plan and the body of the plan. The signature includes the name of the plan, which is pour and the formal parameters names theme, source, and dest specifying the destination of the theme to be poured out of the source container. The formal parameters are typed: the theme is some substance, the source is a container object that contains the theme, and the destination is a location.

The power and elegance of knowledge-enabled programming comes from facilitating the application and combination of relevant knowledge pieces for specific manipulation tasks. This enables robot agents to accomplish manipulation tasks in new contexts much more competently, flexibly, and robustly. (Note: As Pratt, 2015 puts it: "Robots are already making large strides in their abilities, but as the generalisable knowledge representation problem is addressed, the growth of robot capabilities will begin in earnest, and it will likely be explosive. The effects on economic output and human workers are certain to be profound.'')

Consider again the task of picking up an object. Instead of implementing the knowledge of how to pick up an object as a piece of code tailored for the respective context the knowledge of how to pick up objects is decomposed into a collection of generalized knowledge pieces as listed in Figure \ref{fig:generalized-knowledge}. This way the knowledge pieces can be applied to different robots, objects to be picked up, the tasks for which the object are to be picked up, and the scenes in which the objects are to be picked up.

Generalized knowledge for placing substances
Fig. 3: Generalized knowledge for placing substances.

Generalized plans are typically invoked with vague action descriptions such as "pour the popcorn onto a plate'', which might be stated as a symbolic description as follows:

    (perform
    (an action
    (type putting)
    (theme (some substance
    (type popcorn)))
    (destination (a location
    (on (an object (type plate))))))

The action description is vague or underdetermined because many pouring behaviors are possible executions of the action description. In Figure \ref{fig:making-popcorn} we have already seen two of them: the one-handed pouring of the popcorn from the bowl into the pot (upper row right) and the pouring of the popcorn from the pot onto the plate. Thus, in order to transform the underdetermined action description into a motion specification that can be executed by the generalized motion plan, the robot agent has to make a number of reasoned decisons based on current context possibly at planning or execution time. Examples of such decisions are which popcorn is meant and how much? The robot also has to choose a plate that is unused, clean, and has enough capacity.

In order to accomplish underdetermined action descriptions the robot agents have to be equipped with inference mechanisms that enable the robot agents to answer the body motion query:

how do I have to move my body
in order to
accomplish the given action description
for the current task
with the objects  and in the context
that I see or believe

When getting the instruction to pour popcorn onto the plate the robot has to first of all infer that it has to manipulate the container that contains the popcorn, rather than the popcorn itself as stated in the instruction. This requires the application of commonsense knowledge about the use of containers (Davis, 2017). The  robot also has to disambiguate "putting'' into  a pouring action, which again requires commonsense or the experience that pouring is more adequate than spooning with a scoop.

(perform
(an action
(type pouring)
(object-acted-on
(a container
(contains (some substance
(type popcorn)))))
(destination
(destination (a location
(on (an object (type plate))))))

After the original instruction is transformed into a description of the action to be performed, the robot has to decide on the body movements that it intends to realize the action with. To do so, the
robot has to infer the key movement constraints that ensure that the pot will not fall down, the popcorn on the plate and not spilled on the table including to hold the pot with two hands by its handles and
horizontally, to tilt the pot holding it in the right height and centered above the plate, and so on.  

(perform (an action
(type pouring)
(object-acted-on (an object ...))    (destination (a location...))
(graps-type ...)    (grasp-points ...)    (reaching-trajectory ...)
... )

An alternative is to specify an action in terms of its effects rather than in terms of its movement constraints and objectives by stating that the robot is to pour pancake mix onto a pancake maker such that the pancake mix forms a circle of a certain size as shown below. The advantage of such an action specification is that it gives more flexibility to the robot agent on how to perform the pouring. This
flexibility has to be paid for with the requirement of broader and deeper understanding of intuitive physics and the ability to regress effects into the motions that can achieve them.  

(perform  (an action
(type pouring)
(theme (a container
(contains (some substance (type pancake-mix)))))
(destination  (a location \\
(on (the object (type pancake-maker)))))
(results-in (an object
(type pancake)
(shape circular)
(size 12cm))) )