Emily Newman
As a preface I would like to ask for forgiveness for the informal nature of this essay- cognitive robotics is a new subject for me and I am still ruminating over the basic principles at hand. Thank you.
An overall element I feel is lacking in our current class discussion is the idea that cognition is the product of evolution across millions of years and within our own lifetimes. As it has been appropriately noted, the process of evolution from the instantiation of life until this point in time is still under speculation, and can therefore not easily be imposed into different and non-genetic “species” such as robotic agents. Over our own lifetimes however, evolution, or the increase in cognitive ability, is much better understood. If the purpose of cognitive robotics is to create artificial human cognition, I feel it is important to start at the beginning of our own lives on earth- we should regard our agents as new joyful bundles of plastic and metal. Throughout our discussion thus far there has been talk of top-down and bottom-up processing, grounding, and third person perspective, all concepts I feel should and could be dealt with in due time- but perhaps it is a tad ambitious for the time being. Let’s take it a few steps back and assess the practicality of thrusting our own adult human priorities and abilities upon our robotic agent versus somehow creating an agent who learns to work toward these same priorities and abilities. Given the task of emulating human cognition, I feel we are only doing ourselves a disservice by leaving out the crucial elements of maturing, growing, and learning. In this paper I will take a brief look at the issues of grounding symbols and first vs. third person perspective with a focus on the biological relationship between our agent and a human child.
1. Grounding Symbols:
The issue of grounding symbols lives in the association between conceptualizations underlying language and the external world experienced through sensori-motor perceptions. Humans, especially children, deal with this process by many proposed mechanisms such as embodiment and situatedness. In our current efforts, the robot must be pre-programmed to even attend to the same salient events as humans, adding in the undesirable element of top-down processing. As it has been suggested, we process our environments using both top down and bottom-up strategies. For example, receiving a reward such as money is salient to most humans because we understand the value and meaning behind such an event. It is impractical for us to impose this same type of expectation on our robotic agent without some serious top-down processing. Therefore, I propose a mechanism where we either adapt the focus of attention to robot saliency (i.e. fuel/battery charging, etc), or we implement a learning mechanism where the “young” robot learns to prefer certain human priorities (such as emotional support, etc). We cannot expect our robotic agent to be able to ground symbols in a meaningful way to us if it does not prioritize the same salient interactions with its environment. I don’t believe that Steels has proven a solution to the problem by creating social lexical meanings. I do, however, believe he has showcased the potential for social convergence or self-organization of our robotic agents. Grounding will not be achieved until the robot has a meaningful feedback loop representing its environmental stimuli in a salient way.
2. 1st -> 3rd Person representation:
As we discussed in class, I believe that the 3rd person representation is valuable for communication between robotic agents; however, I believe that perhaps biological cognition is best modeled starting only within human-robot interaction. When humans are young, their interaction is mostly with their parents, and even when presented with a peer, the child most likely would not recognize him/her as an intentional agent separate from him/herself (due to the underdevelopment of theory of mind at that age). Therefore, I believe that utilizing the 3rd person representation should also be tapped into only after the basic principles of communication are developed.
The overarching point I am trying to make is that I believe it is detrimental to the goal of creating artificial human cognition to leave out the growth process where we learn and mature based on our social interactions. I believe it would be useful to achieving our goal to follow through the human developmental sequence which goes as follows:
• Communicative effect: robot acts, human reacts
• Communicative inference: robot develops goal-directed behaviors, human infers intention and responds
• Intentional Communication: robot realizes power of communication and uses it deliberately
• Upping the Ante: human acknowledgment requires more precise vocalizations that resemble correct language
In order for this process to occur, it is imperative that we consider the following preconditions:
– Attention/ shared attention: The robotic agent must first gain the ability to lock eye contact and follow the gaze of its caregiver.
– Spontaneous babbling: It is important for the robot to first be able to produce verbal utterances that can later be used to both infer meaning and mold into the verbal lexicon.
– Motivation: Throughout the interaction between the robot and its caregiver, the robot must be motivated (just as a child with his/her mother) to imitate/emulate the verbal abilities and behavior of its caregiver. It is within this crucial step that the agent will learn the tools necessary for social interactions.
– Realizing cause and effect: Now the robot has learned that producing certain causes, or verbalizations, will create behavior in its caregiver. For example, a wailing noise may cause the caregiver to immediately attend to the agent. Here, the robot will also start to “understand” that he is an intentional agent separate from his caregiver.
Now we have set up the beginnings of the foundation for social interactions and theory of mind. These are the biological steps that Steels discusses (and I strongly agree with) are essential for the true emulation of human cognition. It will take further discussion with those knowledgeable of the computer sciences to actually determine how this will be possible, which I hope to expound more upon in the next paper. I think our most difficult obstacle will be creating a robotic agent who not only attends to his environment, but who also spontaneously babbles in effort to communicate. I do not believe that top-down processing or imitation of the caregiver should be avoided- these are important elements to human development that will strongly aid us in our efforts. Humans are social beings in nature and we should not undermine the powerful effects our development has on our current cognitive abilities.
Sources of food for thought:
Breazeal, C. (0000), A Motivational System for Regulating Human-Robot Interaction, Journal
Steels, L. (2003), The Evolution of Communication Systems by Adaptive Agents, Adaptive Agents and MAS, LNAI 2636 125-140.
Steels, L. (2006), ‘The Symbol Grounding Problem has been solved. So what’s next?’ Sony Computer Science Laboratory Paris, VUB AI LAB, Vrije Universiteit Brussel (1-18).
Comments (10)
Nicholas Davis said
at 12:36 am on Oct 6, 2008
"Therefore, I believe that utilizing the 3rd person representation should also be tapped into only after the basic principles of communication are developed" This spawned an interesting idea. What came to me was a gradual increase of usability of the third person perspective. For example, when the agent is created, it will have a third person representation, but maybe it won’t know how to do any work with it. Then gradually, it could learn that by interacting with the environment, this third person space changes, or it can decouple this space and plan things out. All these advances coming from gradually more complex cognitive operations.
Nicholas Davis said
at 12:41 am on Oct 6, 2008
Going along this same thread, and related to sarah's paper (the relationship between c/m representations and 1st to third person representations), there is most likely some developmental milestones relating to the use of the first to third person representation. It would be interesting to take a look at Leland's list to see where the third person representation begins, how it is utilized in the different behavioral stages. A nice diagram tying all these ideas together would be optimal. It would help structure the computational aspect in a developmental manner.
mde10@... said
at 11:59 am on Oct 7, 2008
I have a potential algorthium for dealing with attention. It makes a few fundamental assumptions, but it does work.
Nicholas Davis said
at 12:55 pm on Oct 9, 2008
What are the specifics of that algorithm Matt? You started explaining it to me the other day, but maybe it would be good to describe what is actually going on. I'll say what I remember. Basically, this algorithm deals with finding the probability of input to be in some underlying 'state.' For example, the agent could be viewing the actions of another, this would essentially be string of inputs, the state could correlate with what the other agent is 'attending' at that moment, and this algorithm would give some probabilities for what the other agent's attentional state could be. However, would we then have to state every possible thing that the other agent could be attending to or what. I'm still not sure on the specifics of this algorithm.
mde10@... said
at 1:19 pm on Oct 9, 2008
The algorithm is called mixture model, and it's used to describe the probability of some state causing some action, or connecting the causing and effect. The algorithm tries to create an association between a one dimensional input (ie: a series of numbers) and a number of states. The algorithm attempts to create some kind of association between the two defining the states as the median of binomial distributions, such that when a specific state is active, the output of that state is the numbers which fall inside said binomial distribution. In order for the algorithm to better approximate the relationship between the states and the one dimensional input, it uses gradient descent.
In application of this algorithm, the input could be the how shiny an object is and states can be pay attention to it and do not pay attention to it. It can be applied to shared attention where the input would be various numbers representing the actions of another agent and states can be should the robot pay attention to these actions or not?
One of the fundamental assumptions here is that because the algorithm does not know the correct associations that it should make, it can and does get the association wrong even if the correlation is correct. To ground this in humans would be where someone associates hugs as being repulsive, it's an incorrect association.
I'll be going more indepth of this topic on friday.
Matthew McCroskey said
at 1:41 pm on Oct 10, 2008
I understand and absolutely agree with the basic premise of your argument, Emily; I think that, if we want to meaningfully emulate human cognition, we should consider to the greatest degree feasible the means by which we acquire human cognition. As you mentioned, trying to build some sort of human-like intelligence from biological first principles is certainly out of the reach of this class, if not out of reach altogether. Attempting to mimic the language learning process of human beings seems more manageable, but is still quite daunting. Also, we seem to have been focusing on a robot which has a human-like capability to navigate more so than a human-like capability to communicate. On the one hand, I'm not sure that it is feasible to try to create within this semester a robot that can do both simultaneously. On the other hand, though, it could be argued that, if we can create a general model of human cognition, navigation and communication would be rather naturally intertwined.
I'm a little confused about the algorithm Matt describes, but feel confident that I'll better understand it and its relevance to implementing the concepts of your paper after today's class.
smj14 said
at 2:39 pm on Oct 10, 2008
Emily, your reasoning makes a lot of sense to me. I think that before we even go into the details of word lists and semiotic maps, we have to think long and hard about how humans acquire cognition and how we can emulate the same process in robots. Attaining cognition is a long and complicated process and you are right that we should not bypass this critical component of development. In my paper, I mentioned that part of the reason why the symbol grounding problem is a problem to begin with is because computer scientists pre-encoded robots with a lot of essential knowledge. You and I share the view that if robots are to think and communicate like humans, they must develop the skills to do so in the same manner that humans do. Humans learn meanings behind things through a lifetime of trial and error and through gauging the success of interaction with the outside world. If a robot is pre-encoded with information, than there is no way that the robot can understand the meaning and significance of that information. To me, this a prime source of the Symbol Grounding Problem.
Nicholas Davis said
at 2:53 pm on Oct 10, 2008
Yes, this pre-encoding is precisely the thing. The alternative to this completely pre-encoded thing is starting with some schematic layout, like image schemas, or something to that effect, which is what the work of Paul Cohen was trying to do. He would give the robot certain schemas like 'container' and then have different values which that thing can have 'empty,' 'full,' 'contained' and also there would be slots within that schema to further delineate the feature of this object. Like the closeness schema could be part of the description of container schema, for example a container could have the value of full and the have the closeness as a another image schema within the container schema, which in turn would have a value of close or far. But, this approach still runs into the same basic problem of grounding these original lexical items. It is a nice idea of starting with some schematic knowledge that has interchangable parts, but there is still something missing. This is where perceptual symbols have something to say. The thread on agents sharing third person representations mentions this also. Which perceptual symbols deal with making meanings based on sensory input, recording certain portions of sensory information and relating it to the experience that provoked it. More information on perceptual symbols on the readings list and a summary here: http://cognitiverobotics.pbwiki.com/General+Project+Proposal (in appendix B).
Yardena Daon said
at 9:35 pm on Oct 16, 2008
I agree with Emily about the symbol grounding problem. We should try and stimulate the robot as we do with children. I also like the ideas you (Emily) present about how we should first deal with communication and not with the 1st and 3rd person issue. I think you would like the diagram I will present (I think it will help us all understand what needs to be done before we go further).
maxjensen said
at 12:14 pm on Oct 17, 2008
Emily,
This is a very nice idea. The reason why studying development is so helpful (for all things cognitive and psychological) is that we can see what's missing from the child's mind that makes it a child's mind (and not and adult's). I was thinking about Jean Piaget's stages, which I know are quite controversial, and I'm certain many corrections have been made to them. However, there certainly are stages (however poorly defined they may be). In Piaget's model, children start with only meaning as physical. They then move to linguistic repetitions, and so fourth.
Using models of actual human cognition, including the metaphors used in the field, will allow Robotics to speak with the other domains, which I think is what I like about your paper.
Max
You don't have permission to comment on this page.