Agents Sharing Third Person Representations
Nick Davis
The language games of Luc Steels centered mostly around one agent coming up with some correlation, whether it be individual phonemes (De Boer, 1997), symbol-meaning pairs(Steels, 1996a), or feature distinctions (Steels, 1996b), and sharing this with another agent. Then, the other agent tries to construct this same correlation and evaluation proceeds. This socio-robotic structure is a useful technique that could be implemented in our current approach with regard to the first person to third person representation.
We have often discussed the possibility of the agents ‘sharing maps,’ that is, exchanging third person information with other agents in the environment. However, a means to do this has not been fully explored. Steels’ work demonstrates that dialogical interaction, as in the example with the language game, between agents creates an environment in which shared meaning and representations can arise autonomously. This map sharing idea should build off of this established paradigm.
The problem then, is what technique should the agents use to transmit the information? The easy method would be to trade the code for the third person representation in a kind of ‘telepathic’ manner. However, this would not be very naturalistic. Another option is creating a small spatial vocabulary; this could have meanings either preprogrammed, or agreed upon as in Steels, either way, this method seems to be complex and unattainable within the context of this course. Yet another option is, as I mentioned in class, having some sort of virtual blackboard, or virtual display that depicts what is happening in the agent’s third person representation. This option has the difficulty of visual object recognition: the agent is going to have a very limited sense of object recognition, so much so that, at this point, there will not be any object classification, categorization, etc. However, even without categorizing the virtual display, this system could still be used. Each agent could display their map on the virtual blackboard at the same time, thus revealing any discrepancies simply by scanning the transposed maps to search for discrepancies. This has the advantage of allowing the agents to cooperatively manipulate the maps, such as rotating, nudging, etc. and the other agent will be able to perceive the behavior and respond immediately, therefore facilitating the reciprocal structure utilized by Steels.
Another issue I have been contemplating lately is how the agents could use the third person representation. For example, should goals and ambitions create different ‘desired outcome’ third person representations? This would allow one agent to ask another to do something just by transferring a desired outcome map. Another possibility that resides in this same vein is using multiple third person representations to simulate time. For example, should there be minimally three third person representations: 1) past 2) present 3) future? Say we do create these three spaces, should each of them always be present, or should they only arise when they are relevant to the agent, i.e. if it is trying to remember a previous state of affairs in order to reason about the current situation, or if it is trying to plan an action should it then create a future scenario? These third person transformations, here viewed almost as mental spaces (Fauconnier, 1985), could potentially provide the agent with another mental capacity that is hypothesized to be similar to human cognition.
In addition to this, we have discussed the possibility of the agent scanning the third person representation space. This is somewhat confusing to me, so here I am going to try to detangle what would actually be going on in this scenario. There is a first person, limited viewpoint, sensory based environment, and a third person representation documenting all environmental information including the agent’s position and representing this from a third person, somewhat omniscient viewpoint. Within this third person representation, the agent would then take a first person view and limit the visual field. My question is this: is the full third person representation view available to the agent as well as this first person scanning within that space? It was stated that this scanning would be a kind of attention mechanism, dictating what is interesting in the first person. This link between first and third would create a dynamic interplay between the two spaces, which could prove to be interesting.
The third person space could also be decoupled from the first person, effectively becoming a mental simulation. The agent could peruse the third person space without actually moving its real body at all. This has interesting implications for the multiple spaces that I suggested earlier. This simulation could be motivated by some goal, or the user could dictate it. I am now thinking about the diagrammatic manipulation that I suggested some time ago. If one was able to directly manipulate the third person representation by diagrammatic means, this could be an instruction to the robot with no language being exchanged. For example, we could just move the body of the robot to the other side of the screen, and the robot would then work to navigate to that space in his first person view. This could also be tied to the map sharing suggested earlier; for example, one agent could transmit a desired outcome map, which the other then helps to realize.
To conclude, the first to third person distinction is certainly a large philosophical proposition in itself, but working out the details of how the agent could potentially utilize this information may result in an agent whose ‘mental’ operations more closely resemble human cognition.
References:
De Boer, B. (1997). Emergent Vowel Systems in a Population of Agents. In Harvey, I.
et. al. (eds.) Proceedings of ECAL 97, Brighton UK, July 1997. Cambridge: MIT Press.
Fauconnier, G. (1985). Mental spaces : aspects of meaning construction in natural
language. Cambridge: MIT Press.
Steels, L. (1996a) Perceptually Grounded Meaning Creation. In Tokoro, M., editor,
ICMAS96. AAAI Press.
Steels, L. (1996b) Self-organizing vocabularies. In Christopher G. Langton and
Katsunori Shimohara, editors, Artificial Life V, pages 179--184. Nara, Japan.
Comments (11)
Leland McCleary said
at 10:02 am on Oct 9, 2008
Nick,
You raise a number of important points in this reflection. They are questions that I agree need to be explored. In this comment, I'd like to focus on some of my concerns with the concept of 3rd-person representation, and basically, I'll be dealing with ideas you discuss in the second paragraph: "the possibility of the agents 'sharing maps', that is, exchanging third person information with other agents" and "dialogical interaction".
As you've noticed, I've been puzzled from the outset (that is, from when I joined this on-going discussion a month ago) with the focus on 1st-person and 3rd-person representations. This focus takes the form at times of asking how a robot can develop a 3rd-person representation ("based on" or "built up from" a prior 1st-person representation?); or, as here, of asking how one robot can "share" a 3rd-person representation with a person or another robot. Maybe it hasn't always been clear that my problem is *not* with the concept of representations itself. I'm perfectly comfortable talking about representations, and have no problem believing that 1st-person and 3rd person representations exist.
Leland McCleary said
at 10:04 am on Oct 9, 2008
[Continuing]
What are issues for me are:
1) What are the nature of these representations, and are we as humans being misled in our discussion by our own customary representations of representations, specifically of 3rd-person representations (because I suspect we feel much more comfortable about representing 3rd-person representations than we do about representing 1st-person representations, so in fact we end up using the same types of representations for both).
2) What about 2nd-person representations? It is totally unclear to me why 2nd-person representations are being left out of the discussion, when, from all accounts of human cognitive development, they appear to be central. And if they are introduced, then the question becomes, not, how we get the robot to go from using 1st-person to using and sharing 3rd-person representations, but how we get him to go from using 2nd-person representations (once he's acquired them) to using and sharing 3rd-person representations. I will maintain (and I'll elaborate on this later) that 3rd-person representations totally depend on prior 2nd-person representations; that the very concept of "sharing" only arises from 2nd-person representations, and that 3rd-person representations depend minimally on the concept of agents sharing time and space which can support more than one point of view. But more about both of these points anon, since I'm out of time at the moment.
Nicholas Davis said
at 12:49 pm on Oct 9, 2008
I understand your problem with the diagrammatic representation of the third person space, because as its iconically depicted, there seems to be an omniscient looker (in the first person perspective) that is viewing the objects in the environment from an objective perspective. But maybe this is only stemming from the way we are drawing the space. I suppose that the third person space would be more of a sense of the agents 'knowing' where things are relative to itself, beyond what its sensory input is telling it. This idea of second person representation certainly needs to be explored further, I wonder what this would look like in a diagram. This seems like it would be hard to represent because it has an implicit relationship between two entities the 'you' of narrative. Maybe the second person perspective would essentially be trying to understand the what the other agent is experiencing, i.e. the sensory activations of the other agent. This perspective taking would be crucial to any kind of communicative situation. Steels adresses this in one of his articles, I'll find the exact place later, but it has to do with the agent simulating, most likely from its own memory, what it would see if it were in the place of the other agent and facing in the same direction. Would this be the second person representation then?
Leland McCleary said
at 7:13 pm on Oct 9, 2008
Nick, you're basically right about my first point above. We have always drawn the 3rd-person representations as a "map" (which is what you call it in your Reflection) or a "floor plan", from the point of view of someone looking down from above (someone omniscient, as you say, since we show the "insides" of our spaces, agents and objects). That's what reminded me so forcefully of the classic _Flatland_. But let me elaborate a bit more about my misgivings about some of the other things you mention.
I don't think it's a good idea to identify the 3rd-person representation with “objective perspective”. What do we mean by an “objective perspective”? Probably something like “not tainted by the biases (the ‘subjectivity’?) of a particular observer”. Science makes an effort to produce descriptions that are maximally “objective” by keeping the observer out of the description. The idea is that the same descriptions (maps) should serve for interchangeable observers.
Leland McCleary said
at 7:14 pm on Oct 9, 2008
[Continuing]
What we shouldn’t forget, for our purposes, is that this effort by science to banish the conceptualizer from the conceptualization is possibly the major source of our symbol-grounding problem. It’s what’s fed the idea that meaning can be contained in disembodied strings of symbols (c-repesentations), ignoring the inconvenient fact that at both the encoding and the decoding end there’s always an embodied, grounded human available to supply the meaning (m-repesentations). This process of “objectification” of knowledge of the world has been a collective effort that has taken place over centuries and is supported by scientific genres and practices, such as ‘knowing how to interpret a map’. When you say that our tendency to depict 3rd-person representations as top-down diagrams is “only stemming from the way we are drawing the space”, that *only* is a red flag that should alert us to the fact that this way of depicting is so ‘natural’ that we forget all of the cognitive work and learning that went into being able to do it without thinking it’s anything special. My point is that it *is* very special and gives us a very particular way of thinking about what a 3rd-person representation might be for the robot (or for us, for that matter!).
Leland McCleary said
at 8:11 pm on Oct 9, 2008
I promised to say something about the role of 2nd-person representations, but before that, I want to say a few more things about what we conceive to be 3rd-person representations.
Nick describes a 3rd-person representation as representing “the agents 'knowing' where things are relative to itself, beyond what its sensory input is telling it”. My question is: why can’t a 1st-person representation do that? Isn’t the agent building up a 1st-person representation based on his soundings of the environment? Isn’t he identifying objects (presumably with particular shapes) in a particular array, all in relation to his own position? That array will necessarily include knowledge of objects’ positions relative to each other. All the agent needs is a memory to be able to build a representation that includes more than sensory input at any particular time and place. Surely a 1st-person representation is not limited to being an on-line representation of sensory input in real time.
The essential difference, to my mind, between a 1st-person representation and a 3rd-person representation of space and surrounding objects is that a 1st-person representation is always “relative to oneself”, whereas a 3rd-person representation supports multiple perspectives. That's why a 3rd-person representation can be ‘shared’, and that’s why they can’t be achieved without intermediating 2nd-person representations.
Matthew McCroskey said
at 1:47 am on Oct 10, 2008
First of all, Nick, I think you've clearly and lucidly presented some of the main ideas that we all seem to have for this project; thanks for that. I take issue with many of the same things as Leland, and feel that he has brought up a fabulous point regarding the power of first person representations. I think that we all keep mixing together the concepts of first-person and third-person perception, the concept of memory, the concept of mental spaces, and the concept of navigation.
I think that a very laudable goal is to build a robot with sensory input, a first-person perspective, a memory of its environment, and an ability to combine these to navigate this environment to specified targets. However, assuming we are able to accomplish that more quickly than I foresee that we will, I agree with Leland that developing a second-person perspective is a necessary intermediate step between having a first- and creating a third-person perspective. I also agree with what I think Leland said or implied, that we don't necessarily need a third-person perspective at all: it seems more naturalistic (to me, at least) to have a system whereby one agent puts himself literally "into the other agent's shoes" to understand where that first agent wishes for him to go (or where that first agent is trying to show him to go). No third-person perspective is really required for this.
I'm going to post to the wiki my exact thoughts on how to implement such a robot in an entry of my own.
Leland McCleary said
at 12:52 pm on Oct 10, 2008
As for 2nd-person representations, my best bet at the moment is that they would begin to emerge together with Theory of Mind, that is, with the ‘understanding’ that there are Others that are “like me”. The 2nd-person representation would then be like the 1st-person representation with the difference that you have a dislocation of perspective. This dislocation of perspective is both motivated by and makes possible the concept of an Other “like me”. And, to my mind, this would be the beginning of the possibility of establishing a socially-shared space and time in which 3rd-person representations could have meaning.
I can understand why this scenario is awkward for robotics. Recognizing that there are Others “like me” has been assumed to be a uniquely human trait dependent on the establishment of the category Other, distinct from Object, also considered to be uniquely human, given that it depends on the recognition of intention in the action of the Other (and in fact intention in the action of Self). This is a challenging area in robotics. My feeling is that tricks that allow us to short circuit 2nd-person representations will lead us right back to the symbol-grounding problem.
We’re aiming, of course, at 3rd-person representations, which is what will give us the groundwork for such typically human-like activities as deceit and narrative. But getting there is not so easy, I think.
Leland McCleary said
at 1:28 pm on Oct 10, 2008
Matthew,
I agree that building a robot with a robust 1st-person representation would be quite an accomplishment all by itself. But we’d want to communicate with it, and then we’d have to make a choice: either we give it all its meanings (in which case the symbols would only be grounded in us and not in it—back to classical robotics), or we design it so that it can arrive at its own meanings, and this is where we come face-to-face with the 2nd-person representation. Vogt and Steels are working in this area, in getting two robots to communicate about the world and mutually establish categories and signs referring to states. It doesn’t bother me so much (for the moment) that their robots don’t have to discover on their own that their partner is an Other that is like them, because that may be hard-wired into us as well; there may be no way around that.
About whether we really need a 3rd-person representation, that’s a good question. I’m pretty sure we do (to be cognitively modern humans), but I think it will be instructive to discuss exactly what is possible and not possible with only a 2nd-person representation; and whether it’s possible for agents to interact on the basis of 1st- and 2nd-person representations without 3rd-person representations emerging; and what the conditions of their emergence would be.
smj14 said
at 3:18 pm on Oct 10, 2008
My understanding of the theory behind cognitive robotics is very elementary, but I think I know where Steels is getting at when he talks about acquiring cognition via social interaction. The way I see it, humans essentially have two components that address the meaning behind objects. There is the personal component, which is shaped by personal experiences and the feelings that accompany them. There is also the social component, which is shaped by interaction with the outside world. This includes gauging the success of a variety of interactions with other people and sharing and comparing one’s own semiotic map with others’. The personal component refers to the 1st person because it deals with self alone. The social component refers to the 3rd person because it acknowledges one’s perspective in relation to others. Achieving this 3rd person representation requires a lot of extensive visual and oral interaction. In robotics, this means that interacting robots must share some kind of pre-programmed (or even better: pre-developed) language so that they can efficiently communicate. Communication will allow comparison of thought processes and world perspectives. This also means that some kind of visual sensory device is required so that one robot can compare another robot’s actions to their own. Of course, in order for each robot’s cognition to continue growing and developing memory should be a key component, both short-term and long-term.
These are my theoretical thoughts. I’ve no idea how they will fair computationally.
maxjensen said
at 11:57 am on Oct 17, 2008
Nick,
I agree with Leland for the most part, especially his ideas pertaining to what we mean by a third person representation. Just because we have a series of numbers, that could be transmitted from one robot to another, might not mean that we have a third person representation (although perhaps it is). If two robots scanned the same space at different times, would they have the same representation? If so then we have a map, but again, the idea that the robot is truly doing what we do, seems all to easy. Cognition, as we know, is _never_ simple.
I also agree with Leland we need to figure out what shared attention means, although I feel many people have worked on this problem a long time, so we need to do a solid literature review. If we can get to the bottom of shared attention, then we can start to name universal objects in the space.
Max
You don't have permission to comment on this page.