Cognitive Robotics

 

On the role of cost, gain and value in decision making

Page history last edited by Nicholas Davis 1 yr ago

     In class, we discussed robo blending model of intention. There is a network of blending that influences the decision of the robot by calculating weighing the costs and gains of each action, deciding a value for the objective.

 

     Each potential action has a cost and gain associated with it. The cost has to do with the distance of that object, the kind of terrain--basically the amount of fuel the agent will use in order to reach it. A number of factors can be potentially gained from these actions. There is the surface level gain of eating, for example that the agent will get some fuel and this is a good thing. But also, when the agent is just exploring and it finds fuel objects in the environment, this is a gain because this information is useful for the future. So, there are benefits to exploring that need to be taken into consideration. Gain is responsible for factoring in these various things. Matt's computational method for calculating cost was demonstrated to exist in a grid correlated with the map of the agent. Each block has a number associated with it, calculated as the amount of steps form the starting point added to the shortest amount of steps to the end point. The equation then turns into previous amount + 1 + shortest distance to finish. In order to find the optimal path, the algorithm starts from the finish and chooses the path with the smallest costs.

                 Costs:

S 6 12 18 24 30
6 5 10 14 19 24
12 10 13 14 18 22
18 15 17 17 20 23
24 20 19 22 24 26
30 25 29 22 26 F

                Path:

S 6 12 18 24 30
6 5 10 14 19 24
12 10 13 14 18 22
18 15 17 17 20 23
24 20 19 22 24 26
30 25 29 22 26 F

 

In the first table, the values are listed, I did this in my head, so I'm sure it is not quite accurate, but you can get the picture. I think the computer does it differently becuase it calculates moving from one block to all those around it and the scores are adjusted form that, but the gist is that it goes from the finish and finds those blocks with the smallest cost and picks that path. Walls and blocked regions can taken into account with this strategy by preventing the simulation to use that segment of the map, forcing the system to recalibrate the path costs, and therefore  planning is taking the map as input.

 

     This part of the planning relates to the 'blended' space in the robo blending model. It is a mental simulation that takes as an input the map and the goal and combines them to find the best possible path. However, the problem with the current algorithm is that it may be too simple. For example, it does not really account for the potential gains the agent may have, furthermore there are different kinds of gains. As I said earlier, a gain could be learning about the properties of object, like noting that this object is an energy object and commiting that information to memory, the agent should essentially get some kind of reward for retaining this information. Naturalistic behavior takes into account the future states of the system, and this is precisely what would happen if the agent was rewarded based on learning information about what kinds of things are energy sources and where those objects are located. Additionally, if multiple agents were in the environment, a gain could be considered reaching the object before that agent as to prevent the 'other' from eating it, therefore acquiring more energy for the self. Contrastingly, it may be more beneficial to let the 'other' go first becuase maybe later he will show you an energy source. This is starting to get into group dynamics and social interaction, but these are factors to consider in decision making.

 

     Matt asked us to start thinking about what other factors we would like to see in his grid strategy like the social cognition and adding things to memory. What would the weight of each of these factors be, would the state of the agent, for example its hunger, change the value of these objects (value= cost+gain). Should the current state influence gain? 

 

     Another idea going along with the exploring as a gain paradigm is the thought of the gain meter being the energy meter. For example, it will gain energy by looking at energy objects, but also by discerning and looking at new objects in general, or new angles. Maybe this could be equated with sustaining oneself on knowledge, not eating or sleeping (so to speak) because we are so enthralled by an idea. However, this gain has to decay eventually, logarithmically possibly, for example 'I found this new wall how great (gain +1) hmm does it have a side, let me see (gain +.8), and a back (gain +.5), how far to get to the other side (gain +.4) and can I push it (gain +.2) etc. Maybe certain operations should be more profitable than others. These are things we need to tease out.

 

     We discussed the role of the logic box and decided that it has a relationship with long term memory. For example, in the 'eat' objective, we plan to go to some object, but how do we know which objects are energy objets or not? Long term memory associates certain places on the map with certain qualities, like energy. So, the long term memory is 'impregnating' objects in the sense of pragnänz, meaning to give it some special meaning and therefore make it an important thing to consider.

Comments (0)

You don't have permission to comment on this page.