On the use of criterion functions to explain our basis of judgment. A Tavern Talk

Summary:
In the discourse about collective plans, the process of evaluation – making judgments about whether a plan is ‘good enough’ or which of several plans is the better, — differences of opinions occur, which can be resolved by mutual explanation of the basis of evaluation judgments. The discussion here is focused on the role of ‘criterion functions’ in this process. These functions – showing how ones subjective ‘goodness’ judgments depend on ‘objectively’ measurable criteria – make it possible to explain one’s basis of judgment in much more specific detail than is usually done even in the most cooperative group processes. Some key insights of the discussion are the following:

While such detailed explanation is possible and conceptually not overly challenging, including this process in actual decision-making procedures would add cumbersome provisions to a planning discourse already calling for more structure than many participants are used to and feel comfortable with. The issues of ‘aggregation’ – of individuals’ partial judgments into overall judgments, and especially of individual judgments into ‘group’ evaluation measures are potential sources of controversy.
It becomes clear that any overall ‘measures’ or judgments that can meaningfully guide decisions cannot be derived in ‘top-down’ fashion from general ‘meta values’ and ‘common good’ concepts but must be constructed ‘bottom-up’ from individual participants’ concerns and understanding of specific situations and context.

Familiar claims of planning decision makers ‘to act on behalf of others (clients, users), with adequate knowledge of those others’ concerns and basis of judgment, but not having gone to the trouble of doing this, are unfounded and should be viewed with reserve.
Examination of criterion functions make it obvious that the quest for and claim of having found ‘optimal’ solutions is unrealistic: no plan will achieve the ‘best possible’ scores on all evaluation aspects even for individuals, and different people will have very different but legitimate criterion functions.

Considering plans or policies whose effects will occur – and change – over time, adds another level of complexity. System simulation models can track the performance of variables (criteria) over time, but do not show associated ‘goodness’ judgments (obviously, since this would either be just one person’s assessment, of some aggregated measure of judgments that have not been included in the system modeler’s data.

The examination of how evaluation judgments for different plans could be tracked over time (as a function of the simulated variable tracks) suggests a different decision guide: the (time-discounted) degree of improvement of a plan over the current or predicted problem situation under the ‘do nothing’ option.

ooo

The discussion takes place in the hypothetical ‘Fog Island Tavern’.

– Hey Bog-Hubert – what kind of critters are you guys talking about? I couldn’t quite get it coming in, but it sounds like serious wildlife?

– Good morning Renfroe. Well, I guess you could call them critters, but not the kind you mean. Actually, we were talking about criterion functions. About how we can explain what we mean when we’re making judgments about, say, proposed plans. Calling them good, so-so, or bad, or anything in-between.

– Huh. Plans, as in what to do about saving the beaches where each storm washes away more of the sand?

– Yes, or about what to do about the climate change that makes the storms worse and the oceans rise.

– What, you guys can’t even come up with a plan for beach preservation on this little island, and now you want to talk about the oceans and global climate?

– Well, Renfroe, it looks like the problems of coming to some agreements about what to do are the same everywhere, just at a different scale.

– So where do your critters come in on that one?

– I guess we need to back track a little on that. It has to do with evaluation, say of different proposed plans, to decide which one is best, or whether any of them should be implemented. You could start by looking at them and then make an offhand judgment, say ‘good’; ‘not good’. If there are several proposals, you may want to use a more detailed scale, for example one with seven points ranging from ‘couldn’t be worse = – 3, to ‘couldn’t be better’: +3, with a midpoint of zero meaning ‘don’t know’ or ‘so-so’. In a group you’ll have to agree on some common scale. Now somebody asks you why you rate proposal X so high, and solution Y so low, since they came up with very different ratings from yours. So you may want to talk about the reasons, the basis of your judgments, maybe you each know something the other doesn’t know or see, that should be considered in making the decision. What do you do? How do you explain what makes a solution good or bad, in your view?

– Well, you could look at the various costs and benefits, and how well the plan will work for what it is meant to do?

– And how good it looks, if its’ a building or some thing we are looking at or living in?

– Yes, Sophie.

– And don’t we also have to worry about those ‘unexpected side-and-after-effects’ of plans, that people always talk about but keep forgetting?

– Good point. Next we list all those considerations or ‘aspects’ and make sure that we all mean the same thing when we name them. But we can only consider what people bring up in the discourse, so it’s important to make sure that is organized so as to let everybody put in their views. Now you can give each plan a ‘goodness’ judgment score – on the same scale — for each of those aspects. Let’s call those ‘partial’ judgments – all together they make up your ‘overall’, whole judgment for each plan. You’ll then have to explain how exactly all of those scores make up the overall judgment.

– And looking at the different aspects, you may want to reconsider the first overall, offhand judgment you made?

– Good point, Vodçek. Deliberating already. Learning. Excellent. But to get to the criterion functions: You realize how an explanation of a judgment the goodness score always consists of showing how the judgment relates to something else. That something else can be another judgment – look at how Sophie suggested that her overall ‘goodness’ judgment of, say, a building, should depend in part on the beauty of that building – which of course would be another judgment, and people may have different opinions about that. But the relationship would be that of some ‘degree of beauty’ about which she would have to make another ‘goodness’ judgment to explain how it contributes to the overall building goodness judgment.

– That’s not much on an explanation though, is it?

– Right. So you could ask her what, in her mind, makes a building beautiful. What would be you answer, Sophie?

– Well, if I got the sense that he just wants to annoy me with all these questions – because I can see how each answer will just lead to another one, and where will that end – I could just say that if he can’t see it he just doesn’t understand beauty and tell him to get lost. But if I think he really wants to learn what I see as beautiful, I might suggest that it has to do with, say, its proportions.

– Ah. Now we are getting closer to the criterion issue. Because proportions are really measurement relations: in a rectangle, the length of the short side to the length of the longer one. The relationship is now something we can ‘objectively’ measure. Quantitative. And many people are really adamant about making our decisions based on objective ‘facts’ and measures. So would it satisfy those folks to show them a graph that has the ‘objective’ measure on one axis – the horizontal one, say, and our judgment scale on the vertical one: If you are convinced that the most beautiful proportion is the ‘golden ratio’ – 1:1.618… the graph would touch the + 3 line at that 1.618… point, and go down from there in both directions. Like this, for example: Down to 1:1 (the proportion of the square) on one side, and down to -3 for infinitely long rectangles: 1:∞

Fig 1. A criterion function for proportion judgments

– Hey, didn’t we look at that proportion thing a while back? But then we looked at both sides on the 1:1 point – if the right side is for ‘vertical’ rectangles, the left one continues past the 1:1 point for horizontal ones.

– And if I remember correctly, didn’t somebody point out that there are many people who feel that the square, or the one based on the diagonal of the square, the √2 relationship, is the most beautiful proportion?

– You’re right. But that was in another book, a fat one, if I recall? So let’s not get too distracted by the details here. But perhaps we should just remember that there are many different forms of such functions: the one where ‘zero’ on the variable represents -3 on the judgment scale, which goes on to approach +3 at infinity, its opposite that starts with +3 at zero on the criterion scale and approaches -3 at infinity, the opposite of the golden ratio curve that goes from -3 at some value of our criterion and rises towards +3 on both sides.

Fig. 2. Different criterion function types

– Can’t think of an example for that one though. Except perhaps weird ones like people’s appreciation for apartment levels going up towards the higher floors and +3 at the penthouse – but with a sudden drop to -3 at floor 13?

– Goes to show that people’s preferences can take strange contortions… But those are personal judgments. I guess they are entitled to hold those, privately. Now, what about collective decisions?

– Yes, Vodçek. That is the point where we left off when Renfroe came in. We can explain how our subjective goodness judgment relates to some objective measurement. And this is useful: I can ask somebody to make a decision on my behalf if I give him my criterion function to make a selection – even though I know that his criterion function likely has a different shape and the high +3 judgment score is in a different place. But the issue we started from was the idea that there should be measurements – expressing values – that everybody should agree on, so that collective decisions could be guided by those values and measurements. ‘Meta-values’?

– Yes, that notion seems to make perfect sense to many people – for example the feel that trying to stop climate change is so important for the survival of human civilization – that it is such a meta-value that everybody should agree on so that we can start taking more effective action. And that this is even an ethical, a moral duty. So they can’t understand why there are some people who don’t agree.

– Right. So we were trying to untangle the possible reasons why they don’t.

– Short of declaring them all just blindly indoctrinated by political or religious views or outright fraudulent misrepresentations, eh?

– Yes, let’s not go there. I guess there may have been some confusion because at first there was no distinction between the ‘meta-values – that were quite general and abstract – and the corresponding measures one would need to actually guide decisions: You can’t really argue against a concept like the Common Good as a guiding value – but when it comes to pin down just what that common good is, and how to get an ‘objective’ measure for it, history shows that there’s plenty of disagreement both about the ‘common’ part – my family, tribe, town, country, my religion, humanity, all life on the planet? And even more vicious: about the part what the ‘good’ might be.

– I agree; and the attempt to justify those different views have led to some pretty desperate contortions of moral guidelines for decisions to meet those good things: I remember the old ‘dulce et decorum est pro patria mori’ (‘sweet and decorous it is to die for one’s country’), or promising all kinds of heavenly rewards in the hereafter to those willing to fight and kill and die for religious ideas.

– Well, if your tribe or country or your faith is your ‘common’, and somebody is attacking it, what’s wrong with that?

– I guess it becomes questionable when these definitions lead to more of the good is ending up with some people – the fellows who are spouting these glorious quotes and values and then get to make the decisions ‘on behalf’ of everybody else, while many of the everybody else are doing the dying and sacrificing. If it becomes apparent that these value leaders and doers have actually been acting more on behalf of their own good, the trust in those values is eroded. No matter how they are justified: by philosophical theories, by divine revelations (mostly revealed only to some special prophets), by political theories, or scientific theories. Or systems thinking. Or holistic thinking and awareness.

– Even scientific and systemic investigations?

That’s the tragedy, right. You may argue that science and systems tools are our best available tools for guiding action, based on objective measurements and facts, and have all the evidence and replicable observations to support them: if they then are used by leaders, — governments or ‘movements’ — to postulate ‘meta values’ to guide actions ‘on behalf of’ others, they will run into the same trust issues as all the other attempts to do the same, — unless…

– Unless what?

Well: unless they can also offer guarantees that those leaders are not just acting on their own behalf. Even if Congress were full of scientists and systems thinkers: if they make laws to deprive people of health insurance while themselves enjoying the best insurance and health care, no matter what justification they offer for their actions, they would run into trust problems. So that would be one condition.

– One condition, Vodçek? Are there others?

– In theory, yes. Its’ actually part of the job of the legislature: isn’t it in the constitution, even, that they should make laws on behalf of their constituents. And the representative notion is, of course, that representatives should be part of the communities they represent and communicate with then so that they know what the community wants? And then go and vote for laws that realize those preferences.

– Which they should do even if they don’t agree with them? Even if they know better? Doesn’t the constitution also state that they should vote according to their own conscience?

– Right: there seems to be some contradiction there. And it’s leading to that constant tug-of-war between the principle of electing representatives whose judgment we trust — to vote in view of the common good – even if we don’t agree, — because we don’t have all the information? Or because we are suffering from doctrinaire blindness or stupidity, or ethical or moral depravity? And to the alternative of passing laws by referendum, regardless of what the representatives are saying.

– Well, there’s the practical issue: we can’t possibly make all laws by referendum, can we? And if we are too ignorant to give our representatives proper guidance about how to vote so they have to vote by their own better knowledge and information and conscience, should we be allowed to vote on laws?

– But we are allowed to vote for the representatives? Isn’t there an expectation that the representatives should provide enough information to their constituents to know and understand the best available basis of judgment for the needed actions and also be confident that the representatives’ basis of judgment is sound enough to let them vote according to their own judgment conscience if and when necessary?

– Right. Theory, perhaps; current practice of governance looks a bit different. But that’s where our question about the criterion functions come in, again: They are part of the process of not only making up our minds about what actions to take or to support, but also of explaining our basis of judgment to each other. In the extreme, so that we can trust somebody else to make judgments and take action on our behalf – because we have conveyed our mutual basis of judgment well enough, as well as made sure it does give adequate consideration to all available information and concerns.

– So what we are trying to clear up for ourselves is this question: Are there general or universal ‘meta-values’ that can give us adequate criteria to guide effective action — not only let leaders invoke those values without an way of checking whether they actually are served by the proposed actions? And how would those criteria / measures come to be identified and agreed upon?
– Even more specifically, whether those measures should be determined by objective measurement, and then used by some people who are able or entitled to initiate ‘effective action’ to not only do so, — on behalf of everybody else – but to claim that this is ethically and morally defensible and necessary.

– And to declare everybody who disagrees to be ethically misguided – really: to be ‘bad’ or ‘evil’ people.

– Do they come right out and say so?

– I think both sides are, in so many words, unfortunately. So the question of whether there are ‘meta-values’ with related performance measures that everybody should adopt and thus support the resulting actions, is an important one. The sticking part being, to support collective actions taken on our behalf, and perhaps compelling us to take certain everyday actions ourselves, in pursuit of those meta-values. Didn’t we say, a little while ago, that somebody can take action on my behalf if I have given him my criterion functions? – But now it looks like the suggestion is the other way around: that there are criterion functions people tell us we should adopt, to comply with the ethical demands of the Meta Values. So we are trying to understand what that really means.

– Okay: so, let us assume that we have persuaded the participants in the climate change debate to explain their basis of judgment to each other, as much as possible by use of criterion functions. Diagrams that explain how or subjective ‘goodness’ judgments g about proposed plans relate to some ‘objective’, measurable properties of climate change aspects. Take the example of the ‘evil’ CO2 that we are adding to the atmosphere. Assuming that there is some agreement about how much that matters – a different scientific question that also needs sorting out in many people’s minds. And assuming there’s some meaningful agreement about how and where those levels of CO2 will be measured.

– What do you mean – are agreements about CO2 needed? Isn’t that just a simple scientific fact?

– No, Sophie: It makes a difference whether you measure CO2 content in the atmosphere, and at what height, or in the oceans. All those things must be sorted out. Then: is the ‘amount’ or ‘percentage’ of CO2 a good choice of criterion for explaining our judgments about the quality of a plan to improve things?

+ Sure: Haven’t scientists found out, in may serious studies, that the amount of CO2, at the time when we began so see substantial human-based climate change, was some value ‘c*’? Whether you wish to use the actual measure of the amount of CO2 in the atmosphere or its percentage, does not really matter, at least for the sake of this question. Now, say a plan or policy P1 has been proposed to try to get things back in line, whatever that means. So a person tries to explain how her judgment score on some scale +U, to –U, say +3 to -3 depends on how close a proposed plan comes to that ‘goal’ of c*. Would you say that to get a ‘goodness score of +3 (meaning ‘couldn’t possibly get any better’, on a ‘goodness scale of +3 ‘couldn’t possibly get any better, to -3 meaning ‘couldn’t possibly get any worse’ with a midpoint of zero ‘’so-so’, do don’t know’), the plan would have to achieve a return to that value ‘c*’? So you’d draw up a criterion function with the top of the curve touching the +3 line at c*. But another person B thinks that a slightly higher level of CO2 would be OK given some measures in the plan to mitigate it faster. It looks like this:

Figure 3: Criterion functions of two persons, for a plan to return the CO2 value to a desirable CO2 level
It shows their different judgment basis regarding what that level c* should be, showing the CO2-measure on the x-axis and the judgment score on the y-axis. Any lower or higher than that would get a lower score.

– Okay; I see. And you are saying that in principle, every participant in the evaluation would be entitled to his or her own curve? Even one that doesn’t have the highest score at ‘c*’?

– Now why would you – or anybody – put the highest score anywhere else?

– Well, aren’t you guys saying that having spewed so much CO2 into the atmosphere since it was at that benign level ‘c*’ has done a lot of damage already? So to repair that damage – to get the glaciers to grow back, for example, or to cool down the oceans, – it would be necessary to reduce the CO2 to a lower level than ‘c*’ – at least for a while? If you could calculate that level, that’s where a person B might put the +3 score. And give that score to another plan P2 that would achieve that level. Does that make sense?

Figure 4 – Criterion functions about assessment of CO2 levels relative to the ‘current state’
and the goal of ‘returning’ to a previous better state by temporarily lowering the target state below c*

– Wait a minute: wouldn’t it be better to look at the effectiveness of a plan to reduce the amount of CO2 – at least, as you say, for a while, until the effects of increased CO2 level have been ‘repaired’?

– Good question, at least for showing that even the choice of criterion is a controversial issue. You could say that it might be even better to look at how close to the ideal CO2 level a plan can get with its reduction effectiveness, and show how that relates to what would happen if nothing is done? Which is always an alternative ‘plan’?

– Good. But the question is really about whether the criterion of ‘c*’ or ‘c•’ is an appropriate one to base you judgment upon: Both plans are claiming to reach those values – claims that should be assessed for their plausibility, don’t forget – but at different points in time. So which one is ‘better’?

– I guess we should say: the plan that gets there ‘sooner’ – all other aspects being equal, for the sake of the argument.

– Yes, but doesn’t that mean you really have a different criterion? Specifically: the criterion of ‘time for the plan to get to c* — or c•? Which really is a different criterion, so which one are you going to use?

– Does it matter whether we use different levels of desirable c’s as long as we all come up with a goodness judgment g, and we each have explained what that judgment depends upon?

– If the outcome is that your judgment will support P1 while –‘s score will prefer P2 as the better plan – all else being equal, as you say: the problem isn’t really solved yet, is it?

Figure 5 – Criterion function of plan effectiveness of reduction of CO2 in relation to the ideal ‘stable’ c* level
and to the plan alternative of ‘doing nothing’.

– Hey guys, aren’t you missing something essential here? Well, you did sort of stumble on it, – with your remark ‘at least for a while’.

– Huh? Explain, Vodçek. What’s missing?

– Time, of course. In your criterion functions so far, you seem to make an assumption that your plans P1 and P2 will somehow ‘achieve’ some level of CO2 one they got implemented, and that things would then stay that way? But come on, that’s not how things work, is it? Even a plan for this kind of issue would have to involve activities, policies, processes, all continuing over time. It can’t be a sudden magic spell to change things overnight. And then it would take time for the actual CO2 level to change back to whatever level you’d have in mind — if the plan works. But how long is that going to take? And is it going to stay at that level? Shouldn’t that be part of your systemic ‘due consideration’?

– You are right. So let me see – looks like we have to draw up a kind of three-dimensional criterion function, is that what you are implying?

– Right. If you are serious about this idea of ‘explaining how your subjective judgment relates to objective criteria’. Not everybody does…

– Okay, I see your point. So we draw the diagram like this: Time on the horizontal right-left axis, CO2 – level on the horizontal up-down axis y, and your judgment on the vertical axis z.

– And before you ask, I think Dexter would say simulation models to predict the track of the CO2 levels for each plan over time can and have been developed, given reliable data…

– But now, what’s your score for those plans? If the CO2 levels and judgments change all the time? By the way, wouldn’t you have to also include a simulated track for the ‘current’ state of affairs and its future track if nothing is done?

+ Right. So we have the time CO2 simulation with three tracks – like this?

Figure 6: Criterion function of criterion changes and corresponding judgments over time

– Okay, what about instead looking at the difference between the desired targets and what the plans are actually achieving, over time? It would require looking at the total measure of those differences over time, and I agree that it would also require an agreement about the time frame we are willing to look at. People keep talking about the world we are going to leave for our grandchildren, from a more systemic and holistic perspective – but don’t offer any specifics about how we might distinguish one plan from another, or even from just doing nothing?

– I like your point about looking at the difference, the degree of improvement we can expect from a plan. What would that look like in your diagram?

– Well, the measure would have to be one like the area of the judgment ‘surface’ between two alternatives? Because we agree that in order to reach some kind of meaningful overall measure, it will have to be made up of judgment’ measures, right? But I’m sure that at least for individual judgment systems, ways can be found to calculate that, just as there are equations that describe the simple line functions in the previous examples. But for the time assessment, it will have to include some feature of ‘discounting’ the judgments over time – like calculating the ‘present’ value of a series of income or cost payments over time. Work to do, eh? Because you can see in the diagram that if a plan ‘works’, the improvement judgment area will get wider and wider over time – but after since the probabilities / plausibility judgments will also get more and more uncertain for long future time periods, they will become less and less important factors in the final overall judgment.

– I see what you mean. Not because we don’t care about future generations, mind you – but simply because we just can’t be certain about our predictions about the future.

– Right. Meanwhile, the discourse may become more productive if we threw more of these kinds of diagrams into the fray? Like this one about the judgment surfaces – I agree it needs discussion?

– So you may have to look at the actual results that those changes in CO2 will bring about, over time? Like: levels of ocean rise expected until the remedy sets in – will it actually recede if the plans are any good? I haven’t heard anybody even mention that yet… And provisions for what to do about sea level change until it stops rising?

Figure 7 – Criterion functions showing judgment surfaces of comparative improvement over time

– Coming to think of that – have there been any plans put out there with that kind of details?

– I admit, I haven’t been following all the reports on that controversy – but I am not aware of any.

– Hey, here’s a test for promoters of a plan: ask them whether they are willing to put up down payments for new ocean front real estate that will emerge if their plan works as intended?

– Renfroe… Ah well, maybe you’ve got something there. Get Al to put down money for property on the beach in front of Mar-a-Lago?

– You guys are giving me a headache. How are we ever going to get moving on the problem if you keep analyzing things to death and wasting time with stupid pranks?

– Well, who was claiming and boasting about taking a more systemic and holistic look at these problems, and of making decisions based on objective facts? It may be useful to turn that big talk down a notch, and focus on the painful nitty-gritty aspects of how to make these decisions. So that people can agree on something specific instead of calling each other names, and resisting plans because of distrust – deserved or not – about what the promoters for plan P1 or P2 are really after? Like power? The next election? More profit for one industry rather than another? That lack of trust may be the real reason why people keep resisting plans to do something about problems. But can’t talk about them because those suspected motives or their lack of them aren’t being part of the discourse?

– Ah: the discourse. Yes. How to talk about all that without starting to call each other names or throwing rocks at each other and storefronts… What does that take? Where’s Abbé Boulah?
oooo

1 Response to “On the use of criterion functions to explain our basis of judgment. A Tavern Talk”

Feed for this Entry Trackback Address

1 abbeboulah September 1, 2017 at 8:33 pm

Optimization
One claim in the summary got lost in the text – the comment about how criterion functions shed some light on the issue of optimization in the context of developing and evaluating collective responses – plans – to public problems.
Optimization as understood in its original sense as ‘aiming for the best’, implies ‘goodness’ judgment. We evaluate the prospective goodness on plans according to a number of aspects; and explain – at the extreme – how our goodness judgments are related to some objective measure of performance. While there are numerous sophisticated approaches for finding the ‘optimal’ values of such variables in existing or proposed systems as described by those variables (the optimal value having been pre-determined ‘outside’ of the systems model e.g. by the ‘client’) – but they ‘optimize’ the criterion, not the judgment score. (It should be called ‘maximization or ‘minimization’, but that term does not cover the following). This may be seen as plausible as long as the ‘best’ or ‘worst’ values of the criterion are either infinity or zero, respectively: for example: infinite profit, (best) and zero profit (worst) or infinite cost (worst) and zero cost (best). But for any situations in which the best or worst values of the criterion are some in-between number, that location is invariably subject to controversy, and declaring the achievement of the respective criterion value to be equivalent to ‘optimizing’ is illegitimate unless explicitly agreed upon.
For either situation, though, the emerging understanding of an ‘optimal’ plan would be the following;
a) For an individual, the plan would achieve a +3 goodness judgment (remember that the +3 judgments should be reserved for conditions that ‘couldn’t possibly be better’) for all aspects in that individual’s evaluation schedule, so that the resulting overall aggregated judgment would also be +3.
b) For a group, the ‘optimal’ plan would be one for which the goodness judgments of all participants in the evaluation process would be +3; which would require that all partial judgments would also be # for all participants and all aspects. This would be possible only if the all criterion functions – which might be more or less severe (‘steep’) for different participants – would have the +3 judgment score over the same value of the criterion.
These conditions are so unlikely to occur that the must be deemed impossible. As a result, any claim of optimality could be challenged by a simple demonstration of a deviation from these ‘unanimity’ conditions for only one aspect of one participant. It therefore seems prudent to avoid this term altogether, and to rather focus on questions such as: which of several proposed actions (they should always include the ‘no action’ alternative) shows the highest overall (group) score or statistical value; or what is the minimal such score that a plan should achieve to be at all ‘acceptable’.

Abbe Boulah’s Weblog