Archive for the 'Uncategorized' Category

Eerily erring electioneering?

In the Fog Island Tavern on a dreary day in February:

– You look worried, Bog-Hubert: What’s bugging you today?
– Oh boy. I never thought I’d see Abbé Boulah getting worked up over politics, but let me tell you, Vodçek: this election is getting to him.
– Really? I thought he’d written off this whole voting business long ago, as a totally misguided crutch to bring any political or planning discourse to a meaningful decision?
– Yeah, he keeps working on his schemes to improve that. But you should have heard him this morning — you’d think he’s still training hard for his old pet project to get endurance cussing accepted as a new Olympic discipline —
– So what is it that’s getting riled up on this one now?
– Well, I think he’s mainly disappointed in the candidates’ apparent inability to learn from past mistakes, and to focus on what’s really important. For example, this business about starting to discredit the current front runner, because he’s too, shall we say, unorthodox for the party establishment.
– What’s wrong with that? It’s politics, isn’t it?
– Fulminating stinkbomb-bundles and moccasin-mouth-ridden swamp-weed kudzu tangles: you too, now?
– Oh Bog-Hubert: excellent — you’re shooting for a medal in that sport too?
– By all the overgrown rusty Dodge truck skeletons in my cousin’s front yard: Don’t you, don’t they get it?
– Get what? it’s BAU politics. So, care to explain?
– Well, isn’t it obvious: Rather than tearing each other apart, shouldn’t they try to figure out what it is that makes the frontrunner’s — and the opposition’s message more appealing to those voters they want to convince to vote for them, and come up with a b e t t e r message, a more appealing and convincing vision?? Because that strategy is bound to come back and kick’em in the youknowwhat…
– Hmm. I see what you mean, by Abbe Boulah’s drooping mustache! And It’s giving the opposition free stinkbombs to launch at whoever ends up being the nominee…
– Yeah. And not only that: What if part of the problem is precisely that old habit of the old swamp establishment — of both parties — that those disgruntled voters are getting tired of? And that’s the rusty musket the establishment keeps shooting itself in the foot with?
– I can see why this upsets our friend. The futility of the hope that they’ll ever learn, I mean. Let’s try to get him back to work on those better ways he’s working on…
– I’ll drink to that. Do they make a decent grappa from Sonoma grapes?

— o —

EVALUATION IN THE PLANNING DISCOURSE — TARGET AUDIENCE

An effort to clarify the role of deliberative evaluation in the planning and policy-making process.  Thorbjørn Mann,  February 2020

TARGET AUDIENCE


Audience and Distribution: Overview

The target audience for the results of the effort to evaluate the role of evaluation in the planning discourse is admittedly immodestly diverse. While it may be of interest to many participants in the social media groups currently discussing related issues who will be consultants, offering services and tools planning, problem-solving‘ and ‘change management’ to corporate and institutional clients, the focus here will be on public planning, at all levels from small, local communities to national and international and ultimately global challenges. Thus, the issues concern any officials as well as the public involved in planning. But it is especially at the global level of challenges and crises that transcend the boundaries of traditional institutions, that traditional decision-making modes and habits break down or become inapplicable, generating calls for new ideas, approaches and tools. Increased public participation is a common demand.

The planning discourse at all levels will have to include not just traditional planning experts, decision-makers in all institutions faced with the need for collective action, but also the public. New emerging IT tools and procedures must also be applied to the evaluation facet of planning engaging all potentially affected parties, and leadership as well as the public will have to be involved and become familiar and competent with their use. This will call for appropriate means for generating that familiarity: information, education.

Obviously, at present, whatever discussion and presentation tools are chosen for this exploration of evaluation in public planning discourse, they will not be adequate for informing and achieving the aim of developing definitive answers, not even carrying out an effective discussion. It must be seen as just a first step in a more comprehensive strategy. To the extent that meaningful results emerge from this discussion, the issue of how to bring the ideas to a wider audience for general adoption will become part of the agenda. It should include education at all levels, down to general education for all citizens, not only higher levels. Thus, the hope is to reach planners and decision-makers for general education.

The audience that can be reached via such vehicles as this blog, selected social media, and perhaps a book, will be people who have given these issues some thoughts already, that is: ‘experts‘. So any discussion it will incite, will likely involve discipline ‘jargon’ of several kinds. But in view of a desired larger audience, the language should remain as close to conversational as possible and avoid ‘jargon’ too unfamiliar to non-experts. Many valuable research results and ideas are expressed in academic, ‘scientific’, or technical terms that are likely to exclude parties from the discussion that should be invited and included.

Given the wide range of people and institutions involved with planning, the question of ‘target audience’ may be inadequate or incomplete: it should be expanded to look at the best ways for distributing these suggestions. Besides traditional forms of distribution such as books, textbooks, manuals, new forms or media of familiarizing potential users may have to be developed; for example, online games simulating planning projects using new ideas and methods. This aspect of the project is especially in need of ideas and comments.

–o–

EVALUATION IN PLANNING DISCOURSE: DECISION CRITERIA

Thorbjørn Mann, January 2020

DECISION CRITERIA

The term ‘Decision criteria‘ needs explanation, so as to not be confused with the ‘evaluation criteria‘ used for the task of explaining one’s subjective ‘goodness (or ‘quality’ ) judgment about a plan or object by showing how it relates to an ‘objective’ criterion or performance measure (in section /post …) The criteria that actually determine or guide decision may be very different from those ‘goodness’ evaluation criteria — much as the expectation of the entire effort here is to get decisions that are more based on the merit of discourse contributions that clarify ‘goodness.

For discourse aiming at actual actions to achieve changes in the real world we inhabit: when discussion stops after all aspects etc. have been assessed and individual quality judgment scores have been aggregated into individual overall scores and into group statistics about the distribution of those individual scores, a decision or recommendation has to be made. The question then arises: what should guide that decision? The aim of “reaching decisions based on the merit of discourse contributions” can be understood in many different ways, of which actual ‘group statistics’ are only one — not only because there are several such statistical indicators. (It is advisable to not use the term ‘group judgment‘ for this: the group or set of participants may make a collective decision, but there may be several factions within the group for which any single statistic may not be representative; and the most familiar decision criterion in use is the ratio of votes for or against a plan proposal — which may have little if any relation to the group members’ judgments about the plan’s quality.)

The following is an attempt to survey the range of different group decision criteria of guiding indicators that are used in practice, in part to show why the planning discourse for projects that affect many different governance entities (and, finally, decisions of ‘global’ nature) are calling for different decision guides than the familiar tools such as majority voting.

A first distinction must be made between decision guides we may call ‘plan quality’– based, and those that are more concerned with discourse process.

Examples of plan quality-based indicators are of course the different indicators derived from the quality-based evaluation scores:
–  Averaged scores of all ‘Quality’ or ‘Plausibility’ (or combined) judgment scores of participating members;
–  ‘Weighted average’ scores (where the manner of weighting becoming another controversial issue: degree of ‘affectedness’ of different parties? number of people represented by participating group representatives? number of stock certificates held by stock holders?…)
–  As the extreme form of ‘weighting’ participant ’judgments: the ‘leader’s judgment;
–  The judgment of ‘worst-off’ participants or represented groups (the ‘Max-min’ criterion for a set of alternatives);
–  The Benefit-Cost Ratio;
–  The criterion of having met all ‘regulation rules’ — which usually are just ‘minimal’ expectation considerations (‘to get the permit’) or thresholds of performance, such as ‘coming in under the budget’;
–  Successive elimination of alternatives that show specific weaknesses for certain aspects, such that the remaining alternative will become the recommended decision. A related criterion applied during the plan development would be the successive reduction of the ‘solution space’ until there is only one remaining solution with ‘no alternative’ remaining.

Given the burdensome complexity of more systematic evaluation procedures, many process-based’ criteria are preferred in practice:

– Majority voting; in various forms, with the extreme being ‘consensus’ — i. e. 100% approval;
– ‘Consent’ — understood less as approval but acceptance with reservations either not voiced or not convincing a majority. (Sometimes only achieved / invoked in live meetings by determinations such as ‘time’s up’ or ‘no more objections to the one proposed motion).
– ‘Depth and breadth’ of the discussion (but without assessment of the validity or merit of the contributions making up the breath or depth);
– ‘All parties having been heard / given a chance to voice their concerns;
– Agreed-upon (or institutionally mandated) procedures and presentation requirements having been followed, legitimating approval, or violated, leading to rejection e.g. of competing alternatives; (‘Handed in late’ means ‘failed assignment…’)

Of course, combinations of these criteria are possible. Does the variety of possible resulting decision criteria emphasize the need for more explicitly and carefully agreements: establishing clear, agreed-upon procedural rules at the outset of the process? And for many projects, there is a need for better decision criteria. A main reason for this is that in many important projects affecting populations beyond traditional governance boundaries (e.g. countries) traditional decision determinants such as voting become inapplicable not only because votes may be based on inadequate information and understanding of the problem, but simply because the number of people having ‘voting right’ becomes indeterminate.

A few main issues or practical concerns can be seen that guide the selection of decision criteria: The principle of ‘coverage’ of ‘all aspects that should be given due consideration’ on the one hand, with the desire for simplicity, speed and clarity on the other. The first is aligned with either trust or demonstration (‘proof’ ) of fair coverage: ‘accountability’; the second with expediency. Given the complexity of ‘thorough’ coverage of ‘all’ aspects, explored in previous segments, it should be obvious that full adherence to this principle would call for a decision criterion based on the fully explained (i.e. completed evaluation worksheet results of all parties affected by the project in any way, properly aggregated into an overall statistic accepted by all.

This is clearly not only impossible to define but practically impossible to apply — and equally clearly situated at the opposite end of an ‘expediency’ (speed, simple to understand and apply) scale. These considerations also show why there is a plausible tendency to use ‘procedural compliance criteria‘ to lend the appearance of legitimacy to decisions: ‘All parties have been given the chance to speak up; now time’s up and some decision must be made (whether it meets all parties’ concerns or not.)

It seems to follow that some compromise or ‘approximation’ solution will have to be agreed upon for each case, as opposed to proceed without such agreements, relying on standard assumptions of ‘usual’ procedures, that later lead to procedural quarrels.

For example, one conceivable ‘approximation’ version might be to arrange for a thorough discussion with all affected parties being encouraged to voice and explain their concerns, but only the ‘leader’ or official responsible for actually making the decision be required to complete the detailed evaluation worksheets — and to publish it to ‘prove’ that all aspects have been entered, addressed (with criterion functions for explanation) and given acceptable weights, and that the resulting overall judgment, aggregated with acceptable aggregation functions, corresponds with the leaders’s actual decision. (One issue in this version will be how ‘side payments’ or ‘logrolling’ provisions to compensate parties that do not benefit fairly from the decision but whose votes in traditional voting procedures would be ‘bought’ to support the decision, should be represented in such ‘accounts’.

This topic may call for a separate, more detailed exploration of a ‘morphology‘ of possible decision criteria for such projects, and an examination of evaluation criteria for decision guides or modes to help participants in such projects agree on combinations suited to the specific project and circumstances.

Questions? Missing aspects? Wrong question? Ideas, suggestions?

Suggestions for ‘best answers’ given current state of understanding:
– Ensure better opportunity for all parties affected by problems or plans to contribute their ideas, concerns, and judgments: (Planning discourse platform);
– Focus on improved use of ‘quality/plausibility’ based decision guides, using ‘plausibility-weighted quality evaluation procedures explained and accepted in initial ‘procedural agreements’;
– Reducing the reliance on ‘process-based criteria.

Evalmap Decision criteria
Overview of decision criteria (indices to guide decisions)

–o–

Abbe Boulah’s Brexit Solution

– Say, Bog-Hubert: What’s going on out there on the Fog Island Tavern deck?
– Good question, Vodçek. I thought I’d seen everything, but this…
– That bad, eh? Who’s that guy there with Abbe Boulah?
– It’s a tourist from the EU. Don’t know he got lost out here. must be a friend of a friend of Otis. And he got into a bragging contest with Abbe Boulah about which part of the world is crazier; more polarized, has weirder politicians.
– Must be a toss-up, if you ask me.
– Right. So now Abbe Boulah is trying to teach the EU fellow — I couldn’t really figure out if he’s a still-EU Brit or from another part over there — how to fix the Brexit mess.
– Good grief. So what’s Abbé Boulah’s solution?
– It actually looked like a brilliant idea for a while, but…
– Now you’ve got me curious. Do I have to bribe you with a shot of your own moonshine production, the Tate’s Hell Special Reserve?
– Psst. Okay, talked me into it. He stunned the poor guy with the simplicity of the idea: Let the good, compassionate Europeans help the poor Brits out of the conundrum they voted themselves into. Instead of haggling for years about the details of a hard or soft or a medium-well done exit, he said: Why don’t you simply dissolve the EU?
– Urkhphfft: What??
– Hang on, don’t choke on that Zin of yours. It’s only for a day, Abbé Boulah says: All the other countries who want to stay in the union voting the next day to re-unite the union, just with a little change of the name. So: Brexit? Done. Stroke of the pen. Paid vacation for a day, for all the EU employees. Get it? A few crazy regulations not getting written, it’s actually a benefit for everybody…
– But…
– Yes. Worried about things like the trade treaties? He said, they are all reinstated as they were, for now; and the UK can either choose to re-join the new thing, or stay out and re-negotiate the individual agreements one by one, without any deadlines, while the existing arrangements stay as they are until replaced.
– Weird. But, I must say, it has a certain Abbeboulistic appeal, eh?
– Yes — but now they are arguing about what the new name should be. They agree that it should just be a minimal exchange or change in the current name, so it wouldn’t cost too much. Such as just adding or deleting something in a single letter of the name.
– Makes sense.
– You’d think so. But now they’re up in arms about whether it should be ‘Buropean’ or ‘Iropean’ or ‘Furopean’ or ‘Luropean’ or Nuropean’– all just messing a little with the ‘E’ — or European Onion’ or ‘RUnion’ or just adding a ‘2’ (starting a series of ‘generations’ like some computer system: ‘ EU2’, 3, 4…) or a star ‘ European* Union’ or ‘*European’ or ‘European Union* — ‘EU*’ And another star in the flag. Or just put the whole current name in quotation marks… It’s getting vicious, I tell you — you may have to go out there and throw them into the channel to cool them off…

EVALUATION IN THE PLANNING DISCOURSE — THE OBJECTIVITY-SUBJECTIVITY CONTROVERSY

An effort to clarify the role of deliberative evaluation in the planning and policy-making process

Thorbjørn Mann

OBJECTIVE VERSUS SUBJECTIVE JUDGMENT :
MEASUREMENT VERSUS OPINION

There is a persistent controversy about objective versus subjective evaluation judgments: often expressed as the ‘principle’ of forming decisions based on ‘objective facts’ rather than (‘mere’) subjective opinions. It is also framed in terms of absolute (universal, timeless) values as opposed to relativistic subjective preferences. The desire or quest for absolute, timeless, and universal judgments upon which we ought to base our actions and decisions about plans is an understandable, legitimate and admirable human motivation: Our plans — plans that affect several people, communities — should be ‘good’ in a sense that transcends the temporary and often misguided ‘subjective’ opinions of imperfect individuals. But the opposite view — that individuals are entitled to hold such subjective values (as part of the quest to ‘make a difference’ in their lives) is also held to be even a constitutionally validated human right (the individual right of ‘pursuit of happiness’). The difficulty of how to tell the difference between the two principles and reconcile them in specific situations makes it a controversial issue.

The noble quest of seeking solid ‘facts’ upon which to base decisions leads to the identification of selected ‘objective’ features of plans or objects with their ‘goodness’. Is this a mistake, a fallacy? The selection of those features, from among the many physical features by which objects and plans can be described, is done by individuals. However wise, well-informed, well-intentioned and revered those authorities, does this makes even the selection of features (for which factual certainty can be established), their ‘subjective’ opinions?

Is the jump of deriving factual criteria from the opinions of ‘everybody’, even from the comparison of features of objects from different times as proof of their universal timeless validity, to be considered a mistake or wishful thinking? ‘Everybody’ in practice just means a subset of people, at specific times, in specific context conditions, in surveys mostly separated from situations of actual plans and emergencies and the need for making decisions in the face of many divergent individual judgments.

Regarding ‘timelessness’: the objective fact that the forms and styles of the same category of buildings, (for example, churches and temples, which are expressly intended to convey timeless, universal significance), are changing significantly over time, should be a warning against such attempts of declaring the identity of certain physical properties with the notions of goodness, beauty, awe, wholeness etc. that people feel when perceiving them.

What are the options for participants in a situation like the following?
Two people, A and B, are in a conversation about a plan something they have to decide upon.
They encounter a key claim in the discussion, involving whether an aspect of the plan, that will significantly guide their decision about what to do. The claim is about whether the proposed plan can be said to have certain features that constitute a quality (goodness, or beauty). They find that they both agree that the plan indeed will have those features. But they vehemently disagree about whether that also means that it will have the desired ‘goodness’ quality. An (impartial) observer may point out that in common understanding, the agreement about the plan having those observable features is called an ‘objective’ fact, and that their assessments about the plan’s goodness are ‘subjective‘ opinions. Person A insists that the objective fact of having those objective features implies, as a matter of equally objective fact, also having the desired quality; while B insist that the features do not yet offer the desired experience of perceiving goodness or beauty at all. What are the options they have for dealing with this obstacle?

a) A attempts to persuade B that the features in question that constitute quality are part of a valid theory, and that this should compel B to accept the plan regardless of the latter’s feeling about the matter. The effort might involve subtle or not-so-subtle application or invocation of authority, power, experience, or in the extreme, of labeling B a member of undesirable categories: (‘calling B names’): an ignorant follower of disreputable ‘isms’, insensitive, tasteless beings unable to perceive true quality, even conscious or subconscious shameful pursuits not letting him (B) admit the truth. Of course, in response, B can do this, too…

b) B can attempt to find other features that A will accept (as part of A’s theory, or an alternate theory) that will generate the feeling of experiencing quality, but let A continue to call this a matter of objective fact, and B calling it subjective opinion. This may also involve B’s invoking compelling factors that have nothing to do with the plan’s validity or quality, but e.g. with a person’s ‘right’ to their judgment, or past injustices committed by A’s group against B’s tribe, etc.

c) They can decide to drop the issue of objective versus subjective judgments as determinants of the decision, and try to modify the plan to contain features that will contain both the features A requires to satisfy the theory, and B’s subjective feelings. This usually requires making compromises, one or both parties backing off from the complete satisfaction of their wishes.

d) They could call in a ‘referee’ or authority, to make a decision they will accept for the sake of getting something — anything,– done, without either A or B having to actually change their mind.

e) They can of course abandon the plan altogether, because of their inability to reach common ground.

There does not seem to be an easy answer to this problem that would fit all situations. Seen from the point of view of planning argumentation, where there is an attempt to clearly distinguish between claims (premises of arguments) and their plausibility assessment: is the claim of objectivity of certain judgment an attempt to get everybody to assign high plausibility values to those claims because of their ‘objectivity’?

Stating such claims as planning arguments make it easier to see that theories claiming desirable quality-like features to be implied by must-have objective properties show the different potential sources of disagreements. In an argument like the following, A (from the above example) claims: The plan should have property (feature) f, because f will impart quality q to the plan, given conditions c, which are assumed to be present . The ‘complete’ argument separating the unspoken premises is:

D: The Plan should include feature f                (‘Conclusion’)
because
1)   FI (f –> q) (if f, then q)                             (Factual – instrumental premise)
and
2)   D(q)                                                           (Deontic premise)
and
3)   F(c)                                                            (Factual premise)

Adherent A, of a theory stating postulates like “quality q is generated by feature f, and that this is a matter of objective fact” — may be pointing to the first and third premises that arguably involve ‘factual’ matters. Participant B may disagree with the argument if B thinks (subjectively) that one or several of the premises are not plausible. B may even agree with all three premises — with an understanding that f is just one of several or many ways of creating plans with quality q. Thus, B will still disagree with the ‘conclusion’ because there are reasons — consequences — to refrain from f, and look for different means to achieve q in the plan. This is a different interpretation of the factual-instrumental premises: A seems to hold that premise 1 actually should be understood as “if and only if f, then q”. (The discussion of planning arguments thus should be amended to make this difference clear.) Does the theory involve a logical fallacy by jumping from the plausible observation that both premises 1 and 3 involve ‘objective’ matters, to the inference “iff f then q”? Such a theory gets itself into some trouble because of the implication of that claim: “if not-f then not q”? A proponent of Alexander’s theory (1) seems to have fallen to this fallacy by claiming that because the shape of an egg does not meet some criteria for something having beauty according to this theory, the egg shape cannot be considered to be beautiful. Which did not sit well even with other adherents to the basic underlying theory.

The more general question is: Must this issue be addressed in the ‘procedural agreements’ at the outset of a public planning project? And if so: how? What role does it play in the various evaluation activities throughout the planning process?

One somewhat facile answer might be: If the planning process includes adequate public participation — that is, all members of the public, or at least all affected parties, are taking part in it, including the decisions whether to adopt the plan for implementation — all participants would make their own judgments, and the question would just become the task of agreeing on the appropriate aggregation function (see the section on aggregation) for deriving an overall ‘group’ decision from all individual judgments. If this sounds too controversial, the current practice of majority voting (which is one such aggregation function, albeit a problematic one) should be kept in mind: it is accepted without much question as ‘the way it’s done’. Of course, it just conveniently sidesteps the controversy.

Looking at the issue more closely, things will become more complicated. For one, there are evaluation activities involved in all phases of the planning process, from raising the issue or problem, searching for and interpreting pertinent information, writing the specifications or ‘program’ for the designers to develop ‘solution’ proposals, to making a final decision. Even in the most ambitious participative planning processes, there will be very different parties making key decisions in these phases. So the question becomes: how will the decisions of ‘programming’ and designing (solution development) impact the outcome, if the decision-makers in these early phases are adherents of a theory that admits only certain ‘objective’ features as valid components of solution proposals, and ignores and rejects concerns stated as ‘subjective opinions’ by affected parties? So that the ‘solutions’ the ‘whole community’ is allowed to participate in accepting or rejecting are just not reflecting those ‘subjective’ concerns?

For the time being, one preliminary conclusion drawn here from these observations may be the following: Different expressions and judgments about whether decisions are based on timeless, universal and absolute value considerations or ‘subjective opinions‘ must be expected and accommodated in public planning, besides ‘objective’ factual information. Is it one of the tasks of designing platforms and procedures to do that, to find ways of reaching agreement about practical reconciliation of these opinions? Is one first important step towards that goal the design and development of better ways to communicate about our subjective judgments and about how they relate to physical, ‘objectively measurable’ features? This question is both a justification for the need for deliberative evaluation in collective planning and policy-making, and one of its key challenges.

These are of course only some first draft thoughts about the controversy that has generated much literature, but has not yet been brought to a practical resolution that can more convincingly guide the design of a participatory online public planning discourse platform. More discussion seems to be urgently needed.

Note 1):  Bin Jiang in a FB discussion about Christopher Alexander’s theory as expressed in his books: e.g. ‘A Pattern Language’ and ‘The Nature of Order’

–o–

EVALUATION IN THE PLANNING DISCOURSE — AGGREGATION

An effort  to clarify the role of evaluation in the planning process.

Thorbjørn Mann

THE AGGREGATION PROBLEM:

Getting Overall Judgments from Partial Judgments

The concept of ‘deliberation’ was explained, in part, as the process of ‘making overall judgments a function of partial judgments’. We may have gone through the process of trying to explain our overall judgment about something to others, or made the effort of ‘giving due consideration’ to all aspects of the situation, we arrived at a set of partial judgments. Now the question becomes: just how do we‘assemble’ (‘aggregate’) these partial judgments into the overall judgment that can guide us in making the decision, for example, to adopt or reject the proposed plan.

The discussion has already gone past the level of familiar practices such as merely counting the number of supporting and opposing ‘votes’ and even some well-intentioned approaches that begin to look at the number of explanations (arguments or support statements) in the ‘breadth‘ (number of different aspects brought up by each supporting or opposing party, and ‘depth‘ — the number of levels of further support for the premises and assumptions of the individual arguments.

The reason why these approaches are not satisfying is that neither of them even begin to consider the validity, truth and probability (or more generally: plausibility), weight or relevance of any of the aspects discussed, or whether the judgments about any such aspects or justifications even have been ‘duly considered’ and understood.

Obviously, it is the content merit, validity, the ‘weight’ of arguments etc. we try to bring to bear on the decision. Do we have better, more ‘systematic’ ways to do this than Ben Franklin’s suggestion? (He recommended to write up the pros and cons in two different columns on a sheet of paper, then look at pairs of pros and cons that carry approximately equal weight and cancel each other out, and cross those pairs out, until there are the remaining arguments left that do not have any opposing reasons in the opposite column: those are the ones that should tilt the decision towards approval or rejection.)

What we have, on the one hand, is the impressively quantitative ‘Benefit/Cost’ approach, that works by assigning monetary value to all the b e n e f i t s of a proposed plan (the ‘pro’ arguments), and compare those with the monetary value of the ‘c o s t’ of implementing it. It has run into considerable criticism, mainly for the reasons that the ‘moral’ reluctance of having to assign monetary value to people’s health, happiness, lives; the fact that the approach usually has to be done by ‘experts’, not by citizens or affected groups, and from the overall point of view of some overall ‘common good’ perspective that is the usually ‘biased’ perspective of the government currently in power, that may not be shared by all segments of society, because it tends to hide the issue of the distribution of benefits and costs: inequality.

On the other hand, we have the approaches that separate the ‘description’ of the evaluated plan or object to be evaluated from the perceived ‘goodness’ (‘quality’) judgments about the plan and its expected outcome, from the‘validity’ (plausibility, probability) of the statements (arguments) conveying the claims about those outcomes. And, so far, the assumption that ‘everybody‘ including all ‘affected’ parties can make such judgments and ‘test’ their merit in a participatory discourse. What is still missing are the possible ways in which they can be ‘aggregated’ into overall judgments and guiding measures of merit for the decision– first, for individuals, and then for any groups that will have to come to a commonly supported decision. This is the topic to be discussed under the heading of ‘aggregation’ and ‘aggregation functions’ — the rules for getting ‘overall’ judgments from partial judgments and ‘criterion function’ results.

It turns out that there are different possible rules about this, assumptions that must be agreed upon in each evaluation situation, because they result in different decisions: The following are some considerations about assumptions or expectations for ‘aggregation functions (suggested in H. Rittel’s UC Berkeley lectures on evaluation, and listed in H. Dehlingers article  “Deontische Fragen: Urteilsbildung und Bewertungssysteme”  in “DIe methodische Bewertung: Ein Instrument des Architekten”  Festschrift zum 65. Geburtstag von Prof. Arne Musso, TU Berlin, 1993):

Possible expectation considerations for aggregation functions:

1 Do we wish to arrive at a single overall judgment (of quality / goodness or plausibility etc.) — one that can help us distinguish between e.g. plan alternatives of greater or lesser goodness?

2 Should the judgments be expressed on a commonly agreed-upon judgment scale whose end points and interim values ‘mean’ the same for all participants in the exercise? For example, should we agree that the end points of a ‘goodness’ judgment scale should mean ‘couldn’t possibly be better’ and ‘couldn’t possibly be worse’, respectively; and that there should be a ‘midpoint ‘ meaning’ neither good nor bad; indifferent; or ‘don’t know, can’t make a judgment’? (Most judgments scales in practice are expressed on a ‘zero to some ‘one-directed’ scale such as zero to some number.)

3 Should the judgment scale be the same at all levels of the aspect tree, to maintain consistency of the meaning of scores at all levels? So any equations for the aggregation functions should be designed to produce the respective overall judgment at the next higher level to be a score on the same scale.

4 Should the aggregation function ensure that if a partial score is improved, the resulting overall score should also be higher or the same, but not lower (‘worse’) than the unimproved score? By the same rule, the overall score should not be better than the previous score, if one of the partial judgments becomes lower than before.
This expectation means that in a criterion function, the line showing the judgments cores should be steadily declining and decreasing, but not have sudden spikes or valleys.

5 Should the overall score be the highest one (say, +3 = ’couldn’t be better’, on a +3/-3 scale) only if all partial scores are +3?

6 Should the overall score be a result of ‘due consideration’ of all the partial scores?

7a Should the overall score be ‘couldn’t be worse’ (e.g. -3 on the +3/-3 scale) if all partial scores are -3?
Or
7b Should the overall score become -3 if one of the partial scores becomes -3 and thus unacceptable?

Different functions — equations of ‘summing up partial judgments — will be needed for this. There will be situations or tasks in which aggregation functions meeting expectation 7b may be needed. There is no one aggregation function meeting all these expectations. Thus, the choice of aggregation functions must be discussed and agreed upon in the process.

Examples:

‘Formal’ Evaluation process for Plan ‘Quality’

Individual Assessment

The aggregation functions that can be considered for individual ‘quality’ evaluation (deliberating goodness judgments, aspect trees, and criteria i what may be called ‘formal evaluation procedures’) include the following:

Type I:    ‘Weigthed average’ function:    Q = ∑ (qi * wi)
                                                                       
where Q is the overall deliberated ‘quality’ or ‘goodness’ score; qi is the partial score of aspect or sub-aspect i, n is the number of aspects at that level; wi is the weight of relative importance of aspect i, on a scale of 0 ≤ wi ≤ 1 and such that ∑wi = 1. This is needed to ensure that Q will be on the same scale (and the associated meaning of the resulting judgment score the same) as q.

This function does not meet expectation 7b; it allows ‘poor scores’ on some aspects to be compensated for by good scores on other aspects.

Type II a:  (“the chain as strong as its weakest link” function):      Q = Min (qi)

Type IIb:        Q = ∏ ((qi + u) ^wi ) – u
                       
Here, Q is the overall score, qi the partial score i of n aspects, and u is the extreme value of the judgment score (e.g. 3 in the above examples). This function, (multiplying all the components of (qi + u) with the exponent of their weights wi, and then subtracting u from the result to get the overall score back to the +3/-3 scale) acts much like the type I function as long as all the scores are in the positive range, but pulls the overall score the closer to -u , the lower one of the scores comes to – u, the ‘unacceptable’ performance or quality. (Example: if the structural stability of a building does not stand up against expected loads, it does not matter how otherwise functionally adequate or aesthetically pleasing it is: its evaluation should express that it should not be built.)

Group assessments:

Individual scores from these functions can be applied to get statistical ‘Group’ indicators GQ : for example:

GQ = 1/m ∑ Qj
This is the average or mean of all individual Qj scores for all m participants j.

GQ = Qj
This takes the judgment of one group member as the group score.

GQ = Min (Qj)
The group score is equal to the score of the member with the lowest score in the group; both these functions effectively make one participant the ‘dictator’ of the group…

Different functions should be explored that, for example, would consider the distribution of the improvement of scores for a plan, compared with the existing or expected situation the plan is expected to remedy. For example, the form of aggregation function type IIb could also be used for group judgment aggregation.

The use of any of these aggregated, (‘deliberated’ ) judgment scores as a ‘direct’ guiding measure of performance determining the decision c a n n o t be recommended: they should be considered decision guides, not determinants. For one, the expectation of ‘due consideration of all aspects‘ would require complete knowledge of all consequences of a plan and causes of the problem it aims to fix — an expectation that must be considered unrealistic in many situations but especially in ‘wicked’ problems or ‘messes’. There, decision-makers must be willing to assume responsibility for the possibility of being wrong — a condition impossible to deliberate, by definition, when caused by ignorance of what we might be wrong about.

Aggregation functions for developing overall ‘Plan plausibility’ judgment
from the evaluation of ‘pro’ and ‘con’ arguments.

Plausibility judgments

It is necessary to reach agreements about the use of terms for the merit of judgments about plans as derived from argument evaluation, because the evaluation task for planning arguments is somewhat different from the assessment usually applied to arguments. Traditionally, the purpose of argument analysis and evaluation is seen as that of verifying whether a claim — the ‘conclusion’ of an argument — is true or false, and this is seen as depending on the truth of the premises of the argument and the ‘validity’ of the form or pattern or ‘inference rule’ of the argument. These criteria do not apply to planning arguments, that can generally be represented as follows: (Stating the ‘conclusion’ — the claim about a proposed plan A first:)

Plan A ought to be implemented
because
Plan A will result in outcome B, (given or assuming conditions C);
and
Outcome B ought to be aimed for / pursued;
and
Conditions C are given (or will be when the plan is implemented)

Like many arguments studied by traditional logic and rhetoric, not all argument premises are stated explicitly in discussions; some being assumed as ‘taken for granted’ by the audience: ‘Enthymemes’. But to evaluate these arguments, all premises must be stated and considered explicitly.

This argument pattern — and its variations due to different constellations of assertion or negation of different premises — does not conform to the validity conditions for ‘valid’ arguments in the formal logic sense: it is, at best inconclusive. Its premises cannot be established as ‘true or false‘ — the proposed plan is discussed precisely because it as well as the outcomes B aren’t there (‘true’) yet. This also means that some of the premises — the factual-instrumental claim ‘If A is implemented, then B will happen, given C) and the claim ‘C will be present’ are estimates or predictions qualified as probabilities. And ‘B ought to be pursued’ as well as the conclusions ‘A ought to be implemented) are neither adequately called ‘probable’ nor true or false: the term ‘plausible’ seems more fitting at least for some participants, but not necessarily for all. Indeed: ‘plausible’ judgments may be applied to all the claims, with the different interpretations easily understood to each kind. This is is a matter of degrees, not a binary yes/no quality. And unlike the assessment of factual and even probability claims in common logic argumentation studies, the ‘conclusion’ (decision to implement) is not determined by a single ‘clinching’ argument: it rests on several or many ‘pros and cons’ that must be weighed against each other. That is the evaluation task for planning argumentation, that will lead to different ‘aggregation’ tools.

The logical structure of planning argumentation can be stated in simplified for as follows:

– An individual’s overall plausibility judgment of plausibility PLANPL is a function of the ‘weight’ Argw of the various pro and con arguments raised about the proposal.
– The argument weight is a function of the argument’s plausibility Argpl and the weight of relative importance w of its deontic (ought-) premise.
– The Argument plausibility Argpl is a function of the plausibility of its premises.

Examples of aggregation functions for this process might be the following:
                                                   
1. a Argument plausibility:        Argpli = ∏ {Premplj} for all n premises j.

Or  

1.b   Argpli = Min{ Premplj}

2.    Argument weight:               Argwi = Argpli * wi with 0 ≤ wi and ∑ wi = 1
for the ought-premises of all m arguments

3. Proposal plausibility PLANPL = ∑ Argwi
                                               

Aggregation functions for Group judgment statistics: (Similar to the Quality group aggregations)

Mean Group average plausibility   GPLANPL = 1/k ∑ PLANPLp for all k participants p.                                                  

There are of course other statistical measures of the set of individual plausibility judgments that can be examined and discussed. Like the ‘Quality’ Aggregated measures, these ‘Group’ plausibility statistics should not be used as decision determinants but as guides, for instance as indicators of need for further discussion and explanation of judgment differences, or for revision of plan details to alleviate concerns leading to large judgment differences.
Evalmap11 Aggregation

Comments? Additions?

–o–

EVALUATION IN THE PLANNING DISCOURSE — JUDGMENT SCALES

An effort to clarify the role of deliberative evaluation in the planning and policy-making process.

Thorbjoern Mann

JUDGMENT SCALES

Differences of judgments are expressed  on  s c a l e s  — ‘yardsticks’ on which to locate different judgment ‘visually’.  There are many different kinds of judgment scales.

Which kind of scale should be chosen and agreed upon in a specific evaluation situation depends on the purpose of the task: whether a decision for action (e.g. acceptance or rejection) about a single plan is called for, or a selection among a number of competing proposals or options; or a general expression of  appraisal (e.g. goodness for some purpose, perhaps to guide design in a ‘program’ sense or to improve a proposed plan).  For some purposes ,  like the ‘acceptance/ rejection’ decision, the scale needed will  be ‘binary’ , — have only two ‘values’ or at most three:  ‘Yes, No, Undecided’.  For other purposes, such as comparison between alternative proposals, scales with more such ‘values’  are needed.

Traditionally (as in science), four types of scales are distinguished: 

‘N o m i n a l’,  ‘O r d i n a l’ ,   ‘D i f f e r e n c e’  (or  ‘I n t e r v a l’)  and ‘R a t i o‘  scales.

The Nominal scale (not really an ‘ordered’ scale since its values can be in any order) contains just ‘names’ of different kinds of objects or options must be distinguished. Architectural examples are ‘single-family detached home’, ‘duplex’, ‘row house’, ‘multistory  apartment’ etc. 

Ordinal scales , as the name suggests, put its items in a distinct order — for example ‘first place, second place, third place’ etc. in sports — but without any information about how much faster the winner of the race was than the runner-up. Just rank order.   

Difference or Interval scales offer more detailed ‘quantitative’ information. They specify      u n i t s — but these are arbitrary entities on a scale with an arbitrary location of ‘zero’. The temperature scales are examples: the Fahrenheit scale sets the‘zero’ degrees point (the temperature on the coldest day Mr. Fahrenheit had ever experienced and could not imagine being any lower) at what in the Celsius scale becomes  ~minus 17.8‘degrees. Celsius ‘zero’ is the temperature of water freezing. Fahrenheit’s 100 degrees is the approximate temperature of human blood, a mere 37.8 degrees Celsius, where 100 degrees is the temperature of water boiling, which in Fahrenheit becomes 212 degrees.

The Ratio scale has a ‘natural’  zero point (e.g. human height) but also arbitrary units — it can be measured in feet and inches or centimeters, etc.. 

The more ‘o b j e c t i v e’ , ‘factual’  and ‘scientific’  judgments we wish to express, the more we will tend to use difference scales a least and at best ratio scales that allow precise         m e a s u r e m e n t  rather than  s u b j e c t i v e  opinions, guesses and  estimates.

For judgment purposes, this means that scales for expressing ‘goodness’ and similar judgments, the scales will be at the difference scale level at best. Agreements are needed about whether judgment scales are to be ‘one-directional’  e.g.  from 0 upwards, or ‘bi-directional’, that is, showing both positive and negative ‘d i s c r e t e’  or ‘c o n t i n u o u s’  values. They can be ‘unbounded’  (going to infinity) or ‘bounded‘  with a distinct  upper ‘couldn’t be better‘ or lower (‘couldn’t be worse’) number. 

All these choices, and the m e a n i n g of the points on a chosen scale,  must be agreed upon for any particular  task, to avoid misunderstanding and conflicts.

Evalmap Judgment types,scales

The map above shows distinctions of judgment scales that perhaps gives a wrong impression that these are  mutually exclusive. In reality, they are combined even when judgments are ‘atomic’ (that is, one judgment applied to one object, and especially when two or more judgments are expressed in ‘compound‘ judgments where several evaluation judgments are applied to the object. This variety of possible judgment types and scales are better shown in a ‘Zwicky Box’ type  or ‘morphological analysis’ diagram, as in the following diagram where the type categories are shown as ‘parameters’ and the members of the categories are parameter values, and each particular scale is described by the profile linking several values:

Scan 1.jpeg

The profiles show two typical ‘measurement’ scales  and two common judgment scales: A: the temperature scale and  B  the movement speed in miles per hour; C,  the common academic grading scale, and D, the bidirectional ‘goodness’ judgment scale according to Musso and Rittel.

–o–