EVALUATION IN THE PLANNING DISCOURSE: SAMPLE EVALUATION PROCEDURES EXAMPLE 1: FORMAL ‘QUALITY‘ EVALUATION

Thorbjørn Mann,  January 2020

In the following segments, a few examples procedures for evaluation by groups will be discussed, to illustrate how the various parts of the evaluation process are selectively assembled into a complete process aiming at decision (or recommendation) for decision about a proposed plan or policy; to facilitate understanding of the way the different provisions and choices related to the evaluation task that are reviewed in this study can be assembled to practical procedures for specific situations. The examples are not intended to be universal recommendations for use in all situations. They all will — arguably — call for improvement as well as adaptation to the specific project and situation at hand.

A common evaluation situation is that of a panel of evaluators comparing a number of proposed alternative plan solutions to select or recommend the ‘best’ choice for adoption. Or — if there is only one proposal, — to determine if it is ‘good enough’ for implementation. It is usually carried out by a small group of people assumed to be knowledgeable of the specific discipline (for example, architecture) and reasonably representative of the interests of the project client (which may be the public). The rationale for such efforts, besides aiming for the ‘best’ decision, is the desire for ensuring that the decision will be based on good expert knowledge, but also for transparency and legitimacy and accountability of the process — to justify the decision. The outcome will usually be a recommendation to the actual client decision-makers rather than the actual adoption or implementation decision, based on the group’s assessment of the ‘goodness’ or ‘quality’ of the proposed plan, documented in some form. (It will be referred to as a ‘Formal Quality Evaluation’ procedure.)

There are of course many possible variations of procedures for this task. The sample procedure described in the following is based on the Musso-Rittel (1) procedure for the evaluation of the ‘goodness’ or quality of buildings.

The group will begin by agreeing on the procedure itself and its various provisions: the steps to be followed (for example, whether evaluation aspects and weighting should be worked out before or after presentation of the plan or plan alternatives), general vocabulary, judgment and weighting scales, aggregation functions both for individual overall judgments and group indices, and decision rules for determining its final recommendation.

Assuming that the group has adopted the sequence of first establishing the evaluation aspects and criteria against which the plan (or plans) will be judged, the first step will be a general discussion of the aspects and sub-aspects to be considered, resulting in the construction of the ‘aspect tree’ of aspects, sub-aspects, sub-sub-aspects etc. (ref. the section on aspects and aspect trees) and criteria (the ‘objective’ measures of performance; ref. the section on evaluation criteria). The resulting tree will be displayed and become the basis for scoring worksheets.

The second step will be the assignment of aspect weights (on a scale of zero to to 1 and such that at each level of the ‘tree’, the sum of weights at that level will be 1. Panel members will develop their own individual weighting. This phase can be further refined by applying ‘Delphi Method’ steps: establishing and displaying the mean / median and extreme weighting values and then asking the authors of extremely low or high weights to share and discuss their reasoning for these judgments, and giving all members the chance to revise their weights.

Once the weighted evaluation aspect trees have been established, the next step will be the presentation of the plan proposal or competing alternatives.

Each participant will assign a first ‘overall offhand’ quality score (on the agreed-upon scale, e.g. -3 to +3) to each plan alternative.

The group’s statistics of these scores are then established and displayed. This may help to decide whether any further discussion and detailed scoring of aspects will be needed: there may be a visible consensus for a clear ‘winner’. If there are disagreements, the group decides to go through with the detailed evaluation, and the initial scores are kept for later comparison with the final results. using common worksheets or spreadsheets of the aspect tree, for panel members to fill in their weighting and quality scores. This step may involve the drawing of ‘criterion functions’ (ref. the section of evaluation criteria and criterion functions) to explain how each participant’s quality judgments depend on (objective) criteria or performance measures. These diagrams may be discussed by the panel. They should be considered each panel member’s subjective basis of judgment (or representation of the interests of factions in the population of affected parties). However, some such functions may be the mandatory official regulations (such as building regulations). The temptation to urge adoption of common (group) functions (‘for simplicity and expression of ‘common purpose’) should be resisted to avoid possible bias towards the interests of some parties at the expense of others.

Each group member will then fill in the scores for all aspects and sub-aspects etc. The results will be compiled, and the statistics compared; extreme differences in the scoring will be discussed, and members given the chance to change their assessments. This step may be repeated as needed (e.g. until there are no further changes in the judgments).

The results are calculated and the group recommendation determined according to the agreed-upon decision criterion. The ‘deliberated’ individual overall scores are compared with the members’ initial ‘offhand’ scores. The results may cause the group to revise the aspects, weights, or criteria, (e.g. upon discovering that some critical aspect has been missed), or call for changes in the plan, before determining the final recommendation or decision (again, according to the initial procedural agreements).

The steps are summarized in the following ‘flow chart’.

Evalmap15 FormalevalEvaluation example 1: Steps of a ‘Group Formal Quality Evaluation’

Questions related to this version of a formal evaluation process may include the issue of potential manipulation of weight assignments by changing the steepness of the criterion junction.
Ostensibly, the described process aims at ‘giving due consideration’ to all legitimately ‘pertinent’ aspects, while eliminating or reducing the role of ‘hidden agenda’ factors. Questions may arise whether such ‘hidden’ concerns might be hidden behind other plausible but inordinately weighted aspects. A question that may arise from discussions and argumentation about controversial aspects of a plan and the examination of how such arguments should be assessed (ref. the section on a process for Evaluation of Planning Arguments) is the role of plausibility judgments about the premises of such arguments: esp. the probability of assumption claims that a plan will actually result in a desired or undesired outcome (an aspect). Should the ‘quality’ assessment’ process include a modification of quality scores based on plausibility / probability scores, or should this concern be explicitly included in the aspect list?

The process may of course seem ‘too complicated’, and if done by ‘experts’, invite critical questions whether the experts really can overcome their own interests, bias and preconceptions to adequately consider the interests of other, less‘expert’ groups. The procedure obviously assumes a general degree of cooperativeness in the panel, which sometimes may be unrealistic. Are more adequate provisions needed for dealing with incompatible attitudes and interests?

Other questions? Concerns? Missing considerations?

–o–

EVALUATION IN PLANNING DISCOURSE: DECISION CRITERIA

Thorbjørn Mann, January 2020

DECISION CRITERIA

The term ‘Decision criteria‘ needs explanation, so as to not be confused with the ‘evaluation criteria‘ used for the task of explaining one’s subjective ‘goodness (or ‘quality’ ) judgment about a plan or object by showing how it relates to an ‘objective’ criterion or performance measure (in section /post …) The criteria that actually determine or guide decision may be very different from those ‘goodness’ evaluation criteria — much as the expectation of the entire effort here is to get decisions that are more based on the merit of discourse contributions that clarify ‘goodness.

For discourse aiming at actual actions to achieve changes in the real world we inhabit: when discussion stops after all aspects etc. have been assessed and individual quality judgment scores have been aggregated into individual overall scores and into group statistics about the distribution of those individual scores, a decision or recommendation has to be made. The question then arises: what should guide that decision? The aim of “reaching decisions based on the merit of discourse contributions” can be understood in many different ways, of which actual ‘group statistics’ are only one — not only because there are several such statistical indicators. (It is advisable to not use the term ‘group judgment‘ for this: the group or set of participants may make a collective decision, but there may be several factions within the group for which any single statistic may not be representative; and the most familiar decision criterion in use is the ratio of votes for or against a plan proposal — which may have little if any relation to the group members’ judgments about the plan’s quality.)

The following is an attempt to survey the range of different group decision criteria of guiding indicators that are used in practice, in part to show why the planning discourse for projects that affect many different governance entities (and, finally, decisions of ‘global’ nature) are calling for different decision guides than the familiar tools such as majority voting.

A first distinction must be made between decision guides we may call ‘plan quality’– based, and those that are more concerned with discourse process.

Examples of plan quality-based indicators are of course the different indicators derived from the quality-based evaluation scores:
–  Averaged scores of all ‘Quality’ or ‘Plausibility’ (or combined) judgment scores of participating members;
–  ‘Weighted average’ scores (where the manner of weighting becoming another controversial issue: degree of ‘affectedness’ of different parties? number of people represented by participating group representatives? number of stock certificates held by stock holders?…)
–  As the extreme form of ‘weighting’ participant ’judgments: the ‘leader’s judgment;
–  The judgment of ‘worst-off’ participants or represented groups (the ‘Max-min’ criterion for a set of alternatives);
–  The Benefit-Cost Ratio;
–  The criterion of having met all ‘regulation rules’ — which usually are just ‘minimal’ expectation considerations (‘to get the permit’) or thresholds of performance, such as ‘coming in under the budget’;
–  Successive elimination of alternatives that show specific weaknesses for certain aspects, such that the remaining alternative will become the recommended decision. A related criterion applied during the plan development would be the successive reduction of the ‘solution space’ until there is only one remaining solution with ‘no alternative’ remaining.

Given the burdensome complexity of more systematic evaluation procedures, many process-based’ criteria are preferred in practice:

– Majority voting; in various forms, with the extreme being ‘consensus’ — i. e. 100% approval;
– ‘Consent’ — understood less as approval but acceptance with reservations either not voiced or not convincing a majority. (Sometimes only achieved / invoked in live meetings by determinations such as ‘time’s up’ or ‘no more objections to the one proposed motion).
– ‘Depth and breadth’ of the discussion (but without assessment of the validity or merit of the contributions making up the breath or depth);
– ‘All parties having been heard / given a chance to voice their concerns;
– Agreed-upon (or institutionally mandated) procedures and presentation requirements having been followed, legitimating approval, or violated, leading to rejection e.g. of competing alternatives; (‘Handed in late’ means ‘failed assignment…’)

Of course, combinations of these criteria are possible. Does the variety of possible resulting decision criteria emphasize the need for more explicitly and carefully agreements: establishing clear, agreed-upon procedural rules at the outset of the process? And for many projects, there is a need for better decision criteria. A main reason for this is that in many important projects affecting populations beyond traditional governance boundaries (e.g. countries) traditional decision determinants such as voting become inapplicable not only because votes may be based on inadequate information and understanding of the problem, but simply because the number of people having ‘voting right’ becomes indeterminate.

A few main issues or practical concerns can be seen that guide the selection of decision criteria: The principle of ‘coverage’ of ‘all aspects that should be given due consideration’ on the one hand, with the desire for simplicity, speed and clarity on the other. The first is aligned with either trust or demonstration (‘proof’ ) of fair coverage: ‘accountability’; the second with expediency. Given the complexity of ‘thorough’ coverage of ‘all’ aspects, explored in previous segments, it should be obvious that full adherence to this principle would call for a decision criterion based on the fully explained (i.e. completed evaluation worksheet results of all parties affected by the project in any way, properly aggregated into an overall statistic accepted by all.

This is clearly not only impossible to define but practically impossible to apply — and equally clearly situated at the opposite end of an ‘expediency’ (speed, simple to understand and apply) scale. These considerations also show why there is a plausible tendency to use ‘procedural compliance criteria‘ to lend the appearance of legitimacy to decisions: ‘All parties have been given the chance to speak up; now time’s up and some decision must be made (whether it meets all parties’ concerns or not.)

It seems to follow that some compromise or ‘approximation’ solution will have to be agreed upon for each case, as opposed to proceed without such agreements, relying on standard assumptions of ‘usual’ procedures, that later lead to procedural quarrels.

For example, one conceivable ‘approximation’ version might be to arrange for a thorough discussion with all affected parties being encouraged to voice and explain their concerns, but only the ‘leader’ or official responsible for actually making the decision be required to complete the detailed evaluation worksheets — and to publish it to ‘prove’ that all aspects have been entered, addressed (with criterion functions for explanation) and given acceptable weights, and that the resulting overall judgment, aggregated with acceptable aggregation functions, corresponds with the leaders’s actual decision. (One issue in this version will be how ‘side payments’ or ‘logrolling’ provisions to compensate parties that do not benefit fairly from the decision but whose votes in traditional voting procedures would be ‘bought’ to support the decision, should be represented in such ‘accounts’.

This topic may call for a separate, more detailed exploration of a ‘morphology‘ of possible decision criteria for such projects, and an examination of evaluation criteria for decision guides or modes to help participants in such projects agree on combinations suited to the specific project and circumstances.

Questions? Missing aspects? Wrong question? Ideas, suggestions?

Suggestions for ‘best answers’ given current state of understanding:
– Ensure better opportunity for all parties affected by problems or plans to contribute their ideas, concerns, and judgments: (Planning discourse platform);
– Focus on improved use of ‘quality/plausibility’ based decision guides, using ‘plausibility-weighted quality evaluation procedures explained and accepted in initial ‘procedural agreements’;
– Reducing the reliance on ‘process-based criteria.

Evalmap Decision criteria
Overview of decision criteria (indices to guide decisions)

–o–

Abbe Boulah’s Brexit Solution

– Say, Bog-Hubert: What’s going on out there on the Fog Island Tavern deck?
– Good question, Vodçek. I thought I’d seen everything, but this…
– That bad, eh? Who’s that guy there with Abbe Boulah?
– It’s a tourist from the EU. Don’t know he got lost out here. must be a friend of a friend of Otis. And he got into a bragging contest with Abbe Boulah about which part of the world is crazier; more polarized, has weirder politicians.
– Must be a toss-up, if you ask me.
– Right. So now Abbe Boulah is trying to teach the EU fellow — I couldn’t really figure out if he’s a still-EU Brit or from another part over there — how to fix the Brexit mess.
– Good grief. So what’s Abbé Boulah’s solution?
– It actually looked like a brilliant idea for a while, but…
– Now you’ve got me curious. Do I have to bribe you with a shot of your own moonshine production, the Tate’s Hell Special Reserve?
– Psst. Okay, talked me into it. He stunned the poor guy with the simplicity of the idea: Let the good, compassionate Europeans help the poor Brits out of the conundrum they voted themselves into. Instead of haggling for years about the details of a hard or soft or a medium-well done exit, he said: Why don’t you simply dissolve the EU?
– Urkhphfft: What??
– Hang on, don’t choke on that Zin of yours. It’s only for a day, Abbé Boulah says: All the other countries who want to stay in the union voting the next day to re-unite the union, just with a little change of the name. So: Brexit? Done. Stroke of the pen. Paid vacation for a day, for all the EU employees. Get it? A few crazy regulations not getting written, it’s actually a benefit for everybody…
– But…
– Yes. Worried about things like the trade treaties? He said, they are all reinstated as they were, for now; and the UK can either choose to re-join the new thing, or stay out and re-negotiate the individual agreements one by one, without any deadlines, while the existing arrangements stay as they are until replaced.
– Weird. But, I must say, it has a certain Abbeboulistic appeal, eh?
– Yes — but now they are arguing about what the new name should be. They agree that it should just be a minimal exchange or change in the current name, so it wouldn’t cost too much. Such as just adding or deleting something in a single letter of the name.
– Makes sense.
– You’d think so. But now they’re up in arms about whether it should be ‘Buropean’ or ‘Iropean’ or ‘Furopean’ or ‘Luropean’ or Nuropean’– all just messing a little with the ‘E’ — or European Onion’ or ‘RUnion’ or just adding a ‘2’ (starting a series of ‘generations’ like some computer system: ‘ EU2’, 3, 4…) or a star ‘ European* Union’ or ‘*European’ or ‘European Union* — ‘EU*’ And another star in the flag. Or just put the whole current name in quotation marks… It’s getting vicious, I tell you — you may have to go out there and throw them into the channel to cool them off…

EVALUATION IN THE PLANNING DISCOURSE: ASPECTS and ‘ASPECT TREES’

An effort to clarify the role of deliberative evaluation in the planning and policy-making process.  Thorbjørn Mann,  January 2020

The questions surrounding the task of assembling ‘all’ aspects calling for ‘due consideration’.

 

ASPECTS AND ASPECT TREE DISPLAYS

Once an evaluation effort begins to get serious about its professed aims: of deliberating, making overall judgments a transparent function of partial judgments, of ‘weighing all the pros and cons’, trying not to forget anything significant, to avoid missing things that could lead to ‘unexpected’ adverse consequences of a plan (but that could be anticipated with some care), the people involved will begin to create ‘lists’ of items that ‘should be given due consideration’ before making a decision. One label for these things is ‘aspects’.  Originally meaning just looking at the object (plan) to be decided upon, from different points of view.

A survey of different approaches to evaluation shows that there are many different such labels ‘on the market’ for these ‘things to be given due consideration’. And many of them — especially the many evaluation and problem-solving, systems change consultant brands that compete for commissions to help companies and institutions to cope with their issues — come with very different recommendations for the way this should be done. The question for the effort to develop a general public planning discourse support platform for dealing with projects and challenges that affect people in many governmental and commercial ‘jurisdictions’ — ultimately: ‘global’ challenges — then becomes: How can and should all these differences of the way people talk about these issues be accommodated in a common platform?

Whether a common ground for this can be found — or a way to accommodate all the different perspectives, if a common label can’t be agreed upon — depends upon a scrutiny of the different terms and their procedural implications. This is a significant task in itself, one for which I have not seen much in the way of inquiry and suggestions (other than the ‘brands’ recommendations for adopting ‘their’ terms and approach.) So raising this question might be the beginning of a sizable discussion in itself (or a survey of existing work I haven’t seen). Pending the outcome of such an investigation, many of the issues raised for discussion in this series of evaluation issues will continue to use the term ‘aspect’, with apologies to proponents of other perspectives.

This question of diversity of terminology is only one reason for needed discussion, however. One such reason has to do with the possibility of bias in the very selection of terms, depending on the underlying theory or method, or whether the perspective is focused on some ‘movement’ that by its very nature puts one main aspect at the center of attention (‘competitive strength and growth’; ‘sustainability’, ‘regeneration’; ‘climate change’; ‘globalization’ versus ‘local culture’ etc.) There are many efforts to classify or group aspects — starting with Vitruvius’ three main aspects ‘firmness, convenience and delight’ to the simple ‘cost, benefit, and risk’ grouping, or the recent efforts that encourage participants to explore aspects from different groups of affected or concerned parties, mixed in with concepts such as ‘principles’, best and worst expected outcomes, etc. shown in a ‘canvas’ poster for orientation. Are these efforts encouraging contribution of information from the public, or giving the impression of adequate coverage and inadvertently missing significant aspects? It seems that any classification scheme of aspects is likely to end up neglecting or marginalizing some concerns of affected parties.

Comparatively minor questions are about potential mistakes in applying the related tools: Listing preferred or familiar means of plan implementation as aspects representing goals or concerns, for example; listing the essentially same concern under different labels (and thus weighing it twice…). The issue of functional relationships between different aspects — a main concern of systems views of a problem situation — is one that is often not well represented in the evaluation work tools. A major potential controversy is, of course, the question of who is doing the evaluation, whose concerns are represented, what is the source of information a team will draw upon to assemble the aspect list?

It may be useful to look at the expectations for the vocabulary and its corresponding tools: Is the goal to ensure ‘scientific’ rigor, or to make it easy for lay participants to understand and to contribute to the discussion? To simplify things or to ensure comprehensive coverage? Which vocabulary facilitates further explanation (sub-aspects etc) and ultimately showing how valuation judgments relate to objective criteria — performance measures?

Finally: given the number of different ‘perspectives’ , how should the platform deal with the potential of biased ‘framing’ of discussions by the sequence in which comments are entered and displayed — or is this concern one that should be left to the participants in the process, while the platform itself should be as ‘neutral’ as possible — even with respect to potential bias or distortions?

The ‘aspect tree’ of some approaches refers to the hierarchical ‘tree’ structure emerging in a display of main aspects, each further explained by ‘sub-aspects’, sub-sub-aspects etc. The outermost ‘leaves’ of the aspect tree would be the‘criteria’ or objective performance variables, to which participants might carry their explanations of their judgment basis. (See the later section on criteria and criterion functions.) Is the possibility of doing that a factor in the insistence on the part of some people to ‘base decisions on facts’ — only — thereby eliminating ‘subjective’ judgments that can be explained only by listing more subjective aspects?

An important warning was made by Rittel in discussing ‘Wicked Problems’ long ago: The more different perspectives, explanations of a problem, potential solutions are entered into the discussion, the more aspects will appear claiming ‘due consideration’. The possible consequences of proposed solutions alone extend endlessly into the future. This makes it impossible for a single designer or planner, even a team of problem-solvers, to anticipate them all: the principle of assembling ‘all’ such aspects is practically impossible to meet. This is both a reminder to humbly abstain from claims to comprehensive coverage, and a justification of wide participation on logical (rather than the more common ideological-political) grounds: inviting all potentially affected parties to contribute to the discourse as the best way to get that needed information.

The need for more discussion of this subject, finally, should be shown by the presence of approaches or attitudes that deny the need for evaluation ‘methods’ altogether. This takes different forms, ranging from calls for ‘awareness’ or general adoption of a new ‘paradigm’ or approach — like ‘systems thinking’, holism, relying on ‘swarm’ guidance etc, to more specific approaches like Alexander’s Pattern Language which suggests that using valid patterns (solution elements, not evaluation aspects) to develop plans, will guarantee their validity and quality, thus making evaluation unnecessary.

One source of heuristic guidance to justify ‘stopping rules’ in the effort to assemble evaluation aspects may be seen in the weighting of relative importance given (as subjective judgments by participants) to the different aspects: if the assessment of a given aspect will not make a significant difference in the overall decision because that aspect is given too low a weight, is this a legitimate ‘excuse’ for not giving it a more thorough examination? (A later section will look at the weighting or preference ranking issue).

–o–

The Agenda of Many Important but Connected Issues

Are the agenda platforms of governance candidates consisting of single ‘highest priority’ issues realistic? Aren’t all the issues so tightly connected that none can be resolved without the others?
Attempting to understand, I see this chain:

1 Humanity is confronted by many unprecedented challenges to its survival.

2 There is little if any agreement about how these problems should be addressed.

3 There is a growing sense that current systems of governance are inadequate to address and convincingly resolve these problems: Calls are raised for ‘systemic change’ and ‘a new system’.

4 While there are many well-intentioned theories, initiatives, experiments already underway, to develop new ways of doing things in many domains,

5 There is little if any agreement about what such a ‘new system’ should look like, and very different ideas are promoted in ways that seem more polarizing than unified. We — humanity — do not yet know what works and what does not work: some major ‘systems’ that were tried over recent centuries have turned into dramatic failures.

6 There is much promotion of the many ‘new’ and old ideas, but not enough communication and sharing of experiences among the initiatives for discussion, evaluation and cooperative adoption. Meanwhile, the crises intensify.

So, before attempting another grand system based on inadequate understanding and acceptance, whose failure we cannot afford, it seems that a number of steps are needed:

7 Encouraging the many diverse (usually small scale, local) initiatives and experiments;

8 Supporting these efforts (financially and with information and other resources) regardless of their differences, on condition of their acceptance of some agreements:
a) to avoid getting in each other’s way;
b) to share information about their experiences: successes and failures, for systematic discussion and evaluation, into a common resource repository;
c) to cooperate in a common discourse aiming at necessary (even if just intermediate) decisions — the common ‘rules of the road’ to avoid conflict and facilitate mutual aid in emergencies and system failures.

9 To facilitate the aims in point 8, it will be necessary to develop
a) a common ‘global’ discourse platform accessible to all parties affected by an issue or problem
b) with a system of meaningful incentives for participation to access all information and concerns that must be given ‘due consideration’ in decisions’
c) with adequate translation support not only between different natural languages but also for disciplinary ‘jargon’ into conversational language;
d) new tools for assessment of the merit of information,
e) and new decision-making criteria and procedures based on the merit of contributions (since traditional voting will be inapplicable to issues affecting many parties in different ways across traditional boundaries that define voting rights).

10 It will also be necessary to develop
a) new means for ensuring that common agreements reached will actually be adhered to. Especially at the global level, these tools cannot be based on coercive ‘enforcement’ (which would require an entity endowed with greater power and force that any potential violator — a force which then would become vulnerable to the temptation of abuse of power that arguably is itself one of the global challenges). Instead, development should aim at
b) preventive sanctions triggered by the very attempt at violation, and
c) other innovative means of control of power.

I submit that all of these considerations will have to be pursued simultaneously: without them, any attempt to successfully resolve or mitigate the crises and problems (point 1) will be unsuccessful. The agenda of governance agencies and candidates for public office should include the entire set of interlinked aspects, not just isolated ‘priority’ items. Of course I realize that the practice of election campaign posters, 30-second ads or Twitter posts effectively prevents the communication of comprehensive platforms of this nature. What can we realistically hope for?

EVALUATION IN THE PLANNING DISCOURSE — THE OBJECTIVITY-SUBJECTIVITY CONTROVERSY

An effort to clarify the role of deliberative evaluation in the planning and policy-making process

Thorbjørn Mann

OBJECTIVE VERSUS SUBJECTIVE JUDGMENT :
MEASUREMENT VERSUS OPINION

There is a persistent controversy about objective versus subjective evaluation judgments: often expressed as the ‘principle’ of forming decisions based on ‘objective facts’ rather than (‘mere’) subjective opinions. It is also framed in terms of absolute (universal, timeless) values as opposed to relativistic subjective preferences. The desire or quest for absolute, timeless, and universal judgments upon which we ought to base our actions and decisions about plans is an understandable, legitimate and admirable human motivation: Our plans — plans that affect several people, communities — should be ‘good’ in a sense that transcends the temporary and often misguided ‘subjective’ opinions of imperfect individuals. But the opposite view — that individuals are entitled to hold such subjective values (as part of the quest to ‘make a difference’ in their lives) is also held to be even a constitutionally validated human right (the individual right of ‘pursuit of happiness’). The difficulty of how to tell the difference between the two principles and reconcile them in specific situations makes it a controversial issue.

The noble quest of seeking solid ‘facts’ upon which to base decisions leads to the identification of selected ‘objective’ features of plans or objects with their ‘goodness’. Is this a mistake, a fallacy? The selection of those features, from among the many physical features by which objects and plans can be described, is done by individuals. However wise, well-informed, well-intentioned and revered those authorities, does this makes even the selection of features (for which factual certainty can be established), their ‘subjective’ opinions?

Is the jump of deriving factual criteria from the opinions of ‘everybody’, even from the comparison of features of objects from different times as proof of their universal timeless validity, to be considered a mistake or wishful thinking? ‘Everybody’ in practice just means a subset of people, at specific times, in specific context conditions, in surveys mostly separated from situations of actual plans and emergencies and the need for making decisions in the face of many divergent individual judgments.

Regarding ‘timelessness’: the objective fact that the forms and styles of the same category of buildings, (for example, churches and temples, which are expressly intended to convey timeless, universal significance), are changing significantly over time, should be a warning against such attempts of declaring the identity of certain physical properties with the notions of goodness, beauty, awe, wholeness etc. that people feel when perceiving them.

What are the options for participants in a situation like the following?
Two people, A and B, are in a conversation about a plan something they have to decide upon.
They encounter a key claim in the discussion, involving whether an aspect of the plan, that will significantly guide their decision about what to do. The claim is about whether the proposed plan can be said to have certain features that constitute a quality (goodness, or beauty). They find that they both agree that the plan indeed will have those features. But they vehemently disagree about whether that also means that it will have the desired ‘goodness’ quality. An (impartial) observer may point out that in common understanding, the agreement about the plan having those observable features is called an ‘objective’ fact, and that their assessments about the plan’s goodness are ‘subjective‘ opinions. Person A insists that the objective fact of having those objective features implies, as a matter of equally objective fact, also having the desired quality; while B insist that the features do not yet offer the desired experience of perceiving goodness or beauty at all. What are the options they have for dealing with this obstacle?

a) A attempts to persuade B that the features in question that constitute quality are part of a valid theory, and that this should compel B to accept the plan regardless of the latter’s feeling about the matter. The effort might involve subtle or not-so-subtle application or invocation of authority, power, experience, or in the extreme, of labeling B a member of undesirable categories: (‘calling B names’): an ignorant follower of disreputable ‘isms’, insensitive, tasteless beings unable to perceive true quality, even conscious or subconscious shameful pursuits not letting him (B) admit the truth. Of course, in response, B can do this, too…

b) B can attempt to find other features that A will accept (as part of A’s theory, or an alternate theory) that will generate the feeling of experiencing quality, but let A continue to call this a matter of objective fact, and B calling it subjective opinion. This may also involve B’s invoking compelling factors that have nothing to do with the plan’s validity or quality, but e.g. with a person’s ‘right’ to their judgment, or past injustices committed by A’s group against B’s tribe, etc.

c) They can decide to drop the issue of objective versus subjective judgments as determinants of the decision, and try to modify the plan to contain features that will contain both the features A requires to satisfy the theory, and B’s subjective feelings. This usually requires making compromises, one or both parties backing off from the complete satisfaction of their wishes.

d) They could call in a ‘referee’ or authority, to make a decision they will accept for the sake of getting something — anything,– done, without either A or B having to actually change their mind.

e) They can of course abandon the plan altogether, because of their inability to reach common ground.

There does not seem to be an easy answer to this problem that would fit all situations. Seen from the point of view of planning argumentation, where there is an attempt to clearly distinguish between claims (premises of arguments) and their plausibility assessment: is the claim of objectivity of certain judgment an attempt to get everybody to assign high plausibility values to those claims because of their ‘objectivity’?

Stating such claims as planning arguments make it easier to see that theories claiming desirable quality-like features to be implied by must-have objective properties show the different potential sources of disagreements. In an argument like the following, A (from the above example) claims: The plan should have property (feature) f, because f will impart quality q to the plan, given conditions c, which are assumed to be present . The ‘complete’ argument separating the unspoken premises is:

D: The Plan should include feature f                (‘Conclusion’)
because
1)   FI (f –> q) (if f, then q)                             (Factual – instrumental premise)
and
2)   D(q)                                                           (Deontic premise)
and
3)   F(c)                                                            (Factual premise)

Adherent A, of a theory stating postulates like “quality q is generated by feature f, and that this is a matter of objective fact” — may be pointing to the first and third premises that arguably involve ‘factual’ matters. Participant B may disagree with the argument if B thinks (subjectively) that one or several of the premises are not plausible. B may even agree with all three premises — with an understanding that f is just one of several or many ways of creating plans with quality q. Thus, B will still disagree with the ‘conclusion’ because there are reasons — consequences — to refrain from f, and look for different means to achieve q in the plan. This is a different interpretation of the factual-instrumental premises: A seems to hold that premise 1 actually should be understood as “if and only if f, then q”. (The discussion of planning arguments thus should be amended to make this difference clear.) Does the theory involve a logical fallacy by jumping from the plausible observation that both premises 1 and 3 involve ‘objective’ matters, to the inference “iff f then q”? Such a theory gets itself into some trouble because of the implication of that claim: “if not-f then not q”? A proponent of Alexander’s theory (1) seems to have fallen to this fallacy by claiming that because the shape of an egg does not meet some criteria for something having beauty according to this theory, the egg shape cannot be considered to be beautiful. Which did not sit well even with other adherents to the basic underlying theory.

The more general question is: Must this issue be addressed in the ‘procedural agreements’ at the outset of a public planning project? And if so: how? What role does it play in the various evaluation activities throughout the planning process?

One somewhat facile answer might be: If the planning process includes adequate public participation — that is, all members of the public, or at least all affected parties, are taking part in it, including the decisions whether to adopt the plan for implementation — all participants would make their own judgments, and the question would just become the task of agreeing on the appropriate aggregation function (see the section on aggregation) for deriving an overall ‘group’ decision from all individual judgments. If this sounds too controversial, the current practice of majority voting (which is one such aggregation function, albeit a problematic one) should be kept in mind: it is accepted without much question as ‘the way it’s done’. Of course, it just conveniently sidesteps the controversy.

Looking at the issue more closely, things will become more complicated. For one, there are evaluation activities involved in all phases of the planning process, from raising the issue or problem, searching for and interpreting pertinent information, writing the specifications or ‘program’ for the designers to develop ‘solution’ proposals, to making a final decision. Even in the most ambitious participative planning processes, there will be very different parties making key decisions in these phases. So the question becomes: how will the decisions of ‘programming’ and designing (solution development) impact the outcome, if the decision-makers in these early phases are adherents of a theory that admits only certain ‘objective’ features as valid components of solution proposals, and ignores and rejects concerns stated as ‘subjective opinions’ by affected parties? So that the ‘solutions’ the ‘whole community’ is allowed to participate in accepting or rejecting are just not reflecting those ‘subjective’ concerns?

For the time being, one preliminary conclusion drawn here from these observations may be the following: Different expressions and judgments about whether decisions are based on timeless, universal and absolute value considerations or ‘subjective opinions‘ must be expected and accommodated in public planning, besides ‘objective’ factual information. Is it one of the tasks of designing platforms and procedures to do that, to find ways of reaching agreement about practical reconciliation of these opinions? Is one first important step towards that goal the design and development of better ways to communicate about our subjective judgments and about how they relate to physical, ‘objectively measurable’ features? This question is both a justification for the need for deliberative evaluation in collective planning and policy-making, and one of its key challenges.

These are of course only some first draft thoughts about the controversy that has generated much literature, but has not yet been brought to a practical resolution that can more convincingly guide the design of a participatory online public planning discourse platform. More discussion seems to be urgently needed.

Note 1):  Bin Jiang in a FB discussion about Christopher Alexander’s theory as expressed in his books: e.g. ‘A Pattern Language’ and ‘The Nature of Order’

–o–

EVALUATION IN THE PLANNING DISCOURSE — AGGREGATION

An effort  to clarify the role of evaluation in the planning process.

Thorbjørn Mann

THE AGGREGATION PROBLEM:

Getting Overall Judgments from Partial Judgments

The concept of ‘deliberation’ was explained, in part, as the process of ‘making overall judgments a function of partial judgments’. We may have gone through the process of trying to explain our overall judgment about something to others, or made the effort of ‘giving due consideration’ to all aspects of the situation, we arrived at a set of partial judgments. Now the question becomes: just how do we‘assemble’ (‘aggregate’) these partial judgments into the overall judgment that can guide us in making the decision, for example, to adopt or reject the proposed plan.

The discussion has already gone past the level of familiar practices such as merely counting the number of supporting and opposing ‘votes’ and even some well-intentioned approaches that begin to look at the number of explanations (arguments or support statements) in the ‘breadth‘ (number of different aspects brought up by each supporting or opposing party, and ‘depth‘ — the number of levels of further support for the premises and assumptions of the individual arguments.

The reason why these approaches are not satisfying is that neither of them even begin to consider the validity, truth and probability (or more generally: plausibility), weight or relevance of any of the aspects discussed, or whether the judgments about any such aspects or justifications even have been ‘duly considered’ and understood.

Obviously, it is the content merit, validity, the ‘weight’ of arguments etc. we try to bring to bear on the decision. Do we have better, more ‘systematic’ ways to do this than Ben Franklin’s suggestion? (He recommended to write up the pros and cons in two different columns on a sheet of paper, then look at pairs of pros and cons that carry approximately equal weight and cancel each other out, and cross those pairs out, until there are the remaining arguments left that do not have any opposing reasons in the opposite column: those are the ones that should tilt the decision towards approval or rejection.)

What we have, on the one hand, is the impressively quantitative ‘Benefit/Cost’ approach, that works by assigning monetary value to all the b e n e f i t s of a proposed plan (the ‘pro’ arguments), and compare those with the monetary value of the ‘c o s t’ of implementing it. It has run into considerable criticism, mainly for the reasons that the ‘moral’ reluctance of having to assign monetary value to people’s health, happiness, lives; the fact that the approach usually has to be done by ‘experts’, not by citizens or affected groups, and from the overall point of view of some overall ‘common good’ perspective that is the usually ‘biased’ perspective of the government currently in power, that may not be shared by all segments of society, because it tends to hide the issue of the distribution of benefits and costs: inequality.

On the other hand, we have the approaches that separate the ‘description’ of the evaluated plan or object to be evaluated from the perceived ‘goodness’ (‘quality’) judgments about the plan and its expected outcome, from the‘validity’ (plausibility, probability) of the statements (arguments) conveying the claims about those outcomes. And, so far, the assumption that ‘everybody‘ including all ‘affected’ parties can make such judgments and ‘test’ their merit in a participatory discourse. What is still missing are the possible ways in which they can be ‘aggregated’ into overall judgments and guiding measures of merit for the decision– first, for individuals, and then for any groups that will have to come to a commonly supported decision. This is the topic to be discussed under the heading of ‘aggregation’ and ‘aggregation functions’ — the rules for getting ‘overall’ judgments from partial judgments and ‘criterion function’ results.

It turns out that there are different possible rules about this, assumptions that must be agreed upon in each evaluation situation, because they result in different decisions: The following are some considerations about assumptions or expectations for ‘aggregation functions (suggested in H. Rittel’s UC Berkeley lectures on evaluation, and listed in H. Dehlingers article  “Deontische Fragen: Urteilsbildung und Bewertungssysteme”  in “DIe methodische Bewertung: Ein Instrument des Architekten”  Festschrift zum 65. Geburtstag von Prof. Arne Musso, TU Berlin, 1993):

Possible expectation considerations for aggregation functions:

1 Do we wish to arrive at a single overall judgment (of quality / goodness or plausibility etc.) — one that can help us distinguish between e.g. plan alternatives of greater or lesser goodness?

2 Should the judgments be expressed on a commonly agreed-upon judgment scale whose end points and interim values ‘mean’ the same for all participants in the exercise? For example, should we agree that the end points of a ‘goodness’ judgment scale should mean ‘couldn’t possibly be better’ and ‘couldn’t possibly be worse’, respectively; and that there should be a ‘midpoint ‘ meaning’ neither good nor bad; indifferent; or ‘don’t know, can’t make a judgment’? (Most judgments scales in practice are expressed on a ‘zero to some ‘one-directed’ scale such as zero to some number.)

3 Should the judgment scale be the same at all levels of the aspect tree, to maintain consistency of the meaning of scores at all levels? So any equations for the aggregation functions should be designed to produce the respective overall judgment at the next higher level to be a score on the same scale.

4 Should the aggregation function ensure that if a partial score is improved, the resulting overall score should also be higher or the same, but not lower (‘worse’) than the unimproved score? By the same rule, the overall score should not be better than the previous score, if one of the partial judgments becomes lower than before.
This expectation means that in a criterion function, the line showing the judgments cores should be steadily declining and decreasing, but not have sudden spikes or valleys.

5 Should the overall score be the highest one (say, +3 = ’couldn’t be better’, on a +3/-3 scale) only if all partial scores are +3?

6 Should the overall score be a result of ‘due consideration’ of all the partial scores?

7a Should the overall score be ‘couldn’t be worse’ (e.g. -3 on the +3/-3 scale) if all partial scores are -3?
Or
7b Should the overall score become -3 if one of the partial scores becomes -3 and thus unacceptable?

Different functions — equations of ‘summing up partial judgments — will be needed for this. There will be situations or tasks in which aggregation functions meeting expectation 7b may be needed. There is no one aggregation function meeting all these expectations. Thus, the choice of aggregation functions must be discussed and agreed upon in the process.

Examples:

‘Formal’ Evaluation process for Plan ‘Quality’

Individual Assessment

The aggregation functions that can be considered for individual ‘quality’ evaluation (deliberating goodness judgments, aspect trees, and criteria i what may be called ‘formal evaluation procedures’) include the following:

Type I:    ‘Weigthed average’ function:    Q = ∑ (qi * wi)
                                                                       
where Q is the overall deliberated ‘quality’ or ‘goodness’ score; qi is the partial score of aspect or sub-aspect i, n is the number of aspects at that level; wi is the weight of relative importance of aspect i, on a scale of 0 ≤ wi ≤ 1 and such that ∑wi = 1. This is needed to ensure that Q will be on the same scale (and the associated meaning of the resulting judgment score the same) as q.

This function does not meet expectation 7b; it allows ‘poor scores’ on some aspects to be compensated for by good scores on other aspects.

Type II a:  (“the chain as strong as its weakest link” function):      Q = Min (qi)

Type IIb:        Q = ∏ ((qi + u) ^wi ) – u
                       
Here, Q is the overall score, qi the partial score i of n aspects, and u is the extreme value of the judgment score (e.g. 3 in the above examples). This function, (multiplying all the components of (qi + u) with the exponent of their weights wi, and then subtracting u from the result to get the overall score back to the +3/-3 scale) acts much like the type I function as long as all the scores are in the positive range, but pulls the overall score the closer to -u , the lower one of the scores comes to – u, the ‘unacceptable’ performance or quality. (Example: if the structural stability of a building does not stand up against expected loads, it does not matter how otherwise functionally adequate or aesthetically pleasing it is: its evaluation should express that it should not be built.)

Group assessments:

Individual scores from these functions can be applied to get statistical ‘Group’ indicators GQ : for example:

GQ = 1/m ∑ Qj
This is the average or mean of all individual Qj scores for all m participants j.

GQ = Qj
This takes the judgment of one group member as the group score.

GQ = Min (Qj)
The group score is equal to the score of the member with the lowest score in the group; both these functions effectively make one participant the ‘dictator’ of the group…

Different functions should be explored that, for example, would consider the distribution of the improvement of scores for a plan, compared with the existing or expected situation the plan is expected to remedy. For example, the form of aggregation function type IIb could also be used for group judgment aggregation.

The use of any of these aggregated, (‘deliberated’ ) judgment scores as a ‘direct’ guiding measure of performance determining the decision c a n n o t be recommended: they should be considered decision guides, not determinants. For one, the expectation of ‘due consideration of all aspects‘ would require complete knowledge of all consequences of a plan and causes of the problem it aims to fix — an expectation that must be considered unrealistic in many situations but especially in ‘wicked’ problems or ‘messes’. There, decision-makers must be willing to assume responsibility for the possibility of being wrong — a condition impossible to deliberate, by definition, when caused by ignorance of what we might be wrong about.

Aggregation functions for developing overall ‘Plan plausibility’ judgment
from the evaluation of ‘pro’ and ‘con’ arguments.

Plausibility judgments

It is necessary to reach agreements about the use of terms for the merit of judgments about plans as derived from argument evaluation, because the evaluation task for planning arguments is somewhat different from the assessment usually applied to arguments. Traditionally, the purpose of argument analysis and evaluation is seen as that of verifying whether a claim — the ‘conclusion’ of an argument — is true or false, and this is seen as depending on the truth of the premises of the argument and the ‘validity’ of the form or pattern or ‘inference rule’ of the argument. These criteria do not apply to planning arguments, that can generally be represented as follows: (Stating the ‘conclusion’ — the claim about a proposed plan A first:)

Plan A ought to be implemented
because
Plan A will result in outcome B, (given or assuming conditions C);
and
Outcome B ought to be aimed for / pursued;
and
Conditions C are given (or will be when the plan is implemented)

Like many arguments studied by traditional logic and rhetoric, not all argument premises are stated explicitly in discussions; some being assumed as ‘taken for granted’ by the audience: ‘Enthymemes’. But to evaluate these arguments, all premises must be stated and considered explicitly.

This argument pattern — and its variations due to different constellations of assertion or negation of different premises — does not conform to the validity conditions for ‘valid’ arguments in the formal logic sense: it is, at best inconclusive. Its premises cannot be established as ‘true or false‘ — the proposed plan is discussed precisely because it as well as the outcomes B aren’t there (‘true’) yet. This also means that some of the premises — the factual-instrumental claim ‘If A is implemented, then B will happen, given C) and the claim ‘C will be present’ are estimates or predictions qualified as probabilities. And ‘B ought to be pursued’ as well as the conclusions ‘A ought to be implemented) are neither adequately called ‘probable’ nor true or false: the term ‘plausible’ seems more fitting at least for some participants, but not necessarily for all. Indeed: ‘plausible’ judgments may be applied to all the claims, with the different interpretations easily understood to each kind. This is is a matter of degrees, not a binary yes/no quality. And unlike the assessment of factual and even probability claims in common logic argumentation studies, the ‘conclusion’ (decision to implement) is not determined by a single ‘clinching’ argument: it rests on several or many ‘pros and cons’ that must be weighed against each other. That is the evaluation task for planning argumentation, that will lead to different ‘aggregation’ tools.

The logical structure of planning argumentation can be stated in simplified for as follows:

– An individual’s overall plausibility judgment of plausibility PLANPL is a function of the ‘weight’ Argw of the various pro and con arguments raised about the proposal.
– The argument weight is a function of the argument’s plausibility Argpl and the weight of relative importance w of its deontic (ought-) premise.
– The Argument plausibility Argpl is a function of the plausibility of its premises.

Examples of aggregation functions for this process might be the following:
                                                   
1. a Argument plausibility:        Argpli = ∏ {Premplj} for all n premises j.

Or  

1.b   Argpli = Min{ Premplj}

2.    Argument weight:               Argwi = Argpli * wi with 0 ≤ wi and ∑ wi = 1
for the ought-premises of all m arguments

3. Proposal plausibility PLANPL = ∑ Argwi
                                               

Aggregation functions for Group judgment statistics: (Similar to the Quality group aggregations)

Mean Group average plausibility   GPLANPL = 1/k ∑ PLANPLp for all k participants p.                                                  

There are of course other statistical measures of the set of individual plausibility judgments that can be examined and discussed. Like the ‘Quality’ Aggregated measures, these ‘Group’ plausibility statistics should not be used as decision determinants but as guides, for instance as indicators of need for further discussion and explanation of judgment differences, or for revision of plan details to alleviate concerns leading to large judgment differences.
Evalmap11 Aggregation

Comments? Additions?

–o–