Archive Page 2

EVALUATION IN THE PLANNING DISCOURSE — TARGET AUDIENCE

An effort to clarify the role of deliberative evaluation in the planning and policy-making process.  Thorbjørn Mann,  February 2020

TARGET AUDIENCE


Audience and Distribution: Overview

The target audience for the results of the effort to evaluate the role of evaluation in the planning discourse is admittedly immodestly diverse. While it may be of interest to many participants in the social media groups currently discussing related issues who will be consultants, offering services and tools planning, problem-solving‘ and ‘change management’ to corporate and institutional clients, the focus here will be on public planning, at all levels from small, local communities to national and international and ultimately global challenges. Thus, the issues concern any officials as well as the public involved in planning. But it is especially at the global level of challenges and crises that transcend the boundaries of traditional institutions, that traditional decision-making modes and habits break down or become inapplicable, generating calls for new ideas, approaches and tools. Increased public participation is a common demand.

The planning discourse at all levels will have to include not just traditional planning experts, decision-makers in all institutions faced with the need for collective action, but also the public. New emerging IT tools and procedures must also be applied to the evaluation facet of planning engaging all potentially affected parties, and leadership as well as the public will have to be involved and become familiar and competent with their use. This will call for appropriate means for generating that familiarity: information, education.

Obviously, at present, whatever discussion and presentation tools are chosen for this exploration of evaluation in public planning discourse, they will not be adequate for informing and achieving the aim of developing definitive answers, not even carrying out an effective discussion. It must be seen as just a first step in a more comprehensive strategy. To the extent that meaningful results emerge from this discussion, the issue of how to bring the ideas to a wider audience for general adoption will become part of the agenda. It should include education at all levels, down to general education for all citizens, not only higher levels. Thus, the hope is to reach planners and decision-makers for general education.

The audience that can be reached via such vehicles as this blog, selected social media, and perhaps a book, will be people who have given these issues some thoughts already, that is: ‘experts‘. So any discussion it will incite, will likely involve discipline ‘jargon’ of several kinds. But in view of a desired larger audience, the language should remain as close to conversational as possible and avoid ‘jargon’ too unfamiliar to non-experts. Many valuable research results and ideas are expressed in academic, ‘scientific’, or technical terms that are likely to exclude parties from the discussion that should be invited and included.

Given the wide range of people and institutions involved with planning, the question of ‘target audience’ may be inadequate or incomplete: it should be expanded to look at the best ways for distributing these suggestions. Besides traditional forms of distribution such as books, textbooks, manuals, new forms or media of familiarizing potential users may have to be developed; for example, online games simulating planning projects using new ideas and methods. This aspect of the project is especially in need of ideas and comments.

–o–

EVALUATION IN THE PLANNING DISCOURSE — SYSTEMS THINKING, MODELING AND EVALUATION IN PLANNING

An effort to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann , February 2020. (DRAFT)

SYSTEMS THINKING / MODELING AND EVALUATION IN PLANNING

 

Evaluation and Systems in Planning  — Overview

The contribution of systems perspective and tools to planning.

In just about any discourse about improving approaches to planning and policy-making, there will be claims containing reference to ‘systems’: ‘systems thinking’, ‘systems modeling and simulation’, the need to understand ‘the whole system’, the counterintuitive behavior of systems. Systems thinking as a whole mental framework is described as ‘humanity’s currently best tool for dealing with its problems and challenges. There are by now so many variations, sub-disciplines, approaches and techniques, even definitions of systems and systems approaches on the academic as well as the consulting market, that even a cursory description of this field would become a book-length project.

The focus here is the much narrower issue of the relationship between this ‘systems perspective’ and various evaluation tasks in the planning discourse. This sketch will necessarily be quite general, not doing adequate justice to many specific ‘brands’ of systems theory and practice. However, looking at the subject from the planning / evaluation perspective will identify some significant issues that call for more discussion.

Evaluation judgments at many stages of systems projects and planning

A survey of many ‘systems’ contributions reveals that ‘evaluation’ judgments are made at many stages of projects claiming to take a systems view – like the finding that evaluation takes place at the various stages of planning projects whether explicitly guided by systems views or not. Those judgments are often not even acknowledged as ‘evaluation’, and done by very different patterns of evaluation (as described in the sections exploring the variety of evaluation judgment types and procedures.)

The similar aims of systems thinking and evaluation in planning

Systems practitioners feel that their work contributes well (or ‘better’ than other approaches) to the general aims of planning: such as
– to understand the ‘problem’ that initiates planning efforts;
– to understand the ‘system’ affected by the problem, as well as
– the larger ‘context’ or ‘environment’ system of the project;
– to understand the relationships between the components and agents, especially the ‘loops’ of such relationships that generates the often counterintuitive and complex systems behavior;
– to understand and predict the effects (costs, benefits, risks) and performance of proposed interventions in those systems (‘solution’) over time; both ‘desired’ outcomes and potentially ‘undesirable’ or even unexpected side-and after-effects;
– to help planners develop ‘good’ plan proposals,
– and to reach recommendations and/or decisions about plan proposals that are based on due consideration of all concerns for parties affected by the problem and proposed solutions, and of the merit of ‘all’ the information, contributions, insights and understanding brought into the process.
– To the extent that those decisions and their rationale must be communicated to the community for acceptance, these investigations and judgment processes should be represented in transparent, accountable form.

Judgment in early versus late stages of the process

Looking at these aims, it seems that ‘systems-guided’ projects tend to focus on the ‘early’ information (data) -gathering and ‘understanding’ aspects of planning – more than on the decision-making activities. These ‘early’ activities do involve judgment of many kinds, aiming at understanding ‘reality’ based on the gathering and analysis of facts and data. The validity of these judgments is drawn from standards of what may loosely be called ‘scientific method’ – proper observation, measurement, statistical analysis. There is no doubt that systems modeling, looking at the components of the ‘whole’ system, and the relationships between them, and the development of simulation techniques have greatly improved the degree of understanding both of the problems and the context that generates them, as well as the prediction of proposed effects (performance) of interventions: of ‘solutions’. Less attention seems to be given to the evaluation processes leading up to decisions in the later stages. Some justifications, guiding attitudes, can be distinguished to explain this:

Solution quality versus procedure based legitimatization on of decisions

One attitude, building on the ‘scientific method’ tools applied in the data-gathering and model-building phases, aims at finding ‘optimal’ (ideally, or at least ‘satisficing’) solutions described by performance measures from the models. Sophisticated computer-assisted models and simulations are used to do this; the performance measures (that must be quantifiable, to be calculated) derived from ‘client’ goal statements or from surveys of affected populations, interpreted by the model-building consultants: experts. One the one hand, their expert status is then used to assert validity of results. But on the other hand, increasingly criticized for the lack of transparency to the lay populations affected by problems and plans: questioning the experts’ legitimacy to make judgments ‘on behalf of’ affected parties. If there are differences of opinions, conflicts about model assumptions, these are ‘settled’ – must be settled – by the model builders in order for the programs to yield consistent results.

This practice (that Rittel and other critics called ‘first generation systems approach’) was seen as a superior alternative to traditional ways of generating planning decisions: the discussions in assemblies of people or their representatives, characterized by raising questions and debating the ‘pros and cons’ of proposed solutions – but then making decisions by majority voting or accepting the decisions of designated or self-designated leaders. Both of these decision modes obviously are not meeting all of the postulated expectations in the list above: voting implies dominance of interests of the ‘majority’ and potential disregard on the concerns of the minority; leader’s decisions could lack transparency (much like expert advice) leading to public distrust of the leader’s claim of having given due consideration to ‘all’ concerns affecting people.

There were then some efforts to develop procedures (e.g. formal evaluation procedures) or tools such as the widely used but also widely criticized ‘Benefit-Cost’ analysis tried to extend the ‘calculation based’ development of valid performance measures into the stage of criteria based on the assessment of solution quality to guide decisions. These were not equally widely adopted, for various reasons such as the complicated and burdensome procedures, again requiring experts to facilitate the process but arguably making public participation more difficult. A different path is the tendency to make basic ‘quality’ considerations ‘mandatory’ as regulations and laws, or ‘best practice’ standard. Apart from tending to set ‘minimum’ quality levels as requirement e.g. for building permits, this represents a movement to combine or entirely replace quality-based planning decision-making with decisions that draw their legitimacy from having been generated and following procedures.

This trend is visible both in approaches that specify procedures to generate solutions by using ‘valid’ solution components or features postulated by a theory (or laws): having followed those steps then validates the solution generated removes the necessity to carry out any complicated evaluation procedure. An example of this is Alexander’s ‘Pattern Language’ – though the ‘systems’ aspect is not as prevalent in that approach. Interestingly, that same stratagem is visible in movements that focus on processes aimed at mindsets of groups participating in special events, ‘increasing awareness’ of the nature and complexity of the ‘whole system’ but then rely on solutions ‘emerging’ from the resulting greater awareness and understanding that aim at consensus acceptance in the group for the results generated, that then do not need further examination by more systematic, quantity-focused deliberation procedures. The invoked ‘whole system’ consideration, together with a claimed scientific understanding of the true reality of the situation calling for planning intervention is a part of inducing that acceptance and legitimacy. A telltale feature of these approaches is that debate, argument, and the reasoning scrutiny of supporting evidence involving opposing opinions tends to be avoided or ‘screened out’ in the procedures generating collective ‘swarm’ consensus.

The controversy surrounding the role of ‘subjective’, feeling-based, intuitive judgments versus ‘objective’ measurable, scientific facts (not just opinions) as the proper basis for planning decisions also affects the role of systems thinking contributions to the planning process.

None of the ‘systems’ issues related to evaluation in the planning process can be considered ‘settled’ and needing no further discussion. The very basic ‘systems’ diagrams and models of planning may need to be revised and expanded to address the role and significance of evaluation, as well as argumentation, the assessment of the merit of arguments and other contributions to the discourse, and the development of better decision modes for collective planning decision-making.

–o–

EVALUATION IN THE PLANNING DISCOURSE: PROCEDURE EXAMPLE 2: EVALUATION OF PLANNING ARGUMENTS


An effort to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann, January 2020. (Draft)

PROCEDURE EXAMPLE 2:
EVALUATION OF PLANNING ARGUMENTS (PROS & CONS)

Argument evaluation in the planning discourse

Planning, like design, can be seen as an argumentative process (Rittel): Ideas and proposals are generated, questions are raised about them. The typical planning issues — especially the ‘deontic’ (ought-) questions about what the plan ought to be and how it can be achieved — generate not only answers but arguments — the proverbial ‘pros and cons’ . The information needed to make meaningful decisions — based on ‘due consideration’ of all concerns by all parties affected by the problem the plan is aiming to remedy, as well as by any solution proposals, is often coming mainly via those pros and cons. Taking this view seriously, it becomes necessary to address the question of how those arguments should be evaluated or‘weighed’ . After all, those arguments are supporting contradictory conclusions (claims), so just ‘considering. is not quite enough.

Argumentation as a cooperative rather than adversarial interaction

The very concept of the‘argumentative view of planning is somewhat controversial because many people misunderstand ‘argument’ itself as a nasty adversarial, combative, uncooperative phenomenon, a ‘quarrel’ . (I have suggested the label ‘quarrgument’ for this). But ‘argument’ is originally understood as a set of claims (premises) that together support another claim, the ‘conclusion. For planning, arguments are items of reasoning that explore the ‘pros and cons about plans; and an important underlying assumption is that we ‘argue’ — exchange arguments with others because we believe that the other will accept or consider the position about the plan we are talking about because the other already believes or accepts the premises we offer, — or will do so once we offer the additional support we have for them. It is unfortunate that even recent research on computer-assisted argumentation seems to be stuck in the ‘adversarial’ view of arguments, seeing arguments as ‘attacks’ on opposing positions rather than a cooperative search for a good planning response to problems or visions for a better future.

‘Planning arguments’

There is another critical difference between the arguments discussed in traditional logic textbooks and and the kinds I call ‘planning arguments: The traditional argumentation concern was to establish the truth or falsity of claims about the world, and that the discussion — the assessment of arguments — will ‘settle’ that question in favor of one or the other. This does not apply to planning arguments: The planning decision does not rest on single ‘clinching’ arguments but on the assessment of the entire set of pros and cons. There are always real expected benefits and real expected costs, and as the proverbial saying has it, they must be ‘weighed’ against one another to lead to a decision. There has not been much concern about how that ‘weighing’ can or should be done, and how that process might lead to a reasoned judgment about whether to accept or reject a proposed plan. I have tried to develop a way to do this — a way to explain what our judgments are based on — beginning with an examination of the structure of ‘planning arguments.

The structure of planning arguments and their different types of premises

I suggest that planning arguments can be represented in a following general ‘standard planning argument’ form, the simplest version being the following ‘pro’ argument pattern:

Proposal ‘ought’ claim (‘conclusion’):  Proposal PLAN A ought to be adopted
because
1. Factual-instrumental premise:         Implementing PLAN A will lead to outcome B
                                                                     given conditions C
and
2. Deontic premise:                                  Outcome B ought to be pursued;
and
3. Factual premise:                                  Conditions C are (or will be) given.

This form is not conclusively ‘valid’ in the formal logic sense, according to which it is considered ‘inconclusive’ and ‘defeasible’. There are usually many such pros and cons supporting or questioning a proposal: no single argument (other that evidence pointing out flaws of logical inconsistency or lacking feasibility, leading to rejection) will be sufficient to make a decision. Any evaluation of planning arguments therefore must be embedded in a ‘multi-criteria’ analysis and aggregation of judgments into the overall decision.

It will become evident that all the judgments people make will be personal ‘subjective’ judgments, not only about the deontic (ought) premise but even about the validity and salience of the ‘factual’ premises: they are all about estimated about the future — not yet validated by observation and measurement.

The judgment types of planning argument premises:
‘plausibility’ and weight of importance

There are two kinds of judgments that will be needed. The first is an assessment of the ‘plausibility’ of each claim. The term ‘plausibility’ here includes the familiar‘truth’ (or degree of certainty or probability about the truth of a claim, and the advisability, acceptability, desirability of the deontic claim. It can be expressed as a judgment on a scale e.g. of -1 to +1, with ‘-1’ meaning complete implausibility to +1 expressing ‘total plausibility’, virtual certainty, and the center point of zero meaning ‘don’t know, can’t judge’ . The second one is a judgment about the ‘weight’ of relative importance‘ of the ‘ought’ aspect. It can be expressed e.g. by a score between zero meaning (totally unimportant) and +1 meaning ‘totally important’, overriding all other aspects; the sum of all the weights of deontic premises must be equal to +1.

Argument plausibility

The first step would be the assessment of plausibility of the entire single argument, which would be a function of all three premise plausibility scores to result in an ‘Argument plausibility’ score.

For example, an argument i with pl(1) =0.5, pl(2) = 0.8, and pl(3) = 0.9 might get an argument plausibility :   Argpl (i) of 0.5 x 0.8 x 0.9 = 0.36.

Argument weight of relative importance

The second step would be to assess the ‘argument weight’ of each argument, which can be done by multiplying the weight of relative importance of its deontic premise (premise 2 in the pattern above) with the argument plausibility:    Argw(i) = Argpl(i) x w(i).
That weight will again be a value between zero (meaning ‘totally unimportant’) and +1 (meaning ‘all-important’ i.e. overriding all other considerations). This should be the result of the establishment of a ‘tree’ of deontic concerns (similar to the ‘aspects’ of the ‘Formal evaluation’ procedure in procedure example 1) that gives each deontic claim its proper place as a main aspect, sub-aspect, sub-sub-aspect or ‘criterion’ in the aspect tree, and assigning weights between 0 and 1 such that these add up to 1, at each level.

A deontic claim located at the second level of the aspect tree, having been assigned a weight of .8 at that level, being a sub-aspect to an aspect at the first level with a weight of +.4 at that level, would have a premise weight of w = 0.8 x 0.4 = 0.32. The argument weight with a plausibility of 0.36 would be  Argw(i) = 0.36 x 0.32 = 0.1152 (rounded up as 0.12).

Plan plausibility

All the argument weights could the be aggregated to the overall ‘plan plausibility’ score, for example by adding up all argument weights:
Planpl = ∑ Argw(i) for all argument weights i (of an individual participant)

Of course, there are other possible aggregation forms. (See the sections on ‘Aggregation’ and ‘Decision Criteria).  Which one of those should be used in any specific case must be specified — agreed upon — in the ‘procedural agreements’ governing each planning project.

It should be noted that in a worksheet simply listing all arguments with their premises for plausibility and weigh assignments, there is no need for identifying  arguments as ‘pro’ and ‘con’, as intended by their respective authors. Any argument given a negative premise plausibility by a participant will automatically end up getting a negative argument weight and thus becoming a ‘con’ argument for that participant — even if the argument was intended by its author as a ‘pro’ argument. This makes it obvious that all such assessments are individual, subjective judgments, even if the factual and factual-instrumental premises of arguments are considered ‘objective-fact’ matters.

The process of evaluation of planning arguments within the overall discourse

The diagram below shows the argument assessment process as it will be embedded in an overall discourse. Its central feature is the ‘Next Step?’ decision, invoked after each major activity. It lets the participants in the effort decide — according to rules specified in those procedural agreements — how deeply into the deliberation process they wish to proceed: they could decide to go ahead with a decision after the first set of overall offhand judgments, skipping the detailed premise analysis and evaluation if they feel sufficiently certain about the plan.

Process of argument assessment within the overall discourse

The use of overall plan plausibility scores:
Group statistics of the set of individual plan plausibility scores.

It may be tempting to use the overall plan plausibility scores directly as decision guides or determinants.  For example, to determine a statistic such as the average of all individual scores Planpl(j) for the participants j in the assessment group, as an overall ‘group plausibility score‘ GPlanpl,  e.g.   GPlanpl = 1/n ∑ Planpl(j) for all n members of the panel.

And in evaluating a set of competing plan alternatives: to select the proposal with the highest ‘group plausibility’ score.
Such temptations should be resisted, for a number of reasons, such as: whether a discussion has succeeded in bringing in all pertinent items that should be given ‘due consideration’; the concern that planning arguments tend to be of ‘qualitative’ nature and often don’t easily address quantitative measures of performance; questions regarding principles, the time frame of expected plan effects and consequences; whether and how issues of ‘quality’ of a plan are adequately addressed in the form of arguments; and the question of the appropriate ‘social aggregation’ criterion to be applied to the problem and plan in question: many open questions:

Open questions

Likely incompleteness of the discussion
It is argued that participation of all affected parties and a live discussion will be more likely to bring our the concerns people are actually worried about, than e.g. reliance on general textbook knowledge by panels or surveys made up by experts who ‘don’t live there’. But even the assumption that the discussion guarantees complete coverage is unwarranted. For example, is somebody likely to consider raising an issue about a plan feature that they know will affect another party negatively (when they expect the plan to be good for the own faction) — if the other party isn’t aware enough about this effect, and does not raise it? Likewise; some things may be expected to be so much matters ‘of course’ that nobody considers it necessary to mention it. So unless the overall process includes several different means of getting such information — systems modeling, simulation, extensive scrutiny of other cases etc. — the argumentative discussion alone can’t be assumed to be sufficient to bring up all needed information.

Quantitative aspects in arguments.
The typical planning argument will usually be framed in more ‘qualitative’ terms than quantitative measures. For example: in an argument that “The plan will be more sustainable’ than the current situation” this matters in the plausibility assessment: It can be seen as quite plausible as long as there is some evidence of sustainability improvement, so participants may be inclined to give it a high pl-score close to +1. By comparison, if somebody instead makes the same argument but now claims a specific ‘sustainability’ performance measure — one that others may consider as too optimistic, and therefore assign it a plausibility score closer to zero or even slightly negative: how will that affect the overall assessment? What procedural provisions would be necessary to needed to adequately deal with this question?

The issue of ‘quality’ or ‘goodness’ of a proposed solution.
It is of course possible that a discussion examines the quality or ‘goodness’ of a plan in detail, but as mentioned above, this will likely also be in general, qualitative terms, and often even avoided because to the general acceptance of sayings like’ you can’t argue about beauty’ , so the discussion will have some difficulty in this respect, if it does mention beauty at all, or spiritual value, or the appropriateness of the resulting image. Likewise, requirements for the implementation of the plan, such as meeting regulations, may not be discussed.

The decreasing plausibility ‘paradox’
Arguably, all ‘systematic’ reasoning efforts, including discussion and debate, aim a giving decision-makers a higher degree of certainty about their final judgment, than, say, just fast offhand intuitive decisions. However, it turns out that the more depth as well as breadth of discussion is done, the more final plausibility judgment scores will tend to end up closer to the ‘zero’ or ‘don’t know’ plausibility — if the plausibility assessment is done honestly and seriously, and the aggregation method suggested above is used: Multiplying the plausibility assessments for the various premises (which for the factual premises will be probability estimates). These judgments being all about future expectations, they cannot honestly be given +1 (‘total certainty’) scores or even scores close to it, the less so, the farther out in the future the effects are projected. This result can be quite disturbing and even disappointing to many participants, when final scores are compared with initial ‘offhand’ judgments.
Other issues related to time have often been inadequately dealt with in evaluation of any kind:

Estimates of plan consequences over time
All planning arguments are expressing people’s expectations of the plan’s effect in the future. Of course, we know that there are relatively few cases in which a plan or action will generate results that will materialize immediately upon implementation and then stay that way. So what do we mean when we offer an argument that a plan ‘will bring improve society’s overall health’ — even resorting to ‘precise ‘statistical’ indices like mortality rates, or life expectancy? We know that these figures will change over time, one proposed policy will bring more immediate results than another, but the other will have better effect in the long run; and again, the father into the future we look, the less certain we must be about our prediction estimates. These things are not easily expressed in even carefully crafted arguments supported by the requisite statistics: how should we score their plausibility?

Tentative insights, conclusions?

These ‘not fully resolved / more work needed’ questions may seem to strengthen the case for evaluation approaches other than trying to draw support for planning decisions from discourse contributions, even with more detailed assessment of arguments than shown here (examining the evidence and support for each premise). However, the problems emerging from the examination of the argumentative process do affect other evaluation tools as well. I have not seen approaches that resolve them all more convincingly. So:       Some first tentative conclusions are that planning debate and discourse  — too familiar and accessible to experts and lay people alike to be dismissed in favor of other methods — would benefit from enhancements such as the argument assessment tools, but also, opportunities and encouragement should be offered to draw upon other tools, as called for by the circumstances of each case and the complexity of the plans.

These techniques, methods, should be made available for use by experts and lay discourse participants, in a ‘toolkit’ part of a general planning discourse support platform — not as mandatory components of a general-purpose one-size-fit-all planning method but as a repository of tools for creative innovation and expansion: Because plans as well as the process that generate plans define those involved as ‘the creators of that plan’ , there will be a need to ‘make a difference, to make it theirs: by changing, adapting, expanding and using the tools in new and different ways, besides inventing new tools in the process.

References:
Rittel, Horst: “APIS: A Concept for an Argumentative Planning Information System” Institute of Urban and Regional Development, University of California at Berkeley, 1980 . A report about research activities conducted for the Commission of European Communities, Directorate General XIIA.
–o–

 

 

EVALUATION IN THE PLANNING DISCOURSE: SAMPLE EVALUATION PROCEDURES EXAMPLE 1: FORMAL ‘QUALITY‘ EVALUATION

Thorbjørn Mann,  January 2020

In the following segments, a few examples procedures for evaluation by groups will be discussed, to illustrate how the various parts of the evaluation process are selectively assembled into a complete process aiming at decision (or recommendation) for decision about a proposed plan or policy; to facilitate understanding of the way the different provisions and choices related to the evaluation task that are reviewed in this study can be assembled to practical procedures for specific situations. The examples are not intended to be universal recommendations for use in all situations. They all will — arguably — call for improvement as well as adaptation to the specific project and situation at hand.

A common evaluation situation is that of a panel of evaluators comparing a number of proposed alternative plan solutions to select or recommend the ‘best’ choice for adoption. Or — if there is only one proposal, — to determine if it is ‘good enough’ for implementation. It is usually carried out by a small group of people assumed to be knowledgeable of the specific discipline (for example, architecture) and reasonably representative of the interests of the project client (which may be the public). The rationale for such efforts, besides aiming for the ‘best’ decision, is the desire for ensuring that the decision will be based on good expert knowledge, but also for transparency and legitimacy and accountability of the process — to justify the decision. The outcome will usually be a recommendation to the actual client decision-makers rather than the actual adoption or implementation decision, based on the group’s assessment of the ‘goodness’ or ‘quality’ of the proposed plan, documented in some form. (It will be referred to as a ‘Formal Quality Evaluation’ procedure.)

There are of course many possible variations of procedures for this task. The sample procedure described in the following is based on the Musso-Rittel (1) procedure for the evaluation of the ‘goodness’ or quality of buildings.

The group will begin by agreeing on the procedure itself and its various provisions: the steps to be followed (for example, whether evaluation aspects and weighting should be worked out before or after presentation of the plan or plan alternatives), general vocabulary, judgment and weighting scales, aggregation functions both for individual overall judgments and group indices, and decision rules for determining its final recommendation.

Assuming that the group has adopted the sequence of first establishing the evaluation aspects and criteria against which the plan (or plans) will be judged, the first step will be a general discussion of the aspects and sub-aspects to be considered, resulting in the construction of the ‘aspect tree’ of aspects, sub-aspects, sub-sub-aspects etc. (ref. the section on aspects and aspect trees) and criteria (the ‘objective’ measures of performance; ref. the section on evaluation criteria). The resulting tree will be displayed and become the basis for scoring worksheets.

The second step will be the assignment of aspect weights (on a scale of zero to to 1 and such that at each level of the ‘tree’, the sum of weights at that level will be 1. Panel members will develop their own individual weighting. This phase can be further refined by applying ‘Delphi Method’ steps: establishing and displaying the mean / median and extreme weighting values and then asking the authors of extremely low or high weights to share and discuss their reasoning for these judgments, and giving all members the chance to revise their weights.

Once the weighted evaluation aspect trees have been established, the next step will be the presentation of the plan proposal or competing alternatives.

Each participant will assign a first ‘overall offhand’ quality score (on the agreed-upon scale, e.g. -3 to +3) to each plan alternative.

The group’s statistics of these scores are then established and displayed. This may help to decide whether any further discussion and detailed scoring of aspects will be needed: there may be a visible consensus for a clear ‘winner’. If there are disagreements, the group decides to go through with the detailed evaluation, and the initial scores are kept for later comparison with the final results. using common worksheets or spreadsheets of the aspect tree, for panel members to fill in their weighting and quality scores. This step may involve the drawing of ‘criterion functions’ (ref. the section of evaluation criteria and criterion functions) to explain how each participant’s quality judgments depend on (objective) criteria or performance measures. These diagrams may be discussed by the panel. They should be considered each panel member’s subjective basis of judgment (or representation of the interests of factions in the population of affected parties). However, some such functions may be the mandatory official regulations (such as building regulations). The temptation to urge adoption of common (group) functions (‘for simplicity and expression of ‘common purpose’) should be resisted to avoid possible bias towards the interests of some parties at the expense of others.

Each group member will then fill in the scores for all aspects and sub-aspects etc. The results will be compiled, and the statistics compared; extreme differences in the scoring will be discussed, and members given the chance to change their assessments. This step may be repeated as needed (e.g. until there are no further changes in the judgments).

The results are calculated and the group recommendation determined according to the agreed-upon decision criterion. The ‘deliberated’ individual overall scores are compared with the members’ initial ‘offhand’ scores. The results may cause the group to revise the aspects, weights, or criteria, (e.g. upon discovering that some critical aspect has been missed), or call for changes in the plan, before determining the final recommendation or decision (again, according to the initial procedural agreements).

The steps are summarized in the following ‘flow chart’.

Evalmap15 FormalevalEvaluation example 1: Steps of a ‘Group Formal Quality Evaluation’

Questions related to this version of a formal evaluation process may include the issue of potential manipulation of weight assignments by changing the steepness of the criterion junction.
Ostensibly, the described process aims at ‘giving due consideration’ to all legitimately ‘pertinent’ aspects, while eliminating or reducing the role of ‘hidden agenda’ factors. Questions may arise whether such ‘hidden’ concerns might be hidden behind other plausible but inordinately weighted aspects. A question that may arise from discussions and argumentation about controversial aspects of a plan and the examination of how such arguments should be assessed (ref. the section on a process for Evaluation of Planning Arguments) is the role of plausibility judgments about the premises of such arguments: esp. the probability of assumption claims that a plan will actually result in a desired or undesired outcome (an aspect). Should the ‘quality’ assessment’ process include a modification of quality scores based on plausibility / probability scores, or should this concern be explicitly included in the aspect list?

The process may of course seem ‘too complicated’, and if done by ‘experts’, invite critical questions whether the experts really can overcome their own interests, bias and preconceptions to adequately consider the interests of other, less‘expert’ groups. The procedure obviously assumes a general degree of cooperativeness in the panel, which sometimes may be unrealistic. Are more adequate provisions needed for dealing with incompatible attitudes and interests?

Other questions? Concerns? Missing considerations?

–o–

EVALUATION IN PLANNING DISCOURSE: DECISION CRITERIA

Thorbjørn Mann, January 2020

DECISION CRITERIA

The term ‘Decision criteria‘ needs explanation, so as to not be confused with the ‘evaluation criteria‘ used for the task of explaining one’s subjective ‘goodness (or ‘quality’ ) judgment about a plan or object by showing how it relates to an ‘objective’ criterion or performance measure (in section /post …) The criteria that actually determine or guide decision may be very different from those ‘goodness’ evaluation criteria — much as the expectation of the entire effort here is to get decisions that are more based on the merit of discourse contributions that clarify ‘goodness.

For discourse aiming at actual actions to achieve changes in the real world we inhabit: when discussion stops after all aspects etc. have been assessed and individual quality judgment scores have been aggregated into individual overall scores and into group statistics about the distribution of those individual scores, a decision or recommendation has to be made. The question then arises: what should guide that decision? The aim of “reaching decisions based on the merit of discourse contributions” can be understood in many different ways, of which actual ‘group statistics’ are only one — not only because there are several such statistical indicators. (It is advisable to not use the term ‘group judgment‘ for this: the group or set of participants may make a collective decision, but there may be several factions within the group for which any single statistic may not be representative; and the most familiar decision criterion in use is the ratio of votes for or against a plan proposal — which may have little if any relation to the group members’ judgments about the plan’s quality.)

The following is an attempt to survey the range of different group decision criteria of guiding indicators that are used in practice, in part to show why the planning discourse for projects that affect many different governance entities (and, finally, decisions of ‘global’ nature) are calling for different decision guides than the familiar tools such as majority voting.

A first distinction must be made between decision guides we may call ‘plan quality’– based, and those that are more concerned with discourse process.

Examples of plan quality-based indicators are of course the different indicators derived from the quality-based evaluation scores:
–  Averaged scores of all ‘Quality’ or ‘Plausibility’ (or combined) judgment scores of participating members;
–  ‘Weighted average’ scores (where the manner of weighting becoming another controversial issue: degree of ‘affectedness’ of different parties? number of people represented by participating group representatives? number of stock certificates held by stock holders?…)
–  As the extreme form of ‘weighting’ participant ’judgments: the ‘leader’s judgment;
–  The judgment of ‘worst-off’ participants or represented groups (the ‘Max-min’ criterion for a set of alternatives);
–  The Benefit-Cost Ratio;
–  The criterion of having met all ‘regulation rules’ — which usually are just ‘minimal’ expectation considerations (‘to get the permit’) or thresholds of performance, such as ‘coming in under the budget’;
–  Successive elimination of alternatives that show specific weaknesses for certain aspects, such that the remaining alternative will become the recommended decision. A related criterion applied during the plan development would be the successive reduction of the ‘solution space’ until there is only one remaining solution with ‘no alternative’ remaining.

Given the burdensome complexity of more systematic evaluation procedures, many process-based’ criteria are preferred in practice:

– Majority voting; in various forms, with the extreme being ‘consensus’ — i. e. 100% approval;
– ‘Consent’ — understood less as approval but acceptance with reservations either not voiced or not convincing a majority. (Sometimes only achieved / invoked in live meetings by determinations such as ‘time’s up’ or ‘no more objections to the one proposed motion).
– ‘Depth and breadth’ of the discussion (but without assessment of the validity or merit of the contributions making up the breath or depth);
– ‘All parties having been heard / given a chance to voice their concerns;
– Agreed-upon (or institutionally mandated) procedures and presentation requirements having been followed, legitimating approval, or violated, leading to rejection e.g. of competing alternatives; (‘Handed in late’ means ‘failed assignment…’)

Of course, combinations of these criteria are possible. Does the variety of possible resulting decision criteria emphasize the need for more explicitly and carefully agreements: establishing clear, agreed-upon procedural rules at the outset of the process? And for many projects, there is a need for better decision criteria. A main reason for this is that in many important projects affecting populations beyond traditional governance boundaries (e.g. countries) traditional decision determinants such as voting become inapplicable not only because votes may be based on inadequate information and understanding of the problem, but simply because the number of people having ‘voting right’ becomes indeterminate.

A few main issues or practical concerns can be seen that guide the selection of decision criteria: The principle of ‘coverage’ of ‘all aspects that should be given due consideration’ on the one hand, with the desire for simplicity, speed and clarity on the other. The first is aligned with either trust or demonstration (‘proof’ ) of fair coverage: ‘accountability’; the second with expediency. Given the complexity of ‘thorough’ coverage of ‘all’ aspects, explored in previous segments, it should be obvious that full adherence to this principle would call for a decision criterion based on the fully explained (i.e. completed evaluation worksheet results of all parties affected by the project in any way, properly aggregated into an overall statistic accepted by all.

This is clearly not only impossible to define but practically impossible to apply — and equally clearly situated at the opposite end of an ‘expediency’ (speed, simple to understand and apply) scale. These considerations also show why there is a plausible tendency to use ‘procedural compliance criteria‘ to lend the appearance of legitimacy to decisions: ‘All parties have been given the chance to speak up; now time’s up and some decision must be made (whether it meets all parties’ concerns or not.)

It seems to follow that some compromise or ‘approximation’ solution will have to be agreed upon for each case, as opposed to proceed without such agreements, relying on standard assumptions of ‘usual’ procedures, that later lead to procedural quarrels.

For example, one conceivable ‘approximation’ version might be to arrange for a thorough discussion with all affected parties being encouraged to voice and explain their concerns, but only the ‘leader’ or official responsible for actually making the decision be required to complete the detailed evaluation worksheets — and to publish it to ‘prove’ that all aspects have been entered, addressed (with criterion functions for explanation) and given acceptable weights, and that the resulting overall judgment, aggregated with acceptable aggregation functions, corresponds with the leaders’s actual decision. (One issue in this version will be how ‘side payments’ or ‘logrolling’ provisions to compensate parties that do not benefit fairly from the decision but whose votes in traditional voting procedures would be ‘bought’ to support the decision, should be represented in such ‘accounts’.

This topic may call for a separate, more detailed exploration of a ‘morphology‘ of possible decision criteria for such projects, and an examination of evaluation criteria for decision guides or modes to help participants in such projects agree on combinations suited to the specific project and circumstances.

Questions? Missing aspects? Wrong question? Ideas, suggestions?

Suggestions for ‘best answers’ given current state of understanding:
– Ensure better opportunity for all parties affected by problems or plans to contribute their ideas, concerns, and judgments: (Planning discourse platform);
– Focus on improved use of ‘quality/plausibility’ based decision guides, using ‘plausibility-weighted quality evaluation procedures explained and accepted in initial ‘procedural agreements’;
– Reducing the reliance on ‘process-based criteria.

Evalmap Decision criteria
Overview of decision criteria (indices to guide decisions)

–o–

Abbe Boulah’s Brexit Solution

– Say, Bog-Hubert: What’s going on out there on the Fog Island Tavern deck?
– Good question, Vodçek. I thought I’d seen everything, but this…
– That bad, eh? Who’s that guy there with Abbe Boulah?
– It’s a tourist from the EU. Don’t know he got lost out here. must be a friend of a friend of Otis. And he got into a bragging contest with Abbe Boulah about which part of the world is crazier; more polarized, has weirder politicians.
– Must be a toss-up, if you ask me.
– Right. So now Abbe Boulah is trying to teach the EU fellow — I couldn’t really figure out if he’s a still-EU Brit or from another part over there — how to fix the Brexit mess.
– Good grief. So what’s Abbé Boulah’s solution?
– It actually looked like a brilliant idea for a while, but…
– Now you’ve got me curious. Do I have to bribe you with a shot of your own moonshine production, the Tate’s Hell Special Reserve?
– Psst. Okay, talked me into it. He stunned the poor guy with the simplicity of the idea: Let the good, compassionate Europeans help the poor Brits out of the conundrum they voted themselves into. Instead of haggling for years about the details of a hard or soft or a medium-well done exit, he said: Why don’t you simply dissolve the EU?
– Urkhphfft: What??
– Hang on, don’t choke on that Zin of yours. It’s only for a day, Abbé Boulah says: All the other countries who want to stay in the union voting the next day to re-unite the union, just with a little change of the name. So: Brexit? Done. Stroke of the pen. Paid vacation for a day, for all the EU employees. Get it? A few crazy regulations not getting written, it’s actually a benefit for everybody…
– But…
– Yes. Worried about things like the trade treaties? He said, they are all reinstated as they were, for now; and the UK can either choose to re-join the new thing, or stay out and re-negotiate the individual agreements one by one, without any deadlines, while the existing arrangements stay as they are until replaced.
– Weird. But, I must say, it has a certain Abbeboulistic appeal, eh?
– Yes — but now they are arguing about what the new name should be. They agree that it should just be a minimal exchange or change in the current name, so it wouldn’t cost too much. Such as just adding or deleting something in a single letter of the name.
– Makes sense.
– You’d think so. But now they’re up in arms about whether it should be ‘Buropean’ or ‘Iropean’ or ‘Furopean’ or ‘Luropean’ or Nuropean’– all just messing a little with the ‘E’ — or European Onion’ or ‘RUnion’ or just adding a ‘2’ (starting a series of ‘generations’ like some computer system: ‘ EU2’, 3, 4…) or a star ‘ European* Union’ or ‘*European’ or ‘European Union* — ‘EU*’ And another star in the flag. Or just put the whole current name in quotation marks… It’s getting vicious, I tell you — you may have to go out there and throw them into the channel to cool them off…

EVALUATION IN THE PLANNING DISCOURSE: ASPECTS and ‘ASPECT TREES’

An effort to clarify the role of deliberative evaluation in the planning and policy-making process.  Thorbjørn Mann,  January 2020

The questions surrounding the task of assembling ‘all’ aspects calling for ‘due consideration’.

 

ASPECTS AND ASPECT TREE DISPLAYS

Once an evaluation effort begins to get serious about its professed aims: of deliberating, making overall judgments a transparent function of partial judgments, of ‘weighing all the pros and cons’, trying not to forget anything significant, to avoid missing things that could lead to ‘unexpected’ adverse consequences of a plan (but that could be anticipated with some care), the people involved will begin to create ‘lists’ of items that ‘should be given due consideration’ before making a decision. One label for these things is ‘aspects’.  Originally meaning just looking at the object (plan) to be decided upon, from different points of view.

A survey of different approaches to evaluation shows that there are many different such labels ‘on the market’ for these ‘things to be given due consideration’. And many of them — especially the many evaluation and problem-solving, systems change consultant brands that compete for commissions to help companies and institutions to cope with their issues — come with very different recommendations for the way this should be done. The question for the effort to develop a general public planning discourse support platform for dealing with projects and challenges that affect people in many governmental and commercial ‘jurisdictions’ — ultimately: ‘global’ challenges — then becomes: How can and should all these differences of the way people talk about these issues be accommodated in a common platform?

Whether a common ground for this can be found — or a way to accommodate all the different perspectives, if a common label can’t be agreed upon — depends upon a scrutiny of the different terms and their procedural implications. This is a significant task in itself, one for which I have not seen much in the way of inquiry and suggestions (other than the ‘brands’ recommendations for adopting ‘their’ terms and approach.) So raising this question might be the beginning of a sizable discussion in itself (or a survey of existing work I haven’t seen). Pending the outcome of such an investigation, many of the issues raised for discussion in this series of evaluation issues will continue to use the term ‘aspect’, with apologies to proponents of other perspectives.

This question of diversity of terminology is only one reason for needed discussion, however. One such reason has to do with the possibility of bias in the very selection of terms, depending on the underlying theory or method, or whether the perspective is focused on some ‘movement’ that by its very nature puts one main aspect at the center of attention (‘competitive strength and growth’; ‘sustainability’, ‘regeneration’; ‘climate change’; ‘globalization’ versus ‘local culture’ etc.) There are many efforts to classify or group aspects — starting with Vitruvius’ three main aspects ‘firmness, convenience and delight’ to the simple ‘cost, benefit, and risk’ grouping, or the recent efforts that encourage participants to explore aspects from different groups of affected or concerned parties, mixed in with concepts such as ‘principles’, best and worst expected outcomes, etc. shown in a ‘canvas’ poster for orientation. Are these efforts encouraging contribution of information from the public, or giving the impression of adequate coverage and inadvertently missing significant aspects? It seems that any classification scheme of aspects is likely to end up neglecting or marginalizing some concerns of affected parties.

Comparatively minor questions are about potential mistakes in applying the related tools: Listing preferred or familiar means of plan implementation as aspects representing goals or concerns, for example; listing the essentially same concern under different labels (and thus weighing it twice…). The issue of functional relationships between different aspects — a main concern of systems views of a problem situation — is one that is often not well represented in the evaluation work tools. A major potential controversy is, of course, the question of who is doing the evaluation, whose concerns are represented, what is the source of information a team will draw upon to assemble the aspect list?

It may be useful to look at the expectations for the vocabulary and its corresponding tools: Is the goal to ensure ‘scientific’ rigor, or to make it easy for lay participants to understand and to contribute to the discussion? To simplify things or to ensure comprehensive coverage? Which vocabulary facilitates further explanation (sub-aspects etc) and ultimately showing how valuation judgments relate to objective criteria — performance measures?

Finally: given the number of different ‘perspectives’ , how should the platform deal with the potential of biased ‘framing’ of discussions by the sequence in which comments are entered and displayed — or is this concern one that should be left to the participants in the process, while the platform itself should be as ‘neutral’ as possible — even with respect to potential bias or distortions?

The ‘aspect tree’ of some approaches refers to the hierarchical ‘tree’ structure emerging in a display of main aspects, each further explained by ‘sub-aspects’, sub-sub-aspects etc. The outermost ‘leaves’ of the aspect tree would be the‘criteria’ or objective performance variables, to which participants might carry their explanations of their judgment basis. (See the later section on criteria and criterion functions.) Is the possibility of doing that a factor in the insistence on the part of some people to ‘base decisions on facts’ — only — thereby eliminating ‘subjective’ judgments that can be explained only by listing more subjective aspects?

An important warning was made by Rittel in discussing ‘Wicked Problems’ long ago: The more different perspectives, explanations of a problem, potential solutions are entered into the discussion, the more aspects will appear claiming ‘due consideration’. The possible consequences of proposed solutions alone extend endlessly into the future. This makes it impossible for a single designer or planner, even a team of problem-solvers, to anticipate them all: the principle of assembling ‘all’ such aspects is practically impossible to meet. This is both a reminder to humbly abstain from claims to comprehensive coverage, and a justification of wide participation on logical (rather than the more common ideological-political) grounds: inviting all potentially affected parties to contribute to the discourse as the best way to get that needed information.

The need for more discussion of this subject, finally, should be shown by the presence of approaches or attitudes that deny the need for evaluation ‘methods’ altogether. This takes different forms, ranging from calls for ‘awareness’ or general adoption of a new ‘paradigm’ or approach — like ‘systems thinking’, holism, relying on ‘swarm’ guidance etc, to more specific approaches like Alexander’s Pattern Language which suggests that using valid patterns (solution elements, not evaluation aspects) to develop plans, will guarantee their validity and quality, thus making evaluation unnecessary.

One source of heuristic guidance to justify ‘stopping rules’ in the effort to assemble evaluation aspects may be seen in the weighting of relative importance given (as subjective judgments by participants) to the different aspects: if the assessment of a given aspect will not make a significant difference in the overall decision because that aspect is given too low a weight, is this a legitimate ‘excuse’ for not giving it a more thorough examination? (A later section will look at the weighting or preference ranking issue).

–o–