Archive for December, 2019

EVALUATION IN THE PLANNING DISCOURSE — AGGREGATION

An effort  to clarify the role of evaluation in the planning process.

Thorbjørn Mann

THE AGGREGATION PROBLEM:

Getting Overall Judgments from Partial Judgments

The concept of ‘deliberation’ was explained, in part, as the process of ‘making overall judgments a function of partial judgments’. We may have gone through the process of trying to explain our overall judgment about something to others, or made the effort of ‘giving due consideration’ to all aspects of the situation, we arrived at a set of partial judgments. Now the question becomes: just how do we‘assemble’ (‘aggregate’) these partial judgments into the overall judgment that can guide us in making the decision, for example, to adopt or reject the proposed plan.

The discussion has already gone past the level of familiar practices such as merely counting the number of supporting and opposing ‘votes’ and even some well-intentioned approaches that begin to look at the number of explanations (arguments or support statements) in the ‘breadth‘ (number of different aspects brought up by each supporting or opposing party, and ‘depth‘ — the number of levels of further support for the premises and assumptions of the individual arguments.

The reason why these approaches are not satisfying is that neither of them even begin to consider the validity, truth and probability (or more generally: plausibility), weight or relevance of any of the aspects discussed, or whether the judgments about any such aspects or justifications even have been ‘duly considered’ and understood.

Obviously, it is the content merit, validity, the ‘weight’ of arguments etc. we try to bring to bear on the decision. Do we have better, more ‘systematic’ ways to do this than Ben Franklin’s suggestion? (He recommended to write up the pros and cons in two different columns on a sheet of paper, then look at pairs of pros and cons that carry approximately equal weight and cancel each other out, and cross those pairs out, until there are the remaining arguments left that do not have any opposing reasons in the opposite column: those are the ones that should tilt the decision towards approval or rejection.)

What we have, on the one hand, is the impressively quantitative ‘Benefit/Cost’ approach, that works by assigning monetary value to all the b e n e f i t s of a proposed plan (the ‘pro’ arguments), and compare those with the monetary value of the ‘c o s t’ of implementing it. It has run into considerable criticism, mainly for the reasons that the ‘moral’ reluctance of having to assign monetary value to people’s health, happiness, lives; the fact that the approach usually has to be done by ‘experts’, not by citizens or affected groups, and from the overall point of view of some overall ‘common good’ perspective that is the usually ‘biased’ perspective of the government currently in power, that may not be shared by all segments of society, because it tends to hide the issue of the distribution of benefits and costs: inequality.

On the other hand, we have the approaches that separate the ‘description’ of the evaluated plan or object to be evaluated from the perceived ‘goodness’ (‘quality’) judgments about the plan and its expected outcome, from the‘validity’ (plausibility, probability) of the statements (arguments) conveying the claims about those outcomes. And, so far, the assumption that ‘everybody‘ including all ‘affected’ parties can make such judgments and ‘test’ their merit in a participatory discourse. What is still missing are the possible ways in which they can be ‘aggregated’ into overall judgments and guiding measures of merit for the decision– first, for individuals, and then for any groups that will have to come to a commonly supported decision. This is the topic to be discussed under the heading of ‘aggregation’ and ‘aggregation functions’ — the rules for getting ‘overall’ judgments from partial judgments and ‘criterion function’ results.

It turns out that there are different possible rules about this, assumptions that must be agreed upon in each evaluation situation, because they result in different decisions: The following are some considerations about assumptions or expectations for ‘aggregation functions (suggested in H. Rittel’s UC Berkeley lectures on evaluation, and listed in H. Dehlingers article  “Deontische Fragen: Urteilsbildung und Bewertungssysteme”  in “DIe methodische Bewertung: Ein Instrument des Architekten”  Festschrift zum 65. Geburtstag von Prof. Arne Musso, TU Berlin, 1993):

Possible expectation considerations for aggregation functions:

1 Do we wish to arrive at a single overall judgment (of quality / goodness or plausibility etc.) — one that can help us distinguish between e.g. plan alternatives of greater or lesser goodness?

2 Should the judgments be expressed on a commonly agreed-upon judgment scale whose end points and interim values ‘mean’ the same for all participants in the exercise? For example, should we agree that the end points of a ‘goodness’ judgment scale should mean ‘couldn’t possibly be better’ and ‘couldn’t possibly be worse’, respectively; and that there should be a ‘midpoint ‘ meaning’ neither good nor bad; indifferent; or ‘don’t know, can’t make a judgment’? (Most judgments scales in practice are expressed on a ‘zero to some ‘one-directed’ scale such as zero to some number.)

3 Should the judgment scale be the same at all levels of the aspect tree, to maintain consistency of the meaning of scores at all levels? So any equations for the aggregation functions should be designed to produce the respective overall judgment at the next higher level to be a score on the same scale.

4 Should the aggregation function ensure that if a partial score is improved, the resulting overall score should also be higher or the same, but not lower (‘worse’) than the unimproved score? By the same rule, the overall score should not be better than the previous score, if one of the partial judgments becomes lower than before.
This expectation means that in a criterion function, the line showing the judgments cores should be steadily declining and decreasing, but not have sudden spikes or valleys.

5 Should the overall score be the highest one (say, +3 = ’couldn’t be better’, on a +3/-3 scale) only if all partial scores are +3?

6 Should the overall score be a result of ‘due consideration’ of all the partial scores?

7a Should the overall score be ‘couldn’t be worse’ (e.g. -3 on the +3/-3 scale) if all partial scores are -3?
Or
7b Should the overall score become -3 if one of the partial scores becomes -3 and thus unacceptable?

Different functions — equations of ‘summing up partial judgments — will be needed for this. There will be situations or tasks in which aggregation functions meeting expectation 7b may be needed. There is no one aggregation function meeting all these expectations. Thus, the choice of aggregation functions must be discussed and agreed upon in the process.

Examples:

‘Formal’ Evaluation process for Plan ‘Quality’

Individual Assessment

The aggregation functions that can be considered for individual ‘quality’ evaluation (deliberating goodness judgments, aspect trees, and criteria i what may be called ‘formal evaluation procedures’) include the following:

Type I:    ‘Weigthed average’ function:    Q = ∑ (qi * wi)
                                                                       
where Q is the overall deliberated ‘quality’ or ‘goodness’ score; qi is the partial score of aspect or sub-aspect i, n is the number of aspects at that level; wi is the weight of relative importance of aspect i, on a scale of 0 ≤ wi ≤ 1 and such that ∑wi = 1. This is needed to ensure that Q will be on the same scale (and the associated meaning of the resulting judgment score the same) as q.

This function does not meet expectation 7b; it allows ‘poor scores’ on some aspects to be compensated for by good scores on other aspects.

Type II a:  (“the chain as strong as its weakest link” function):      Q = Min (qi)

Type IIb:        Q = ∏ ((qi + u) ^wi ) – u
                       
Here, Q is the overall score, qi the partial score i of n aspects, and u is the extreme value of the judgment score (e.g. 3 in the above examples). This function, (multiplying all the components of (qi + u) with the exponent of their weights wi, and then subtracting u from the result to get the overall score back to the +3/-3 scale) acts much like the type I function as long as all the scores are in the positive range, but pulls the overall score the closer to -u , the lower one of the scores comes to – u, the ‘unacceptable’ performance or quality. (Example: if the structural stability of a building does not stand up against expected loads, it does not matter how otherwise functionally adequate or aesthetically pleasing it is: its evaluation should express that it should not be built.)

Group assessments:

Individual scores from these functions can be applied to get statistical ‘Group’ indicators GQ : for example:

GQ = 1/m ∑ Qj
This is the average or mean of all individual Qj scores for all m participants j.

GQ = Qj
This takes the judgment of one group member as the group score.

GQ = Min (Qj)
The group score is equal to the score of the member with the lowest score in the group; both these functions effectively make one participant the ‘dictator’ of the group…

Different functions should be explored that, for example, would consider the distribution of the improvement of scores for a plan, compared with the existing or expected situation the plan is expected to remedy. For example, the form of aggregation function type IIb could also be used for group judgment aggregation.

The use of any of these aggregated, (‘deliberated’ ) judgment scores as a ‘direct’ guiding measure of performance determining the decision c a n n o t be recommended: they should be considered decision guides, not determinants. For one, the expectation of ‘due consideration of all aspects‘ would require complete knowledge of all consequences of a plan and causes of the problem it aims to fix — an expectation that must be considered unrealistic in many situations but especially in ‘wicked’ problems or ‘messes’. There, decision-makers must be willing to assume responsibility for the possibility of being wrong — a condition impossible to deliberate, by definition, when caused by ignorance of what we might be wrong about.

Aggregation functions for developing overall ‘Plan plausibility’ judgment
from the evaluation of ‘pro’ and ‘con’ arguments.

Plausibility judgments

It is necessary to reach agreements about the use of terms for the merit of judgments about plans as derived from argument evaluation, because the evaluation task for planning arguments is somewhat different from the assessment usually applied to arguments. Traditionally, the purpose of argument analysis and evaluation is seen as that of verifying whether a claim — the ‘conclusion’ of an argument — is true or false, and this is seen as depending on the truth of the premises of the argument and the ‘validity’ of the form or pattern or ‘inference rule’ of the argument. These criteria do not apply to planning arguments, that can generally be represented as follows: (Stating the ‘conclusion’ — the claim about a proposed plan A first:)

Plan A ought to be implemented
because
Plan A will result in outcome B, (given or assuming conditions C);
and
Outcome B ought to be aimed for / pursued;
and
Conditions C are given (or will be when the plan is implemented)

Like many arguments studied by traditional logic and rhetoric, not all argument premises are stated explicitly in discussions; some being assumed as ‘taken for granted’ by the audience: ‘Enthymemes’. But to evaluate these arguments, all premises must be stated and considered explicitly.

This argument pattern — and its variations due to different constellations of assertion or negation of different premises — does not conform to the validity conditions for ‘valid’ arguments in the formal logic sense: it is, at best inconclusive. Its premises cannot be established as ‘true or false‘ — the proposed plan is discussed precisely because it as well as the outcomes B aren’t there (‘true’) yet. This also means that some of the premises — the factual-instrumental claim ‘If A is implemented, then B will happen, given C) and the claim ‘C will be present’ are estimates or predictions qualified as probabilities. And ‘B ought to be pursued’ as well as the conclusions ‘A ought to be implemented) are neither adequately called ‘probable’ nor true or false: the term ‘plausible’ seems more fitting at least for some participants, but not necessarily for all. Indeed: ‘plausible’ judgments may be applied to all the claims, with the different interpretations easily understood to each kind. This is is a matter of degrees, not a binary yes/no quality. And unlike the assessment of factual and even probability claims in common logic argumentation studies, the ‘conclusion’ (decision to implement) is not determined by a single ‘clinching’ argument: it rests on several or many ‘pros and cons’ that must be weighed against each other. That is the evaluation task for planning argumentation, that will lead to different ‘aggregation’ tools.

The logical structure of planning argumentation can be stated in simplified for as follows:

– An individual’s overall plausibility judgment of plausibility PLANPL is a function of the ‘weight’ Argw of the various pro and con arguments raised about the proposal.
– The argument weight is a function of the argument’s plausibility Argpl and the weight of relative importance w of its deontic (ought-) premise.
– The Argument plausibility Argpl is a function of the plausibility of its premises.

Examples of aggregation functions for this process might be the following:
                                                   
1. a Argument plausibility:        Argpli = ∏ {Premplj} for all n premises j.

Or  

1.b   Argpli = Min{ Premplj}

2.    Argument weight:               Argwi = Argpli * wi with 0 ≤ wi and ∑ wi = 1
for the ought-premises of all m arguments

3. Proposal plausibility PLANPL = ∑ Argwi
                                               

Aggregation functions for Group judgment statistics: (Similar to the Quality group aggregations)

Mean Group average plausibility   GPLANPL = 1/k ∑ PLANPLp for all k participants p.                                                  

There are of course other statistical measures of the set of individual plausibility judgments that can be examined and discussed. Like the ‘Quality’ Aggregated measures, these ‘Group’ plausibility statistics should not be used as decision determinants but as guides, for instance as indicators of need for further discussion and explanation of judgment differences, or for revision of plan details to alleviate concerns leading to large judgment differences.
Evalmap11 Aggregation

Comments? Additions?

–o–

EVALUATION IN THE PLANNING DISCOURSE — JUDGMENT SCALES

An effort to clarify the role of deliberative evaluation in the planning and policy-making process.

Thorbjoern Mann

JUDGMENT SCALES

Differences of judgments are expressed  on  s c a l e s  — ‘yardsticks’ on which to locate different judgment ‘visually’.  There are many different kinds of judgment scales.

Which kind of scale should be chosen and agreed upon in a specific evaluation situation depends on the purpose of the task: whether a decision for action (e.g. acceptance or rejection) about a single plan is called for, or a selection among a number of competing proposals or options; or a general expression of  appraisal (e.g. goodness for some purpose, perhaps to guide design in a ‘program’ sense or to improve a proposed plan).  For some purposes ,  like the ‘acceptance/ rejection’ decision, the scale needed will  be ‘binary’ , — have only two ‘values’ or at most three:  ‘Yes, No, Undecided’.  For other purposes, such as comparison between alternative proposals, scales with more such ‘values’  are needed.

Traditionally (as in science), four types of scales are distinguished: 

‘N o m i n a l’,  ‘O r d i n a l’ ,   ‘D i f f e r e n c e’  (or  ‘I n t e r v a l’)  and ‘R a t i o‘  scales.

The Nominal scale (not really an ‘ordered’ scale since its values can be in any order) contains just ‘names’ of different kinds of objects or options must be distinguished. Architectural examples are ‘single-family detached home’, ‘duplex’, ‘row house’, ‘multistory  apartment’ etc. 

Ordinal scales , as the name suggests, put its items in a distinct order — for example ‘first place, second place, third place’ etc. in sports — but without any information about how much faster the winner of the race was than the runner-up. Just rank order.   

Difference or Interval scales offer more detailed ‘quantitative’ information. They specify      u n i t s — but these are arbitrary entities on a scale with an arbitrary location of ‘zero’. The temperature scales are examples: the Fahrenheit scale sets the‘zero’ degrees point (the temperature on the coldest day Mr. Fahrenheit had ever experienced and could not imagine being any lower) at what in the Celsius scale becomes  ~minus 17.8‘degrees. Celsius ‘zero’ is the temperature of water freezing. Fahrenheit’s 100 degrees is the approximate temperature of human blood, a mere 37.8 degrees Celsius, where 100 degrees is the temperature of water boiling, which in Fahrenheit becomes 212 degrees.

The Ratio scale has a ‘natural’  zero point (e.g. human height) but also arbitrary units — it can be measured in feet and inches or centimeters, etc.. 

The more ‘o b j e c t i v e’ , ‘factual’  and ‘scientific’  judgments we wish to express, the more we will tend to use difference scales a least and at best ratio scales that allow precise         m e a s u r e m e n t  rather than  s u b j e c t i v e  opinions, guesses and  estimates.

For judgment purposes, this means that scales for expressing ‘goodness’ and similar judgments, the scales will be at the difference scale level at best. Agreements are needed about whether judgment scales are to be ‘one-directional’  e.g.  from 0 upwards, or ‘bi-directional’, that is, showing both positive and negative ‘d i s c r e t e’  or ‘c o n t i n u o u s’  values. They can be ‘unbounded’  (going to infinity) or ‘bounded‘  with a distinct  upper ‘couldn’t be better‘ or lower (‘couldn’t be worse’) number. 

All these choices, and the m e a n i n g of the points on a chosen scale,  must be agreed upon for any particular  task, to avoid misunderstanding and conflicts.

Evalmap Judgment types,scales

The map above shows distinctions of judgment scales that perhaps gives a wrong impression that these are  mutually exclusive. In reality, they are combined even when judgments are ‘atomic’ (that is, one judgment applied to one object, and especially when two or more judgments are expressed in ‘compound‘ judgments where several evaluation judgments are applied to the object. This variety of possible judgment types and scales are better shown in a ‘Zwicky Box’ type  or ‘morphological analysis’ diagram, as in the following diagram where the type categories are shown as ‘parameters’ and the members of the categories are parameter values, and each particular scale is described by the profile linking several values:

Scan 1.jpeg

The profiles show two typical ‘measurement’ scales  and two common judgment scales: A: the temperature scale and  B  the movement speed in miles per hour; C,  the common academic grading scale, and D, the bidirectional ‘goodness’ judgment scale according to Musso and Rittel.

–o–

THE BRIEF FOG ISLAND TAVERN DISCOURSE ON LAWS AND MORALS


– You writing a letter to the editor there, Bog-Hubert?

– Huh? What makes you think so, Vodçek?

– Well, scribbling furiously in your notebook and crossing out half of what you wrote, as far as I can see from this inverted perspective, anyway: must be something with a word limit, like a letter to the editor. Am I right?

– Nah, sorry. I was just reading some article in Abbé Boulah’s NYRB latest issue , about evolution of ethics and morality, a review of some fat book. And I don’t think that any comment about it other than ‘BS’ with an exclamation mark or two can be kept short enough for a letter to the editor.

– Good grief — and this in my lowly tavern? What makes you spend any more time on that profound conundrum, then?

– Well, … wait, why do you say ‘conundrum’?

– Hmm… can’t really tell you — it was just the first word matching the profundity of the topic that came to my mind.

– So you think it’s profound, huh? I tend to agree, but why make it so complicated? And why do serious, educated people throw it in with evolution and theories like that?

– To come up with some explanation or basis for ethical and moral rules that is somehow scientific, not just arbitrary?

– I got stuck on the arbitrary part too. Brings up the question of how’s doing the arbitrating, for one?

– It isn’t just arbitrary, though. Just looking no further, for a first step, than some society that has to deal with how its members act. Some behavior they don’t like, say, because it hurts another member of that society: what to do about it? So they make a rule: we don’t do things that hurt each other.

– And if somebody does such things?

– Good question. You want me to say something simple and stupid like ‘an eye for an eye’, etc. We do the same thing to the rule-breaker? Doesn’t always work right.

– Why not?

– Let’s see. For example: if you steal somebody’s savings, well, you have to give them back, if you didn’t already spend it all, well, let’s not investigate, you’ll have to give it back, if you don’t have any savings — which is why you did the deed in the first place, there’s no ‘eye for an eye’. And if you kill my child, by accident or intent: if you don’t have kids, there’s no ‘eye for that eye’ either, and if you did have a child, it sure wouldn’t be the right thing to do for that kid. So there’s got to be some better way.

– Yeah, well: there are laws that specify what substitutes for those eyes, right? And that’s what I was wondering: what’s the essential difference between laws and morals?

– Ah: better question. So let’s look at laws, first. A community, or society, decides what kind of acts and behavior it doesn’t like and wants to keep its members from doing.

– What about things people like to have the community members do?

– Another good question! Because that gives us a first possible way of dealing with your question of what to do — for either kinds of things and behavior. For we can say: as a community, we provide some good things for all our fellow members regardless of what they do or don’t do — that why we are a community, in the first place — and we can offer some good things in reward for doing things we like. Incentives. Now the first rule we can make about doing ‘bad’ things — that we don’t like: we can threaten to take away some of the good things — the incentives first, and then some of the essentials, for really bad stuff. Like freedom: if you do some things that really hurt people or their savings or kids and such, we put you in a place where you can’t do that, which means that you can’t go to do things you like to do either, even if they don’t hurt anybody.

– So laws are trying to prevent people from doing bad things. If you do them anyway, we’ll do this, that etc. to you. Threats of punishment and follow-through if the threats don’t work. Well: that should do it, shouldn’t it?

– What do you mean, do it?

– Let’s assume for the moment that we can make laws that cover all the good things we like people to do, and all the bad things we don’t want them to do — everything! — why would we need any ethics and morals? Case closed. Sure, there are details to be worked out: what are those do’s and don’t’s, and what kinds of incentives and punishments should be provided — that will need some serious discussion.

– Yes, and don’t forget the problem of finding out what people did. If there’s evolution in this, it’s mainly in the evolution of the gadgets we can give law enforcement to go after the ‘bad’ guys. But hey, that’s just the point: there are some things that are hard or impossible to find out, and some where it’s hard to tell who is right: the person who says ‘the other guy did something bad to me’, or the other guy who says ‘that guy just want to hurt me by saying I did those bad things: he’s really the bad one!’ And on the ‘good’ side do we really want to do good things for the incentives and rewards — that’s not really being good, is it?

– I see what you’re getting at. You’re saying that there are some things we should do or don’t do, just for the sake of their goodness, and our own wanting to be good, in one way or other. And it’s for those societies need moral ‘rules’ — that society itself doesn’t or can’t react to with rewards or punishment but just admiration or disrespect, disapproval?

– Yes, except that even admiration and disapproval can be powerful incentives or deterrents. But you were jumping ahead a step or two there, weren’t you? There are some acts or behaviors that are good or bad, but if we don’t find out, we may treat a person as an upstanding moral one even if they are secretly evil and rotten: if we want to say something about those…

– Ah. I see. You’ll have to invent a higher power — one who will know your secret noble or evil thoughts and deeds and will reward or punish you in the hereafter. And that’s the moral system.

– Yes. Or it’s internal: called your own conscience. That’s knowing how bad you really are, that will punish you with guilty sleepless nights and poor digestion…

– Let’s not go there. It could drive a fellow to drink.

– You put the finger right on the sore spot, my friend. And I’m here to help you with that…

– What a guy. Ok. One glass of Zin. The gloriously evolved one from Sonoma County. Just one, eh?

– Good choice. The Fog Island Tavern Reward for good thinking and conversation. Cheers.

–o–

EVALUATION IN THE PLANNING PROCESS: EVALUATION TASKS


An effort to clarify the role of deliberative evaluation in the planning and policy-making process

Thorbjoern Mann

EVALUATION TASKS / SITUATIONS

The necessity for this review of evaluation practices and tools arises from the fact that evaluation tasks and judgments and related activities occur at many stages of planning projects. A focus on the most common task, the evaluation of a proposed plan or a set of plan alternatives in preparation for the last action, may hide the role and impact of many judgments along the way, where explicitly or implicitly not only different labels but also very different vocabulary, tools and principles are involved. Is it necessary to look at these differences, to ask whether there should be more of an effort of coordination and common vocabulary in the set of working agreements for a project?

This section will at least raise the question and begin to explore the different disguises of evaluation acts throughout the planning process to answers these questions.

Many plans are started as extensions of routine ‘maintenance’ activities on existing processes and systems, using established performance measures as indicators of a need for extraordinary steps to ensure the continued desirable function of the system in question. In such tasks, the selected performance criteria, their threshold values demanding action and most of the expected remedial steps and means, are part of the factual ‘current conditions’ data basis of further planning.

To what extent are these data understood as part of the planning project — either as ‘given’ aspects or as needing revision, discussion, change — when the situation is so unprecedented as to call for activities going beyond the routine maintenance concerns? Such situations are often referred to as ’problems’, which tends to trigger a very different way of talking. There are many different ‘definitions’ or views, understandings of problems, as well as different problem types. To what extent is an evaluation group’s decision to talk about the situation as a problem, a specific problem type, already an evaluative task? Even adopting a view of ‘problem’ as a perceived (by somebody!) discrepancy between an existing ‘IS’ state of affairs and a view of what that state ‘OUGHT’ to be, calling for ideas about ‘HOW’ to get from the IS to the OUGHT.

Judgments about what ‘is’ the case do call for judgments, perhaps even measurements, of current conditions: assessments of factual matters, even as those are perceived — again, by whom? — as ‘NOT-Ought’. Judgments specifying the OUGHT — ‘goals’ , ‘visions’, ‘desirable’ states of affairs — belong to the ‘deontic’ realm, much as this often is obscured by the invocation of ‘facts’ in the form authorities and of polls of percentages of populations ‘wanting’ this or that ‘OUGHT’: the ‘good’ they are after. The judgments about the ‘HOW’ — means, tools, etc. to reach those goals may look like ‘factual-instrumental’ judgments — but also getting into the deontic realm; some possible ‘means’ are decidedly NOT what we OUGHT to do, no matter how functionally effective they seem to be.

The ‘authority’ source of judgments that participants in planning will have to consider come in the form of laws and ‘regulations’. Examined as ‘givens’, they may be helpful in defining, constraining the ‘solution space’ for the development of the plan. But they often ‘don’t fit the circumstances’ of a current planning situation, and raise questions about whether to apply for a ‘variance’, an exception to a rule. Of course, any regulation is itself the outcome of an evaluation or judgment process — one that may be acknowledged but usually not thoroughly examined by the planners of a specific project. The temptation is, of course, to ‘accept’ such regulations as the critical performance objective (‘to get the permit’), conveniently forgetting that such regulations usually specify m i n i m a l performance expectations. They usually focus on meaningful concerns such as safety and conformance to setback and functional performance conventions — and neglecting or drawing attention away from other issues such as aesthetics, sustainability, environmental or mental health impact of the resulting ‘permitted’ but in many other ways quite mediocre and outright undesirable solutions.

Other guidance tools for the development of the plan — buildings, urban environments, but also general societal policy and policy implementation efforts — are the ‘programs’ (briefs’) and equivalent statements about the desired outcome. One main consideration of such statements is to describe the scope of the plan (in buildings; how many spaces, their size and functions , etc.) in relation to the constraint of the budget. In many cases, such descriptions are in turn guided by ‘standards’ and norms for similar uses, in each case moving responsibility for the evaluation judgments onto a different agency: asking for the basis of judgment of the provision of such expectations is becoming a complex task in itself.

The ‘participation’ demand for involving the eventual users, citizens, affected parties in these processes seems to take two main forms: one being general surveys — asking the participants to fill out questionnaires that try to capture expectations and preferences; the other being ‘hearings’ in connection with the presentation of in-progress ‘option decisions or final plans. Do the different methodological basis and treatment of these otherwise laudable efforts raise questions about their ultimate usefulness in nurturing the production of ‘quality’ plans?

The term ‘quality’ is a key concern of a very different approach to design and planning — on that explicitly denies the very need for ‘method’ in the form of systematic evaluation procedures. This is the key feature (from the current point of view) of the ‘Pattern Language’ by Christopher Alexander. Its promise (briefly and arguably unfairly distorting) is that using ‘patterns’ such as the design precepts for building and town planning of his book ‘A Pattern Language’ in the development of the plan will ‘guarantee’ an outcome that embodies the ‘quality without a name’ — including many of the aspects not addressed by the ‘usual’ design process and its regulation and function-centered constraints.

This move seems to be very appealing to designers (surprisingly, even more in other domains such as computer programming than in architecture) — any outcome done in the proper way with the proper patterns is thereby ‘good’ (‘has the ‘quality’ ) and does not need further evaluation. Not discussed, as far as I can see, is the fact that the evaluation issue is merely moved to the process of suggesting and ‘validating’ the patterns — in the building case, by Alexander and his associates, and assembled in the book. Is the admirable and very necessary effort to bring those missing quality issues back into the design and planning process and discussion undercut by the removal of the evaluation problem from that discussion?

The Pattern Language example should make it very clear how drastically the treatment of the evaluation question could influence the process and decision-making in the planning process.

Comments: Missing items / issues? Wrong question?

–o–

EVALUATION IN THE PLANNING DISCOURSE: TYPES OF JUDGMENTS

An effort to clarify the role of deliberative evaluation in the planning and policy-making process.

Thorbjoern Mann

EVALUATION:  TYPES OF JUDGMENTS

Different kinds of judgments can be distinguished according to the purpose for which they are made and communicated:  Some judgments are aimed at supporting or indicating decisions,  a c t i o n s:  to accept or reject a plan proposal, to initiate processes. Others are are just intending to provide  d e s c r i p t i o n s, about what things, environments, buildings, plans are like. Such descriptions   can be merely informative, but the label ‘judgment’ is often held to mainly apply to ‘evaluation’  expressions of goodness, value, appropriateness, to the things we evaluate, ‘a p p r e c i a t i v e judgments and that these judgments are ‘subjective’. Is it important to recognize that expressions of  description  of those objects which we expect to be ‘true’,  ‘objective’  matters of fact are often also only      e s t i m a t e s, that is, judgments of degrees of certainty that things are really so? (and that these judgments are  personal and seem more subjective than objective?) 

Classes of  judgments usually distinguished in evaluation discussions are

  • ‘Descriptive‘ and ‘Evaluative’ judgments;  
  • ‘Offhand’ (‘spontaneous’) and ‘Deliberated judgments; 
  • ‘Overall’ (about the ‘whole’) and Partial judgments (about parts, sub-aspects etc.)

This  leads to the following sets of judgments:

  • ‘Action’ or decision-oriented judgments and arguably evaluative:  
    • overall     offhand      and    
    • overall  deliberated
  • Descriptive:
    • overall  offhand,
    • overall  deliberated  
    • partial offhand   
    • partial  deliberated
  • Evaluative
    • overall offhand,
    • overall deliberated
    • partial offhand
    • partial deliberated

If we admit  m e a s u r e m e n t s  as a kind of description judgment,  ‘e s t i m a t e s‘  of results of measurements (that have not yet been done, or predicted for the future) will  also have to be added; and though the aim may be to have both measurements and estimates be as close to actual ‘objective’ properties as possible, are these estimates subjective or objective? There are scientific protocols (precise description of where, when, by what means and tools measurements have been taken, so that they can be repeated by anyone, anytime, anywhere and yielding the same results, etc.)  that must be met to claim validity of objectivity claims:  Do we need to be more exigent about similar protocols for predictions, forecasts?

Additionally, the distinctions between judgments seen as ‘objectively’ true or ‘factual’ , as opposed to ’subjective’ opinions, are the subject of persistent controversies: are evaluative judgments objective or subjective? Has the legitimate and ‘responsible’ human quest to base decisions on facts and truth, that is, objective premises that confer certainty and authority, led to tendencies  to not take subjective judgments as seriously as objective judgments or measurements?

The distinctions are not clear-cut. It is tempting, for the sake of simplicity and clarity,  to declare ‘objective’ descriptions as related to features of the object evaluated, (which then everybody should accept as such, whether we like it or not, provided those protocols have been met) and ‘subjective’ evaluations as pertaining to the effects generated in the person (subject) doing the evaluating, and as such changeable and personal taste, ‘about which we cannot argue’. But consider a  judgment such as ‘affordable’ — eminently personal and thus subjective: I cannot afford the multi million-dollar yacht or mansion a billionaire has no trouble buying. Isn’t that judgment based on the objective fact of the cost of the yacht exceeding the amount of money in my bank account?  Or we observe — objectively? — that some people like a certain building and call it ‘beautiful’,  while others do not:  even if we declare the judgments of the latter group ‘wrong‘  or misguided, (on what ‘objective’ grounds?) — are they not still judgment facts that will influence the decisions made?  Some, like C. Alexander, are arguing that aspects like beauty and value are “matters of objective fact”.  Is the implication that differing judgments — opinions — should be dismissed?  Can these differences be resolved? How should they be handled in public deliberative evaluation efforts?    

Which brings us to the distinction between ‘i n d i v i d u a l’  judgments by single participants in the process, from ‘g r o u p’  judgments or indicators based on all those individual scores. Is it appropriate to speak of a ‘group judgment’ — groups are seldom if ever one unified entity that ‘have’ a unified judgment;  there usually are significant differences in the judgments of individuals and sub-groups within a collective? Sure a group can agree on a judgment. But even using the term ‘group judgment‘ and demanding its pursuit as a patriotic or solidarity, moral duty,  can be seen as an effort to enforce the views of one party in the group over the objections of another, dismissing the latter’s concerns. It may be better to just consider the different kinds of statistical descriptions of the set of individual judgments:  different statistical indicators that can guide the group’s decision in different ways.

How all these issues should be dealt with in the design of support systems for the planning discourse is very much a matter needing discussion, since extreme positions on some of the issues would have significant implications for the role of  evaluation in the process.

Evalmap6 Judgment–o–

EVALUATION, DELIBERATION IN THE PLANNING DISCOURSE

An effort to clarify the role of deliberative evaluation in the planning and policy-making process
Thorbjoern Mann

EVALUATION / DELIBERATION

‘Evaluation‘ and its related term ‘deliberation’ is understood in many different ways. A simple view is just the act of making a value judgment about something: about a plan: is it ‘worth’ implementing? To many, it evokes a somewhat cumbersome, bureaucratic process that itself constitutes a problem. Seen from the perspective of theories like the Pattern Language, for example, it is a ‘method’ from which the Pattern Language ‘frees’ the designer: not needed, even ‘part of the problem’ of misguided design and planning process. So does the idea need some clarification, discussion?

Some answers to this question might be found by examining the reasons people feel such efforts are necessary: Beginning with trying to make up one’s own mind when facing a somewhat complicated situation and plan, trying to consider all pertinent aspects, all significant causes of the problem a plan is supposed to fix, also its possible consequences, its ‘pros and cons’; trying not to forget important details, expected benefits and costs and risks if things don’t turn out quite as we might wish.

Such ‘mulling’ about the task in order for an individual person to arrive at a judgment may not require a very systematic and orderly process. Things may be somewhat different when we are then asked to explain or justify our judgments to others, and even more so when participants in a project discourse try to get other parties to not only become aware of their concerns and judgments, but even to give them ‘due consideration’ in making decisions. Or when clients or users are asking designers, planners and ultimate decision-makers to make the decisions in developing the plan ‘on their behalf’: The burden of explanation (of what they would consider a viable answer to their needs or wishes falls first on the former, and then on the latter, pointing out how their plan features will meet those expectations. The common denominator: explaining the basis of one’s judgment to others, for the purpose of justification or persuasion — to accept the plan. The basic pattern in that process is to show how o v e r a l l judgments or quality scores depend on various   p a r t i a l judgments, or ultimately on some ‘objective’ quantifiable features (‘criteria’) of the plan. (The very term ‘objective’, used in asserting its distinction from ‘subjective’ judgments and ‘opinions’, is of course itself a major controversy, to be dealt with in a later segment.)

The shift of burden of explanation mentioned above is an indicator of a fact that is often overlooked in discussions about evaluation issues: that evaluation occurs in many different shapes and forms, in many different stages all along the planning process, not just in the final occasions of accepting or rejecting a proposed plan, or selecting ‘the best’ of a set of proposed alternatives by a competition jury. Should a better coordination be developed between those different events, and the often very different terms used?

The claims and arguments used in the different evaluation tasks use different terms, and draw on different sources and methods for obtaining the ‘evidence’ for claims and arguments. The near obsession with ‘data’ (or ‘facts’) in this connection overshadows the problems associated with the relationships between facts describing the current ‘problem’ situations to be remedied, the ‘facts’ about the expectations, concerns, wishes, needs of different groups in the affected populations (which themselves are not ‘facts’ …yet) and the ‘facts’ (but also just estimates, predictions) generated by systems models about the ‘whole system’ in which current problem, plans and future consequences are embedded.

A final aspect should be mentioned in this connection. There will be, in real life, many situations in which people, leaders and others, will be called upon to make quick decisions, with no time for lengthy public discourse. These decisions will be ‘intuitive’, often ‘offhand’ decisions for which there is insufficient information upon which they can be reasonably based. We expect that decisions must be made by people whose (intuitive?) judgment can be trusted. This suggests that we think some people have ‘better’ intuitive judgment than others. So where does better intuition, better judgment come from? Experience with similar situations is one likely source. There are claims that having experienced the process of organized, systematic deliberation and evaluation may also contribute to improve decision-makers‘ quality of intuitive judgment. What is the evidence for this, and what, if any implications should be considered?

Given the speculative nature of many of these considerations, it seems that there is a need for more thorough study and discussion of these issues; what are the implications of assumptions we make for the design of better planning discourse platforms? What other aspects should be added to the picture?

–o–

Abbé Boulah’s Hack-Rigged Funding Scheme

In the Fog Island Tavern:

Hey Vodçek — has Abbé Boulah been in today?

And a good morning to you, too, Bog-Hubert. No, haven’t seen him yet. What’s stirring your urgency to see him?

Well it’s a mystery. I was out in the Gulf trying to get to Rigatopia — you know, the new refugee society on that abandoned rig, when my GPS conked out and I had to navigate by compass, the old-fashioned way. Turns out I’d forgotten to re-set the compass declination to the new position of that wandering magnetic North Pole. So I ran into a different rig, nearby, but was warned off by radio to go anywhere near it. Secret prison rehab project or something. Near Rigatopia? Sounded like another one of Abbe Boulah’s crazy schemes: do you know anything about it?

Ah Bog-Hubert: You’ve been over in your Tate’s Hell bog cooking stuff to long. Yes, it’s Abbé Boulah’s new project. He’s gotten another abandoned oil rig for it. But this one has prison inmates on it, working on a new kind of ‘community service’ to try to get reduced sentences.

Doesn’t surprise me. Abbé Boulah again. Prison inmates? What kind of community service — there’s no community out there? And why secret?

Patience. Remember Abbé Boulah and his friend up in town working on that global planning discourse project?

Of course:  I was working on that one too.

Oh yes, I forgot. Well, one day, here, Abbé Boulah was talking about it with a guy who turned to to be a bit of a planning discourse-cum-argument-evaluation-sceptic — a ‘NAA’ —’never argue-arguer’, Abbé Boulah calls them. This one thought it’d be impossible to get any serious mainstream company to write the programs for that kind of public platform, and funding the implementation even for small local prototypes. So Abbé Boulah sat there fuming for a while, using up a good part of my Sonoma Zinfandel supply, and came up with this idea to get imprisoned hackers to work on the project. You know, some of those brilliant computer freaks who were caught showing humanity how naively vulnerable our precious IT systems have become.

Brilliant guys, eh — but not brilliant enough to avoid getting caught?

Well, turns out some of them were sold out by their peer hackers. You know there’s fierce and unfair competition even in that murky kingdom too. And I think the FBI has hired some such experts…

But hey, sounds like an interesting idea?

We’ll see how it works out. Anyway, he got some judges convinced that incarcerating these people at great expense to society is a sinful waste of brilliant minds and public money, and to set up a program to offer these guys reductions of their sentences if they worked on writing the programs needed for this project. And similar projects. So Abbé Boulah got a friend of his — a brilliant fellow, once a student of his buddy up in town — who’s been busy getting people whose lives have been disrupted by all the stupid wars in the Middle East to learn programming to get well-paying jobs — to rehab another abandoned rig. Where these people can be kept safely to work on that project.

Hmm. Sounds a little like putting the fox to work on guarding the hen-house, though?

Well, the things they come up with will be thoroughly tested, of course.

Tested, how?

Easy. They put separate hacker or hacker teams to work on trying to hack the system designs. Promising those guys rewards — three years of life off, if they can break the competition’s system… And vice versa. Anyway, it’s an inexpensive way to get those programs written, putting those minds to productive use, work no other company wants to do. And possibly getting those people not only a chance of keeping their skills honed but also a better chance of rehabilitated legitimate existence once released.

So Abbé Boulah is out there on that rig now, is that what you are saying?

Not sure. He’s a difficult fellow to keep track of. Of course somebody has to tell those guys what the platform is supposed to do. And he’s getting some sailing and fishing in on his breaks…

I knew it. Having a great time doing interesting stuff… I’ll drink to that.

Cheers!

—o—