Archive for February, 2020

EVALUATION IN THE PLANNING DISCOURSE — AI SUPPORT OF EVALUATION IN PLANNING

Part of a series of  issues to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann, February 2020.

The necessity of information technology assistance

A planning discourse support platform aiming at accommodating projects that cannot be handled by small F2F ‘teams’ or deliberation bodies, must use current (or yet-to-be developed) advanced information technology, if only just to handle communication. The examination of evaluation tasks in such large project discourse, so far, also has shown that serious, thorough deliberation and evaluation can become so complex that information technology assistance for many tasks will seem unavoidable, whether in form of simple data management or more sophisticated ‘artificial intelligence‘.

So the question arises what role advanced Artificial or Augmented Intelligence tools might play in such a platform. A first cursory examination will begin by surveying the simpler data management (‘house-keeping’) aspects that have no direct bearing on actual ‘intelligence’ or ‘reasoning’ and evaluation in planning thinking, and then exploring possible expansion of the material being assembled and sorted, into the intelligence assistance realm. It will be important to remain alert to the concern of where the line between assistance to human reasoning and substituting machine calculation results for human judgment should be drawn.

‘House-keeping’ tasks

a. File maintenance. A first ‘simple’ data management task will of course be to gather and store the contributions to the discourse, for record-keeping, retrieval and reference. This will apply to all entries, in their ‘verbatim‘ form, most of which will be in conversational language. They may be stored in simple chronological order as they are entered, with date and author information. A separate file will keep track of authors and cross-reference them with entries and other actions. A log of activities may also be needed.

b. ‘Ordered’, or ‘formatted’ files. For a meaningfully orchestrated evaluation in the discourse, it will be necessary to check for and eliminate duplication of essential the same information, to sort the entries, for example according to issues, proposals, arguments, factual information, — perhaps already in some formatted manner — and to keep the resulting files updated. This may already involve some formatting of the content of ‘verbatim’ entries.

c.  Preparation of displays, for overview. This will involve displays of ‘candidates’ for
decision, the resulting agenda of accepted candidates; ‘issue maps’ of the evolving discussion, evaluation and decision results and statistics.

d. Preparation of evaluation worksheets.

e. Tabulating, aggregating evaluation results for statistics and displays.

‘Analysis’ tasks, examples

f. Translation. Verbatim entries submitted in different languages and their formatted ‘content’ will have to be translated into the languages of all participants. Also, entries expressed in ‘discipline jargon’ will have to be translated into conversational language.

g. Entries will have to be checked for duplication of essential identical content, expressed in different words (to avoid counting the same content twice in evaluation procedures).

h. Standard information search (‘googling’) for available pertinent information already
documented by existing research, data bases, case studies etc. This will require the selection of search terms, and the assessment of relevance of found items, then entered into as separate section of the ‘verbatim’ file.

i. Entered items (verbal contributions and researched material) will have to be formatted for evaluation; arguments with unstated (‘taken for granted’) premises must be completed with all premises stated explicitly; evaluation aspects, sub-aspects etc must be ordered into coherent ‘aspect trees’.  (Optional: Information claims found in searches may be combined to form ‘new’ arguments that have not been made by human participants).

j. Identifying argument patterns (inference rules) of arguments, and checked (to alert participants for validity problems and contradictions)

k. Normalization of weight assignments, aggregation of judgments and arguments and display if different aggregation result (different aggregation functions) as well as their effect on different decision criteria will have to be prepared and displayed.

l. More sophisticated support examples would be the development of systems models of the ‘system’ at hand, (for example, constructing cause-effect connections and loops for the factual-instrumental premises in arguments) to predict performance of proposed solutions, to simulate the behavior of the resulting system in its environment over time.

The boundary between human and machine judgments

It should be clear from preceding sections that general algorithms should not be used to generate evaluative judgments (unless there are criteria expressed in regulations, laws, or norms, to expressly substitute for human judgment.) Any calculated statistics of participant judgments should be clearly identified as ‘statistics’ of individuals’ judgments, not as ‘group judgments’. The boundary issue may be illustrated with the examination of the idea of complete ‘objectification’ or explanation of a person’s basis of judgment, with the ‘formal evaluation’ process explained in that segment. Complete description of judgment basis would require description of criterion functions for all aspect judgments, the weighting of all aspects and sub-aspects etc., and the estimates of plausibility (probability) for a plan to meet the performance expectations involved. This would allow a person A to make judgments on behalf of another person B, while not necessarily sharing B’s basis of judgment. Imagining a computer doing the same thing is meaningful only if all those values of B’s judgment basis can be given to the computer. The judgments would then be ‘deliberated’ and fully explained (not necessarily justified or mandatory for all to share).

In practice, doing that even for another person is too cumbersome to be realistic. People usually shortcut such complete objectification, making decisions with ‘offhand’ intuitive judgments — that they do not or cannot explain. That step cannot be performed by a machine, by definition: the machine must base its simulation of our judgment basis on some explanation. (Admittedly, It could be simulating the human equivalent of tossing a coin: randomly, though most humans would resent describing their intuitive judgments to be called ‘random’). And vague reference is usually made to ‘common sense’ or otherwise societally accepted values, obscuring and sidestepping the problem of dealing with the reality of significantly different values and opinions.

Where would the machine get the information for making such judgments if not from a human? Any algorithm for this would be written by a human programmer, including the specifics for obtaining the ‘factual’ information needed to develop even the most crude criterion function. A common AI argument would be that the machine can be designed to observe (gather the needed factual information) and ‘learn’ to assemble a basis of judgment, for measurable and predictable objectives such as ‘growth’ or stability (survival) of the system. The trouble is that the ‘facts’ involved in evaluating the performance and advisability of plans are not ‘facts’ at all:  They are estimates, predictions of future facts, so they cannot be ‘observed’ but must be extrapolated from past observations by means of some program. And we can deceive ourselves to accept information about the desirability of ‘ought’ or ‘goodness aspects of a plan as ‘factual’ data only by looking at statistics, (also extrapolated into the future) or legal requirements — that must have been adopted by some human agent or agency.

To be sure: these observations are not intended to dismiss the usefulness of AI (that should be called augmented intelligence) for the planning discourse. They are trying to call attention to the question of where to draw the boundary between human and machine ‘judgment’. Ignoring this issue can easily lead to development of processes in which machine ‘judgment’ — presented to the public as non-partisan, ‘objective’, and therefore more ‘correct’ than human decisions, but inevitably programmed to represent some party’s intentions and values — can become sources of serious mistakes, and tools of oppression. This brief sketch can only serve as encouragement to more thorough discussion.


— o —

EVALUATION IN THE PLANNING DISCOURSE — THE DIMINISHING PLAUSIBILITY PARADOX

Thorbjørn Mann,  February 2020

THE DIMINISHING PLAUSIBILITY PARADOX

Does thorough deliberation increase or decrease confidence in the decision?

There is a curious effect of careful evaluation and deliberation that may appear paradoxical to people involved in planning decision-making, who expect such efforts to lead to greater certainty and confidence in the validity of their decisions. There are even consulting approaches that derive measures of such confidence from the ‘breadth’ and ‘depth’ achieved in the discourse.

The effect is the observation that with well-intentioned, honest effort to give due consideration and even systematic evaluation  to all concerns — as expressed e.g. by the pros and cons of proposed plans perceived by affected and experienced people, –, the degree of certainty or plausibility for a proposed plan actually seems to decrease, or move towards a central ‘don’t know’ point on a +1 to -1 plausibility scale. Specifically: The more carefully breadth (meaning coverage the entire range of all aspects or concerns) and depth (understood as the thorough examination of the support — evidence and supporting arguments — of the premises of each ‘pro’ and ‘con’ argument) are evaluated, the more the degree of confidence felt by evaluators moves from initial high support (or opposition) towards the central point ‘zero’  on the scale, meaning ‘don’t know; can’t decide’.

This is of course, the opposite of what the advice to ‘carefully evaluate the pros and cons’ seem to promise, and what approaches striving for breadth and depth actually appear to achieve. This creates a suspicion that either the method for measuring the plausibility of all the pros and cons must be faulty, or that the approaches relying on the degree of breadth and depth directly as equivalent to greater support are making mistakes. So it seems necessary to take a closer a look at this apparently counterintuitive phenomenon.

The effect has first been observed in the course of the review for a journal publication of an article on the structure and evaluation of planning arguments [1] — several reviewers pointed out what they thought must be a flawed method of calculation.

Explanation of the effect

The crucial steps of the method (also explained in the section on planning argument assessment) are the following:

– All pro and con arguments are converted from their often incomplete, missing- premises state to the complete pattern explicitly stating all premises, (e.g. “Yes, adopt plan A because 1) A will lead to effect B given conditions C, and 2) B ought to be aimed for, and 3) conditions C will be present”).

– Each participant will assign plausibility judgments to each premise, on the +1 /-1 scale where the +1 stands for complete certainty or plausibility, the -1 for complete certainty that the claim is not true, or totally implausible (in the judgment of the individual participant), and the center point of zero expressing inability to judge”don’t know; can’t decide’. Since in the planning argument, all premises are estimates or expectations of future states — effects of the plan, applicability of the causal rule that connects future effects or ‘consequences’ with actions of the plan, and the desirability or undesirability of those consequences, complete certainty assessments (pl = +1, or -1) for the premises must be considered unreasonable; so all the plausibility values will be somewhere between those extremes.

– Deriving a plausibility value for the entire argument from these plausibility judgments can be done in different ways: The extreme being to assign the lowest premise plausibility judgment prempl to the entire argument, expressing an attitude like ‘the strength of a chain is equal to the strength of its weakest link’. Or the plausibility values can be multiplied:  The Argument plausibility: for argument i 

            Argpl(i) =  (prempl(i,j))  for all premises j of argument i

Either way, the resulting argument plausibility cannot be higher than the premise plausibilities.

– SInce arguments do not carry the same ‘weight’ in determining the overall plausibility judgment, it is necessary to assign some weight factor to each argument plausibility judgment. That weight will depend on the relative importance of the ‘deontic’ (ought) premises; and approximately expressed by assigning each of the deontic claims in all the arguments a weight between zero and +1, such that all the weights add up to +1. So the weight of argument i will be the plausibility of argument i times the weight of its deontic premises: Argw(i) = Argpl(i) x w(i)

– A plausibility value for the entire plan, will have to be calculated from all the argument weights. Again, there are different ways to do that (discussed in the section of aggregation) but an aggregation function such as adding all the argument weights (as derived by the preceding steps) will yield a plan plausibility value on the same scale as the initial premise and argument plausibility judgments. It will also be the result of considering all the arguments, both pro and con; and since the argument weights of arguments considered ‘con’ arguments in the view of individual participants will be subtracted from the summed-up weight of ‘pro’ arguments, it will be nowhere near the complete certainty value of +1 or -1, unless of course the process revealed that there were no arguments carrying any weight at all on the pro or con side. Which is unlikely since e.g. all plans have been conceived from some expectation of generating some benefit, and will carry some cost or effort, etc.

This approach as described thus far can be considered a ‘breadth-only’ assessment, justly so if there is no effort to examine the degree of support of premises. But of course the same reasoning can be applied to any of the premises — to any degree of ‘depth’ as demanded by participants from each other. The effect of overall plan plausibility tending toward the center point of zero (‘don’t know’ or ‘undecided’), compared with initial offhand convincing ‘yes: apply the plan!) or ‘no- reject!’ reactions will be the same — unless there are completely ‘principle’-based or ‘logical or physical ‘impossibility’ considerations, in plans that arguably should not even have reached the stage of collective decision-making.

Explanation of the opposite effect in ‘breadth/depth’ based approaches

So what distinguishes this method from approaches that claim to use degrees of ‘breadth and depth’ deliberation as measures justifying the resulting plan decisions? And in the process, increases the team’s confidence in the ‘rightness’ of their decision?

One obvious difference — that must be considered a definite flaw,– is that the degree of deliberation, measured by the mere number of comments, arguments, of ‘breadth’ or ‘depth’, does not include assessment of the plausibility (positive or negative) of the claims involved, nor of their weights of relative importance. Just having talked about the number of considerations, without that distinction, cannot already be a valid basis for decisions, even if Popper’s advice about the degree of confidence in scientific hypotheses we are entitled to hold is not considered applicable to design and planning. (“We are entitled to tentatively accept a hypothesis to the extent we have given our best effort to test, to refute it, and it has withstood all those tests”…)

Sure, we don’t have ‘tests’ that definitively refute a hypothesis (or ‘null hypothesis’) that we have to apply as best we can, and planning decisions don’t rest or fall on the strength of single arguments or hypotheses. All we have are arguments explaining our expectations, speculations about the future resulting from our planning actions — but we can adapt Popper’s advice to planning: “We can accept a plan as tentatively justified to the extent we have tried our best to expose it to counterarguments (con’s) and have seen that those arguments are either flawed (not sufficiently plausible) or outweighed by the arguments in its favor.”

And if we do this, honestly admitting that we really can’t be very certain about all the claims that go into the arguments, pro or con, and look at how all those uncertainties come together in totaling up the overall plausibility of the plan, the tendency of that plausibility to go towards the center point of the scale looks more reasonable.

Could these consideration be the key to understand why approaches relying on mere breadth and depth measurements may result in increased confidence of the participants in such projects? There are two kinds of extreme situations in which it is likely that even extensive breadth and depth discussions can ignore or marginalize one side or the other of necessary ‘pro’ or ‘con’ arguments.

One is the typical ‘problem-solving’ team assembled for the purpose of developing a ‘solution’ or recommendation. The enthusiasm of the collective creative effort itself (but possibly also the often invoked ‘positive’ thinking, defer judgment so as to not disrupt the creative momentum, as well a the expectation of a ‘consensus’ decision?) may focus the thinking of team members on ‘pro’ arguments, justifying the emerging plan — but neglecting or diverting attention from counterarguments. Finding sufficient good reasons for the plan being enough to make a decision?

An opposite type of situation is the ‘protest’ demonstration, or events arranged for the express purpose of opposing a plan. Disgruntled citizens outraged by how a big project will change their neighborhood: counting up all the damaging effects: Must we not assume that there will be a strong focus on highlighting the plan’s negative effects or potential consequences: assembling a strong enough ‘case’ to reject it? In both cases, there may be considerable and even reasonable deliberation in breadth and depth involved — but also possible bias due to neglect of the other side’s arguments.

Implications of the possibility of decreasing plan plausibility?

So pending some more research into this phenomenon, — if found to be common enough to worry about, — it may be useful to look at what it means: what adjustments to common practice it would suggest, what ‘side-stepping’ stratagems may have evolved due to the mere sentiment that more deliberation might shake any undue, undeserved expectations in a plan. Otherwise, cynical observers might recommend throwing up our arms and leaving the decision to the wisdom of ‘leaders’ of one kind or another, in the extreme to oracle-like devices — artificial intelligence from algorithms whose rationales remain as unintelligible to the lay person as the medieval ‘divine judgment’ validated by mysterious rituals (but otherwise amounting to tossing coins?).

Besides the above-mentioned research into the question, examining common approaches on the consulting market for potential vulnerability to provisions to overplay the tendency would be one first step. For example, adding plausibility assessment to the approaches using depth and breadth criteria would be necessary to make them more meaningful.

The introduction of more citizen participation into the public planning process is an increasingly common move that has been urged — among other undeniable advantages such as getting better information about how problems and the plans proposed to solve them actually affect people — to also make plans more acceptable to the public because the plans then are felt to be more ‘their own’. As such, could this make the process vulnerable to the above first fallacy of overlooking negative features? If so, the same remedy of actually including more systematic evaluation into the process might be considered.

A common temptation by promoters of ‘big’ plans can’t be overlooked: to resort to ‘big’ arguments that are so difficult to evaluate that made-up ‘supporting’ evidence can’t be distinguished from predictions based on better data and analysis (following Machiavelli’s quip about ‘the bigger the lie, the more likely people will buy it’…). Many people already are suggesting that we should return to smaller (local) governance entities that can’t offer big lies.

Again: this issue calls for more research.

[1]   “The Structure and Evaluation of Planning Arguments”  Thorbjoern Mann, INFORMAL LOGIC  Dec. 2010.

— o —

EVALUATION IN THE PLANNING DISCOURSE — PROCEDURAL AGREEMENTS

An effort to clarify the role of deliberative evaluation in the planning and policy-making process.  Thorbjørn Mann,  February 2020

PROCEDURAL AGREEMENTS FOR EVALUATION

The need for procedural agreements

Any group, team or assembly having decided to embark upon a common evaluation / deliberation task aimed at a recommendation or decision about a plan, will have to adopt a set of agreements about the procedure to be followed, explicitly or implicitly. These rules can become quite detailed and complicated. Even the familiar ‘rules of order’ of standard parliamentary procedure, aiming at simple yea/nay decisions on ‘motions’ for the assembly to accept or reject, will become book-length guides (like ‘Robert’s Rules of Order’) that the chairpersons of such processes may have to consult when disputes arise. For simplified versions based on the expected simplicity of ending the discussions with a majority vote, and citizens’ familiarity with basic rules, agreements can even be tacitly taken for granted, without recourse to written guides. However, this no longer applies when the decision-making body engages in more detailed and systematic deliberation aiming at making the decisions more transparently justified by the evaluative judgments made on the comments in the discourse.

General overall agreements versus procedures for ‘special techniques’

This could be seen as a call for a general procedure that includes the necessary procedural rules, as an extension of the familiar parliamentary procedure. Would such a one-size-fits-all solution be appropriate? As the preceding sections of this study show, we now see not only a great variety of different evaluation tasks and context situations, but also a variety of different ‘approaches’ for such processes now on the ‘market’ — especially as they are assisted by new technology. Each one comes with different assumptions about the rules or ‘procedural agreements’ guiding the process. So it seems that the question is less one of developing and adopting one general-purpose pattern, than one of providing a ‘toolkit’ of different approaches that the participants in a planning process could choose from as the task at hand requires. That opportunity-step for choice must be embedded in a general and flexible overall process, than participants either would be familiar with already, or able to easily learn and agree to.

Once a special technique is selected, as decided by the group, its procedural steps and decision rules should then be explicitly agreed upon at the very beginning of the specific process — the more so, the ‘newer’ the approach, tools and techniques — so as to avoid disruption of the actual deliberation by disagreements about procedure later on. Such quibbles could easily become quite destructive and polarizing, and even their in-process resolution can introduce significant bias into the actual assessment work itself. It may be necessary to change some rules, as the participants learn more about the nature of the problem at hand. That process should be governed by rules set out in the initial agreements: A provision such as the ‘Next step’ proposed in the process for the overall planning discourse platform would offer that opportunity. [See ‘PDSS-REVISED’).

This seemingly matter-of-course step can become controversial because different ‘special techniques’ may involve different concepts and corresponding vocabulary to be used: even ‘systems’ approaches of different ‘generations’ are likely to use different labels for essentially the same things, which can result in miscommunication and misunderstanding or worse. New techniques and tools may require different responsibilities, behavior, decision modes, replacing rules still taken for granted: must new agreements be set ‘upfront’ to prevent later conflicts?

The main agreements — possibly different rules for different project types — then will cover the basic procedural steps, the ‘stopping rules’ for deciding when a decision can be said to have been accepted (since one of the key properties of ‘wicked problems’ is that there is nothing in the nature of the problem itself that tell problem-solvers that a solution has been reached and the the work can stop); decision criteria and modes according to which this should be done. For the details of the evaluation part itself, the kinds of judgments and judgment scales will have to be agreed upon, — so that e.g. a judgment score will have the same meaning for all participants. (These issues will be addressed in separate sections).

An argument can be made that efforts should made to preserve consistency between the overall approach and its frame of reference and vocabulary, and any ‘special techniques’ for evaluation within that process along the way.

Doing without cumbersome procedural rules?

There will be attempts to escape procedures felt to be too ‘cumbersome’ or bureaucratic, with an easier route to a decision. Majority voting itself can be seen as such an escape. Even easier are decision criteria such as ‘consent’ — declared, for example, by the chair that there are ‘no more objections’ combined with ‘time’s up’ — which may indicate that the congregation has become exhausted, rather than convinced of the advantages of a proposed plan, or dissuaded from voicing more ‘critical’ questions. But aren’t the conditions leading to ‘consent’ outcomes in some approaches — group size, seating arrangements, sequences of steps and phases — themselves procedural provisions?

Examples of aspects calling for agreements

Examples of different procedural agreements are the above-mentioned ‘rules of order’, the steps for determining the ‘Benefit/Cost Ratio’ of plans; provisions for ‘formal evaluation’ process of the ‘quality’ of a proposed plan or for the evaluation of a set of alternative proposals; agreements needed for evaluating the plausibility of a plan by systematic assessment of argument plausibility; the guides for a ‘Pattern Language’ approach to planning. (Some of these will be described in separate segments).

The procedural agreements cover aspects such as the following:
– The conceptual frame of reference and its vocabulary and corresponding techniques and displays;
– Proper ‘etiquette’ and behavior
The process steps (sequence), participant rights and responsibilities;
Formatting of entries as needed for evaluation;
– For the evaluation tasks: judgment scales and units, the meaning of the scores;
– The aggregation functions to be used to derive overall judgments from partial judgment scores and from individual participant scores to ‘group’ statistics and decision rules;
– Decision criteria and decision modes;
– The stopping rule(s) for the process.

Specific agreements for different evaluation ‘approaches’ and special techniques must then be discussed in the sections describing those methods.


–o–

Eerily erring electioneering?

In the Fog Island Tavern on a dreary day in February:

– You look worried, Bog-Hubert: What’s bugging you today?
– Oh boy. I never thought I’d see Abbé Boulah getting worked up over politics, but let me tell you, Vodçek: this election is getting to him.
– Really? I thought he’d written off this whole voting business long ago, as a totally misguided crutch to bring any political or planning discourse to a meaningful decision?
– Yeah, he keeps working on his schemes to improve that. But you should have heard him this morning — you’d think he’s still training hard for his old pet project to get endurance cussing accepted as a new Olympic discipline —
– So what is it that’s getting riled up on this one now?
– Well, I think he’s mainly disappointed in the candidates’ apparent inability to learn from past mistakes, and to focus on what’s really important. For example, this business about starting to discredit the current front runner, because he’s too, shall we say, unorthodox for the party establishment.
– What’s wrong with that? It’s politics, isn’t it?
– Fulminating stinkbomb-bundles and moccasin-mouth-ridden swamp-weed kudzu tangles: you too, now?
– Oh Bog-Hubert: excellent — you’re shooting for a medal in that sport too?
– By all the overgrown rusty Dodge truck skeletons in my cousin’s front yard: Don’t you, don’t they get it?
– Get what? it’s BAU politics. So, care to explain?
– Well, isn’t it obvious: Rather than tearing each other apart, shouldn’t they try to figure out what it is that makes the frontrunner’s — and the opposition’s message more appealing to those voters they want to convince to vote for them, and come up with a b e t t e r message, a more appealing and convincing vision?? Because that strategy is bound to come back and kick’em in the youknowwhat…
– Hmm. I see what you mean, by Abbe Boulah’s drooping mustache! And It’s giving the opposition free stinkbombs to launch at whoever ends up being the nominee…
– Yeah. And not only that: What if part of the problem is precisely that old habit of the old swamp establishment — of both parties — that those disgruntled voters are getting tired of? And that’s the rusty musket the establishment keeps shooting itself in the foot with?
– I can see why this upsets our friend. The futility of the hope that they’ll ever learn, I mean. Let’s try to get him back to work on those better ways he’s working on…
– I’ll drink to that. Do they make a decent grappa from Sonoma grapes?

— o —

On the style of government architecture

Thorbjørn Mann, February 2020

The current administration of the U.S.  Federal Government has proposed that buildings for federal government use should be designed in the ‘classical’ style of ancient Greek and Roman architecture; this has led to some passionate objections, e.g. from the American Institute of Architects.

Both the desire to get some general rules for designing government (at least ‘federal’) architecture and to the particular choice of style, as well as the reaction to that government move, are understandable, though the rationale for both deserve some discussion.

In traditional societies, it was almost a matter of course that buildings were designed in a way that made them recognizable as to their role or function or purpose: A house (for living in) was a house, distinct from the barn or the stable or the storehouse, a church, a temple or synagogue or mosque were recognizable as what they were even to children, a store was a store, and a government building was a government building — a city hall, a ruler’s palace. Even in societies changed by the industrial revolution, a factory or a railway station were recognizable to the citizens as what they were and what they were for.

For government buildings, the design or style carried additional expectations: what kind of government, what kind of societal order did they represent? At one time, a ruler would live in a fortress — ostensibly for protection from exterior enemies, but as a convenient side-effect also protection from the ruler’s own subjects who didn’t like the taxes and what he used them for, or other edicts. More ‘democratic’ or ‘republican’ governance systems favored more ‘civil’ connotations, say, like a ‘marketplace of ideas’ for how to run their lives; the issue of designing suitable places that told the governance folks that they were ‘servants of the people’ but also told visitors how great their cities or nations were, became a delicate challenge. This also affected the design of residences of oligarchs who ‘ran’ government from their own palaces, but wished to insist on the right to do so by their wealth and erudition and good taste. (1) Their administrations — bureaucracies — could no longer use the fortress symbols to keep the citizenry in line, but architects helped the rulers to find other means to do that; the sheer size and complexity of rule-based designs of administrative institutions were intimidating, sorry ‘inspiring’ enough?

That clarity and comprehensibility of buildings has been lost in recent architecture: We see many kinds of clients, governmental and commercial and in-between institutions trying to impress the public and each other by means of size and novelty supplied by architectural creativity with their buildings. This is leading to a ‘diversity’ of the public visual environment that many find refreshing and interesting but others are beginning to resent as disturbing and boring, since as a whole it expresses a different kind of uninspiring uniformity of common desire to impress: by means of size (who’s got the tallest building and most brilliant plumage?) of ‘different’ signature architecture. Coming across as more puerile than ‘inspiring’: is that who we are as a society?

So the question of whether at least some clear distinction between governmental architecture and other buildings should be re-established, is not an entirely meaningless one. But insisting that the issue should be the sole domain of architects to decide rather than the government is also missing just that point: what is it that architecture tells us about who we — and our government — are, or ought to be? Just big and impressively ‘imperial’ — like the Roman or other empires that ended up collapsing under their own weight and corruption that all the marble couldn’t hide? The ‘inspiration’ being mainly the same kind of puerile awe of its sheer power but also — and not just incidentally: fear? What is the kind of architecture that would inspire us to cooperate, through our government, towards a more ‘perfect’ just, free, creative but kind and peaceful society?

Part of the problem is that we do not have a good forum for the discussion of these issues. The government itself, in most countries, has lost the standing of being that forum, for various reasons. The forms of ‘classical’ architecture won’t bring it back — they have too easily been adopted by commercial and other building clients: the example of an insane asylum with a classical portico, an old standard joke in architecture schools that advocated more modern styles, is beginning to give us a new chilling feeling… So where: Books? Movies? TV? Ah: Twitter? Is that who we are? Just asking…

(1) I have written about this issue (under the heading of the role of ‘occasion’ and ‘image’ in the built environment) in some articles and book; using the example of government architecture in Renaissance Florence, (where we can see buildings showing the dramatic evolution of the image of government in close proximity) and about the forum for discussion of public policy. I consider the design and organization of that ‘forum’ one of the urgent challenges of our time.

EVALUATION IN THE PLANNING DISCOURSE — TIME AND EVALUATION OF PLANS

An effort to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann, February 2020

TIME AND EVALUATION OF PLANS  (Draft, for discussion)

Inadequate attention to time in current common assessment approaches

Considering that evaluation of plans (especially ‘strategic’ plans) and policy proposals, by their very nature are concerned with the future, it is curious that the role of time has not received more attention, even with the development of simulation techniques that aim at tracking the behavior of key variables of systems over many years into the future. The neglect of this question, for example in the education or architects, can be seen in the practice of judging students’ design project presentations on the basis of their drawings and models.

The exceptions — for example in building and engineering economics — are looking at very few performance variables, with quite sophisticated techniques: expected cost of building projects, ‘life cycle cost’, return on investment etc., — to be put into relation to expected revenues and profit. Techniques such as ‘Benefit/Cost Analysis‘, which in its simplest form considers those variables as realized immediately upon implementation, also can apply this kind of analysis to forecasting costs and benefits and comparing them over time by methods for converting initial amounts (of money) to ‘annualized’ or future equivalents, or vice versa.

Criticism of such approaches amount to pointing out problems such as having to convert ‘intangible’ performance aspects (like public health, satisfaction, loss of lives) into money amounts to be compared, (raising serious ethical questions) for entities like nations, where the money amounts drawn from or entering the national budget hide controversies such as inequities in the distribution of the costs and benefits. Looking at the issue from the point of view of other evaluation approaches might at least identify the challenges in the consideration of time in the assessment of plans, and help guide the development of better tools.

A first point to be pointed out is that from the perspective of the formal evaluation process, for example, (See e.g. the previous section on the Musso/Rittel approach), measures like present value of future cost or profit, or benefit-cost ratio must be considered ‘criteria’ (measures of performance) for more general evaluation aspects, for among a set of (goodness) evaluation aspects that each evaluator must be weighted for their relative importance, to make up overall ‘goodness’ or quality judgments. (See the segments on evaluation judgments, criteria and criterion functions, and aggregation.) And as such, the use of these measures as decision criteria must be considered incomplete and inappropriate. However, in those approaches, the time factor is usually not treated with even the attention expressed in the above tools for discounting future costs and benefits to comparable present worth: For example, pro or con arguments in a live verbal discussion about expected economic performance often amount to mere qualitative comparisons or claims like ‘over the budget’ or ‘more expensive in the long run’. 

Finally, in approaches such as the Pattern language, (which makes valuable observations about ‘timeless’ quality of built environments, but does not consider explicit evaluation a necessary part of the process of generating such environments), there is no mention or discussion of how time considerations might influence decisions: the quality of designs is guaranteed by having been generated by the use of patterns, but the efforts to describe that quality do not include consideration of effects of solutions over time.

Time aspects calling for attention in planning

Assessments of undesirable present or future states ‘if nothing is done’

The implementation of a plan is expected to bring about changes in the state of affairs that is felt to be ‘problems’ — things not being as they ought to be, or ‘challenges’,‘opportunities’ calling for better, improved states of affairs. Many plans and policies aim at preventing future developments to occur, either as distinctly ‘sudden’ events or development over time. Obviously, the degree of undesirability depends on the expected severity of these developments; they are matters of degree that must be predicted in order for the plan’s effectiveness to be judged.

The knowledge that goes into the estimates of future change comes from experience: observation of the pattern and rate of change in the past, (even if that knowledge is taken to be well enough established to be considered a ‘law’). But not all such change tracks have been well enough observed and recorded in the past, so much estimate and judgment goes into the assumptions already about the changes over time in the past.

Individual assessments of future plan performance

Our forecasts for future changes ‘if nothing is done’, resting on such shaky past knowledge must be considered less that 100% reliable. Should our confidence in the application of that knowledge to estimates of a plan’s future ‘performance‘ then not be be acknowledged as equal (at best) or arguably less certain — expressed as deserving a lower ‘plausibility’ qualifier? This would be expressed, for example, with the pl — plausibility — judgment for the relationship claimed in the factual-instrumental premise of an argument about the desirability of the plan effects: “Plan A will result (by virtue of the law or causal relationship R) in producing effect B”.

This argument should be (but is often not) qualified by adding the assumption ‘given the conditions C under which the relationship R will hold’: the conditions which the third (factual claim) premise of the ‘standard planning argument’ claims is — or will be — ‘given’.

Note: ‘Will be’: since the plan will be implemented in the future, this premise also involves a prediction. And to the extent the condition is not a stable, unchanging one but also a changing, evolving phenomenon, the degree of the desirable or undesirable effect B must be expected to change. And, to make things even more interesting and complex: as explained in the sections on argument assessment and systems modeling: the ‘condition’ is never adequately described by a single variable, but actually represents the  evolving state of the entire ‘system’ in which the plan will intervene.

This means that when two people exchange their assumptions and judgments, opinions, about the effectiveness of the plan by citing its effect on B, they may likely have very different degrees (or performance measures in mind, occurring under very different assumptions about both R and C, — at different times.

Things become more fuzzy when the likelihood is considered that the desired or undesired effects are not expected to change things overnight, but gradually, over time. So how should we make evaluation judgments about competing plan alternatives, when, for example, one plan promises rapid improvement soon after implementation, (as measured by one criterion), but then slowing down or even start declining, while the other will improve at a much slower but more consistent rate? A mutually consistent evaluation must be based on agreed-upon measures of performance: measured at what future time? Over what future time period, aka ‘planning horizon’? This question will just apply to the prediction of the performance criterion — what about the plausibility and weight of importance judgments we need to offer complete explanation of our judgment base?  Is it enough to apply the same plausibility factor to forecasts of trends decades in the future, as the one we use for near future predictions? As discussed in the segment on criteria, the crisp fine forecast lines we see in simulation printouts are misleading: the line should really be a fuzzy track widening more and more, the farther out in time it extends?  Likewise: is it meaningful to use the same weight of relative importance for the assessment of effects at different times?

These considerations apply, so far, only to the explanation of individual judgments, and already show that it would be almost impossible to construct meaningful criterion functions and aggregation functions to get adequately ‘objectified’ overall deliberated judgment scores for individual participants in evaluation procedures.

Aggregation issues for group judgment indicators

The time-assessment difficulties described for individual judgments do not diminish in the task of construction decision guides for groups, based on the results of individual judgment scores. Reminder: to meet the ideal ‘democratic’ expectation that the community decision about a plan should be based on due consideration of ‘all’ concerns expressed by ‘all’ affected parties, the guiding indicator (‘decision guide’ or criterion) should be an appropriate aggregation statistic of all individual overall judgments. The above considerations show, to put it mildly, that it would be difficult enough to aggregate individual judgments into overall judgment scores, but even more so to construct group indicators that are based on the same assumptions about the time qualifiers entering the assessments.

This makes it understandable (but not excusable) why decision-makers in practice tend to either screen out the uncomfortable questions about time in their judgments, or resort to vague ‘goals’ measured by vague criteria to be achieved within arbitrary time periods: “Carbon-emission neutrality by 2050”, for example: How to choose between different plan or policies whose performance simulation forecasts do not promise 100% achievement of the goal, but only ‘approximations’ with different interim performance tracks, at different costs and other side-effects in society? But 2050 is far enough in the future to ensure that none of the decision-makers for today’s plans will be held responsible for today’s decisions…

“Conclusions’ ?

The term ‘conclusion’ is obviously inappropriate if referring to expected answers to the questions discussed. These issues have just been raised, not resolved; which means that more research, experiments, discussion is called for to find better answers and tools. For the time being, the best recommendation that can be drawn from this brief exploration is that the decision-makers for today’s plans should routinely be alerted to these difficulties before making decisions, carry out the ‘objectification’ process for the concerns expressed in the discourse (of course: facilitating discourse with wide participation adequate to the severity of the challenge of the project), and then admit that any high degree of ‘certainty‘ for proposed decisions is not justified. Decisions about ‘wicked problems’ are more like ‘gambles’ for which responsibility, ‘accountability’ must be assumed. If official decision-makers cannot assume that responsibility — as expressed in ‘paying’ for mistaken decisions, should they seek supporters to share that responsibility?

So far, this kind of talk is just that: mere empty talk, since there is at best only the vague and hardly measurable ‘reputation’ available as the ‘account‘ from which ‘payment‘ can be made — in the next election, or in history books. Which does not prevent reckless mistakes in planning decisions: there should be better means for making the concept of ‘accountability’ more meaningful. (Some suggestions for this are sketched in the sections on the use of ‘discourse contribution credit points’ earned by decision-makers or contributed by supporters from their credit point accounts,and made the required form of ‘investment payment’ for decisions.) The needed research and discussion of these issues will have to consider new connections between the factors involved in evaluation for public planning.


Overview

— o —

EVALUATION IN THE PLANNING DISCOURSE — TARGET AUDIENCE

An effort to clarify the role of deliberative evaluation in the planning and policy-making process.  Thorbjørn Mann,  February 2020

TARGET AUDIENCE


Audience and Distribution: Overview

The target audience for the results of the effort to evaluate the role of evaluation in the planning discourse is admittedly immodestly diverse. While it may be of interest to many participants in the social media groups currently discussing related issues who will be consultants, offering services and tools planning, problem-solving‘ and ‘change management’ to corporate and institutional clients, the focus here will be on public planning, at all levels from small, local communities to national and international and ultimately global challenges. Thus, the issues concern any officials as well as the public involved in planning. But it is especially at the global level of challenges and crises that transcend the boundaries of traditional institutions, that traditional decision-making modes and habits break down or become inapplicable, generating calls for new ideas, approaches and tools. Increased public participation is a common demand.

The planning discourse at all levels will have to include not just traditional planning experts, decision-makers in all institutions faced with the need for collective action, but also the public. New emerging IT tools and procedures must also be applied to the evaluation facet of planning engaging all potentially affected parties, and leadership as well as the public will have to be involved and become familiar and competent with their use. This will call for appropriate means for generating that familiarity: information, education.

Obviously, at present, whatever discussion and presentation tools are chosen for this exploration of evaluation in public planning discourse, they will not be adequate for informing and achieving the aim of developing definitive answers, not even carrying out an effective discussion. It must be seen as just a first step in a more comprehensive strategy. To the extent that meaningful results emerge from this discussion, the issue of how to bring the ideas to a wider audience for general adoption will become part of the agenda. It should include education at all levels, down to general education for all citizens, not only higher levels. Thus, the hope is to reach planners and decision-makers for general education.

The audience that can be reached via such vehicles as this blog, selected social media, and perhaps a book, will be people who have given these issues some thoughts already, that is: ‘experts‘. So any discussion it will incite, will likely involve discipline ‘jargon’ of several kinds. But in view of a desired larger audience, the language should remain as close to conversational as possible and avoid ‘jargon’ too unfamiliar to non-experts. Many valuable research results and ideas are expressed in academic, ‘scientific’, or technical terms that are likely to exclude parties from the discussion that should be invited and included.

Given the wide range of people and institutions involved with planning, the question of ‘target audience’ may be inadequate or incomplete: it should be expanded to look at the best ways for distributing these suggestions. Besides traditional forms of distribution such as books, textbooks, manuals, new forms or media of familiarizing potential users may have to be developed; for example, online games simulating planning projects using new ideas and methods. This aspect of the project is especially in need of ideas and comments.

–o–

EVALUATION IN THE PLANNING DISCOURSE — SYSTEMS THINKING, MODELING AND EVALUATION IN PLANNING

An effort to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann , February 2020. (DRAFT)

SYSTEMS THINKING / MODELING AND EVALUATION IN PLANNING

 

Evaluation and Systems in Planning  — Overview

The contribution of systems perspective and tools to planning.

In just about any discourse about improving approaches to planning and policy-making, there will be claims containing reference to ‘systems’: ‘systems thinking’, ‘systems modeling and simulation’, the need to understand ‘the whole system’, the counterintuitive behavior of systems. Systems thinking as a whole mental framework is described as ‘humanity’s currently best tool for dealing with its problems and challenges. There are by now so many variations, sub-disciplines, approaches and techniques, even definitions of systems and systems approaches on the academic as well as the consulting market, that even a cursory description of this field would become a book-length project.

The focus here is the much narrower issue of the relationship between this ‘systems perspective’ and various evaluation tasks in the planning discourse. This sketch will necessarily be quite general, not doing adequate justice to many specific ‘brands’ of systems theory and practice. However, looking at the subject from the planning / evaluation perspective will identify some significant issues that call for more discussion.

Evaluation judgments at many stages of systems projects and planning

A survey of many ‘systems’ contributions reveals that ‘evaluation’ judgments are made at many stages of projects claiming to take a systems view – like the finding that evaluation takes place at the various stages of planning projects whether explicitly guided by systems views or not. Those judgments are often not even acknowledged as ‘evaluation’, and done by very different patterns of evaluation (as described in the sections exploring the variety of evaluation judgment types and procedures.)

The similar aims of systems thinking and evaluation in planning

Systems practitioners feel that their work contributes well (or ‘better’ than other approaches) to the general aims of planning: such as
– to understand the ‘problem’ that initiates planning efforts;
– to understand the ‘system’ affected by the problem, as well as
– the larger ‘context’ or ‘environment’ system of the project;
– to understand the relationships between the components and agents, especially the ‘loops’ of such relationships that generates the often counterintuitive and complex systems behavior;
– to understand and predict the effects (costs, benefits, risks) and performance of proposed interventions in those systems (‘solution’) over time; both ‘desired’ outcomes and potentially ‘undesirable’ or even unexpected side-and after-effects;
– to help planners develop ‘good’ plan proposals,
– and to reach recommendations and/or decisions about plan proposals that are based on due consideration of all concerns for parties affected by the problem and proposed solutions, and of the merit of ‘all’ the information, contributions, insights and understanding brought into the process.
– To the extent that those decisions and their rationale must be communicated to the community for acceptance, these investigations and judgment processes should be represented in transparent, accountable form.

Judgment in early versus late stages of the process

Looking at these aims, it seems that ‘systems-guided’ projects tend to focus on the ‘early’ information (data) -gathering and ‘understanding’ aspects of planning – more than on the decision-making activities. These ‘early’ activities do involve judgment of many kinds, aiming at understanding ‘reality’ based on the gathering and analysis of facts and data. The validity of these judgments is drawn from standards of what may loosely be called ‘scientific method’ – proper observation, measurement, statistical analysis. There is no doubt that systems modeling, looking at the components of the ‘whole’ system, and the relationships between them, and the development of simulation techniques have greatly improved the degree of understanding both of the problems and the context that generates them, as well as the prediction of proposed effects (performance) of interventions: of ‘solutions’. Less attention seems to be given to the evaluation processes leading up to decisions in the later stages. Some justifications, guiding attitudes, can be distinguished to explain this:

Solution quality versus procedure based legitimatization on of decisions

One attitude, building on the ‘scientific method’ tools applied in the data-gathering and model-building phases, aims at finding ‘optimal’ (ideally, or at least ‘satisficing’) solutions described by performance measures from the models. Sophisticated computer-assisted models and simulations are used to do this; the performance measures (that must be quantifiable, to be calculated) derived from ‘client’ goal statements or from surveys of affected populations, interpreted by the model-building consultants: experts. One the one hand, their expert status is then used to assert validity of results. But on the other hand, increasingly criticized for the lack of transparency to the lay populations affected by problems and plans: questioning the experts’ legitimacy to make judgments ‘on behalf of’ affected parties. If there are differences of opinions, conflicts about model assumptions, these are ‘settled’ – must be settled – by the model builders in order for the programs to yield consistent results.

This practice (that Rittel and other critics called ‘first generation systems approach’) was seen as a superior alternative to traditional ways of generating planning decisions: the discussions in assemblies of people or their representatives, characterized by raising questions and debating the ‘pros and cons’ of proposed solutions – but then making decisions by majority voting or accepting the decisions of designated or self-designated leaders. Both of these decision modes obviously are not meeting all of the postulated expectations in the list above: voting implies dominance of interests of the ‘majority’ and potential disregard on the concerns of the minority; leader’s decisions could lack transparency (much like expert advice) leading to public distrust of the leader’s claim of having given due consideration to ‘all’ concerns affecting people.

There were then some efforts to develop procedures (e.g. formal evaluation procedures) or tools such as the widely used but also widely criticized ‘Benefit-Cost’ analysis tried to extend the ‘calculation based’ development of valid performance measures into the stage of criteria based on the assessment of solution quality to guide decisions. These were not equally widely adopted, for various reasons such as the complicated and burdensome procedures, again requiring experts to facilitate the process but arguably making public participation more difficult. A different path is the tendency to make basic ‘quality’ considerations ‘mandatory’ as regulations and laws, or ‘best practice’ standard. Apart from tending to set ‘minimum’ quality levels as requirement e.g. for building permits, this represents a movement to combine or entirely replace quality-based planning decision-making with decisions that draw their legitimacy from having been generated and following procedures.

This trend is visible both in approaches that specify procedures to generate solutions by using ‘valid’ solution components or features postulated by a theory (or laws): having followed those steps then validates the solution generated removes the necessity to carry out any complicated evaluation procedure. An example of this is Alexander’s ‘Pattern Language’ – though the ‘systems’ aspect is not as prevalent in that approach. Interestingly, that same stratagem is visible in movements that focus on processes aimed at mindsets of groups participating in special events, ‘increasing awareness’ of the nature and complexity of the ‘whole system’ but then rely on solutions ‘emerging’ from the resulting greater awareness and understanding that aim at consensus acceptance in the group for the results generated, that then do not need further examination by more systematic, quantity-focused deliberation procedures. The invoked ‘whole system’ consideration, together with a claimed scientific understanding of the true reality of the situation calling for planning intervention is a part of inducing that acceptance and legitimacy. A telltale feature of these approaches is that debate, argument, and the reasoning scrutiny of supporting evidence involving opposing opinions tends to be avoided or ‘screened out’ in the procedures generating collective ‘swarm’ consensus.

The controversy surrounding the role of ‘subjective’, feeling-based, intuitive judgments versus ‘objective’ measurable, scientific facts (not just opinions) as the proper basis for planning decisions also affects the role of systems thinking contributions to the planning process.

None of the ‘systems’ issues related to evaluation in the planning process can be considered ‘settled’ and needing no further discussion. The very basic ‘systems’ diagrams and models of planning may need to be revised and expanded to address the role and significance of evaluation, as well as argumentation, the assessment of the merit of arguments and other contributions to the discourse, and the development of better decision modes for collective planning decision-making.

–o–