Archive for the 'Design discourse' Category

EVALUATION IN THE PLANNING DISCOURSE — AI SUPPORT OF EVALUATION IN PLANNING

Part of a series of  issues to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann, February 2020.

The necessity of information technology assistance

A planning discourse support platform aiming at accommodating projects that cannot be handled by small F2F ‘teams’ or deliberation bodies, must use current (or yet-to-be developed) advanced information technology, if only just to handle communication. The examination of evaluation tasks in such large project discourse, so far, also has shown that serious, thorough deliberation and evaluation can become so complex that information technology assistance for many tasks will seem unavoidable, whether in form of simple data management or more sophisticated ‘artificial intelligence‘.

So the question arises what role advanced Artificial or Augmented Intelligence tools might play in such a platform. A first cursory examination will begin by surveying the simpler data management (‘house-keeping’) aspects that have no direct bearing on actual ‘intelligence’ or ‘reasoning’ and evaluation in planning thinking, and then exploring possible expansion of the material being assembled and sorted, into the intelligence assistance realm. It will be important to remain alert to the concern of where the line between assistance to human reasoning and substituting machine calculation results for human judgment should be drawn.

‘House-keeping’ tasks

a. File maintenance. A first ‘simple’ data management task will of course be to gather and store the contributions to the discourse, for record-keeping, retrieval and reference. This will apply to all entries, in their ‘verbatim‘ form, most of which will be in conversational language. They may be stored in simple chronological order as they are entered, with date and author information. A separate file will keep track of authors and cross-reference them with entries and other actions. A log of activities may also be needed.

b. ‘Ordered’, or ‘formatted’ files. For a meaningfully orchestrated evaluation in the discourse, it will be necessary to check for and eliminate duplication of essential the same information, to sort the entries, for example according to issues, proposals, arguments, factual information, — perhaps already in some formatted manner — and to keep the resulting files updated. This may already involve some formatting of the content of ‘verbatim’ entries.

c.  Preparation of displays, for overview. This will involve displays of ‘candidates’ for
decision, the resulting agenda of accepted candidates; ‘issue maps’ of the evolving discussion, evaluation and decision results and statistics.

d. Preparation of evaluation worksheets.

e. Tabulating, aggregating evaluation results for statistics and displays.

‘Analysis’ tasks, examples

f. Translation. Verbatim entries submitted in different languages and their formatted ‘content’ will have to be translated into the languages of all participants. Also, entries expressed in ‘discipline jargon’ will have to be translated into conversational language.

g. Entries will have to be checked for duplication of essential identical content, expressed in different words (to avoid counting the same content twice in evaluation procedures).

h. Standard information search (‘googling’) for available pertinent information already
documented by existing research, data bases, case studies etc. This will require the selection of search terms, and the assessment of relevance of found items, then entered into as separate section of the ‘verbatim’ file.

i. Entered items (verbal contributions and researched material) will have to be formatted for evaluation; arguments with unstated (‘taken for granted’) premises must be completed with all premises stated explicitly; evaluation aspects, sub-aspects etc must be ordered into coherent ‘aspect trees’.  (Optional: Information claims found in searches may be combined to form ‘new’ arguments that have not been made by human participants).

j. Identifying argument patterns (inference rules) of arguments, and checked (to alert participants for validity problems and contradictions)

k. Normalization of weight assignments, aggregation of judgments and arguments and display if different aggregation result (different aggregation functions) as well as their effect on different decision criteria will have to be prepared and displayed.

l. More sophisticated support examples would be the development of systems models of the ‘system’ at hand, (for example, constructing cause-effect connections and loops for the factual-instrumental premises in arguments) to predict performance of proposed solutions, to simulate the behavior of the resulting system in its environment over time.

The boundary between human and machine judgments

It should be clear from preceding sections that general algorithms should not be used to generate evaluative judgments (unless there are criteria expressed in regulations, laws, or norms, to expressly substitute for human judgment.) Any calculated statistics of participant judgments should be clearly identified as ‘statistics’ of individuals’ judgments, not as ‘group judgments’. The boundary issue may be illustrated with the examination of the idea of complete ‘objectification’ or explanation of a person’s basis of judgment, with the ‘formal evaluation’ process explained in that segment. Complete description of judgment basis would require description of criterion functions for all aspect judgments, the weighting of all aspects and sub-aspects etc., and the estimates of plausibility (probability) for a plan to meet the performance expectations involved. This would allow a person A to make judgments on behalf of another person B, while not necessarily sharing B’s basis of judgment. Imagining a computer doing the same thing is meaningful only if all those values of B’s judgment basis can be given to the computer. The judgments would then be ‘deliberated’ and fully explained (not necessarily justified or mandatory for all to share).

In practice, doing that even for another person is too cumbersome to be realistic. People usually shortcut such complete objectification, making decisions with ‘offhand’ intuitive judgments — that they do not or cannot explain. That step cannot be performed by a machine, by definition: the machine must base its simulation of our judgment basis on some explanation. (Admittedly, It could be simulating the human equivalent of tossing a coin: randomly, though most humans would resent describing their intuitive judgments to be called ‘random’). And vague reference is usually made to ‘common sense’ or otherwise societally accepted values, obscuring and sidestepping the problem of dealing with the reality of significantly different values and opinions.

Where would the machine get the information for making such judgments if not from a human? Any algorithm for this would be written by a human programmer, including the specifics for obtaining the ‘factual’ information needed to develop even the most crude criterion function. A common AI argument would be that the machine can be designed to observe (gather the needed factual information) and ‘learn’ to assemble a basis of judgment, for measurable and predictable objectives such as ‘growth’ or stability (survival) of the system. The trouble is that the ‘facts’ involved in evaluating the performance and advisability of plans are not ‘facts’ at all:  They are estimates, predictions of future facts, so they cannot be ‘observed’ but must be extrapolated from past observations by means of some program. And we can deceive ourselves to accept information about the desirability of ‘ought’ or ‘goodness aspects of a plan as ‘factual’ data only by looking at statistics, (also extrapolated into the future) or legal requirements — that must have been adopted by some human agent or agency.

To be sure: these observations are not intended to dismiss the usefulness of AI (that should be called augmented intelligence) for the planning discourse. They are trying to call attention to the question of where to draw the boundary between human and machine ‘judgment’. Ignoring this issue can easily lead to development of processes in which machine ‘judgment’ — presented to the public as non-partisan, ‘objective’, and therefore more ‘correct’ than human decisions, but inevitably programmed to represent some party’s intentions and values — can become sources of serious mistakes, and tools of oppression. This brief sketch can only serve as encouragement to more thorough discussion.


— o —

On the style of government architecture

Thorbjørn Mann, February 2020

The current administration of the U.S.  Federal Government has proposed that buildings for federal government use should be designed in the ‘classical’ style of ancient Greek and Roman architecture; this has led to some passionate objections, e.g. from the American Institute of Architects.

Both the desire to get some general rules for designing government (at least ‘federal’) architecture and to the particular choice of style, as well as the reaction to that government move, are understandable, though the rationale for both deserve some discussion.

In traditional societies, it was almost a matter of course that buildings were designed in a way that made them recognizable as to their role or function or purpose: A house (for living in) was a house, distinct from the barn or the stable or the storehouse, a church, a temple or synagogue or mosque were recognizable as what they were even to children, a store was a store, and a government building was a government building — a city hall, a ruler’s palace. Even in societies changed by the industrial revolution, a factory or a railway station were recognizable to the citizens as what they were and what they were for.

For government buildings, the design or style carried additional expectations: what kind of government, what kind of societal order did they represent? At one time, a ruler would live in a fortress — ostensibly for protection from exterior enemies, but as a convenient side-effect also protection from the ruler’s own subjects who didn’t like the taxes and what he used them for, or other edicts. More ‘democratic’ or ‘republican’ governance systems favored more ‘civil’ connotations, say, like a ‘marketplace of ideas’ for how to run their lives; the issue of designing suitable places that told the governance folks that they were ‘servants of the people’ but also told visitors how great their cities or nations were, became a delicate challenge. This also affected the design of residences of oligarchs who ‘ran’ government from their own palaces, but wished to insist on the right to do so by their wealth and erudition and good taste. (1) Their administrations — bureaucracies — could no longer use the fortress symbols to keep the citizenry in line, but architects helped the rulers to find other means to do that; the sheer size and complexity of rule-based designs of administrative institutions were intimidating, sorry ‘inspiring’ enough?

That clarity and comprehensibility of buildings has been lost in recent architecture: We see many kinds of clients, governmental and commercial and in-between institutions trying to impress the public and each other by means of size and novelty supplied by architectural creativity with their buildings. This is leading to a ‘diversity’ of the public visual environment that many find refreshing and interesting but others are beginning to resent as disturbing and boring, since as a whole it expresses a different kind of uninspiring uniformity of common desire to impress: by means of size (who’s got the tallest building and most brilliant plumage?) of ‘different’ signature architecture. Coming across as more puerile than ‘inspiring’: is that who we are as a society?

So the question of whether at least some clear distinction between governmental architecture and other buildings should be re-established, is not an entirely meaningless one. But insisting that the issue should be the sole domain of architects to decide rather than the government is also missing just that point: what is it that architecture tells us about who we — and our government — are, or ought to be? Just big and impressively ‘imperial’ — like the Roman or other empires that ended up collapsing under their own weight and corruption that all the marble couldn’t hide? The ‘inspiration’ being mainly the same kind of puerile awe of its sheer power but also — and not just incidentally: fear? What is the kind of architecture that would inspire us to cooperate, through our government, towards a more ‘perfect’ just, free, creative but kind and peaceful society?

Part of the problem is that we do not have a good forum for the discussion of these issues. The government itself, in most countries, has lost the standing of being that forum, for various reasons. The forms of ‘classical’ architecture won’t bring it back — they have too easily been adopted by commercial and other building clients: the example of an insane asylum with a classical portico, an old standard joke in architecture schools that advocated more modern styles, is beginning to give us a new chilling feeling… So where: Books? Movies? TV? Ah: Twitter? Is that who we are? Just asking…

(1) I have written about this issue (under the heading of the role of ‘occasion’ and ‘image’ in the built environment) in some articles and book; using the example of government architecture in Renaissance Florence, (where we can see buildings showing the dramatic evolution of the image of government in close proximity) and about the forum for discussion of public policy. I consider the design and organization of that ‘forum’ one of the urgent challenges of our time.

EVALUATION IN THE PLANNING DISCOURSE — TIME AND EVALUATION OF PLANS

An effort to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann, February 2020

TIME AND EVALUATION OF PLANS  (Draft, for discussion)

Inadequate attention to time in current common assessment approaches

Considering that evaluation of plans (especially ‘strategic’ plans) and policy proposals, by their very nature are concerned with the future, it is curious that the role of time has not received more attention, even with the development of simulation techniques that aim at tracking the behavior of key variables of systems over many years into the future. The neglect of this question, for example in the education or architects, can be seen in the practice of judging students’ design project presentations on the basis of their drawings and models.

The exceptions — for example in building and engineering economics — are looking at very few performance variables, with quite sophisticated techniques: expected cost of building projects, ‘life cycle cost’, return on investment etc., — to be put into relation to expected revenues and profit. Techniques such as ‘Benefit/Cost Analysis‘, which in its simplest form considers those variables as realized immediately upon implementation, also can apply this kind of analysis to forecasting costs and benefits and comparing them over time by methods for converting initial amounts (of money) to ‘annualized’ or future equivalents, or vice versa.

Criticism of such approaches amount to pointing out problems such as having to convert ‘intangible’ performance aspects (like public health, satisfaction, loss of lives) into money amounts to be compared, (raising serious ethical questions) for entities like nations, where the money amounts drawn from or entering the national budget hide controversies such as inequities in the distribution of the costs and benefits. Looking at the issue from the point of view of other evaluation approaches might at least identify the challenges in the consideration of time in the assessment of plans, and help guide the development of better tools.

A first point to be pointed out is that from the perspective of the formal evaluation process, for example, (See e.g. the previous section on the Musso/Rittel approach), measures like present value of future cost or profit, or benefit-cost ratio must be considered ‘criteria’ (measures of performance) for more general evaluation aspects, for among a set of (goodness) evaluation aspects that each evaluator must be weighted for their relative importance, to make up overall ‘goodness’ or quality judgments. (See the segments on evaluation judgments, criteria and criterion functions, and aggregation.) And as such, the use of these measures as decision criteria must be considered incomplete and inappropriate. However, in those approaches, the time factor is usually not treated with even the attention expressed in the above tools for discounting future costs and benefits to comparable present worth: For example, pro or con arguments in a live verbal discussion about expected economic performance often amount to mere qualitative comparisons or claims like ‘over the budget’ or ‘more expensive in the long run’. 

Finally, in approaches such as the Pattern language, (which makes valuable observations about ‘timeless’ quality of built environments, but does not consider explicit evaluation a necessary part of the process of generating such environments), there is no mention or discussion of how time considerations might influence decisions: the quality of designs is guaranteed by having been generated by the use of patterns, but the efforts to describe that quality do not include consideration of effects of solutions over time.

Time aspects calling for attention in planning

Assessments of undesirable present or future states ‘if nothing is done’

The implementation of a plan is expected to bring about changes in the state of affairs that is felt to be ‘problems’ — things not being as they ought to be, or ‘challenges’,‘opportunities’ calling for better, improved states of affairs. Many plans and policies aim at preventing future developments to occur, either as distinctly ‘sudden’ events or development over time. Obviously, the degree of undesirability depends on the expected severity of these developments; they are matters of degree that must be predicted in order for the plan’s effectiveness to be judged.

The knowledge that goes into the estimates of future change comes from experience: observation of the pattern and rate of change in the past, (even if that knowledge is taken to be well enough established to be considered a ‘law’). But not all such change tracks have been well enough observed and recorded in the past, so much estimate and judgment goes into the assumptions already about the changes over time in the past.

Individual assessments of future plan performance

Our forecasts for future changes ‘if nothing is done’, resting on such shaky past knowledge must be considered less that 100% reliable. Should our confidence in the application of that knowledge to estimates of a plan’s future ‘performance‘ then not be be acknowledged as equal (at best) or arguably less certain — expressed as deserving a lower ‘plausibility’ qualifier? This would be expressed, for example, with the pl — plausibility — judgment for the relationship claimed in the factual-instrumental premise of an argument about the desirability of the plan effects: “Plan A will result (by virtue of the law or causal relationship R) in producing effect B”.

This argument should be (but is often not) qualified by adding the assumption ‘given the conditions C under which the relationship R will hold’: the conditions which the third (factual claim) premise of the ‘standard planning argument’ claims is — or will be — ‘given’.

Note: ‘Will be’: since the plan will be implemented in the future, this premise also involves a prediction. And to the extent the condition is not a stable, unchanging one but also a changing, evolving phenomenon, the degree of the desirable or undesirable effect B must be expected to change. And, to make things even more interesting and complex: as explained in the sections on argument assessment and systems modeling: the ‘condition’ is never adequately described by a single variable, but actually represents the  evolving state of the entire ‘system’ in which the plan will intervene.

This means that when two people exchange their assumptions and judgments, opinions, about the effectiveness of the plan by citing its effect on B, they may likely have very different degrees (or performance measures in mind, occurring under very different assumptions about both R and C, — at different times.

Things become more fuzzy when the likelihood is considered that the desired or undesired effects are not expected to change things overnight, but gradually, over time. So how should we make evaluation judgments about competing plan alternatives, when, for example, one plan promises rapid improvement soon after implementation, (as measured by one criterion), but then slowing down or even start declining, while the other will improve at a much slower but more consistent rate? A mutually consistent evaluation must be based on agreed-upon measures of performance: measured at what future time? Over what future time period, aka ‘planning horizon’? This question will just apply to the prediction of the performance criterion — what about the plausibility and weight of importance judgments we need to offer complete explanation of our judgment base?  Is it enough to apply the same plausibility factor to forecasts of trends decades in the future, as the one we use for near future predictions? As discussed in the segment on criteria, the crisp fine forecast lines we see in simulation printouts are misleading: the line should really be a fuzzy track widening more and more, the farther out in time it extends?  Likewise: is it meaningful to use the same weight of relative importance for the assessment of effects at different times?

These considerations apply, so far, only to the explanation of individual judgments, and already show that it would be almost impossible to construct meaningful criterion functions and aggregation functions to get adequately ‘objectified’ overall deliberated judgment scores for individual participants in evaluation procedures.

Aggregation issues for group judgment indicators

The time-assessment difficulties described for individual judgments do not diminish in the task of construction decision guides for groups, based on the results of individual judgment scores. Reminder: to meet the ideal ‘democratic’ expectation that the community decision about a plan should be based on due consideration of ‘all’ concerns expressed by ‘all’ affected parties, the guiding indicator (‘decision guide’ or criterion) should be an appropriate aggregation statistic of all individual overall judgments. The above considerations show, to put it mildly, that it would be difficult enough to aggregate individual judgments into overall judgment scores, but even more so to construct group indicators that are based on the same assumptions about the time qualifiers entering the assessments.

This makes it understandable (but not excusable) why decision-makers in practice tend to either screen out the uncomfortable questions about time in their judgments, or resort to vague ‘goals’ measured by vague criteria to be achieved within arbitrary time periods: “Carbon-emission neutrality by 2050”, for example: How to choose between different plan or policies whose performance simulation forecasts do not promise 100% achievement of the goal, but only ‘approximations’ with different interim performance tracks, at different costs and other side-effects in society? But 2050 is far enough in the future to ensure that none of the decision-makers for today’s plans will be held responsible for today’s decisions…

“Conclusions’ ?

The term ‘conclusion’ is obviously inappropriate if referring to expected answers to the questions discussed. These issues have just been raised, not resolved; which means that more research, experiments, discussion is called for to find better answers and tools. For the time being, the best recommendation that can be drawn from this brief exploration is that the decision-makers for today’s plans should routinely be alerted to these difficulties before making decisions, carry out the ‘objectification’ process for the concerns expressed in the discourse (of course: facilitating discourse with wide participation adequate to the severity of the challenge of the project), and then admit that any high degree of ‘certainty‘ for proposed decisions is not justified. Decisions about ‘wicked problems’ are more like ‘gambles’ for which responsibility, ‘accountability’ must be assumed. If official decision-makers cannot assume that responsibility — as expressed in ‘paying’ for mistaken decisions, should they seek supporters to share that responsibility?

So far, this kind of talk is just that: mere empty talk, since there is at best only the vague and hardly measurable ‘reputation’ available as the ‘account‘ from which ‘payment‘ can be made — in the next election, or in history books. Which does not prevent reckless mistakes in planning decisions: there should be better means for making the concept of ‘accountability’ more meaningful. (Some suggestions for this are sketched in the sections on the use of ‘discourse contribution credit points’ earned by decision-makers or contributed by supporters from their credit point accounts,and made the required form of ‘investment payment’ for decisions.) The needed research and discussion of these issues will have to consider new connections between the factors involved in evaluation for public planning.


Overview

— o —

EVALUATION IN THE PLANNING DISCOURSE — SYSTEMS THINKING, MODELING AND EVALUATION IN PLANNING

An effort to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann , February 2020. (DRAFT)

SYSTEMS THINKING / MODELING AND EVALUATION IN PLANNING

 

Evaluation and Systems in Planning  — Overview

The contribution of systems perspective and tools to planning.

In just about any discourse about improving approaches to planning and policy-making, there will be claims containing reference to ‘systems’: ‘systems thinking’, ‘systems modeling and simulation’, the need to understand ‘the whole system’, the counterintuitive behavior of systems. Systems thinking as a whole mental framework is described as ‘humanity’s currently best tool for dealing with its problems and challenges. There are by now so many variations, sub-disciplines, approaches and techniques, even definitions of systems and systems approaches on the academic as well as the consulting market, that even a cursory description of this field would become a book-length project.

The focus here is the much narrower issue of the relationship between this ‘systems perspective’ and various evaluation tasks in the planning discourse. This sketch will necessarily be quite general, not doing adequate justice to many specific ‘brands’ of systems theory and practice. However, looking at the subject from the planning / evaluation perspective will identify some significant issues that call for more discussion.

Evaluation judgments at many stages of systems projects and planning

A survey of many ‘systems’ contributions reveals that ‘evaluation’ judgments are made at many stages of projects claiming to take a systems view – like the finding that evaluation takes place at the various stages of planning projects whether explicitly guided by systems views or not. Those judgments are often not even acknowledged as ‘evaluation’, and done by very different patterns of evaluation (as described in the sections exploring the variety of evaluation judgment types and procedures.)

The similar aims of systems thinking and evaluation in planning

Systems practitioners feel that their work contributes well (or ‘better’ than other approaches) to the general aims of planning: such as
– to understand the ‘problem’ that initiates planning efforts;
– to understand the ‘system’ affected by the problem, as well as
– the larger ‘context’ or ‘environment’ system of the project;
– to understand the relationships between the components and agents, especially the ‘loops’ of such relationships that generates the often counterintuitive and complex systems behavior;
– to understand and predict the effects (costs, benefits, risks) and performance of proposed interventions in those systems (‘solution’) over time; both ‘desired’ outcomes and potentially ‘undesirable’ or even unexpected side-and after-effects;
– to help planners develop ‘good’ plan proposals,
– and to reach recommendations and/or decisions about plan proposals that are based on due consideration of all concerns for parties affected by the problem and proposed solutions, and of the merit of ‘all’ the information, contributions, insights and understanding brought into the process.
– To the extent that those decisions and their rationale must be communicated to the community for acceptance, these investigations and judgment processes should be represented in transparent, accountable form.

Judgment in early versus late stages of the process

Looking at these aims, it seems that ‘systems-guided’ projects tend to focus on the ‘early’ information (data) -gathering and ‘understanding’ aspects of planning – more than on the decision-making activities. These ‘early’ activities do involve judgment of many kinds, aiming at understanding ‘reality’ based on the gathering and analysis of facts and data. The validity of these judgments is drawn from standards of what may loosely be called ‘scientific method’ – proper observation, measurement, statistical analysis. There is no doubt that systems modeling, looking at the components of the ‘whole’ system, and the relationships between them, and the development of simulation techniques have greatly improved the degree of understanding both of the problems and the context that generates them, as well as the prediction of proposed effects (performance) of interventions: of ‘solutions’. Less attention seems to be given to the evaluation processes leading up to decisions in the later stages. Some justifications, guiding attitudes, can be distinguished to explain this:

Solution quality versus procedure based legitimatization on of decisions

One attitude, building on the ‘scientific method’ tools applied in the data-gathering and model-building phases, aims at finding ‘optimal’ (ideally, or at least ‘satisficing’) solutions described by performance measures from the models. Sophisticated computer-assisted models and simulations are used to do this; the performance measures (that must be quantifiable, to be calculated) derived from ‘client’ goal statements or from surveys of affected populations, interpreted by the model-building consultants: experts. One the one hand, their expert status is then used to assert validity of results. But on the other hand, increasingly criticized for the lack of transparency to the lay populations affected by problems and plans: questioning the experts’ legitimacy to make judgments ‘on behalf of’ affected parties. If there are differences of opinions, conflicts about model assumptions, these are ‘settled’ – must be settled – by the model builders in order for the programs to yield consistent results.

This practice (that Rittel and other critics called ‘first generation systems approach’) was seen as a superior alternative to traditional ways of generating planning decisions: the discussions in assemblies of people or their representatives, characterized by raising questions and debating the ‘pros and cons’ of proposed solutions – but then making decisions by majority voting or accepting the decisions of designated or self-designated leaders. Both of these decision modes obviously are not meeting all of the postulated expectations in the list above: voting implies dominance of interests of the ‘majority’ and potential disregard on the concerns of the minority; leader’s decisions could lack transparency (much like expert advice) leading to public distrust of the leader’s claim of having given due consideration to ‘all’ concerns affecting people.

There were then some efforts to develop procedures (e.g. formal evaluation procedures) or tools such as the widely used but also widely criticized ‘Benefit-Cost’ analysis tried to extend the ‘calculation based’ development of valid performance measures into the stage of criteria based on the assessment of solution quality to guide decisions. These were not equally widely adopted, for various reasons such as the complicated and burdensome procedures, again requiring experts to facilitate the process but arguably making public participation more difficult. A different path is the tendency to make basic ‘quality’ considerations ‘mandatory’ as regulations and laws, or ‘best practice’ standard. Apart from tending to set ‘minimum’ quality levels as requirement e.g. for building permits, this represents a movement to combine or entirely replace quality-based planning decision-making with decisions that draw their legitimacy from having been generated and following procedures.

This trend is visible both in approaches that specify procedures to generate solutions by using ‘valid’ solution components or features postulated by a theory (or laws): having followed those steps then validates the solution generated removes the necessity to carry out any complicated evaluation procedure. An example of this is Alexander’s ‘Pattern Language’ – though the ‘systems’ aspect is not as prevalent in that approach. Interestingly, that same stratagem is visible in movements that focus on processes aimed at mindsets of groups participating in special events, ‘increasing awareness’ of the nature and complexity of the ‘whole system’ but then rely on solutions ‘emerging’ from the resulting greater awareness and understanding that aim at consensus acceptance in the group for the results generated, that then do not need further examination by more systematic, quantity-focused deliberation procedures. The invoked ‘whole system’ consideration, together with a claimed scientific understanding of the true reality of the situation calling for planning intervention is a part of inducing that acceptance and legitimacy. A telltale feature of these approaches is that debate, argument, and the reasoning scrutiny of supporting evidence involving opposing opinions tends to be avoided or ‘screened out’ in the procedures generating collective ‘swarm’ consensus.

The controversy surrounding the role of ‘subjective’, feeling-based, intuitive judgments versus ‘objective’ measurable, scientific facts (not just opinions) as the proper basis for planning decisions also affects the role of systems thinking contributions to the planning process.

None of the ‘systems’ issues related to evaluation in the planning process can be considered ‘settled’ and needing no further discussion. The very basic ‘systems’ diagrams and models of planning may need to be revised and expanded to address the role and significance of evaluation, as well as argumentation, the assessment of the merit of arguments and other contributions to the discourse, and the development of better decision modes for collective planning decision-making.

–o–

EVALUATION IN THE PLANNING DISCOURSE: PROCEDURE EXAMPLE 2: EVALUATION OF PLANNING ARGUMENTS


An effort to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann, January 2020. (Draft)

PROCEDURE EXAMPLE 2:
EVALUATION OF PLANNING ARGUMENTS (PROS & CONS)

Argument evaluation in the planning discourse

Planning, like design, can be seen as an argumentative process (Rittel): Ideas and proposals are generated, questions are raised about them. The typical planning issues — especially the ‘deontic’ (ought-) questions about what the plan ought to be and how it can be achieved — generate not only answers but arguments — the proverbial ‘pros and cons’ . The information needed to make meaningful decisions — based on ‘due consideration’ of all concerns by all parties affected by the problem the plan is aiming to remedy, as well as by any solution proposals, is often coming mainly via those pros and cons. Taking this view seriously, it becomes necessary to address the question of how those arguments should be evaluated or‘weighed’ . After all, those arguments are supporting contradictory conclusions (claims), so just ‘considering. is not quite enough.

Argumentation as a cooperative rather than adversarial interaction

The very concept of the‘argumentative view of planning is somewhat controversial because many people misunderstand ‘argument’ itself as a nasty adversarial, combative, uncooperative phenomenon, a ‘quarrel’ . (I have suggested the label ‘quarrgument’ for this). But ‘argument’ is originally understood as a set of claims (premises) that together support another claim, the ‘conclusion. For planning, arguments are items of reasoning that explore the ‘pros and cons about plans; and an important underlying assumption is that we ‘argue’ — exchange arguments with others because we believe that the other will accept or consider the position about the plan we are talking about because the other already believes or accepts the premises we offer, — or will do so once we offer the additional support we have for them. It is unfortunate that even recent research on computer-assisted argumentation seems to be stuck in the ‘adversarial’ view of arguments, seeing arguments as ‘attacks’ on opposing positions rather than a cooperative search for a good planning response to problems or visions for a better future.

‘Planning arguments’

There is another critical difference between the arguments discussed in traditional logic textbooks and and the kinds I call ‘planning arguments: The traditional argumentation concern was to establish the truth or falsity of claims about the world, and that the discussion — the assessment of arguments — will ‘settle’ that question in favor of one or the other. This does not apply to planning arguments: The planning decision does not rest on single ‘clinching’ arguments but on the assessment of the entire set of pros and cons. There are always real expected benefits and real expected costs, and as the proverbial saying has it, they must be ‘weighed’ against one another to lead to a decision. There has not been much concern about how that ‘weighing’ can or should be done, and how that process might lead to a reasoned judgment about whether to accept or reject a proposed plan. I have tried to develop a way to do this — a way to explain what our judgments are based on — beginning with an examination of the structure of ‘planning arguments.

The structure of planning arguments and their different types of premises

I suggest that planning arguments can be represented in a following general ‘standard planning argument’ form, the simplest version being the following ‘pro’ argument pattern:

Proposal ‘ought’ claim (‘conclusion’):  Proposal PLAN A ought to be adopted
because
1. Factual-instrumental premise:         Implementing PLAN A will lead to outcome B
                                                                     given conditions C
and
2. Deontic premise:                                  Outcome B ought to be pursued;
and
3. Factual premise:                                  Conditions C are (or will be) given.

This form is not conclusively ‘valid’ in the formal logic sense, according to which it is considered ‘inconclusive’ and ‘defeasible’. There are usually many such pros and cons supporting or questioning a proposal: no single argument (other that evidence pointing out flaws of logical inconsistency or lacking feasibility, leading to rejection) will be sufficient to make a decision. Any evaluation of planning arguments therefore must be embedded in a ‘multi-criteria’ analysis and aggregation of judgments into the overall decision.

It will become evident that all the judgments people make will be personal ‘subjective’ judgments, not only about the deontic (ought) premise but even about the validity and salience of the ‘factual’ premises: they are all about estimated about the future — not yet validated by observation and measurement.

The judgment types of planning argument premises:
‘plausibility’ and weight of importance

There are two kinds of judgments that will be needed. The first is an assessment of the ‘plausibility’ of each claim. The term ‘plausibility’ here includes the familiar‘truth’ (or degree of certainty or probability about the truth of a claim, and the advisability, acceptability, desirability of the deontic claim. It can be expressed as a judgment on a scale e.g. of -1 to +1, with ‘-1’ meaning complete implausibility to +1 expressing ‘total plausibility’, virtual certainty, and the center point of zero meaning ‘don’t know, can’t judge’ . The second one is a judgment about the ‘weight’ of relative importance‘ of the ‘ought’ aspect. It can be expressed e.g. by a score between zero meaning (totally unimportant) and +1 meaning ‘totally important’, overriding all other aspects; the sum of all the weights of deontic premises must be equal to +1.

Argument plausibility

The first step would be the assessment of plausibility of the entire single argument, which would be a function of all three premise plausibility scores to result in an ‘Argument plausibility’ score.

For example, an argument i with pl(1) =0.5, pl(2) = 0.8, and pl(3) = 0.9 might get an argument plausibility :   Argpl (i) of 0.5 x 0.8 x 0.9 = 0.36.

Argument weight of relative importance

The second step would be to assess the ‘argument weight’ of each argument, which can be done by multiplying the weight of relative importance of its deontic premise (premise 2 in the pattern above) with the argument plausibility:    Argw(i) = Argpl(i) x w(i).
That weight will again be a value between zero (meaning ‘totally unimportant’) and +1 (meaning ‘all-important’ i.e. overriding all other considerations). This should be the result of the establishment of a ‘tree’ of deontic concerns (similar to the ‘aspects’ of the ‘Formal evaluation’ procedure in procedure example 1) that gives each deontic claim its proper place as a main aspect, sub-aspect, sub-sub-aspect or ‘criterion’ in the aspect tree, and assigning weights between 0 and 1 such that these add up to 1, at each level.

A deontic claim located at the second level of the aspect tree, having been assigned a weight of .8 at that level, being a sub-aspect to an aspect at the first level with a weight of +.4 at that level, would have a premise weight of w = 0.8 x 0.4 = 0.32. The argument weight with a plausibility of 0.36 would be  Argw(i) = 0.36 x 0.32 = 0.1152 (rounded up as 0.12).

Plan plausibility

All the argument weights could the be aggregated to the overall ‘plan plausibility’ score, for example by adding up all argument weights:
Planpl = ∑ Argw(i) for all argument weights i (of an individual participant)

Of course, there are other possible aggregation forms. (See the sections on ‘Aggregation’ and ‘Decision Criteria).  Which one of those should be used in any specific case must be specified — agreed upon — in the ‘procedural agreements’ governing each planning project.

It should be noted that in a worksheet simply listing all arguments with their premises for plausibility and weigh assignments, there is no need for identifying  arguments as ‘pro’ and ‘con’, as intended by their respective authors. Any argument given a negative premise plausibility by a participant will automatically end up getting a negative argument weight and thus becoming a ‘con’ argument for that participant — even if the argument was intended by its author as a ‘pro’ argument. This makes it obvious that all such assessments are individual, subjective judgments, even if the factual and factual-instrumental premises of arguments are considered ‘objective-fact’ matters.

The process of evaluation of planning arguments within the overall discourse

The diagram below shows the argument assessment process as it will be embedded in an overall discourse. Its central feature is the ‘Next Step?’ decision, invoked after each major activity. It lets the participants in the effort decide — according to rules specified in those procedural agreements — how deeply into the deliberation process they wish to proceed: they could decide to go ahead with a decision after the first set of overall offhand judgments, skipping the detailed premise analysis and evaluation if they feel sufficiently certain about the plan.

Process of argument assessment within the overall discourse

The use of overall plan plausibility scores:
Group statistics of the set of individual plan plausibility scores.

It may be tempting to use the overall plan plausibility scores directly as decision guides or determinants.  For example, to determine a statistic such as the average of all individual scores Planpl(j) for the participants j in the assessment group, as an overall ‘group plausibility score‘ GPlanpl,  e.g.   GPlanpl = 1/n ∑ Planpl(j) for all n members of the panel.

And in evaluating a set of competing plan alternatives: to select the proposal with the highest ‘group plausibility’ score.
Such temptations should be resisted, for a number of reasons, such as: whether a discussion has succeeded in bringing in all pertinent items that should be given ‘due consideration’; the concern that planning arguments tend to be of ‘qualitative’ nature and often don’t easily address quantitative measures of performance; questions regarding principles, the time frame of expected plan effects and consequences; whether and how issues of ‘quality’ of a plan are adequately addressed in the form of arguments; and the question of the appropriate ‘social aggregation’ criterion to be applied to the problem and plan in question: many open questions:

Open questions

Likely incompleteness of the discussion
It is argued that participation of all affected parties and a live discussion will be more likely to bring our the concerns people are actually worried about, than e.g. reliance on general textbook knowledge by panels or surveys made up by experts who ‘don’t live there’. But even the assumption that the discussion guarantees complete coverage is unwarranted. For example, is somebody likely to consider raising an issue about a plan feature that they know will affect another party negatively (when they expect the plan to be good for the own faction) — if the other party isn’t aware enough about this effect, and does not raise it? Likewise; some things may be expected to be so much matters ‘of course’ that nobody considers it necessary to mention it. So unless the overall process includes several different means of getting such information — systems modeling, simulation, extensive scrutiny of other cases etc. — the argumentative discussion alone can’t be assumed to be sufficient to bring up all needed information.

Quantitative aspects in arguments.
The typical planning argument will usually be framed in more ‘qualitative’ terms than quantitative measures. For example: in an argument that “The plan will be more sustainable’ than the current situation” this matters in the plausibility assessment: It can be seen as quite plausible as long as there is some evidence of sustainability improvement, so participants may be inclined to give it a high pl-score close to +1. By comparison, if somebody instead makes the same argument but now claims a specific ‘sustainability’ performance measure — one that others may consider as too optimistic, and therefore assign it a plausibility score closer to zero or even slightly negative: how will that affect the overall assessment? What procedural provisions would be necessary to needed to adequately deal with this question?

The issue of ‘quality’ or ‘goodness’ of a proposed solution.
It is of course possible that a discussion examines the quality or ‘goodness’ of a plan in detail, but as mentioned above, this will likely also be in general, qualitative terms, and often even avoided because to the general acceptance of sayings like’ you can’t argue about beauty’ , so the discussion will have some difficulty in this respect, if it does mention beauty at all, or spiritual value, or the appropriateness of the resulting image. Likewise, requirements for the implementation of the plan, such as meeting regulations, may not be discussed.

The decreasing plausibility ‘paradox’
Arguably, all ‘systematic’ reasoning efforts, including discussion and debate, aim a giving decision-makers a higher degree of certainty about their final judgment, than, say, just fast offhand intuitive decisions. However, it turns out that the more depth as well as breadth of discussion is done, the more final plausibility judgment scores will tend to end up closer to the ‘zero’ or ‘don’t know’ plausibility — if the plausibility assessment is done honestly and seriously, and the aggregation method suggested above is used: Multiplying the plausibility assessments for the various premises (which for the factual premises will be probability estimates). These judgments being all about future expectations, they cannot honestly be given +1 (‘total certainty’) scores or even scores close to it, the less so, the farther out in the future the effects are projected. This result can be quite disturbing and even disappointing to many participants, when final scores are compared with initial ‘offhand’ judgments.
Other issues related to time have often been inadequately dealt with in evaluation of any kind:

Estimates of plan consequences over time
All planning arguments are expressing people’s expectations of the plan’s effect in the future. Of course, we know that there are relatively few cases in which a plan or action will generate results that will materialize immediately upon implementation and then stay that way. So what do we mean when we offer an argument that a plan ‘will bring improve society’s overall health’ — even resorting to ‘precise ‘statistical’ indices like mortality rates, or life expectancy? We know that these figures will change over time, one proposed policy will bring more immediate results than another, but the other will have better effect in the long run; and again, the father into the future we look, the less certain we must be about our prediction estimates. These things are not easily expressed in even carefully crafted arguments supported by the requisite statistics: how should we score their plausibility?

Tentative insights, conclusions?

These ‘not fully resolved / more work needed’ questions may seem to strengthen the case for evaluation approaches other than trying to draw support for planning decisions from discourse contributions, even with more detailed assessment of arguments than shown here (examining the evidence and support for each premise). However, the problems emerging from the examination of the argumentative process do affect other evaluation tools as well. I have not seen approaches that resolve them all more convincingly. So:       Some first tentative conclusions are that planning debate and discourse  — too familiar and accessible to experts and lay people alike to be dismissed in favor of other methods — would benefit from enhancements such as the argument assessment tools, but also, opportunities and encouragement should be offered to draw upon other tools, as called for by the circumstances of each case and the complexity of the plans.

These techniques, methods, should be made available for use by experts and lay discourse participants, in a ‘toolkit’ part of a general planning discourse support platform — not as mandatory components of a general-purpose one-size-fit-all planning method but as a repository of tools for creative innovation and expansion: Because plans as well as the process that generate plans define those involved as ‘the creators of that plan’ , there will be a need to ‘make a difference, to make it theirs: by changing, adapting, expanding and using the tools in new and different ways, besides inventing new tools in the process.

References:
Rittel, Horst: “APIS: A Concept for an Argumentative Planning Information System” Institute of Urban and Regional Development, University of California at Berkeley, 1980 . A report about research activities conducted for the Commission of European Communities, Directorate General XIIA.
–o–

 

 

EVALUATION IN THE PLANNING DISCOURSE: SAMPLE EVALUATION PROCEDURES EXAMPLE 1: FORMAL ‘QUALITY‘ EVALUATION

Thorbjørn Mann,  January 2020

In the following segments, a few examples procedures for evaluation by groups will be discussed, to illustrate how the various parts of the evaluation process are selectively assembled into a complete process aiming at decision (or recommendation) for decision about a proposed plan or policy; to facilitate understanding of the way the different provisions and choices related to the evaluation task that are reviewed in this study can be assembled to practical procedures for specific situations. The examples are not intended to be universal recommendations for use in all situations. They all will — arguably — call for improvement as well as adaptation to the specific project and situation at hand.

A common evaluation situation is that of a panel of evaluators comparing a number of proposed alternative plan solutions to select or recommend the ‘best’ choice for adoption. Or — if there is only one proposal, — to determine if it is ‘good enough’ for implementation. It is usually carried out by a small group of people assumed to be knowledgeable of the specific discipline (for example, architecture) and reasonably representative of the interests of the project client (which may be the public). The rationale for such efforts, besides aiming for the ‘best’ decision, is the desire for ensuring that the decision will be based on good expert knowledge, but also for transparency and legitimacy and accountability of the process — to justify the decision. The outcome will usually be a recommendation to the actual client decision-makers rather than the actual adoption or implementation decision, based on the group’s assessment of the ‘goodness’ or ‘quality’ of the proposed plan, documented in some form. (It will be referred to as a ‘Formal Quality Evaluation’ procedure.)

There are of course many possible variations of procedures for this task. The sample procedure described in the following is based on the Musso-Rittel (1) procedure for the evaluation of the ‘goodness’ or quality of buildings.

The group will begin by agreeing on the procedure itself and its various provisions: the steps to be followed (for example, whether evaluation aspects and weighting should be worked out before or after presentation of the plan or plan alternatives), general vocabulary, judgment and weighting scales, aggregation functions both for individual overall judgments and group indices, and decision rules for determining its final recommendation.

Assuming that the group has adopted the sequence of first establishing the evaluation aspects and criteria against which the plan (or plans) will be judged, the first step will be a general discussion of the aspects and sub-aspects to be considered, resulting in the construction of the ‘aspect tree’ of aspects, sub-aspects, sub-sub-aspects etc. (ref. the section on aspects and aspect trees) and criteria (the ‘objective’ measures of performance; ref. the section on evaluation criteria). The resulting tree will be displayed and become the basis for scoring worksheets.

The second step will be the assignment of aspect weights (on a scale of zero to to 1 and such that at each level of the ‘tree’, the sum of weights at that level will be 1. Panel members will develop their own individual weighting. This phase can be further refined by applying ‘Delphi Method’ steps: establishing and displaying the mean / median and extreme weighting values and then asking the authors of extremely low or high weights to share and discuss their reasoning for these judgments, and giving all members the chance to revise their weights.

Once the weighted evaluation aspect trees have been established, the next step will be the presentation of the plan proposal or competing alternatives.

Each participant will assign a first ‘overall offhand’ quality score (on the agreed-upon scale, e.g. -3 to +3) to each plan alternative.

The group’s statistics of these scores are then established and displayed. This may help to decide whether any further discussion and detailed scoring of aspects will be needed: there may be a visible consensus for a clear ‘winner’. If there are disagreements, the group decides to go through with the detailed evaluation, and the initial scores are kept for later comparison with the final results. using common worksheets or spreadsheets of the aspect tree, for panel members to fill in their weighting and quality scores. This step may involve the drawing of ‘criterion functions’ (ref. the section of evaluation criteria and criterion functions) to explain how each participant’s quality judgments depend on (objective) criteria or performance measures. These diagrams may be discussed by the panel. They should be considered each panel member’s subjective basis of judgment (or representation of the interests of factions in the population of affected parties). However, some such functions may be the mandatory official regulations (such as building regulations). The temptation to urge adoption of common (group) functions (‘for simplicity and expression of ‘common purpose’) should be resisted to avoid possible bias towards the interests of some parties at the expense of others.

Each group member will then fill in the scores for all aspects and sub-aspects etc. The results will be compiled, and the statistics compared; extreme differences in the scoring will be discussed, and members given the chance to change their assessments. This step may be repeated as needed (e.g. until there are no further changes in the judgments).

The results are calculated and the group recommendation determined according to the agreed-upon decision criterion. The ‘deliberated’ individual overall scores are compared with the members’ initial ‘offhand’ scores. The results may cause the group to revise the aspects, weights, or criteria, (e.g. upon discovering that some critical aspect has been missed), or call for changes in the plan, before determining the final recommendation or decision (again, according to the initial procedural agreements).

The steps are summarized in the following ‘flow chart’.

Evalmap15 FormalevalEvaluation example 1: Steps of a ‘Group Formal Quality Evaluation’

Questions related to this version of a formal evaluation process may include the issue of potential manipulation of weight assignments by changing the steepness of the criterion junction.
Ostensibly, the described process aims at ‘giving due consideration’ to all legitimately ‘pertinent’ aspects, while eliminating or reducing the role of ‘hidden agenda’ factors. Questions may arise whether such ‘hidden’ concerns might be hidden behind other plausible but inordinately weighted aspects. A question that may arise from discussions and argumentation about controversial aspects of a plan and the examination of how such arguments should be assessed (ref. the section on a process for Evaluation of Planning Arguments) is the role of plausibility judgments about the premises of such arguments: esp. the probability of assumption claims that a plan will actually result in a desired or undesired outcome (an aspect). Should the ‘quality’ assessment’ process include a modification of quality scores based on plausibility / probability scores, or should this concern be explicitly included in the aspect list?

The process may of course seem ‘too complicated’, and if done by ‘experts’, invite critical questions whether the experts really can overcome their own interests, bias and preconceptions to adequately consider the interests of other, less‘expert’ groups. The procedure obviously assumes a general degree of cooperativeness in the panel, which sometimes may be unrealistic. Are more adequate provisions needed for dealing with incompatible attitudes and interests?

Other questions? Concerns? Missing considerations?

–o–

EVALUATION IN THE PLANNING DISCOURSE — AGGREGATION

An effort  to clarify the role of evaluation in the planning process.

Thorbjørn Mann

THE AGGREGATION PROBLEM:

Getting Overall Judgments from Partial Judgments

The concept of ‘deliberation’ was explained, in part, as the process of ‘making overall judgments a function of partial judgments’. We may have gone through the process of trying to explain our overall judgment about something to others, or made the effort of ‘giving due consideration’ to all aspects of the situation, we arrived at a set of partial judgments. Now the question becomes: just how do we‘assemble’ (‘aggregate’) these partial judgments into the overall judgment that can guide us in making the decision, for example, to adopt or reject the proposed plan.

The discussion has already gone past the level of familiar practices such as merely counting the number of supporting and opposing ‘votes’ and even some well-intentioned approaches that begin to look at the number of explanations (arguments or support statements) in the ‘breadth‘ (number of different aspects brought up by each supporting or opposing party, and ‘depth‘ — the number of levels of further support for the premises and assumptions of the individual arguments.

The reason why these approaches are not satisfying is that neither of them even begin to consider the validity, truth and probability (or more generally: plausibility), weight or relevance of any of the aspects discussed, or whether the judgments about any such aspects or justifications even have been ‘duly considered’ and understood.

Obviously, it is the content merit, validity, the ‘weight’ of arguments etc. we try to bring to bear on the decision. Do we have better, more ‘systematic’ ways to do this than Ben Franklin’s suggestion? (He recommended to write up the pros and cons in two different columns on a sheet of paper, then look at pairs of pros and cons that carry approximately equal weight and cancel each other out, and cross those pairs out, until there are the remaining arguments left that do not have any opposing reasons in the opposite column: those are the ones that should tilt the decision towards approval or rejection.)

What we have, on the one hand, is the impressively quantitative ‘Benefit/Cost’ approach, that works by assigning monetary value to all the b e n e f i t s of a proposed plan (the ‘pro’ arguments), and compare those with the monetary value of the ‘c o s t’ of implementing it. It has run into considerable criticism, mainly for the reasons that the ‘moral’ reluctance of having to assign monetary value to people’s health, happiness, lives; the fact that the approach usually has to be done by ‘experts’, not by citizens or affected groups, and from the overall point of view of some overall ‘common good’ perspective that is the usually ‘biased’ perspective of the government currently in power, that may not be shared by all segments of society, because it tends to hide the issue of the distribution of benefits and costs: inequality.

On the other hand, we have the approaches that separate the ‘description’ of the evaluated plan or object to be evaluated from the perceived ‘goodness’ (‘quality’) judgments about the plan and its expected outcome, from the‘validity’ (plausibility, probability) of the statements (arguments) conveying the claims about those outcomes. And, so far, the assumption that ‘everybody‘ including all ‘affected’ parties can make such judgments and ‘test’ their merit in a participatory discourse. What is still missing are the possible ways in which they can be ‘aggregated’ into overall judgments and guiding measures of merit for the decision– first, for individuals, and then for any groups that will have to come to a commonly supported decision. This is the topic to be discussed under the heading of ‘aggregation’ and ‘aggregation functions’ — the rules for getting ‘overall’ judgments from partial judgments and ‘criterion function’ results.

It turns out that there are different possible rules about this, assumptions that must be agreed upon in each evaluation situation, because they result in different decisions: The following are some considerations about assumptions or expectations for ‘aggregation functions (suggested in H. Rittel’s UC Berkeley lectures on evaluation, and listed in H. Dehlingers article  “Deontische Fragen: Urteilsbildung und Bewertungssysteme”  in “DIe methodische Bewertung: Ein Instrument des Architekten”  Festschrift zum 65. Geburtstag von Prof. Arne Musso, TU Berlin, 1993):

Possible expectation considerations for aggregation functions:

1 Do we wish to arrive at a single overall judgment (of quality / goodness or plausibility etc.) — one that can help us distinguish between e.g. plan alternatives of greater or lesser goodness?

2 Should the judgments be expressed on a commonly agreed-upon judgment scale whose end points and interim values ‘mean’ the same for all participants in the exercise? For example, should we agree that the end points of a ‘goodness’ judgment scale should mean ‘couldn’t possibly be better’ and ‘couldn’t possibly be worse’, respectively; and that there should be a ‘midpoint ‘ meaning’ neither good nor bad; indifferent; or ‘don’t know, can’t make a judgment’? (Most judgments scales in practice are expressed on a ‘zero to some ‘one-directed’ scale such as zero to some number.)

3 Should the judgment scale be the same at all levels of the aspect tree, to maintain consistency of the meaning of scores at all levels? So any equations for the aggregation functions should be designed to produce the respective overall judgment at the next higher level to be a score on the same scale.

4 Should the aggregation function ensure that if a partial score is improved, the resulting overall score should also be higher or the same, but not lower (‘worse’) than the unimproved score? By the same rule, the overall score should not be better than the previous score, if one of the partial judgments becomes lower than before.
This expectation means that in a criterion function, the line showing the judgments cores should be steadily declining and decreasing, but not have sudden spikes or valleys.

5 Should the overall score be the highest one (say, +3 = ’couldn’t be better’, on a +3/-3 scale) only if all partial scores are +3?

6 Should the overall score be a result of ‘due consideration’ of all the partial scores?

7a Should the overall score be ‘couldn’t be worse’ (e.g. -3 on the +3/-3 scale) if all partial scores are -3?
Or
7b Should the overall score become -3 if one of the partial scores becomes -3 and thus unacceptable?

Different functions — equations of ‘summing up partial judgments — will be needed for this. There will be situations or tasks in which aggregation functions meeting expectation 7b may be needed. There is no one aggregation function meeting all these expectations. Thus, the choice of aggregation functions must be discussed and agreed upon in the process.

Examples:

‘Formal’ Evaluation process for Plan ‘Quality’

Individual Assessment

The aggregation functions that can be considered for individual ‘quality’ evaluation (deliberating goodness judgments, aspect trees, and criteria i what may be called ‘formal evaluation procedures’) include the following:

Type I:    ‘Weigthed average’ function:    Q = ∑ (qi * wi)
                                                                       
where Q is the overall deliberated ‘quality’ or ‘goodness’ score; qi is the partial score of aspect or sub-aspect i, n is the number of aspects at that level; wi is the weight of relative importance of aspect i, on a scale of 0 ≤ wi ≤ 1 and such that ∑wi = 1. This is needed to ensure that Q will be on the same scale (and the associated meaning of the resulting judgment score the same) as q.

This function does not meet expectation 7b; it allows ‘poor scores’ on some aspects to be compensated for by good scores on other aspects.

Type II a:  (“the chain as strong as its weakest link” function):      Q = Min (qi)

Type IIb:        Q = ∏ ((qi + u) ^wi ) – u
                       
Here, Q is the overall score, qi the partial score i of n aspects, and u is the extreme value of the judgment score (e.g. 3 in the above examples). This function, (multiplying all the components of (qi + u) with the exponent of their weights wi, and then subtracting u from the result to get the overall score back to the +3/-3 scale) acts much like the type I function as long as all the scores are in the positive range, but pulls the overall score the closer to -u , the lower one of the scores comes to – u, the ‘unacceptable’ performance or quality. (Example: if the structural stability of a building does not stand up against expected loads, it does not matter how otherwise functionally adequate or aesthetically pleasing it is: its evaluation should express that it should not be built.)

Group assessments:

Individual scores from these functions can be applied to get statistical ‘Group’ indicators GQ : for example:

GQ = 1/m ∑ Qj
This is the average or mean of all individual Qj scores for all m participants j.

GQ = Qj
This takes the judgment of one group member as the group score.

GQ = Min (Qj)
The group score is equal to the score of the member with the lowest score in the group; both these functions effectively make one participant the ‘dictator’ of the group…

Different functions should be explored that, for example, would consider the distribution of the improvement of scores for a plan, compared with the existing or expected situation the plan is expected to remedy. For example, the form of aggregation function type IIb could also be used for group judgment aggregation.

The use of any of these aggregated, (‘deliberated’ ) judgment scores as a ‘direct’ guiding measure of performance determining the decision c a n n o t be recommended: they should be considered decision guides, not determinants. For one, the expectation of ‘due consideration of all aspects‘ would require complete knowledge of all consequences of a plan and causes of the problem it aims to fix — an expectation that must be considered unrealistic in many situations but especially in ‘wicked’ problems or ‘messes’. There, decision-makers must be willing to assume responsibility for the possibility of being wrong — a condition impossible to deliberate, by definition, when caused by ignorance of what we might be wrong about.

Aggregation functions for developing overall ‘Plan plausibility’ judgment
from the evaluation of ‘pro’ and ‘con’ arguments.

Plausibility judgments

It is necessary to reach agreements about the use of terms for the merit of judgments about plans as derived from argument evaluation, because the evaluation task for planning arguments is somewhat different from the assessment usually applied to arguments. Traditionally, the purpose of argument analysis and evaluation is seen as that of verifying whether a claim — the ‘conclusion’ of an argument — is true or false, and this is seen as depending on the truth of the premises of the argument and the ‘validity’ of the form or pattern or ‘inference rule’ of the argument. These criteria do not apply to planning arguments, that can generally be represented as follows: (Stating the ‘conclusion’ — the claim about a proposed plan A first:)

Plan A ought to be implemented
because
Plan A will result in outcome B, (given or assuming conditions C);
and
Outcome B ought to be aimed for / pursued;
and
Conditions C are given (or will be when the plan is implemented)

Like many arguments studied by traditional logic and rhetoric, not all argument premises are stated explicitly in discussions; some being assumed as ‘taken for granted’ by the audience: ‘Enthymemes’. But to evaluate these arguments, all premises must be stated and considered explicitly.

This argument pattern — and its variations due to different constellations of assertion or negation of different premises — does not conform to the validity conditions for ‘valid’ arguments in the formal logic sense: it is, at best inconclusive. Its premises cannot be established as ‘true or false‘ — the proposed plan is discussed precisely because it as well as the outcomes B aren’t there (‘true’) yet. This also means that some of the premises — the factual-instrumental claim ‘If A is implemented, then B will happen, given C) and the claim ‘C will be present’ are estimates or predictions qualified as probabilities. And ‘B ought to be pursued’ as well as the conclusions ‘A ought to be implemented) are neither adequately called ‘probable’ nor true or false: the term ‘plausible’ seems more fitting at least for some participants, but not necessarily for all. Indeed: ‘plausible’ judgments may be applied to all the claims, with the different interpretations easily understood to each kind. This is is a matter of degrees, not a binary yes/no quality. And unlike the assessment of factual and even probability claims in common logic argumentation studies, the ‘conclusion’ (decision to implement) is not determined by a single ‘clinching’ argument: it rests on several or many ‘pros and cons’ that must be weighed against each other. That is the evaluation task for planning argumentation, that will lead to different ‘aggregation’ tools.

The logical structure of planning argumentation can be stated in simplified for as follows:

– An individual’s overall plausibility judgment of plausibility PLANPL is a function of the ‘weight’ Argw of the various pro and con arguments raised about the proposal.
– The argument weight is a function of the argument’s plausibility Argpl and the weight of relative importance w of its deontic (ought-) premise.
– The Argument plausibility Argpl is a function of the plausibility of its premises.

Examples of aggregation functions for this process might be the following:
                                                   
1. a Argument plausibility:        Argpli = ∏ {Premplj} for all n premises j.

Or  

1.b   Argpli = Min{ Premplj}

2.    Argument weight:               Argwi = Argpli * wi with 0 ≤ wi and ∑ wi = 1
for the ought-premises of all m arguments

3. Proposal plausibility PLANPL = ∑ Argwi
                                               

Aggregation functions for Group judgment statistics: (Similar to the Quality group aggregations)

Mean Group average plausibility   GPLANPL = 1/k ∑ PLANPLp for all k participants p.                                                  

There are of course other statistical measures of the set of individual plausibility judgments that can be examined and discussed. Like the ‘Quality’ Aggregated measures, these ‘Group’ plausibility statistics should not be used as decision determinants but as guides, for instance as indicators of need for further discussion and explanation of judgment differences, or for revision of plan details to alleviate concerns leading to large judgment differences.
Evalmap11 Aggregation

Comments? Additions?

–o–