Archive for the 'Rittel' Category

EVALUATION IN THE PLANNING DISCOURSE — TIME AND EVALUATION OF PLANS

An effort to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann, February 2020

TIME AND EVALUATION OF PLANS  (Draft, for discussion)

Inadequate attention to time in current common assessment approaches

Considering that evaluation of plans (especially ‘strategic’ plans) and policy proposals, by their very nature are concerned with the future, it is curious that the role of time has not received more attention, even with the development of simulation techniques that aim at tracking the behavior of key variables of systems over many years into the future. The neglect of this question, for example in the education or architects, can be seen in the practice of judging students’ design project presentations on the basis of their drawings and models.

The exceptions — for example in building and engineering economics — are looking at very few performance variables, with quite sophisticated techniques: expected cost of building projects, ‘life cycle cost’, return on investment etc., — to be put into relation to expected revenues and profit. Techniques such as ‘Benefit/Cost Analysis‘, which in its simplest form considers those variables as realized immediately upon implementation, also can apply this kind of analysis to forecasting costs and benefits and comparing them over time by methods for converting initial amounts (of money) to ‘annualized’ or future equivalents, or vice versa.

Criticism of such approaches amount to pointing out problems such as having to convert ‘intangible’ performance aspects (like public health, satisfaction, loss of lives) into money amounts to be compared, (raising serious ethical questions) for entities like nations, where the money amounts drawn from or entering the national budget hide controversies such as inequities in the distribution of the costs and benefits. Looking at the issue from the point of view of other evaluation approaches might at least identify the challenges in the consideration of time in the assessment of plans, and help guide the development of better tools.

A first point to be pointed out is that from the perspective of the formal evaluation process, for example, (See e.g. the previous section on the Musso/Rittel approach), measures like present value of future cost or profit, or benefit-cost ratio must be considered ‘criteria’ (measures of performance) for more general evaluation aspects, for among a set of (goodness) evaluation aspects that each evaluator must be weighted for their relative importance, to make up overall ‘goodness’ or quality judgments. (See the segments on evaluation judgments, criteria and criterion functions, and aggregation.) And as such, the use of these measures as decision criteria must be considered incomplete and inappropriate. However, in those approaches, the time factor is usually not treated with even the attention expressed in the above tools for discounting future costs and benefits to comparable present worth: For example, pro or con arguments in a live verbal discussion about expected economic performance often amount to mere qualitative comparisons or claims like ‘over the budget’ or ‘more expensive in the long run’. 

Finally, in approaches such as the Pattern language, (which makes valuable observations about ‘timeless’ quality of built environments, but does not consider explicit evaluation a necessary part of the process of generating such environments), there is no mention or discussion of how time considerations might influence decisions: the quality of designs is guaranteed by having been generated by the use of patterns, but the efforts to describe that quality do not include consideration of effects of solutions over time.

Time aspects calling for attention in planning

Assessments of undesirable present or future states ‘if nothing is done’

The implementation of a plan is expected to bring about changes in the state of affairs that is felt to be ‘problems’ — things not being as they ought to be, or ‘challenges’,‘opportunities’ calling for better, improved states of affairs. Many plans and policies aim at preventing future developments to occur, either as distinctly ‘sudden’ events or development over time. Obviously, the degree of undesirability depends on the expected severity of these developments; they are matters of degree that must be predicted in order for the plan’s effectiveness to be judged.

The knowledge that goes into the estimates of future change comes from experience: observation of the pattern and rate of change in the past, (even if that knowledge is taken to be well enough established to be considered a ‘law’). But not all such change tracks have been well enough observed and recorded in the past, so much estimate and judgment goes into the assumptions already about the changes over time in the past.

Individual assessments of future plan performance

Our forecasts for future changes ‘if nothing is done’, resting on such shaky past knowledge must be considered less that 100% reliable. Should our confidence in the application of that knowledge to estimates of a plan’s future ‘performance‘ then not be be acknowledged as equal (at best) or arguably less certain — expressed as deserving a lower ‘plausibility’ qualifier? This would be expressed, for example, with the pl — plausibility — judgment for the relationship claimed in the factual-instrumental premise of an argument about the desirability of the plan effects: “Plan A will result (by virtue of the law or causal relationship R) in producing effect B”.

This argument should be (but is often not) qualified by adding the assumption ‘given the conditions C under which the relationship R will hold’: the conditions which the third (factual claim) premise of the ‘standard planning argument’ claims is — or will be — ‘given’.

Note: ‘Will be’: since the plan will be implemented in the future, this premise also involves a prediction. And to the extent the condition is not a stable, unchanging one but also a changing, evolving phenomenon, the degree of the desirable or undesirable effect B must be expected to change. And, to make things even more interesting and complex: as explained in the sections on argument assessment and systems modeling: the ‘condition’ is never adequately described by a single variable, but actually represents the  evolving state of the entire ‘system’ in which the plan will intervene.

This means that when two people exchange their assumptions and judgments, opinions, about the effectiveness of the plan by citing its effect on B, they may likely have very different degrees (or performance measures in mind, occurring under very different assumptions about both R and C, — at different times.

Things become more fuzzy when the likelihood is considered that the desired or undesired effects are not expected to change things overnight, but gradually, over time. So how should we make evaluation judgments about competing plan alternatives, when, for example, one plan promises rapid improvement soon after implementation, (as measured by one criterion), but then slowing down or even start declining, while the other will improve at a much slower but more consistent rate? A mutually consistent evaluation must be based on agreed-upon measures of performance: measured at what future time? Over what future time period, aka ‘planning horizon’? This question will just apply to the prediction of the performance criterion — what about the plausibility and weight of importance judgments we need to offer complete explanation of our judgment base?  Is it enough to apply the same plausibility factor to forecasts of trends decades in the future, as the one we use for near future predictions? As discussed in the segment on criteria, the crisp fine forecast lines we see in simulation printouts are misleading: the line should really be a fuzzy track widening more and more, the farther out in time it extends?  Likewise: is it meaningful to use the same weight of relative importance for the assessment of effects at different times?

These considerations apply, so far, only to the explanation of individual judgments, and already show that it would be almost impossible to construct meaningful criterion functions and aggregation functions to get adequately ‘objectified’ overall deliberated judgment scores for individual participants in evaluation procedures.

Aggregation issues for group judgment indicators

The time-assessment difficulties described for individual judgments do not diminish in the task of construction decision guides for groups, based on the results of individual judgment scores. Reminder: to meet the ideal ‘democratic’ expectation that the community decision about a plan should be based on due consideration of ‘all’ concerns expressed by ‘all’ affected parties, the guiding indicator (‘decision guide’ or criterion) should be an appropriate aggregation statistic of all individual overall judgments. The above considerations show, to put it mildly, that it would be difficult enough to aggregate individual judgments into overall judgment scores, but even more so to construct group indicators that are based on the same assumptions about the time qualifiers entering the assessments.

This makes it understandable (but not excusable) why decision-makers in practice tend to either screen out the uncomfortable questions about time in their judgments, or resort to vague ‘goals’ measured by vague criteria to be achieved within arbitrary time periods: “Carbon-emission neutrality by 2050”, for example: How to choose between different plan or policies whose performance simulation forecasts do not promise 100% achievement of the goal, but only ‘approximations’ with different interim performance tracks, at different costs and other side-effects in society? But 2050 is far enough in the future to ensure that none of the decision-makers for today’s plans will be held responsible for today’s decisions…

“Conclusions’ ?

The term ‘conclusion’ is obviously inappropriate if referring to expected answers to the questions discussed. These issues have just been raised, not resolved; which means that more research, experiments, discussion is called for to find better answers and tools. For the time being, the best recommendation that can be drawn from this brief exploration is that the decision-makers for today’s plans should routinely be alerted to these difficulties before making decisions, carry out the ‘objectification’ process for the concerns expressed in the discourse (of course: facilitating discourse with wide participation adequate to the severity of the challenge of the project), and then admit that any high degree of ‘certainty‘ for proposed decisions is not justified. Decisions about ‘wicked problems’ are more like ‘gambles’ for which responsibility, ‘accountability’ must be assumed. If official decision-makers cannot assume that responsibility — as expressed in ‘paying’ for mistaken decisions, should they seek supporters to share that responsibility?

So far, this kind of talk is just that: mere empty talk, since there is at best only the vague and hardly measurable ‘reputation’ available as the ‘account‘ from which ‘payment‘ can be made — in the next election, or in history books. Which does not prevent reckless mistakes in planning decisions: there should be better means for making the concept of ‘accountability’ more meaningful. (Some suggestions for this are sketched in the sections on the use of ‘discourse contribution credit points’ earned by decision-makers or contributed by supporters from their credit point accounts,and made the required form of ‘investment payment’ for decisions.) The needed research and discussion of these issues will have to consider new connections between the factors involved in evaluation for public planning.


Overview

— o —

EVALUATION IN THE PLANNING DISCOURSE: PROCEDURE EXAMPLE 2: EVALUATION OF PLANNING ARGUMENTS


An effort to clarify the role of deliberative evaluation in the planning and policy-making process. Thorbjørn Mann, January 2020. (Draft)

PROCEDURE EXAMPLE 2:
EVALUATION OF PLANNING ARGUMENTS (PROS & CONS)

Argument evaluation in the planning discourse

Planning, like design, can be seen as an argumentative process (Rittel): Ideas and proposals are generated, questions are raised about them. The typical planning issues — especially the ‘deontic’ (ought-) questions about what the plan ought to be and how it can be achieved — generate not only answers but arguments — the proverbial ‘pros and cons’ . The information needed to make meaningful decisions — based on ‘due consideration’ of all concerns by all parties affected by the problem the plan is aiming to remedy, as well as by any solution proposals, is often coming mainly via those pros and cons. Taking this view seriously, it becomes necessary to address the question of how those arguments should be evaluated or‘weighed’ . After all, those arguments are supporting contradictory conclusions (claims), so just ‘considering. is not quite enough.

Argumentation as a cooperative rather than adversarial interaction

The very concept of the‘argumentative view of planning is somewhat controversial because many people misunderstand ‘argument’ itself as a nasty adversarial, combative, uncooperative phenomenon, a ‘quarrel’ . (I have suggested the label ‘quarrgument’ for this). But ‘argument’ is originally understood as a set of claims (premises) that together support another claim, the ‘conclusion. For planning, arguments are items of reasoning that explore the ‘pros and cons about plans; and an important underlying assumption is that we ‘argue’ — exchange arguments with others because we believe that the other will accept or consider the position about the plan we are talking about because the other already believes or accepts the premises we offer, — or will do so once we offer the additional support we have for them. It is unfortunate that even recent research on computer-assisted argumentation seems to be stuck in the ‘adversarial’ view of arguments, seeing arguments as ‘attacks’ on opposing positions rather than a cooperative search for a good planning response to problems or visions for a better future.

‘Planning arguments’

There is another critical difference between the arguments discussed in traditional logic textbooks and and the kinds I call ‘planning arguments: The traditional argumentation concern was to establish the truth or falsity of claims about the world, and that the discussion — the assessment of arguments — will ‘settle’ that question in favor of one or the other. This does not apply to planning arguments: The planning decision does not rest on single ‘clinching’ arguments but on the assessment of the entire set of pros and cons. There are always real expected benefits and real expected costs, and as the proverbial saying has it, they must be ‘weighed’ against one another to lead to a decision. There has not been much concern about how that ‘weighing’ can or should be done, and how that process might lead to a reasoned judgment about whether to accept or reject a proposed plan. I have tried to develop a way to do this — a way to explain what our judgments are based on — beginning with an examination of the structure of ‘planning arguments.

The structure of planning arguments and their different types of premises

I suggest that planning arguments can be represented in a following general ‘standard planning argument’ form, the simplest version being the following ‘pro’ argument pattern:

Proposal ‘ought’ claim (‘conclusion’):  Proposal PLAN A ought to be adopted
because
1. Factual-instrumental premise:         Implementing PLAN A will lead to outcome B
                                                                     given conditions C
and
2. Deontic premise:                                  Outcome B ought to be pursued;
and
3. Factual premise:                                  Conditions C are (or will be) given.

This form is not conclusively ‘valid’ in the formal logic sense, according to which it is considered ‘inconclusive’ and ‘defeasible’. There are usually many such pros and cons supporting or questioning a proposal: no single argument (other that evidence pointing out flaws of logical inconsistency or lacking feasibility, leading to rejection) will be sufficient to make a decision. Any evaluation of planning arguments therefore must be embedded in a ‘multi-criteria’ analysis and aggregation of judgments into the overall decision.

It will become evident that all the judgments people make will be personal ‘subjective’ judgments, not only about the deontic (ought) premise but even about the validity and salience of the ‘factual’ premises: they are all about estimated about the future — not yet validated by observation and measurement.

The judgment types of planning argument premises:
‘plausibility’ and weight of importance

There are two kinds of judgments that will be needed. The first is an assessment of the ‘plausibility’ of each claim. The term ‘plausibility’ here includes the familiar‘truth’ (or degree of certainty or probability about the truth of a claim, and the advisability, acceptability, desirability of the deontic claim. It can be expressed as a judgment on a scale e.g. of -1 to +1, with ‘-1’ meaning complete implausibility to +1 expressing ‘total plausibility’, virtual certainty, and the center point of zero meaning ‘don’t know, can’t judge’ . The second one is a judgment about the ‘weight’ of relative importance‘ of the ‘ought’ aspect. It can be expressed e.g. by a score between zero meaning (totally unimportant) and +1 meaning ‘totally important’, overriding all other aspects; the sum of all the weights of deontic premises must be equal to +1.

Argument plausibility

The first step would be the assessment of plausibility of the entire single argument, which would be a function of all three premise plausibility scores to result in an ‘Argument plausibility’ score.

For example, an argument i with pl(1) =0.5, pl(2) = 0.8, and pl(3) = 0.9 might get an argument plausibility :   Argpl (i) of 0.5 x 0.8 x 0.9 = 0.36.

Argument weight of relative importance

The second step would be to assess the ‘argument weight’ of each argument, which can be done by multiplying the weight of relative importance of its deontic premise (premise 2 in the pattern above) with the argument plausibility:    Argw(i) = Argpl(i) x w(i).
That weight will again be a value between zero (meaning ‘totally unimportant’) and +1 (meaning ‘all-important’ i.e. overriding all other considerations). This should be the result of the establishment of a ‘tree’ of deontic concerns (similar to the ‘aspects’ of the ‘Formal evaluation’ procedure in procedure example 1) that gives each deontic claim its proper place as a main aspect, sub-aspect, sub-sub-aspect or ‘criterion’ in the aspect tree, and assigning weights between 0 and 1 such that these add up to 1, at each level.

A deontic claim located at the second level of the aspect tree, having been assigned a weight of .8 at that level, being a sub-aspect to an aspect at the first level with a weight of +.4 at that level, would have a premise weight of w = 0.8 x 0.4 = 0.32. The argument weight with a plausibility of 0.36 would be  Argw(i) = 0.36 x 0.32 = 0.1152 (rounded up as 0.12).

Plan plausibility

All the argument weights could the be aggregated to the overall ‘plan plausibility’ score, for example by adding up all argument weights:
Planpl = ∑ Argw(i) for all argument weights i (of an individual participant)

Of course, there are other possible aggregation forms. (See the sections on ‘Aggregation’ and ‘Decision Criteria).  Which one of those should be used in any specific case must be specified — agreed upon — in the ‘procedural agreements’ governing each planning project.

It should be noted that in a worksheet simply listing all arguments with their premises for plausibility and weigh assignments, there is no need for identifying  arguments as ‘pro’ and ‘con’, as intended by their respective authors. Any argument given a negative premise plausibility by a participant will automatically end up getting a negative argument weight and thus becoming a ‘con’ argument for that participant — even if the argument was intended by its author as a ‘pro’ argument. This makes it obvious that all such assessments are individual, subjective judgments, even if the factual and factual-instrumental premises of arguments are considered ‘objective-fact’ matters.

The process of evaluation of planning arguments within the overall discourse

The diagram below shows the argument assessment process as it will be embedded in an overall discourse. Its central feature is the ‘Next Step?’ decision, invoked after each major activity. It lets the participants in the effort decide — according to rules specified in those procedural agreements — how deeply into the deliberation process they wish to proceed: they could decide to go ahead with a decision after the first set of overall offhand judgments, skipping the detailed premise analysis and evaluation if they feel sufficiently certain about the plan.

Process of argument assessment within the overall discourse

The use of overall plan plausibility scores:
Group statistics of the set of individual plan plausibility scores.

It may be tempting to use the overall plan plausibility scores directly as decision guides or determinants.  For example, to determine a statistic such as the average of all individual scores Planpl(j) for the participants j in the assessment group, as an overall ‘group plausibility score‘ GPlanpl,  e.g.   GPlanpl = 1/n ∑ Planpl(j) for all n members of the panel.

And in evaluating a set of competing plan alternatives: to select the proposal with the highest ‘group plausibility’ score.
Such temptations should be resisted, for a number of reasons, such as: whether a discussion has succeeded in bringing in all pertinent items that should be given ‘due consideration’; the concern that planning arguments tend to be of ‘qualitative’ nature and often don’t easily address quantitative measures of performance; questions regarding principles, the time frame of expected plan effects and consequences; whether and how issues of ‘quality’ of a plan are adequately addressed in the form of arguments; and the question of the appropriate ‘social aggregation’ criterion to be applied to the problem and plan in question: many open questions:

Open questions

Likely incompleteness of the discussion
It is argued that participation of all affected parties and a live discussion will be more likely to bring our the concerns people are actually worried about, than e.g. reliance on general textbook knowledge by panels or surveys made up by experts who ‘don’t live there’. But even the assumption that the discussion guarantees complete coverage is unwarranted. For example, is somebody likely to consider raising an issue about a plan feature that they know will affect another party negatively (when they expect the plan to be good for the own faction) — if the other party isn’t aware enough about this effect, and does not raise it? Likewise; some things may be expected to be so much matters ‘of course’ that nobody considers it necessary to mention it. So unless the overall process includes several different means of getting such information — systems modeling, simulation, extensive scrutiny of other cases etc. — the argumentative discussion alone can’t be assumed to be sufficient to bring up all needed information.

Quantitative aspects in arguments.
The typical planning argument will usually be framed in more ‘qualitative’ terms than quantitative measures. For example: in an argument that “The plan will be more sustainable’ than the current situation” this matters in the plausibility assessment: It can be seen as quite plausible as long as there is some evidence of sustainability improvement, so participants may be inclined to give it a high pl-score close to +1. By comparison, if somebody instead makes the same argument but now claims a specific ‘sustainability’ performance measure — one that others may consider as too optimistic, and therefore assign it a plausibility score closer to zero or even slightly negative: how will that affect the overall assessment? What procedural provisions would be necessary to needed to adequately deal with this question?

The issue of ‘quality’ or ‘goodness’ of a proposed solution.
It is of course possible that a discussion examines the quality or ‘goodness’ of a plan in detail, but as mentioned above, this will likely also be in general, qualitative terms, and often even avoided because to the general acceptance of sayings like’ you can’t argue about beauty’ , so the discussion will have some difficulty in this respect, if it does mention beauty at all, or spiritual value, or the appropriateness of the resulting image. Likewise, requirements for the implementation of the plan, such as meeting regulations, may not be discussed.

The decreasing plausibility ‘paradox’
Arguably, all ‘systematic’ reasoning efforts, including discussion and debate, aim a giving decision-makers a higher degree of certainty about their final judgment, than, say, just fast offhand intuitive decisions. However, it turns out that the more depth as well as breadth of discussion is done, the more final plausibility judgment scores will tend to end up closer to the ‘zero’ or ‘don’t know’ plausibility — if the plausibility assessment is done honestly and seriously, and the aggregation method suggested above is used: Multiplying the plausibility assessments for the various premises (which for the factual premises will be probability estimates). These judgments being all about future expectations, they cannot honestly be given +1 (‘total certainty’) scores or even scores close to it, the less so, the farther out in the future the effects are projected. This result can be quite disturbing and even disappointing to many participants, when final scores are compared with initial ‘offhand’ judgments.
Other issues related to time have often been inadequately dealt with in evaluation of any kind:

Estimates of plan consequences over time
All planning arguments are expressing people’s expectations of the plan’s effect in the future. Of course, we know that there are relatively few cases in which a plan or action will generate results that will materialize immediately upon implementation and then stay that way. So what do we mean when we offer an argument that a plan ‘will bring improve society’s overall health’ — even resorting to ‘precise ‘statistical’ indices like mortality rates, or life expectancy? We know that these figures will change over time, one proposed policy will bring more immediate results than another, but the other will have better effect in the long run; and again, the father into the future we look, the less certain we must be about our prediction estimates. These things are not easily expressed in even carefully crafted arguments supported by the requisite statistics: how should we score their plausibility?

Tentative insights, conclusions?

These ‘not fully resolved / more work needed’ questions may seem to strengthen the case for evaluation approaches other than trying to draw support for planning decisions from discourse contributions, even with more detailed assessment of arguments than shown here (examining the evidence and support for each premise). However, the problems emerging from the examination of the argumentative process do affect other evaluation tools as well. I have not seen approaches that resolve them all more convincingly. So:       Some first tentative conclusions are that planning debate and discourse  — too familiar and accessible to experts and lay people alike to be dismissed in favor of other methods — would benefit from enhancements such as the argument assessment tools, but also, opportunities and encouragement should be offered to draw upon other tools, as called for by the circumstances of each case and the complexity of the plans.

These techniques, methods, should be made available for use by experts and lay discourse participants, in a ‘toolkit’ part of a general planning discourse support platform — not as mandatory components of a general-purpose one-size-fit-all planning method but as a repository of tools for creative innovation and expansion: Because plans as well as the process that generate plans define those involved as ‘the creators of that plan’ , there will be a need to ‘make a difference, to make it theirs: by changing, adapting, expanding and using the tools in new and different ways, besides inventing new tools in the process.

References:
Rittel, Horst: “APIS: A Concept for an Argumentative Planning Information System” Institute of Urban and Regional Development, University of California at Berkeley, 1980 . A report about research activities conducted for the Commission of European Communities, Directorate General XIIA.
–o–

 

 

EVALUATION IN THE PLANNING DISCOURSE: SAMPLE EVALUATION PROCEDURES EXAMPLE 1: FORMAL ‘QUALITY‘ EVALUATION

Thorbjørn Mann,  January 2020

In the following segments, a few examples procedures for evaluation by groups will be discussed, to illustrate how the various parts of the evaluation process are selectively assembled into a complete process aiming at decision (or recommendation) for decision about a proposed plan or policy; to facilitate understanding of the way the different provisions and choices related to the evaluation task that are reviewed in this study can be assembled to practical procedures for specific situations. The examples are not intended to be universal recommendations for use in all situations. They all will — arguably — call for improvement as well as adaptation to the specific project and situation at hand.

A common evaluation situation is that of a panel of evaluators comparing a number of proposed alternative plan solutions to select or recommend the ‘best’ choice for adoption. Or — if there is only one proposal, — to determine if it is ‘good enough’ for implementation. It is usually carried out by a small group of people assumed to be knowledgeable of the specific discipline (for example, architecture) and reasonably representative of the interests of the project client (which may be the public). The rationale for such efforts, besides aiming for the ‘best’ decision, is the desire for ensuring that the decision will be based on good expert knowledge, but also for transparency and legitimacy and accountability of the process — to justify the decision. The outcome will usually be a recommendation to the actual client decision-makers rather than the actual adoption or implementation decision, based on the group’s assessment of the ‘goodness’ or ‘quality’ of the proposed plan, documented in some form. (It will be referred to as a ‘Formal Quality Evaluation’ procedure.)

There are of course many possible variations of procedures for this task. The sample procedure described in the following is based on the Musso-Rittel (1) procedure for the evaluation of the ‘goodness’ or quality of buildings.

The group will begin by agreeing on the procedure itself and its various provisions: the steps to be followed (for example, whether evaluation aspects and weighting should be worked out before or after presentation of the plan or plan alternatives), general vocabulary, judgment and weighting scales, aggregation functions both for individual overall judgments and group indices, and decision rules for determining its final recommendation.

Assuming that the group has adopted the sequence of first establishing the evaluation aspects and criteria against which the plan (or plans) will be judged, the first step will be a general discussion of the aspects and sub-aspects to be considered, resulting in the construction of the ‘aspect tree’ of aspects, sub-aspects, sub-sub-aspects etc. (ref. the section on aspects and aspect trees) and criteria (the ‘objective’ measures of performance; ref. the section on evaluation criteria). The resulting tree will be displayed and become the basis for scoring worksheets.

The second step will be the assignment of aspect weights (on a scale of zero to to 1 and such that at each level of the ‘tree’, the sum of weights at that level will be 1. Panel members will develop their own individual weighting. This phase can be further refined by applying ‘Delphi Method’ steps: establishing and displaying the mean / median and extreme weighting values and then asking the authors of extremely low or high weights to share and discuss their reasoning for these judgments, and giving all members the chance to revise their weights.

Once the weighted evaluation aspect trees have been established, the next step will be the presentation of the plan proposal or competing alternatives.

Each participant will assign a first ‘overall offhand’ quality score (on the agreed-upon scale, e.g. -3 to +3) to each plan alternative.

The group’s statistics of these scores are then established and displayed. This may help to decide whether any further discussion and detailed scoring of aspects will be needed: there may be a visible consensus for a clear ‘winner’. If there are disagreements, the group decides to go through with the detailed evaluation, and the initial scores are kept for later comparison with the final results. using common worksheets or spreadsheets of the aspect tree, for panel members to fill in their weighting and quality scores. This step may involve the drawing of ‘criterion functions’ (ref. the section of evaluation criteria and criterion functions) to explain how each participant’s quality judgments depend on (objective) criteria or performance measures. These diagrams may be discussed by the panel. They should be considered each panel member’s subjective basis of judgment (or representation of the interests of factions in the population of affected parties). However, some such functions may be the mandatory official regulations (such as building regulations). The temptation to urge adoption of common (group) functions (‘for simplicity and expression of ‘common purpose’) should be resisted to avoid possible bias towards the interests of some parties at the expense of others.

Each group member will then fill in the scores for all aspects and sub-aspects etc. The results will be compiled, and the statistics compared; extreme differences in the scoring will be discussed, and members given the chance to change their assessments. This step may be repeated as needed (e.g. until there are no further changes in the judgments).

The results are calculated and the group recommendation determined according to the agreed-upon decision criterion. The ‘deliberated’ individual overall scores are compared with the members’ initial ‘offhand’ scores. The results may cause the group to revise the aspects, weights, or criteria, (e.g. upon discovering that some critical aspect has been missed), or call for changes in the plan, before determining the final recommendation or decision (again, according to the initial procedural agreements).

The steps are summarized in the following ‘flow chart’.

Evalmap15 FormalevalEvaluation example 1: Steps of a ‘Group Formal Quality Evaluation’

Questions related to this version of a formal evaluation process may include the issue of potential manipulation of weight assignments by changing the steepness of the criterion junction.
Ostensibly, the described process aims at ‘giving due consideration’ to all legitimately ‘pertinent’ aspects, while eliminating or reducing the role of ‘hidden agenda’ factors. Questions may arise whether such ‘hidden’ concerns might be hidden behind other plausible but inordinately weighted aspects. A question that may arise from discussions and argumentation about controversial aspects of a plan and the examination of how such arguments should be assessed (ref. the section on a process for Evaluation of Planning Arguments) is the role of plausibility judgments about the premises of such arguments: esp. the probability of assumption claims that a plan will actually result in a desired or undesired outcome (an aspect). Should the ‘quality’ assessment’ process include a modification of quality scores based on plausibility / probability scores, or should this concern be explicitly included in the aspect list?

The process may of course seem ‘too complicated’, and if done by ‘experts’, invite critical questions whether the experts really can overcome their own interests, bias and preconceptions to adequately consider the interests of other, less‘expert’ groups. The procedure obviously assumes a general degree of cooperativeness in the panel, which sometimes may be unrealistic. Are more adequate provisions needed for dealing with incompatible attitudes and interests?

Other questions? Concerns? Missing considerations?

–o–

EVALUATION IN THE PLANNING DISCOURSE — AGGREGATION

An effort  to clarify the role of evaluation in the planning process.

Thorbjørn Mann

THE AGGREGATION PROBLEM:

Getting Overall Judgments from Partial Judgments

The concept of ‘deliberation’ was explained, in part, as the process of ‘making overall judgments a function of partial judgments’. We may have gone through the process of trying to explain our overall judgment about something to others, or made the effort of ‘giving due consideration’ to all aspects of the situation, we arrived at a set of partial judgments. Now the question becomes: just how do we‘assemble’ (‘aggregate’) these partial judgments into the overall judgment that can guide us in making the decision, for example, to adopt or reject the proposed plan.

The discussion has already gone past the level of familiar practices such as merely counting the number of supporting and opposing ‘votes’ and even some well-intentioned approaches that begin to look at the number of explanations (arguments or support statements) in the ‘breadth‘ (number of different aspects brought up by each supporting or opposing party, and ‘depth‘ — the number of levels of further support for the premises and assumptions of the individual arguments.

The reason why these approaches are not satisfying is that neither of them even begin to consider the validity, truth and probability (or more generally: plausibility), weight or relevance of any of the aspects discussed, or whether the judgments about any such aspects or justifications even have been ‘duly considered’ and understood.

Obviously, it is the content merit, validity, the ‘weight’ of arguments etc. we try to bring to bear on the decision. Do we have better, more ‘systematic’ ways to do this than Ben Franklin’s suggestion? (He recommended to write up the pros and cons in two different columns on a sheet of paper, then look at pairs of pros and cons that carry approximately equal weight and cancel each other out, and cross those pairs out, until there are the remaining arguments left that do not have any opposing reasons in the opposite column: those are the ones that should tilt the decision towards approval or rejection.)

What we have, on the one hand, is the impressively quantitative ‘Benefit/Cost’ approach, that works by assigning monetary value to all the b e n e f i t s of a proposed plan (the ‘pro’ arguments), and compare those with the monetary value of the ‘c o s t’ of implementing it. It has run into considerable criticism, mainly for the reasons that the ‘moral’ reluctance of having to assign monetary value to people’s health, happiness, lives; the fact that the approach usually has to be done by ‘experts’, not by citizens or affected groups, and from the overall point of view of some overall ‘common good’ perspective that is the usually ‘biased’ perspective of the government currently in power, that may not be shared by all segments of society, because it tends to hide the issue of the distribution of benefits and costs: inequality.

On the other hand, we have the approaches that separate the ‘description’ of the evaluated plan or object to be evaluated from the perceived ‘goodness’ (‘quality’) judgments about the plan and its expected outcome, from the‘validity’ (plausibility, probability) of the statements (arguments) conveying the claims about those outcomes. And, so far, the assumption that ‘everybody‘ including all ‘affected’ parties can make such judgments and ‘test’ their merit in a participatory discourse. What is still missing are the possible ways in which they can be ‘aggregated’ into overall judgments and guiding measures of merit for the decision– first, for individuals, and then for any groups that will have to come to a commonly supported decision. This is the topic to be discussed under the heading of ‘aggregation’ and ‘aggregation functions’ — the rules for getting ‘overall’ judgments from partial judgments and ‘criterion function’ results.

It turns out that there are different possible rules about this, assumptions that must be agreed upon in each evaluation situation, because they result in different decisions: The following are some considerations about assumptions or expectations for ‘aggregation functions (suggested in H. Rittel’s UC Berkeley lectures on evaluation, and listed in H. Dehlingers article  “Deontische Fragen: Urteilsbildung und Bewertungssysteme”  in “DIe methodische Bewertung: Ein Instrument des Architekten”  Festschrift zum 65. Geburtstag von Prof. Arne Musso, TU Berlin, 1993):

Possible expectation considerations for aggregation functions:

1 Do we wish to arrive at a single overall judgment (of quality / goodness or plausibility etc.) — one that can help us distinguish between e.g. plan alternatives of greater or lesser goodness?

2 Should the judgments be expressed on a commonly agreed-upon judgment scale whose end points and interim values ‘mean’ the same for all participants in the exercise? For example, should we agree that the end points of a ‘goodness’ judgment scale should mean ‘couldn’t possibly be better’ and ‘couldn’t possibly be worse’, respectively; and that there should be a ‘midpoint ‘ meaning’ neither good nor bad; indifferent; or ‘don’t know, can’t make a judgment’? (Most judgments scales in practice are expressed on a ‘zero to some ‘one-directed’ scale such as zero to some number.)

3 Should the judgment scale be the same at all levels of the aspect tree, to maintain consistency of the meaning of scores at all levels? So any equations for the aggregation functions should be designed to produce the respective overall judgment at the next higher level to be a score on the same scale.

4 Should the aggregation function ensure that if a partial score is improved, the resulting overall score should also be higher or the same, but not lower (‘worse’) than the unimproved score? By the same rule, the overall score should not be better than the previous score, if one of the partial judgments becomes lower than before.
This expectation means that in a criterion function, the line showing the judgments cores should be steadily declining and decreasing, but not have sudden spikes or valleys.

5 Should the overall score be the highest one (say, +3 = ’couldn’t be better’, on a +3/-3 scale) only if all partial scores are +3?

6 Should the overall score be a result of ‘due consideration’ of all the partial scores?

7a Should the overall score be ‘couldn’t be worse’ (e.g. -3 on the +3/-3 scale) if all partial scores are -3?
Or
7b Should the overall score become -3 if one of the partial scores becomes -3 and thus unacceptable?

Different functions — equations of ‘summing up partial judgments — will be needed for this. There will be situations or tasks in which aggregation functions meeting expectation 7b may be needed. There is no one aggregation function meeting all these expectations. Thus, the choice of aggregation functions must be discussed and agreed upon in the process.

Examples:

‘Formal’ Evaluation process for Plan ‘Quality’

Individual Assessment

The aggregation functions that can be considered for individual ‘quality’ evaluation (deliberating goodness judgments, aspect trees, and criteria i what may be called ‘formal evaluation procedures’) include the following:

Type I:    ‘Weigthed average’ function:    Q = ∑ (qi * wi)
                                                                       
where Q is the overall deliberated ‘quality’ or ‘goodness’ score; qi is the partial score of aspect or sub-aspect i, n is the number of aspects at that level; wi is the weight of relative importance of aspect i, on a scale of 0 ≤ wi ≤ 1 and such that ∑wi = 1. This is needed to ensure that Q will be on the same scale (and the associated meaning of the resulting judgment score the same) as q.

This function does not meet expectation 7b; it allows ‘poor scores’ on some aspects to be compensated for by good scores on other aspects.

Type II a:  (“the chain as strong as its weakest link” function):      Q = Min (qi)

Type IIb:        Q = ∏ ((qi + u) ^wi ) – u
                       
Here, Q is the overall score, qi the partial score i of n aspects, and u is the extreme value of the judgment score (e.g. 3 in the above examples). This function, (multiplying all the components of (qi + u) with the exponent of their weights wi, and then subtracting u from the result to get the overall score back to the +3/-3 scale) acts much like the type I function as long as all the scores are in the positive range, but pulls the overall score the closer to -u , the lower one of the scores comes to – u, the ‘unacceptable’ performance or quality. (Example: if the structural stability of a building does not stand up against expected loads, it does not matter how otherwise functionally adequate or aesthetically pleasing it is: its evaluation should express that it should not be built.)

Group assessments:

Individual scores from these functions can be applied to get statistical ‘Group’ indicators GQ : for example:

GQ = 1/m ∑ Qj
This is the average or mean of all individual Qj scores for all m participants j.

GQ = Qj
This takes the judgment of one group member as the group score.

GQ = Min (Qj)
The group score is equal to the score of the member with the lowest score in the group; both these functions effectively make one participant the ‘dictator’ of the group…

Different functions should be explored that, for example, would consider the distribution of the improvement of scores for a plan, compared with the existing or expected situation the plan is expected to remedy. For example, the form of aggregation function type IIb could also be used for group judgment aggregation.

The use of any of these aggregated, (‘deliberated’ ) judgment scores as a ‘direct’ guiding measure of performance determining the decision c a n n o t be recommended: they should be considered decision guides, not determinants. For one, the expectation of ‘due consideration of all aspects‘ would require complete knowledge of all consequences of a plan and causes of the problem it aims to fix — an expectation that must be considered unrealistic in many situations but especially in ‘wicked’ problems or ‘messes’. There, decision-makers must be willing to assume responsibility for the possibility of being wrong — a condition impossible to deliberate, by definition, when caused by ignorance of what we might be wrong about.

Aggregation functions for developing overall ‘Plan plausibility’ judgment
from the evaluation of ‘pro’ and ‘con’ arguments.

Plausibility judgments

It is necessary to reach agreements about the use of terms for the merit of judgments about plans as derived from argument evaluation, because the evaluation task for planning arguments is somewhat different from the assessment usually applied to arguments. Traditionally, the purpose of argument analysis and evaluation is seen as that of verifying whether a claim — the ‘conclusion’ of an argument — is true or false, and this is seen as depending on the truth of the premises of the argument and the ‘validity’ of the form or pattern or ‘inference rule’ of the argument. These criteria do not apply to planning arguments, that can generally be represented as follows: (Stating the ‘conclusion’ — the claim about a proposed plan A first:)

Plan A ought to be implemented
because
Plan A will result in outcome B, (given or assuming conditions C);
and
Outcome B ought to be aimed for / pursued;
and
Conditions C are given (or will be when the plan is implemented)

Like many arguments studied by traditional logic and rhetoric, not all argument premises are stated explicitly in discussions; some being assumed as ‘taken for granted’ by the audience: ‘Enthymemes’. But to evaluate these arguments, all premises must be stated and considered explicitly.

This argument pattern — and its variations due to different constellations of assertion or negation of different premises — does not conform to the validity conditions for ‘valid’ arguments in the formal logic sense: it is, at best inconclusive. Its premises cannot be established as ‘true or false‘ — the proposed plan is discussed precisely because it as well as the outcomes B aren’t there (‘true’) yet. This also means that some of the premises — the factual-instrumental claim ‘If A is implemented, then B will happen, given C) and the claim ‘C will be present’ are estimates or predictions qualified as probabilities. And ‘B ought to be pursued’ as well as the conclusions ‘A ought to be implemented) are neither adequately called ‘probable’ nor true or false: the term ‘plausible’ seems more fitting at least for some participants, but not necessarily for all. Indeed: ‘plausible’ judgments may be applied to all the claims, with the different interpretations easily understood to each kind. This is is a matter of degrees, not a binary yes/no quality. And unlike the assessment of factual and even probability claims in common logic argumentation studies, the ‘conclusion’ (decision to implement) is not determined by a single ‘clinching’ argument: it rests on several or many ‘pros and cons’ that must be weighed against each other. That is the evaluation task for planning argumentation, that will lead to different ‘aggregation’ tools.

The logical structure of planning argumentation can be stated in simplified for as follows:

– An individual’s overall plausibility judgment of plausibility PLANPL is a function of the ‘weight’ Argw of the various pro and con arguments raised about the proposal.
– The argument weight is a function of the argument’s plausibility Argpl and the weight of relative importance w of its deontic (ought-) premise.
– The Argument plausibility Argpl is a function of the plausibility of its premises.

Examples of aggregation functions for this process might be the following:
                                                   
1. a Argument plausibility:        Argpli = ∏ {Premplj} for all n premises j.

Or  

1.b   Argpli = Min{ Premplj}

2.    Argument weight:               Argwi = Argpli * wi with 0 ≤ wi and ∑ wi = 1
for the ought-premises of all m arguments

3. Proposal plausibility PLANPL = ∑ Argwi
                                               

Aggregation functions for Group judgment statistics: (Similar to the Quality group aggregations)

Mean Group average plausibility   GPLANPL = 1/k ∑ PLANPLp for all k participants p.                                                  

There are of course other statistical measures of the set of individual plausibility judgments that can be examined and discussed. Like the ‘Quality’ Aggregated measures, these ‘Group’ plausibility statistics should not be used as decision determinants but as guides, for instance as indicators of need for further discussion and explanation of judgment differences, or for revision of plan details to alleviate concerns leading to large judgment differences.
Evalmap11 Aggregation

Comments? Additions?

–o–

Artificial Intelligence for the Planning Discourse?

The discussion about whether and to what extent Artificial Intelligence technology can meaningfully support the planning process with contributions similar or equivalent to human thinking is largely dominated by controversies about what constitutes thinking. An exploration of the reasoning patterns in the various phases of human planning discourse could produce examples for that discussion, leaving the determination of that definition label ‘thinking’ open for the time being.

One specific example (only one of several different and equally significant aspects of planning):
People propose plans for action, e.g. to solve problems, and then engage in discussion of the ‘pros and cons’ of those plans: arguments. A typical planning argument can be represented as follows:
“Plan A should be adopted for implementation, because
i) Plan A will produce consequences B, given certain conditions C, and
ii) Consequences B ought to be pursued (are desirable); and
iii) Conditions C are present (or will be, at implementation).

Question 1: could such an argument be produced by automated technological means?
This question is usually followed up by question 2: Would or could the ‘machine’ doing this be able (or should it be allowed) to also make decisions to accept or reject the plan?

Can meaningful answer to these questions be found? (Currently or definitively?)

Beginning with question 1: Formulating such an argument in their minds, humans draw on their memory — or on explanations and information provided during the discourse itself — for items of knowledge that could become premises of arguments:

‘Factual-instrumental’ knowledge of the form “FI (A –> X)”, for example (“A will cause X’, given conditions C;
‘Deontic’ Knowledge: of the form “D(X)” or “X ought to be’ (is desirable)”, and
Factual Knowledge of the form “F ( C)” or “Conditions C are given”.
‘Argumentation-pattern knowledge’: Recognition that any of the three knowledge items above can be inserted into an argument pattern of the form
D(A) <– ((A–> X)|C)) & D(X) & F( C)).

(There are of course many variations of such argument patterns, depending on assertion or negation of the premises, and different kinds of relations between A and X.)

It does not seem to be very difficult to develop a Knowledge Base (collection) of such knowledge items and a search-and-match program that would assemble ‘arguments’ of this pattern.

Any difficulties arguably would be more related to the task of recognizing and suitably extracting such items (‘translating’ it into the form recognizable to the program) from the human recorded and documented sources of knowledge, than to the mechanics of the search-and-match process itself. Interpretation of meaning: is an item expressed in different words equivalent to other terms that are appropriate to the other potential premises in an argument?

Another slight quibble relates to the question whether and to what extent the consequence qualifies as one that ‘ought to be’ (or not) — but this can be dealt with by reformulating the argument as follows:
“If (FI(A –> X|C) & D(X) & F( C)) then D(A)”.

(It should be accompanied by the warning that this formulation that ‘looks’ like a valid logic argument pattern is in fact not really applicable to arguments containing deontic premises, and that a plan’s plausibility does not rest on one single argument but on the weight of all its pros and cons.)

But assuming that these difficulties can be adequately dealt with, the answer to question 1) seems obvious: yes, the machine would be able to construct such arguments. Whether that already qualifies as ‘thinking’ or ‘reasoning’ can be left open; the significant realization is equally obvious: that such contributions could be potentially helpful contributions to the discourse. For example, by contributing arguments human participants had not thought of, they could be helping to meet the aim of ensuring — as much as possible — that the plan will not have ‘unexpected’ undesirable side-and-after-effects. (One important part of H. Rittel’s very definition of design and planning.)

The same cannot as easily be said about question 2.

The answer to that question hinges on whether the human ‘thinking’ activities needed to make a decision to accept or reject the proposed plan can be matched by ‘the machine’. The reason is, of course, that not only the plausibility of each argument will have to be ‘evaluated’, judged, (by assessing the plausibility of each premise) but also that the arguments must be weighed against one another. (A method for doing that has been described e.g  in ‘The Fog Island Argument” and  several papers.)

So a ‘search and match’ process as the first part of such a judgment process would have to look for those judgments in the data base, and the difficulty here has to do with where such judgments would come from.

The prevailing answers for factual-instrumental premises as well as for fact-premises — premises i) and iii) — are drawing on ‘documented’ and commonly accepted truth, probability, or validity. Differences of opinion about claims drawn from ‘scientific’ and technical work, if any, are decided by a version of ‘majority voting’ — ‘prevailing knowledge’, accepted by the community of scientists or domain experts, ‘settled’ controversies, derived from sufficiently ‘big data’ (“95% of climate scientists…”) can serve as the basis of such judgments. It is often overlooked that the premises of planning arguments, however securely based on ‘past’ measurements, observations etc, are inherently predictions. So any certainty about their past truth must at least be qualified with a somewhat lesser degree of confidence that they will be equally reliably true in future: will the conditions under which the A –> X relationships are assumed to hold, be equally likely to hold in the future? Including the conditions that may be — intentionally or inadvertently — changed as a result of future human activities pursuing different aims than those of the plan?

The question becomes even more controversial for the deontic (ought-) premises of the planning arguments. Where do the judgments come from by which their plausibility and importance can be determined? Humans can be asked to express their opinions — and prevalent social conventions consider the freedom to not only express such judgments but to have them given ‘due consideration’ in public decision-making (however roundabout and murky the actual mechanisms for realizing this may be) as a human right.

Equally commonly accepted is the principle that machines do not ‘have’ such rights. Thus, any judgment about deontic premises that might be used by a program for evaluating planning arguments would have to be based on information about human judgments that can be found in the data base the program is using. There are areas where this is possible and even plausible. Not only is it prudent to assign a decidedly negative plausibility to deontic claims whose realization contradicts natural laws established by science (and considered still valid…like ‘any being heavier than air can’t fly…’). But there also are human agreements — regulations and laws, and predominant moral codes — that summarily prohibit or mandate certain plans or parts of plans; supported by subsequent arguments to the effect that we all ought not break the law, regardless of our own opinions. This will effectively ‘settle’ some arguments.

And there are various approaches in design and planning that seem to aim at finding — or establishing — enough such mandates or prohibitions that, taken together, would make it possible to ‘mechanically’ determine at least whether a plan is ‘admissible’ or not — e.g. for buildings, whether its developer should get a building permit.

This pattern is supported in theory by modal logic branches that seek to resolve deontic claims on the basis of ‘true/false’ judgments (that must have been made somewhere by some authority) of ‘obligatory’, ‘prohibited’, ‘permissible’ etc. It can be seen to be extended by at last two different ‘movements’ that must be seen as sidestepping the judgment question.

One is the call for society as a whole to adopt (collectively agree upon) moral, ethical codes whose function is equivalent to ‘laws’ — from which the deontic judgment about plans could be derived by mechanically applying the appropriate reasoning steps — invoking ‘Common Good’ mandates supposedly accepted unanimously by everybody. The question whether and how this relates to the principle of granting the ‘right’ of freely holding and happily pursuing one’s own deontic opinions is usually not examined in this context.

Another example is the ‘movement’ of Alexander’s ‘Pattern Language’. Contrary to claims that it is a radically ‘new’ theory, it stands in a long and venerable tradition of many trades and disciplines to establish codes and collections of ‘best practice’ rules of ‘patterns’ — learned by apprentices in years of observing the masters, or compiled in large volumes of proper patterns. The basic idea is that of postulating ‘elements’ (patterns) of the realm of plans, and relationships between these, by means of which plans can be generated. The ‘validity’ or ‘quality’ of the generated plan is then guaranteed by the claim that each of the patterns (rules) are ‘valid’ (‘true’, or having that elusive ‘quality without a name’). This is supported by showing examples of environments judged (by intuition, i.e. needing no further justification) to be exhibiting ‘quality’, by  applications of the patterns. The remaining ‘solution space’ left open by e.g.  the different combinations of patterns, then serves as the basis for claims that the theory offers ‘participation’ by prospective users. However, it hardly needs pointing out that individual ‘different’ judgments — e.g. based on the appropriateness of a given pattern or relationship — are effectively eliminated by such approaches. (This assessment should not be seen as a wholesale criticism of the approach, whose unquestionable merit is to introduce quality considerations into the discourse about built environment that ‘common practice’ has neglected.)

The relevance of discussing these approaches for the two questions above now becomes clear: If a ‘machine’ (which could of course just be a human, untiringly pedantic bureaucrat assiduously checking plans for adherence to rules or patterns) were able to draw upon a sufficiently comprehensive data base of factual-instrumental knowledge and ‘patterns or rules’, it could conceivably be able to generate solutions. And if the deontic judgments have been inherently attached to those rules, it could claim that no further evaluation (i.e. inconvenient intrusion of differing individual judgments would be necessary.

The development of ‘AI’ tools of automated support for planning discourse — will have to make a choice. It could follow this vision of ‘common good’ and valid truth of solution elements, universally accepted by all members of society. Or it could accept the challenge of a view that it either should refrain from intruding on the task of making judgments, or going to the trouble of obtaining those judgments from human participants in the process, before using them in the task of deriving decisions. Depending on which course is followed, I suspect the agenda and tasks of current and further research and development and programming will be very different. This is, in my opinion, a controversial issue of prime significance.

A hypothetical ‘perfect’ artificial argumentative systems planner — D R A F T

A tavern discussion looking at the idea of an artificial planning discourse participant from the perspectives of the argumentative model and the systems thinking perspectives, expanding both (or mutually patching up their shortcomings), and inadvertently stumbling upon potential improvements upon the concept of democracy.

Customers and patron of a fogged-in island tavern with nothing better to do,
awaiting news on progress on the development of a better planning discourse
begin an idly speculative exploration of the idea of an artificial planner:
would such a creature be a better planning discourse participant?

– Hey Bog-Hubert: Early up and testing Vodçeks latest incarnation of café cataluñia forte forte? The Fog Island Tavern mental alarm clock for the difficult-to-wakeup?

– Good morning, professor. Well, have you tried it? Or do you want to walk around in a fogged-in-morning daze for just a while longer?

– Whou-ahmm, sorry. Depends.

– Depends? On what?

– Whether this morning needs my full un-dazed attention yet.

– Makes sense. Okay. Let me ask you a question. I hear you’ve been up in town. Did you run into Abbé Boulah, by any chance? He’s been up there for a while, sorely neglecting his Fog Island Tavern duties here, ostensibly to help his buddy at the university with the work on his proposals for a better planning discourse system. Hey, Sophie: care to join us?

– Okay, good morning to you too. What’s this about a planning system?

– I’m not sure if it’s a ‘system’. I was asking the professor if he has heard whether Abbé Boulah and his buddy have made any progress on that. It’s more like a discourse platform than a ‘system’ – if by ‘system’ you mean something like an artificial planning machine – a robot planner.

– Oh, I’m relieved to hear that.

– Why, Sophie?

– Why? Having a machine make our plans for our future? That would be soo out of touch. Really. Just when we are just beginning to understand that WE have to take charge, to redesign the current ‘MeE’ system, from a new Awareness of the Whole, of our common place on the planet, in the universe, our very survival as a species? That WE have to get out from under that authoritarian, ME-centered linear machine systems thinking, to emerge into a sustainable, regenerative NEW SYSTEM?

– Wow. Sounds like we are in more trouble than I thought. So who’s doing that, how will we get to that New System?

– Hold on, my friends. Lets not get into that New System issue again – haven’t we settled that some time ago here – that we simply don’t know yet what it should be like, and should try to learn more about what works and what doesn’t, before starting another ambitious grand experiment with another flawed theory?

– Okay, Vodçek, good point. But coming to think about it – to get there, — I mean to a better system with a better theory — wouldn’t that require some smart planning? You can’t just rely on everybody coming to that great awareness Sophie is taking about, for everything just to fall into place? So wouldn’t it be interesting to just speculate a bit about what your, I mean Abbé Boulah’s buddy’s planning machine, would have to do to make decent plans?

– You mean the machine he doesn’t, or, according to Sophie, emphatically shouldn’t even think about developing?

– That’s the one.

– Glad we have that cleared up… Well, since we haven’t heard anything new about the latest scandals up in town yet, it might be an interesting way to pass the time.

– Hmm.

– I hear no real objections, just an indecisive Hmm. And no, I don’t have any news from Abbé Boulah either – didn’t see him. He tends to stay out of public view. So it’s agreed. Where do we start?

– Well: how about at the beginning? What triggers a planning project? How does it start?

Initializing triggers for planning?

– Good idea, Sophie. Somebody having a problem – meaning something in the way things are, that are perceived as unsatisfactory, hurtful, ugly, whatever: not the way they ought to be?

– Or: somebody just has a bright idea for doing something new and interesting?

– Or there’s a routine habit or institutional obligation to make preparations for the future – to lay in provisions for a trip, or heating material for the winter?

– Right: there are many different things that could trigger a call for ‘doing something about it’ – a plan. So what would the machine do about that?

– You are assuming that somebody – a human being – is telling the machine to do something? Or are you saying that it could come up with a planning project on its own?

– It would have to be programmed to recognize a discrepancy between what IS and what OUGHT to be, about a problem or need, wouldn’t it? And some human would have had to tell him that. Because it’s never the machine (or the human planner working on behalf of people) hurting if there’s a problem; its only people who have problems.

– So it’s a Him already?

– Easy, Sophie. Okay: A She? You decide. Give her, him, it a name. So we can get on with it.

– Okay. I’d call it the APT – Abominable Planning Thing. And it’s an IT, a neuter.

– APT it is. Nicely ambiguous… For a moment I thought you meant Argumentative Planning Tool. Or Template.

– Let’s assume, for now, that somebody told it about a problem or a bright idea. So what would that APT do?

Ground rules, Principles?
Due consideration of all available information;
Whole system understanding guiding decisions
towards better (or at least not worse) outcomes
for all affected parties

– Wait: Shouldn’t we first find out some ground rules about how it’s going to work? For example, it wouldn’t do to just come up with some random idea and say ‘this is it’?

– Good point. You have any such ground rules in mind, professor?

– Sure. I think one principle is that it should try to gather and ‘duly consider’ ALL pertinent information that is available about the problem situation. Ideally. Don’t you agree, Sophie? Get the WHOLE picture? Wasn’t that part of the agenda you mentioned?

– Sounds good, professor. But is it enough to just ‘have’ all the information? Didn’t someone give a good description of the difference between ‘data’ (just givens, messages, numbers etc) and ‘information’ – the process of data changing someone’s stat of knowledge, insight, understanding?

– No, you are right. There must be adequate UNDERSTANDING – of what it means and how it all is related.

– I see a hot discussion coming up about what that really means: ‘understanding’… But go on.

– Well, next: wouldn’t we expect that there needs to be a process of developing or drawing a SOLUTION or a proposed PLAN – or several – from that understanding? Not just from the stupid data?

– Det er da svœrt så fordringfull du er idag, Sophie: Now you are getting astoundingly demanding here. Solutions based on understanding?

– Oh, quit your Norwegian bickering. I’ll do even more demanding: Mustn’t there be a way to CONNECT all that understanding, all the concerns, data, facts, arguments, with any proposed DECISION, especially the final one that leads to action, implementation. If we ever get to that?

– Are you considering that all the affected folks will expect that the decision should end up making things BETTER for them? Or at least not WORSE than before? Would that be one of your ground rules?

– Don’t get greedy here, Vodçek. The good old conservative way is to ask some poor slobs to make some heroic Sacrifices for the Common Good. “mourir pour des idées, d’accord, mais de mort lente…”  as George Brassens complains. But you are right: ideally, that would be a good way to put the purpose of the effort.

– All right, we have some first principles or expectations. We’ll probably add some more of those along the way, but I’d say it’s enough for a start. So what would our APT gizmo do to get things moving?

Obtaining information
Sources?

– I’d say it would start to inquire and assemble information about the problem’s IS state, first. Where is the problem, who’s hurting and how, etc. What caused it? Are there any ideas for how to fix it? What would be the OUGHT part — of the problem as well as a bright idea as the starting point?

– Sounds good, Bog-Hubert. Get the data. I guess there will be cases where the process actually starts with somebody having a bright idea for a solution. But that’s a piece of data too, put it in the pile. Where would it get all that information?

– Many sources, I guess. First: from whoever is hurting or affected in any way.

– By the problem, Vodçek? Or the solutions?

– Uh, I guess both. But what if there aren’t any solutions proposed yet?

– It means that the APT will have to check and re-check that whenever someone proposes a solution — throughout the whole process, doesn’t it? It’s not enough to run a single first survey of citizen preferences, like they usually do to piously meet the mandate for ‘citizen participation’. Information gathering, research, re-research, analysis will accompany the whole process.

– Okay. It’s a machine, it won’t get tired of repeated tasks.

– Ever heard of devices overheating, eh? But to go on, there will be experts on the particular kind of problem. There’ll be documented research, case studies from similar events, the textbooks, newspapers, letters to the editor, petitions, the internet. The APT would have to go through everything. And I guess there might have to be some actual ‘observation’, data gathering, measurements.

Distinctions, meaning
Understanding

– So now it has a bunch of stuff in its memory. Doesn’t it have to sort it somehow, so it can begin to do some real work on it?

– You don’t think gathering that information is work, Sophie?

– Sure, but just a bunch of megabytes of stuff… what would it do with it? Don’t tell me it can magically pull the solution from that pile of data!

– Right. Some seem to think they can… But you’ll have to admit that having all the information is part of the answer to our first expectation: to consider ALL available information. The WHOLE thing, remember? The venerable Systems Thinking idea?

– Okay. If you say so. So what to you mean by ‘consider’ – or ‘due consideration’? Just staring at the pile of data until understanding blossoms in your minds and the solution jumps out at you like the bikini-clad girl out of the convention cake? Or Aphrodite rising out of the data ocean?

– You are right. You need to make some distinctions, sort out things. What you have now, at best, are a bunch of concepts, vague, undefined ideas. The kind of ‘tags’ you use to google stuff.

– Yeah. Your argumentation buddy would say you’d have to ask for explanations of those tags – making sure it’s clear what they mean, right?

– Yes. Now he’d also make the distinction that some of the data are actual claims about the situation. Of different types: ‘fact’-claims about the current situation; ‘ought’ claims about what people feel the solution should be. Claims of ‘instrumental’ knowledge about what caused things to become what they are, and thus what will happen when we do this or that: connecting some action on a concept x with another concept ‘y’ – an effect. Useful when we are looking for x’s to achieve desired ‘y’s that we want – the ‘ought’ ideas – or avoid the proverbial ‘unexpected / undesirable’ side-and after-effect surprises of our grand plans: ‘How’ to do things.

– You’re getting there. But some of the information will also consist of several claims arranged into arguments. Like: “Yes, we should do ‘x’ (as part of the plan) because it will lead to ‘y’, and ‘y’ ought to be…” And counterarguments: “No, we shouldn’t do ‘x’ because x will cause ‘z’ which ought not to be.”

– Right. You’ve been listening to Abbé Boulah’s buddy’s argumentative stories, I can tell. Or even reading Rittel? Yes, there will be differences of opinion – not only about what ought to be, but about what we should do to get what we want, about what causes what, even about what Is the case. Is there an old sinkhole on the proposed construction site? And if so, where? That kind of issue. And different opinions about those, too. So the data pile will contain a lot of contradictory claims of all kinds. Which means, for one thing, that we, –even Spock’s relative APT — can’t draw any deductively valid conclusions from contradictory items in the data. ‘Ex contradictio sequitur quodlibet’, remember – from a contradiction you can conclude anything whatever. So APT can’t be a reliable ‘artificial intelligence’ or ‘expert system’ that gives you answers you can trust to be correct. We discussed that too once, didn’t we – there was an old conference paper from the 1990s about it. Remember?

– But don’t we argue about contradictory opinions all the time – and draw conclusions about them too?
– Sure. Living recklessly, eh? All the time, and especially in planning and policy-making. But it means that we can’t expect to draw ‘valid’ conclusions that are ‘true or false’, from our planning arguments. Just more or less plausible. Or ‘probable’ – for claims that are appropriately labeled that way.

Systems Thinking perspective
Versus Argumentative Model of Planning?

– Wait. What about the ‘Systems Thinking’ perspective — systems modeling and simulation? Isn’t that a better way to meet the expectation of ‘due consideration’ of the ‘whole system’? So should the APT develop a systems model from the information it collected?

– Glad you brought that up, Vodçek. Yes, it’s claimed to be the best available foundation for dealing with our challenges. So what would that mean for our APT? Is it going to have a split robopersonality between Systems and the Argumentative Model?

– Let’s look at both and see? There are several levels we can distinguish there. The main tenets of the systems approach have to do with the relationships between the different parts of a system – a system is a set of parts or entities, components, that are related in different ways – some say that ‘everything is connected / related to everything else’ – but a systems modeler will focus on the most significant relationships, and try to identify the ‘loops’ in that network of relationships. Those are the ones that will cause the system to behave in ways that can’t be predicted from the relationships between any of the individual pairs of entities in the network. Complexity; nonlinearity. Emergence.

– Wow. You’re throwing a lot of fancy words around there!

– Sorry, Renfroe; good morning, I didn’t see you come in. Doing okay?

– Yeah, thanks. Didn’t get hit by a nonlinearity, so far. This a dangerous place now, for that kind of thing?

– Not if you don’t put too much brandy in that café cataluñia Vodçek is brewing here.

– Hey, lets’ get back to your systems model. Can you explain it in less nonlinear terms?

– Sure, Sophie. Basically, you take all the significant concepts you’ve found, put them into a diagram, a map, and draw the relationships between them. For example, cause-effect relationships; meaning increasing ‘x’ will cause an increase in ‘y’. Many people think that fixing a system can best be done by identifying the causes that brought the state of affairs about that we now see as a problem. This will add a number or new variables to the diagram, to the ‘understanding’ of the problem.

– They also look for the presence of ‘loops’ in the diagram, don’t they? – Where cause-effect chains come back to previous variables.

– Right, Vodçek. This is an improvement over a simple listing of all the pro and con arguments, for example – they also talk about relationships x – y, but only one at a time, so you don’t easily see the whole network, and the loops, in the network. So if you are after ‘understanding the system’, seeing the network of relationships will be helpful. To get a sense of its complexity and nonlinearity.

– I think I understand: you understand a system when you recognize that it’s so loopy and complex and nonlinear that its behavior can’t be predicted so it can’t be understood?

– Renfroe… Professor, can you straighten him out?

– Sounds to me like he’s got it right on, Sophie. Going on: Of course, to be really helpful, the systems modeler will tell you that you should find a way to measure each concept, that is, find a variable – a property of the system that can be measures with precise units.

– What’s the purpose of that, other than making it look more scientific?

– Well, Renfroe, remember the starting point, the problem situation. Oh, wait, you weren’t here yet. Okay; say there’s a problem. We described it as a discrepancy between what somebody feels Is the case and what Ought to be. Somebody complains about it being too hot in here. Now just saying: ‘it’s too hot; it ought to be cooler’, is a starting point, but in order to become useful, you need to be able to say just what you mean by ‘cooler’. See, you are stating the Is/Ought problem in terms of the same variable ‘temperature’. So too even see the difference between Is and Ought, you have to point to the levels of each. 85 degrees F? Too hot. Better: cool it to 72. Different degrees or numbers on the temperature scale.

– Get it. So now we have numbers, math in the system. Great. Just what we need. This early in the morning, too.

– I was afraid of that too. It’s bound to get worse…nonlinear. So in the argumentative approach – the arguments don’t show that? Is that good or bad?

– Good question. Of course you can get to that level, if you bug them enough. Just keep asking more specific questions.

– Aren’t there issues where degrees of variables are not important, or where variables have only two values: Present or not present? Remember that the argumentative model came out of architectural and environmental design, where the main concerns were whether or not to provide some feature: ‘should the entrance to the building be from the east, yes or no?’ or ‘Should the building structure be of steel or concrete?’ Those ‘conceptual’ planning decisions could often be handled without getting into degrees of variables. The decision to go with steel could be reached just with the argument that steel would be faster and cheaper than concrete, even before knowing just by how much. The arguments and the decision were then mainly yes or no decisions.

– Good points, Vodçek. Fine-tuning, or what they call ‘parametric’ planning comes later, and could of course cause much bickering, but doesn’t usually change the nature of the main design that much. Just its quality and cost…

Time
Simulation of systems behavior

– Right. And they also didn’t have to worry too much about the development of systems over time. A building, once finished, will usually stay that way for a good while. But for policies that would guide societal developments or economies, the variables people were concerned about will change considerably over time, so more prediction is called for, trying to beat complexity.

– I knew it, I knew it: time’s the culprit, the snake in the woodpile. I never could keep track of time…

– Renfroe… You just forget winding up your old alarm clock. Now, where were we? Okay: In order to use the model to make predictions about what will happen, you have to allocate each relationship step to some small time unit: x to y during the first time unit; y to z in the second, and so on. This will allow you to track the behavior of the variables of the system over time, give some initial setting, and make predictions about the likely effects of your plans. The APT computer can quickly calculate predictions for a variety of planning options.

– I’ve seen some such simulation predictions, yes. Amazing. But I’ve always wondered how they can make such precise forecasts – those fine crisp lines over several decades: how do they do that, when for example our meteorologists can only make forecasts of hurricane tracks of a few days only, tracks that get wider like a fat trumpet in just a few days? Are those guys pulling a fast one?

– Good point. The answer is that each simulation only shows the calculated result of one specific set of initial conditions and settings of relationships equations. If you make many forecasts with different numbers, and put them all on the same graph, you’d get the same kind of trumpet track. Or even a wild spaghetti plate of tracks.

– I am beginning to see why those ‘free market’ economists had such an advantage over people who wanted to gain some control of the economy. They just said: the market is unpredictable. It’s pointless to make big government plans and laws and regulations. Just get rid of all the regulations, let the free market play it out. It will control and adapt and balance itself by supply and demand and competition and creativity.

– Yeah, and if something goes wrong, blame it on the remaining regulations of big bad government. Diabolically smart and devious.

– But they do appreciate government research grants, don’t they? Wait. They get them from the companies that just want to get rid of some more regulations. Or from think tanks financed by those companies.

– Hey, this is irresponsibly interesting but way off our topic, wouldn’t you say?

– Right, Vodçek. Are you worried about some government regulation – say, about the fireworks involved in your café catastrofia? But okay. Back to the issue.

– So, to at least try to be less irresponsible, our APT thing would have systems models and be able to run simulations. A simulation, if I understand what you were saying, would show how the different variables in the system would change over time, for some assumed initial setting of those variables. That initial setting would be different from the ‘current’ situation, though, wouldn’t it? So where does the proposed solution in the systems model come from? Where are the arguments? Does the model diagram show what we want to achieve? Or just the ‘current state’?

Representation of plan proposals
and arguments in the systems model?
Leverage points

– Good questions, all. They touch on some critical problems with the systems perspective. Let’s take one at a time. You are right: the usual systems model does not show a picture of a proposed solution. To do that, I think we’ll have to expand a little upon our description of a plan: Would you agree that a plan involves some actions by some actors, using some resources acting upon specific variables in the system? Usually not just one variable but several. So a plan would be described by those variables, and the additional concepts of actions, actor, resources etc. Besides the usual sources of plans, — somebody’s ‘brilliant idea’, some result of a team brainstorming session, or just an adaptation of a precedent, a ‘tried and true’ known solution with a little new twist, —  the systems modeler may have played around with his model and identified some ‘leverage points’ in the system – variables where modest and easy-to-do changes can bring about significant improvement elsewhere in the system: those are suggested starting points for solution ideas.

– So you are saying that the systems tinkerer should get with it and add all the additional solution description to the diagram?

– Yes. And that would raise some new questions. What are those resources needed for the solution? Where would they come from, are they available? What will they cost? And more: wouldn’t just getting all that together cause some new effects, consequences, that weren’t in the original data collection, and that some other people than those who originally voiced their concerns about the problem would now be worried about? So your data collection component will have to go back to do some more collecting. Each new solution idea will need its own new set of information.

– There goes your orderly systematic procedure all right. That may go on for quite some time, eh?

– Right. Back and forth, if you want to be thorough. ‘Parallel processing’. And it will generate more arguments that will have to be considered, with questions about how plausible the relationship links are, how plausible the concerns about the effects – the desirable / undesirable outcomes. More work. So it will often be shouted down with the usual cries of ‘analysis paralysis’.

Intelligent analysis of data:
Generating ‘new’ arguments?

– Coming to think of it: if our APT has stored all the different claims it has found – in the literature, the textbooks, previous cases, and in the ongoing discussions, would it be able to construct ‘new’ arguments from those? Arguments the actual participants haven’t thought about?

– Interesting idea, Bog-Hubert. – It’s not even too difficult. I actually heard our friend Dexter explain that recently. It would take the common argument patterns – like the ones we looked at – and put claim after claim into them, to see how they fit: all the if-then connections to a proposal claim would generate more arguments for and against the proposal. Start looking at an ‘x’ claim of the proposal. Then search for (‘google’)  ‘x→ ?’:  any ‘y’s in the data that have been cited as ‘caused by x’. If a ‘y’ you found was expressed somewhere else as ‘desirable or undesirable’ – as a deontic claim, — it makes an instant ‘new’ potential argument. Of course, whether it would work as a ‘pro’ or a ‘con’ argument in some participant’s mind would depend on how that participant feels about the various premises.

– What are you saying, professor? This doesn’t make sense. A ‘pro’ argument is a ‘pro’ argument, and ‘con’ argument is a ‘con’ argument. Now you’re saying it depends on the listener?

– Precisely. I know some people don’t like this. But consider an example. People are discussing a plan P; somebody A makes what he thinks is a ‘pro’ argument: “Let’s do P because P will produce Q; and Q is desirable, isn’t it?” Okay, for A it is a pro argument, no question. Positive plausibility, he assumes, for P→Q as well as for Q; so it would get positive plausibility pl for P. Now for curmudgeon B, who would also like to achieve Q but is adamant that P→Q won’t work, (getting a negative pl) that set of premises would produce a negative pl for P, wouldn’t it? Similarly, for his neighbor C, who would hate for Q to become true, but thinks that P→Q will do just that, that same set of premises also is a ‘con’ argument.

– So what you’re saying is that all the programs out there, that show ‘dialogue maps’ identifying all arguments as pro or con, as they were intended by their authors, are patently ignoring the real nature and effects of arguments?

– I know some people have been shocked – shocked — by these heretical opinions – they have been written up. But I haven’t seen any serious rebuttals; those companies, if they have heard of them have chosen to ignore them. Haven’t changed their evil ways though…

– So our devious APT could be programmed to produce new arguments. More arguments. Just what we need. The arguments can be added to the argument list, but I was going to ask you before: how would the deontic claims, the ‘oughts’, be shown in the model?

– You’d have to add another bubble to each variable bubble, right? Now, we have the variable itself, the value of each variable in the current IS condition, the value of the variable if it’s part of a plan intervention, and the desired value – hey: at what time?

– You had to put the finger on the sore spot, Vodçek. Bad boy. Not only does this make the diagram a lot less clean, simple, and legible. Harder to understand. And showing what somebody means by saying what the solution ought to achieve, when all the variables are changing over time, now becomes a real challenge. Can you realistically expect that a desired variable should stay ‘stable’ at one desired value all the time, after the solution is implemented? Or would people settle for something like: remaining within a range of acceptable values? Or, if a disturbance has occurred, return to a desired value after some reasonably short specified time?

– I see the problem here. Couldn’t the diagram at least show the central desired value, and then let people judge whether a given solution comes close enough to be acceptable?

– Remember that we might be talking about a large number of variables that represent measures of how well all the different concerns have been met by a proposed solution. But if you don’t mind complex diagrams, you could add anything to the systems model. Or you can use several diagrams. Understanding can require some work, not just sudden ‘aha!’ enlightenment.

Certainty about arguments and predictions
Truth, probability, plausibility and relative importance of claims

– And we haven’t even talked about the question of how sure we can be that a solution will actually achieve a desired result.

– I remember our argumentative friends at least claimed to have a way to calculate the plausibility of a plan proposal based on the plausibility of each argument and the weight of relative importance of each deontic, each ought concern. Would that help?

– Wait, Bog-hubert: how does that work, again? Can you give us the short explanation? I know you guys talked about that before, but…

– Okay, Sophie: The idea is this: a person would express how plausible she thinks each of the premises of an argument are. On some plausibility scale of, say +1 which means ‘totally plausible’, to -1 which means ‘totally implausible; with a midpoint zero meaning ‘don’t know, can’t tell’. These plausibility values together will then give you an ‘argument plausibility’ – on the same scale, either by multiplying them or taking the lowest score as the overall result. The weakest link in the chain, remember. Then: multiplying that plausibility with the weight of relative importance of the ought- premise in the argument, which is a value between zero and +1 such that all the weights of all the ‘oughts’ in all the arguments about the proposal will add up to +1. That will give you the ‘argument weight’ of each argument; and all the argument weights together will give you the proposal plausibility – again, on the same scale of +1 to -1, so you’d know what the score means. A value higher than zero means it’s somewhat plausible; a value lower than zero and close to -1 means it’ so implausible that it should not be implemented. But we aren’t saying that this plausibility could be used as the final decision measure.

– Yeah, I remember now. So that would have to be added to the systems model as well?

– Yes, of course – but I have never seen one that does that yet.

‘Goodness’ of solutions
not just plausibility?

– But is that all? I mean: ‘plausibility’ is fine. If there are several proposals to compare: is plausibility the appropriate measure? It doesn’t really tell me how good the plan outcome will be? Even comparing a proposed solution to the current situation: wouldn’t the current situation come up with a higher plausibility — simply because it’s already there?

– You’ve got a point there. Hmm. Let me think. You have just pointed out that both these illustrious approaches – the argumentative model, at last as we have discussed it so far, as well as the systems perspective, for all its glory, have both grievously sidestepped the question of what makes a solution, a systems intervention ‘good’ or bad’. The argument assessment work, because it was just focused on the plausibility of arguments; as the first necessary step that had not been looked at yet. And the systems modeling focusing on the intricacies of the model relations and simulation, leaving the decision and its preparatory evaluation, if any, to the ‘client.’ Fair enough; they are both meritorious efforts, but it leaves both approaches rather incomplete. Not really justifying the claims of being THE ultimate tools to crack the wicked problems of the world. It makes you wonder: why didn’t anybody call the various authors on this?

– But haven’t there long been methods, procedures for people to evaluate to the presumed ‘goodness’ of plans? Why wouldn’t they have been added to either approach?

– They have, just as separate, detached and not really integrated extra techniques. Added, cumbersome complications, because they represent additional effort and preparation, even for small groups. And never even envisaged for large public discussions.

– So would you say there are ways to add the ‘goodness’ evaluation into the mix? We’ve already brought systems and arguments closer together? You say there are already tools for doing that?

– Yes, there are. For example, as part of a ‘formal’ evaluation procedure, you can ask people to explain the basis of their ‘goodness’ judgment about a proposed solution by specifying a ‘criterion function’ that shows how that judgment depends on the values of a system variable. The graph of it looks like this: On one axis it would have positive (‘like’, ‘good’, desirable’) judgment values on the positive side, and ‘dislike’, ‘bad’, ‘undesirable ‘ values on the negative one, with a midpoint of ‘neither good nor bad’ or ‘can’t decide’. And the specific system variable on the other axis, for example that temperature scale from our example a while ago. So by drawing a line in the graph that touches the ‘best possible’ judgment score at the person’s most comfortable temperature, and curves down towards ‘so-so, and down to ‘very bad’ and ultimately ‘intolerable’, couldn’t get worse’, a person could ‘explain’ the ‘objective’, measurable basis of her subjective goodness.

– But that’s just one judgment out of many others she’d have to make about all the other system variables that have been declared ‘deontic’ targets? How would you get to an overall judgment about the whole plan proposal?

– There are ways to ‘aggregate’ all those partial judgments into an overall deliberated judgment. All worked out in the old papers describing the procedure. I can show you that if you want. But that’s not the real problem here – you don’t see it?

– Huh?

The problem of  ‘aggregation’

of many different personal, subjective judgments
into group or collective decision guides

– Well, tell me this, professor: would our APTamajig have the APTitude to make all those judgments?

– Sorry, Bog-Hubert: No. Those judgments would be judgments of real persons. The APT machine would have to get those judgments from all the people involved.

– That’s just too complicated. Forget it.

– Well, commissioner, — you’ve been too quiet here all this time – remember: the expectation was to make the decision based on ‘due consideration’ of all concerns. Of everybody affected?

– Yes, of course. Everybody has the right to have his or her concerns considered.

– So wouldn’t ‘knowing and understanding the whole system’ include knowing how everybody affected feels about those concerns? Wasn’t that, in a sense, part of your oath of office, to serve all members of the public to the best of your knowledge and abilities? So now we have a way to express that, you don’t want to know about that because it’s ‘too complicated?

– Cut the poor commissioner some slack: the systems displays would get extremely crowded trying to show all that. And adding all that detail will not really convey much insight.

– It would, professor, if the way that it’s being sidestepped wasn’t actually a little more tricky, almost deceptive. Commissioner, you guys have some systems experts on your staff, don’t you? So where do they get those pristine performance track printouts of their simulation models?

– Ah. Huh. Well, that question never came up.

– But you are very concerned about public opinion, aren’t you? The polls, your user preference surveys?

– Oh, yeah: that’s a different department – the PR staff. Yes, they get the Big Data about public opinions. Doing a terrific job at it too, and we do pay close attention to that.

– But – judging just from the few incidents in which I have been contacted by folks with such surveys – those are just asking general questions, like ‘How important is it to attract new businesses to the city?’ Nobody has ever asked me to do anything like those criterion functions the professor was talking about. So if you’re not getting that: what’s the basis for your staff recommendations about which new plan you should vote for?

– Best current practice: we have those general criteria, like growth rate, local or regional product, the usual economic indicators.

– Well, isn’t that the big problem with those systems models? They have to assume some performance measure to make a recommendation. And that is usually one very general aggregate measure – like the quarterly profit for companies. Or your Gross National Product, for countries. The one all the critics now are attacking, for good reasons, I’d say, — but then they just suggest another big aggregate measure that nobody really can be against – like Gross National Happiness or similar well-intentioned measures. Sustainability. Systemicity. Whatever that means.

– Well, what’s wrong with those? Are you fixin’ to join the climate change denier crowd?

– No, Renfroe. The problem with those measures is that they assume that all issues have been settled, all arguments resolved. But the reality is that people still do have differences of opinions, there will still be costs as well as benefits for all plans, and those are all too often not fairly distributed. The big single measure, whatever it is, only hides the disagreements and the concerns of those who have to bear more of the costs. Getting shafted in the name of overall social benefits.

Alternative criteria to guide decisions?

– So what do you think should be done about that? And what about our poor APT? It sounds like most of the really important stuff is about judgments it isn’t allowed or able to make? Would even a professional planner named APT – ‘Jonathan Beaujardin APT, Ph.D M.WQ, IDC’ — with the same smarts as the machine, not be allowed to make such judgments?

– As a person, an affected and concerned citizen, he’d have the same right as everybody else to express his opinions, and bring them into the process. As a planner, no. Not claiming to judge ‘on behalf’ of citizens – unless they have explicitly directed him to do that, and told him how… But now the good Commissioner says he wouldn’t even need to understand his own basis of judgment,  much less make it count in the decision?

– Gee. That really explains a lot.

– Putting it differently: Any machine – or any human planner, for that matter, however much they try to be ‘perfect’ – trying to make those judgments ‘on behalf’ of other people, is not only imperfect but wrong, unless it has somehow obtained knowledge about those feelings about good or bad of others, and has found an acceptable way of reconciling the differences into some overall common ‘goodness’ measure. Some people will argue that there isn’t any such thing: judgments about ‘good or ‘bad’ are individual, subjective judgments; they will differ, there’s no method by which those individual judgments can be aggregated into a ‘group’ judgment that wouldn’t end up taking sides, one way or the other.

– You are a miserable spoilsport, Bog-Hubert. Worse than Abbé Boulah! He probably would say that coming to know good and bad, or rather thinking that you can make meaningful judgments about good or bad IS the original SIN.

– I thought he’s been excommunicated, Vodçek? So does he have any business saying anything like that? Don’t put words in his mouth when he’s not here to spit them back at you. Still, even if Bog-Hubert is right: if that APT is a machine that can process all kinds of information faster and more accurate than humans, isn’t there anything it can do to actually help the planning process?

– Yes, Sophie, I can see a number of things that can be done, and might help.

– Let’s hear it.

– Okay. We were assuming that APT is a kind of half-breed argumentative-systems creature, except we have seen that it can’t make up either new claims nor plausibility nor goodness judgments on its own. It must get them from humans; only then can it use them for things like making new arguments. If it does that, — it may take some bribery to get everybody to make and give those judgments, mind you – it can of course store them, analyze them, and come up with all kinds of statistics about them.
One kind of information I’d find useful would be to find out exactly where people disagree, and how much, and for what reasons. I mean, people argue against a policy for different reasons – one because he doesn’t believe that the policy will be effective in achieving the desired goal – the deontic premise that he agrees with – and the other because she disagrees with the goal.

– I see: Some people disagree with the US health plan they call ‘Obamacare’ because they genuinely think it has some flaws that need correcting, and perhaps with good reasons. But others can’t even name any such flaws and just rail against it, calling it a disaster or a trainwreck etc. because, when you strip away all the reasons they can’t substantiate, simply because it’s Obama’s.

– Are you saying Obama should have called it Romneycare, since it was alleged to be very similar to what Romney did in Massachusetts when he was governor there? Might have gotten some GOP support?

– Let’s not get into that quarrgument here, guys. Not healthy. Stay with the topic. So  our APT would be able to identify those differences, and other discourse features that might help decide what to do next – get more information, do some more discussion, another analysis, whatever. But so far, its systems alter ego hasn’t been able to show any of that in the systems model diagram, to make that part of holistic information visible to the other participants in the discourse.

– Wouldn’t that require that it become fully conscious of its own calculations, first?

– Interesting question, Sophie. Conscious. Hmm. Yes: my old car wouldn’t show me a lot of things on the dashboard that were potential problems – whether a tire was slowly going flat or the left rear turn indicator was out – so you could say it wasn’t aware enough, — even ‘conscious?’ — of those things to let me know. The Commissioner’s new car does some of that, I think. Of course my old one could be very much aware but just ornery enough to leave me in the dark about them; we’ll never know, eh?

– Who was complaining about running off the topic road here just a while ago?

– You’re right, Vodçek: sorry. The issue is whether and how the system could produce a useful display of those findings. I don’t think it’s a fundamental problem, just work to do. My guess is that all that would need several different maps or diagrams.

Discourse –based criteria guiding collective decisions?

– So let’s assume that not only all those judgments could be gathered, stored, analyzed and the results displayed in a useful manner. All those individual judgments, the many plausibility and judgment scores and the resulting overall plan plausibility and ‘goodness’ judgments. What’s still open is this: how should those determine or at least guide the overall group’s decision? In a way that makes it visible that all aspects, all concerns were ‘duly considered’, and ending up in a result that does not make some participants feel that their concerns were neglected or ignored, and that the result is – if not ‘the very best we could come up with’ then at least somewhat better than the current situation and not worse for anybody?

– Your list of aspects there already throws out a number of familiar decision-making procedures, my friend. Leaving the decision to authority, which is what the systems folks have cowardly done, working for some corporate client, (who also determines the overall ‘common good’ priorities for a project, that will be understood to rank higher than any individual concerns) – that’s out. Not even pretending to be transparent or connected to the concerns expressed in the elaborate process. Even traditional voting, that has been accepted as the most ‘democratic’ method, for all its flaws. Out. And don’t even mention ‘consensus’ or the facile ‘no objection?‘ version. What could our APT possibly produce that can replace those tools? Do we have any candidate tools?

– If you already concede that ‘optimal’ solutions are unrealistic and we have to make do with ‘not worse – would it make sense to examine possible adaptations to one of the familiar techniques?

– It may come to that if we don’t find anything better – but I’d say let’s look at the possibilities for alternatives in the ideas we just discussed, first? I don’t feel like going through the pros and cons about our current tools. It’s been done.

– Okay, professor: Could our APT develop a performance measure made up of the final scores of the measures we have developed? Say, the overall goodness score modified by the overall plausibility score a plan proposal achieved?

– Sounds promising.

– Hold your horses, folks. It sounds good for individual judgment scores – may even tell a person whether she ought to vote yes or no on a plan – but how would you concoct a group measure from all that – especially in the kind of public asynchronous discourse we have in mind? Where we don’t even know what segment of the whole population is represented by the participants in the discourse and its cumbersome exercises, and how they relate to the whole public populations for the issue at hand?
– Hmm. You got some more of that café catawhatnot, Vodçek?

– Sure – question got you flummoxed?

– Well, looks like we’ll have to think for a while. Think it might help?

– What an extraordinary concept!

– Light your Fundador already, Vodçek, and quit being obnoxious!

– Okay, you guys. Lets examine the options. The idea you mentioned, Bog-Hubert, was to combine the goodness score and the plausibility score for a plan. We could do that for any number of competing plan alternatives, too.

– It was actually an idea I got from Abbé Boulah some time ago. At the time I just didn’t get its significance.

– Abbé Boulah? Let’s drink to his health. So we have the individual scores: the problem is to get some kind of group score from them. The mean – the average – of those scores is one; we discussed the problems with the mean many times here, didn’t we? It obscures the way the scores are distributed on the scale: you get the same result from a bunch of scores tightly grouped around that average as you’d get from two groups of extreme scores at opposite ends of the scale. Can’t see the differences of opinion.

– That can be somewhat improved upon if you calculate the variance – it measures the extent of disagreement among the scores. So if you get two alternatives with the same mean, the one with the lower variance will be the less controversial one. The range is a crude version of the same idea – just take the difference between the highest and the lowest score; the better solution is the one with a smaller range.

– What if there’s only one proposal?

– Well, hmm; I guess you’d have to look at the scores and decide if it’s good enough.

– Let’s go back to what we tried to do – the criteria for the whole effort: wasn’t there something about making sure that nobody ends up in worse shape in the end?

– Brilliant, Sophie – I see what you are suggesting. Look at the lowest scores in the result and check whether they are lower or higher than, than …

– Than what, Bog-Hubert?

– Let me think, let me think. If we had a score for the assessment of the initial condition for everybody (or for the outcome that would occur if the problem isn’t taken care of) then an acceptable solution would simply have to show a higher score than that initial assessment, for everybody. Right? The higher the difference, even something like the average, the better.

– Unusual idea. But if we don’t have the initial score?

– I guess we’d have to set some target threshold for any lowest score – no lower than zero (not good, not bad) or at least a + 0.5 on a +2/-2 goodness scale, for the worst-off participant score? That would be one way to take care of the worst-off affected folks. The better-off people couldn’t complain, because they are doing better, according to their own judgment. And we’d have made sure that the worst-off outcomes aren’t all that bad.

– You’re talking as if ‘we’ or that APT thing is already set up and doing all that. The old Norwegian farmer’s rule says: Don’t sell the hide before the bear is shot! It isn’t that easy though, is it? Wouldn’t we need a whole new department, office, or institution to run those processes for all the plans in a society?

– You have a point there, Vodçek. A new branch of government? Well now that you open that Pandora’s box: yes, there’s something missing in the balance.

– What in three twisters name are you talking about, Bog-Hubert?

– Well, Sophie. We’ve been talking about the pros and cons of plans. In government, I mean the legislative branch that makes the laws, that’s what the parties do, right? Now look at the judicial branch. There, too, they are arguing – prosecutor versus defense attorney – like the parties in the House and Senate. But then there’s a judge and the jury: they are looking at the pros and cons of both sides, and they make the decision. Where is that  jury or judge ‘institution’ in the legislature? Both ‘chambers’ are made up of parties, who too often look like they are concerned about gaining or keeping their power, their majority, their seats, more than the quality of their laws. Where’s the jury? The judge? And to top that off: even the Executive is decided by the party, in a roundabout process that looks perfectly designed to blow the thinking cap off every citizen. A spectacle! Plenty of circenses but not enough panem. Worse than old Rome…

– Calm down, Bog-Hubert. Aren’t they going to the judiciary to resolve quarrels about their laws, though?

– Yes, good point. But you realize that the courts can only make decisions based on whether a law complies with the Constitution or prior laws – issues of fact, of legality. Not about the quality, the goodness of the law. What’s missing is just what Vodçek said: another entity that looks at the quality and goodness of the proposed plans and policies, and makes the decisions.

– What would the basis of judgment of such an entity be?

– Well, didn’t we just draw up some possibilities? The concerns are those that have been discussed, by all parties. The criteria that are drawn from all the contributions of the discourse.  The party ‘in power’ would only use the criteria of its own arguments, wouldn’t it? Just like they do now… Of course the idea will have to be discussed, thought through, refined. But I say that’s the key missing element in the so-called ‘democratic’ system.

– Abbé Boulah would be proud of you, Bog-Hubert. Perhaps a little concerned, too? Though I’m still not sure how it all would work, for example considering that the humans in the entity or ‘goodness panel’ are also citizens, and thus likely ‘party’. But that applies to the judge and jury system in the judicial as well. Work to do.

– And whatever decision they come up with, that worst-off guy could still complain that it isn’t fair, though?

– Better that 49% of the population peeved and feeling taken advantage of? Commissioner: what do you say?

– Hmmm. That one guy might be easier to buy off than the 49%, yes. But I’m not sure I’d get enough financing for my re-election campaign with these ideas. The money doesn’t come from the worst-off folks, you know…

– Houston, we have a problem …