Papers as graphs or to “sex things up” can this help spot disinformation (?) and/or improve peer reviewing 

The work on causal claims in economics has taken off and there is quite a lot of interest in the methodology used across many different fields. I keep on saying, technology will humble us all and there may well have to be a fundamental debate about the social contract and the organisation of society. I have been part of such debates and in a strange way, it is intruiguing to see how it does seem like there is adjustment in policy spaces following such conversations. At least that is my perception, but again, it may be wrong or I am just intrigued.

I think hyper personalization, if anything, should be done in small communities of people that support and trust one another. Groups where care and compassion is a defining principle of the human relationship. Defining the relationships between smaller communities and bigger ones then requires a role for individuals that can traverse between fields or groups. I think in my work I have criss-crossed so many different fields and have had exposure to so many different areas of work at different levels that I think there is, deep down, a growing appreciation of the nuances and subtleties involved. But it does mean some neural nodes have connected in ways that they may not have for others. I also think this is the bits that folks like Musk may not understand. Or, they react to the ongoing change as something to be afraid of. That is why I wrote about the encoding of the first bit.

Now, let’s have a look at papers as graphs. The project has been gestating for a long time. My students or people I worked with will see the patterns and I guess there is some connecting of dots at a meta level that I usually do at our socials. Some of you may have been reading my ramblings on social media which, at times, can get quite cryptic. I want to give a bit of an outlook on some future directions this work of knowledge and networks as graphs will take. But I think it is worth to delve a bit more deeply in some detail on graph structures. 

What strikes us as most relevant at this point is to characterize the “average” shape of a causal claims graph and how it may help evaluate research papers in economics and other fields going forward. Much like with storytelling, plays, movies or so — economics and social science research is characterized by a shape and a topology of how an argument is being made. There is a arc and may, and this is where things get cryptic: our whole life has an arc and the encoding of the first bit’s is really what may matter. 

There is much to learn from this in a way that I think may help streamline and possibly make a contribution to improve academic publishing, in particular, speeding it up and reducing the refereeing burden. It may also help in tackling disinformation or help shape our understanding of the media ecosystem, but more on this later.

Let’s talk graphs. Let me illustrate this with an example of a paper I know reasonably well: a paper on Housing Insecurity & Homelessness in the UK. This very much has to do with the lived experience of early 2009–2012 in and around London. At the time, I was every now and then helping out with a Winter Night Shelter where a local church community helped out.

When we leverage our prompt to retrieve our causal claims, it extracts the following causal claims from the paper:

Set of causal claims retrieved in Fetzer, Sen and Souza (2023)

What does such a graph look like when you plot it? Ultimately is a star-like layout with a root and many terminal nodes. This is an archetype of a graph structure that you see quite often in applied economics papers. It is a tree that is spanned which essentially links the X variable, measuring a shock in the UK welfare system to a range of y-variables or outcomes.

Causal Claims in Fetzer, Sen and Souza (2023) 

Sometimes, what can happen in empirical papers is that such a star-like structure is derived from a range of different empirical exercises that do not stem from the same econometric framework. For example, you could have one relationship being only estimated in a cross-sectional framework, rather than being arrived at in a panel framework. In other cases, some arcs may be estimated with data of different granularity compared to the main unit of analysis. 

Right now our causal claims in Economics is in version 7. But we are working on augmenting and also going to other fields of research.

From a set of claims to a story Of course, the paper story is not captured in the star-like graph. The graph is not arranged. What is a story? A story ultimately is just a temporal sequencing of A leading to B leading to C and maybe, at the “end” of the story, there is an “aha, I learned something” or, a realisation this was “time well spent”. 

So to turn a set of empirical observations into a story you need to think of a way of structuring the set of empirical observations. This is what we would then often refer to as a “mechanism”. In the case of the above paper, the implicit mechanism can be visually displayed by adding arcs that form the chain of events.

A story is spanned across a set of empirical claims from Fetzer, Sen and Souza (2023)

The structure of the graph has now changed in so far that this is now not a minimum spanning tree, but that there are also the direct (or indirect, depending on your brain’s wiring) paths. These (in)direct links tell the implicit story of the paper: cuts to housing benefit increased rent arrears, resulting in evictions, driving involuntary homelessness. The twist in the paper here is that the increase in council spending on homelessness prevention neutralized many of the original housing benefit savings (and yes, I am aware that Ministry of Housing, Communities and Local Government did look at this research).

Anyways. You may be able to rearrange the sequence of claims or the order, but the logical “flow” in terms of chains of events here things are quite “clear”. If you think about refereeing, one way to evaluate the veracity of a set of claims is to check for timing consistency of a sequence events and effects. This is something that one can study with the underlying panel data to see if the dynamics align (dynamic consistency). And yes, these are aspects that we have started incorporating in our causal claims v8.

Now, we can think about these as qualitative evaluation filters that one may want to apply when evaluating applied economics research. The “questions” to as in the review process and a lot of this can be retrieved.

A risk of an overclaim arises when we, e.g. piece together a story of not necessary “aligned” empirical and measurement frameworks as the set of identifying assumptions are possibly much more restrictive and its easier to find a set of false positives in a set of non-connected exercises just by looking hard enough. So this is one quality filter.

Overclaiming In one extension in Causal Claims, we are experimenting with trying to construct measures of the extent to which a paper, and its narrative may be “overclaiming” vis-à-vis the substantive or positive econometric exercise. An alternative way is to think of this as “overselling”. One specific way to think about this is to retrieve a graph of causal claims, and the narrative — that is, the arranged sequence of claims — solely from the text description. The second bit is the claims that are empirically evidenced and to what extent they are evidenced in depth.

A feature that I have noticed in work is that, well, sometimes, the work can take on a dynamic of itself, for example in journalistic accounts of research — that may well skew the way research is represented to broader masses. There was always the articulated need to “making it simple”; a few lines or few words. No complexity. Busy people. When in fact I did have at times longer conversations with some folks that seem (very) close to policy. 

How can one measure overclaiming? One way is to leverage the classification system of economic literature as we presently do. One can attempt to map the paper title to concepts, for example, Housing Insecurity and Homelessness — this links two high level concepts but is quite devoid of detail how either the former or the latter are actually measured. The JEL classification system is quite flat and not as hierarchic as other classification systems (think hard Science), yet, it allows for a mapping of links between first and second digit in terms of JEL codes.

An overclaim may be detected if, for example, the measurement is more granular — a simple way to spot this is when it takes many words to describe the actual measurement X. Individuals deemed involuntary homeless — complex. Homeless — easy to understand and attached with visual stereotypes (I shared my thinking on this with quite a few people — the first time to a broader audience in April 2024 in Berlin — but beforehand, for example, in February 2020 at Warwick in an internal seminar — the “local decline” work but the homelessness aimed to do exactly the same — but, measurement issues and hence was asked to remove the populism results). An attention grabbing headline or title may thus be spotted by exploring literally the number of words you need to describe a concept vis-a-vis its empirical measurement.

But of course, this does not capture an overclaim by itself. It is in conjunction with a shock.

Another ingredient to over may be significance.

(Statistical?) Significance Going back to the example of the paper, I plot here the version of the links that are evidenced in the paper. Both direct- as well as indirect links. Moving from a star like graph to this web or chain of links is what spans a story.

With such a web of relationships, it should be clear that we need to think differently about statistical significance. First, of course, the graph implies that there is a mechanic correlation structure. This means, naïve corrections for multiple hypothesis testing with standard clustering approaches are not suitable (hence why I like randomization inference or permutation approaches). But what is more: sometimes, statistical significance is constrained by the quality of outcome measurement or treatment measurement, e.g. due to data being aggregated or not made available, for example, for confidentiality or privacy reasons.

This may constrain the “number of stars” in a single exercise which, unfortunately, in my publishing experience, at times, resulted in papers being rejected. A focus was on the individual tree, not the implicit forest that is spun by a narrative of a set of evidenced causal claims.

I think this is why ongoing research on inference in high dimensional research is so relevant, and, not surprisingly, the ERC is supporting exactly one such project in its current Consolidator Grants (if my understanding just from the title of the project is true). This also links with the work with Jakob, Christina at the Office of National Statistics and CMA respectively

Thinking about statistical significance A simple way to benchmark “significance” is against the null of there being no effect in the joint distribution and assuming independence. An alternative way is to evaluate a set of claims in a more systematic way. For example, one can do an exercise that considers effect sizes relative to shock sizes.

The above paper is a great example of some of the confusion that sometimes arises with empirical research. We did not have at our disposal, individual level linked data that could allow us to trace through the impact that the cuts had, across all the stages that we document. But we know that, for the average family, the cut to housing benefit was quite sizable — for some, it could be up to several 100 pounds per month, which is a big deal if you can just make ends meet in an environment with chronic housing undersupply (hence, work on planning I started in 2022) driving rents.

But essentially what you can do is you can chain the effect sizes in descending order based on some priors. These should “align” in terms of magnitude. I must admit, I have not done that in the Housing Insecurity paper, but it should be totally doable. This would help think about statistical significance in a more structural sense as ultimately, you will have gradually declining effect sizes relative to the whole treated population when evaluating against the effect size distribution.

Now, we are working on this to refine as I think this graph like retrieval could really help the reviewing process and help editors do a better job and help referees by evaluating the logic cohesion. In the end, I guess there is beauty in complexity.

Thinking about disinformation and conspiracy theories Now, I promised I would say something about conspiracy theories or disinformation. What is a conspiracy theory, or what is disinformation? At the heart is a story that posits some behavior and from there on spans a set of causal claims or chains. What is the case with most disinformation is that, the starting premise is something that is not observable or not a large phenomena. 

Some random or bizarre fact. And you can potentially connect a story or a chain of events. Most humans are predisposed to like stories over numbers. That is why I pitched this framework to the European Commission/JRC folks — ultimately, to see if F(data) = stories and, or if F^-1(h(stories))=data which, as a concept, I introduced in a short memo to the JRC and more broadly here.

If the set of causal claims that chain together claims made by individuals with sample size N=1, the claims are highly bizarre, not shared or amplified by specific individual groups and/or the chain is not indicative of a high degree of diversity of views or consideration, then we can probably easily conclude that this may be indicative of misinformation or primarily an attempt at mobilization. I need to think a bit more about this but that is why I think zero knowledge proofs are so important as it creates a role for trust. But of course, it will require a rewiring of the economy and that is exactly what appears to be in the works big time.

Thinking about the media ecosystem We are all shaped by our very own life experiences. The encoding of the first bits. Now, some of us have more cognitive complexity in some domains, some have more in others. We all have different skills and information processing, connecting dots, seeing relationships, patterns is one of these. In addition, there is obviously human creativity which AI can help or augment — say, if you lack your own words to describe your thoughts in a language that may be socially acceptable.

Anyways, but you see the challenge. The need for simplicity simply arises because some folks have different engagement capacity. But this is where invariably always a tension arises in a population with heterogenous abilities, talents etc., as invariably when you may dumb things down or make content more easily accessible, one threatens to irritate others creating noise and cacophony.

Yet, at the same time, you need the possibility of mass communication to organize collective action. But that is why, in some way, one line of inquiry would induce me to conclude that it is entirely reasonable why there is a conflict. The best we can do is to make sure we align our understanding of the same pieces of evidence as I had argued in the 2023 Keynote Shared Perspectives in Bologna. I was overwhelmed and stopped the talk short.

But you see this is a challenge as ultimately disinformation employees some of the very same techniques that may be needed to induce a desired behavioral change. And this links with marketing as well that also uses these techniques and turns out to be a huge industry. That is why we need to measure and better capture the exchange of knowledge, data, etc. and for this we need to arrive at a governance framework for the global trading system as some of these are increasingly traded as services. And, we need to come to a global accord. The US is aggressively rolling out digital ID as I had commented on here extensively, but I first noticed some moves in late 2022. The UK is a hot bed here as its one of the flash points of geopolitical conflict, but so is Germany,

Anyways, a more plural media ecosystem can be helpful but it needs to be subject to a healthy degree of diversity on the receiving side so that people can internalize the impact that their words may have on others. This requires a new information topology with more decentralised nodes and we need trust anchors around. And this trust can be measured and may even be displayed. I do not think that a super centralised single node to become the gateway is a desirable feature here. But it could well be that the trust system that awards “influence” is something that is subject to a type of peer review or evaluation and for that, again, we need to negotiate the informational boundaries.


Posted

in

by

Tags: