Many years ago, I served on a committee responsible for recommending funding levels for research grants. After the awards were announced, a colleague commented that all we did was count the number of publications and award grants in proportion to that number. So, I checked and did a scatter plot. Boy, did they scatter. The correlation between the grant size and the number of publications was not that strong. I then tried citations; again a large scatter. Well, perhaps the results really were random—nah, that could not happen; I was on the committee after all.
I did not do a multivariable analysis, but there were no simple correlations between what might be called quantitative indicators and the size the research grant. This supports the conclusions of the Expert Panel on Science Performance and Research Funding: Mapping research funding allocation directly to quantitative indicators is far too simplistic, and is not a realistic strategy. Trying to do that is making the mistake of the logical positivists who wanted to attach significance directly to the measurements. As I have argued in previous essays, the meaning is always in the model and logical positivism leads to a dead end.
In deciding funding levels, the situation is too complicated for the use of a simple algorithm. Consider the number of publications. There are different types of publications: letters, regular journal articles, review articles, conference contributions, etc. Publications are of different lengths. Should one count pages rather than publications? Or is one letter worth two regular journal papers; letters being shorter and considered by some to be more important than regular articles. But, in reality, one wants to see a mix of the different types of publications. A review article might indicate standing in the field but one also wants to see original papers. Is a paper in a prestigious journal worth more than one in a more mundane journal? What is a prestigious journal anyway? There is also the question of multi-author papers. One gets suspicious if all the papers are with more senior or well-known authors but all single author papers is also a warning sign. Generally co-authoring papers with junior collaborators is a good thing. In some fields, all papers include all members of the collaboration so the number of coauthors carries very little information. The order of authors on a publication may or may not be important. And on it goes. Expert judgment is, as always, required to sort out what it all means.
Citations are an even bigger can of worms. Even in a field as small as sub-atomic theoretical physics there are distinct variations in the pattern of citations among the subfields: string theory, particle phenomenology or nuclear physics. For example, the lifetime for citations in particle phenomenology is significantly less than in nuclear physics. Then there is the question of self-citations, citations to one’s own work or, more subtle, to close collaborators. And what about review articles? Is a citation to a review article as important as one to an article on original research? Review articles frequently collect more references. My most cited paper is a review article. A person can, with a bit of effort, sort this all out. Setting up an algorithm would be damn near impossible. A person could even, gasp, read some of the papers and form an independent opinion of their validity. But that could introduce biases. Hence, numbers are important but they must be interpreted. This leads to the conclusion: Quantitative indicators should be used to inform rather than replace expert judgment in the context of science assessment for research funding allocation.
The other problem with simple algorithms is the feedback loop. With a simple algorithm, researchers naturally change their behaviour to maximize their grants. For example, if we judge on the number of publications, people split papers up, publish weak papers, or publish what is basically the same thing several times. I have done that myself. None of these improve the quality of the work being done. Expert judgment can generally spot these a mile away. After all, the experts have used these tricks themselves.
More generally, there is the problem of trying to reduce everything to questions that have nice quantitative answers. Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong questions, which can always be made precise. There seems to be this argument that since science normally uses quantitative methods, administration should follow suit so it can have the success of science. It is like the medieval argument that since most successful farmers had three cows; the way to make farmers successful was to give them all three cows. But the wrong question can never give the right answer. It is far better to ask the right question and then work on getting a meaningful answer. What we want to do at a science laboratory, or for funding science generally, is to advance our understanding of how the universe works to maximum extent possible and use the findings for the benefit of society. The real question is how do we do this? That is neither an easy question to answer nor one that can be easily quantified. Not being quantifiable does not make it a meaningless question. There are various metrics an informed observer can use to make intelligent judgments. But it is very important that administrators avoid the siren call of logical positivism and not try to attach meaning directly to a few simple measurements.
 Quote from: Informing research choices: indicators and judgment. Expert Panel on Science Performance and Research Funding. Council of Canadian Academies (2012).
 Tukey, J. W. (1962). The future of data analysis. Annals of Mathematical Statistics 33(1), 1-67.