Wednesday, November 28, 2012

Cumulative Impact-Factor Benchmarking

Speaking of CVs, publications, and impact-factors. Some time ago I got pretty anxious about this whole publications benchmarking story. You know, when some people say that "everyone should publish at least one paper a year", or somebody mentions in passing that "nobody is hired without at least 1 glamours paper", or "second-authors do not count", and so on.

So I decided to do some research myself. I did the following:

  1. Identified some people in my field who did something remotely similar to what I do, and who are or were on the job market within last ~5 years.
  2. For each of them, I downloaded a full list of their publications as undergrads, grad students and postdocs, as that's what they showed (or are showing) on their CVs when looking for a job.*
  3. For every publication I found the impact-factor of the journal it was published in.
  4. I discounted 2nd-author papers and reviews by 75% (so 4 second author papers = 4 reviews = one first author paper). It's obviously a wild guess, and an oversimplification, as the formula would not hold at extremes, but overall it's probably about right.
  5. And finally, I calculated their cumulative impact factor. And then plotted this value vs. years that passed since they got their PhDs.

Here are the results. Black lines represent those who got their TT positions in really cool (glamorous) places. Brown lines indicate successful landing on TT positions in quite decent places (universities, colleges). Blue lines are for those who either got a non-TT positions, or only got some really terrible (unacceptable) offers in some weird places, or did not receive any offers so far.

What do we see here? A bunch of stuff!

  1. To get any kind of a TT position in my subfield you need to reach a threshold of about 60 cumulative IF. That's either 2 publications in Nature, or 15 Plos-ones, or anything in between**.
  2. You need to get it in about 12 years including grad school. A gentle slope means asking for trouble.
  3. Glamorous papers (those sudden jumps in the cIF) do increase your chances, but mostly because they pump up your cIF. Although one can argue that they also improve your image (see that black line among the brown ones, with a distinct CNS jump).

I personally aren't on track yet, but I have some chances to get on the brown tack, if only the papers I'm working on now are published properly (in good journals).

* Practically speaking, I took all papers published before they got their first last-author research paper; plus any non-last-author research papers published in 2 years after that. (This additional complication is necessary, as apparently many people publish their last postdoc paper already after publishing their first PI paper. But I assume they still had it shown in their CVs as "submitted"; thus the adjustment).

** Update: No doubt, the "threshold" will be very different for different fields, and even subfields. My goal was to benchmark myself against those who would have been my peers, had I started my career some 5 years earlier. It would be really great if somebody could make a personalized online benchmarking tool like that, for everybody to use, for my web-programming skills are just not good enough for developing it. If you can do it - please, do it!


  1. Wow those are some pretty significant correlations I would guess... (can you publish this somewhere? ;-)..)

  2. Another reason for trainees not to publish in new non-established journals.

  3. Namnezia: the initial statement in DrugMonkey's blog was extremely bold: "Academic trainees should not be publishing in journals that do not yet have Impact Factors". I only commented that it means "nobody will publish in new journals". By design.

    Negative and orphan data, if it is methodologically sound, should be published somewhere. Better publishing it in PeerJ than leaving it on the hard drive. All the rest - yes, should be published in journals as good as possible, but that is kind of obvious, isn't it? =)

  4. InBabyAttachMode:
    It's a lot of manual work. My problem is that I can not write a program to automatically download publications from Pubmed. It is possible, just I can not do it.

    If only somebody (ResearchGate, Mendeley) implemented it - it would give enough data for a fine Plos1 paper indeed =)

  5. The most striking element seems to be the relative consistency of the trajectory (I know there are some jumps too). In other words, if you are on a "black" track, you stay there (which is perhaps intuitive as you will have better PDF and TT choices/opportunities). Not so good if initially publish in lower impact journals (although not without hope!). "Productivity" in early career years is quite deterministic.

    What do the data look like if measure citations per paper rather than JIF? This would need a 3 year lag to allow cite accumulation.

  6. Jim:
    Consistency is not that surprising: person's productivity is probably relatively constant, but people get more experienced, and also gradually travel to the lab that suits them (in a way - in a lab they deserve). So you may expect the trajectory to curve up a bit. But because there's an obvious upper limit for the slope (about one Nature paper every 2 years), they will all form a rather dense bundle.

    On citations: it would be much more interesting to look at the citations of course, but 1) in this case I would have to deconvolve a typical "citation curve" from the data, and it is just too hard to be beautiful; 2) in this case a young scientist would not be able to benchmark them-self vs the curve (you want to know your odds now, not wait 2-3 years for the citations to accumulate). And my initial motivation was exactly about benchmarking.

    Also as far as I know hiring committees would not normally bother downloading the citations. So for job search citations are maybe not that relevant (alas).

  7. While certainly an interesting observation, I think you need to qualify your observations. You looked only in neuroscience, yes? Also, I have to admit I threw up my hands at the description "really terrible offers." What does this even mean? As a tenured professor at a liberal arts college with a cumulative IF in the mid 20's, I'm guessing I work at a "weird place?"

  8. Jeremia,
    I sampled "some people in my field": that is, not just those in neuroscience, but in neuroscience that is kind of similar to mine. Those people I could potentially compete with. No doubt, in a different field (and even in a different sub-field of neuroscience) the "threshold" would be very different!

    On the "really terrible places": let me give you one example. A TT position, with a salary of 50 k$/y, an expectation to teach 3 courses AND get grants, 10 $k in startup funds. To me it sounds like a joke (or maybe rather a scam). It is just unrealistic, and thus a person would not get tenure in such a place, even though formally the position is announced as a TT one.

    A liberal arts college would have been brown at my chart =)

  9. Arseny, I agree that citations are lagging but picking a three year point can help. The citation slug looks a bit like this (for a lifetime cite count of 60):

    Yr 1 3 cites (v. sensitive to time of yr published)
    Yr 2 10 cites
    Yr 3 14 cites
    Yr 4 12 cites
    Yr 5 7 cites
    Yr 6 5 cites etc

    Shape (including the length of the tail) depends on various factors but max cite rate is a surrogate of AUC. For papers with >100 cites, the tail can be very long.

    As for hiring committees, include your cites in your CV! Even when relatively fresh, this is useful data.

  10. Thanks for the clarification. I would agree that getting grants under those conditions is potentially a scam, but it is also the state of academia today. It would also highly depend what "getting grants" means. If it is the standard research model of PI's must pay X% of salary off grants by year Y, then it is the worst of both worlds. But if all it means is that candidates are expected to find funds to do summer research with students, many would jump at such a job.

    For example, at my college, assistant professors earn $45k, are lucky to get $5k startup, and teach a 4/4 load. There are publishing expectations, but grants are not expected. Six years ago, I took a 35% pay cut to come here. My salary has yet to get back to what I was earning when I left. On average, I have not regretted my decision. Furthermore, we have since recruited two faculty members from other institutions whose working conditions were much worse than here.

    I bring this up not to fish for sympathy, but simply to point out that the years we are in our PhD programs and postdocs can lead to serious tunnel vision regarding what kind of jobs are out there, and what is attainable. We all know the numbers can't work out, yet many still try to find that magical "well-funded position at an R1 with minimal teaching load" (quoting Dr. Becca).

  11. Jeramia,
    thanks for your comment! I agree that even the relatively detailed numbers I cited could mean very different things depending on so many other variables. E.g. 50k in rural Tennessee and 50k in the greater Boston area would tell two very different stories. So, again,

    a typical liberal arts college TT positions would have produced a strong solid "brown line" on my chart.

    Also, as for me teaching is at least as important as research, each time I read that supposedly most people prefer research to teaching, I internally rejoice =)

  12. I love your metrics. I'm 2.5 years into a TT position now and just calculated my CIF using your scheme stopping at the point I started my TT job and it was 39. I'm at what you would characterize a "quite decent" place. I'm in engineering, in a nanobio subfield. It looks to have a similar fit, just shorter time scale since in engineering if you don't get a TT job within 4 years after your PhD, you almost certainly never will.

    One thing though, at the time the person is hired into the TT job, the pending publications would still be under review or in preparation, which in my department essentially aren't counted during hiring.

  13. EngineeringProf,
    thanks for your kind words! On the "papers in preparation do not count". For one, I'm an optimist, So I decided to count them. =) But more importantly, when I just have a list of publications downloaded from the web, I have no way of saying what was accepted by the time of a job search, and what was not. And also the publication date is rounded to the year. So basically I looked for an algorithm that would be easy to follow, but which also would not be too sensitive to the noise of individual circumstances.

  14. Hi, Intersting post ... I am looking to find a way to calculate my " Cumulative impact factor", do you know any website or Programme? Many thanks, Mitra