Identifying Emergent Opportunities in Science

This material is based upon work supported by the National Science Foundation under Grant No. IIS-1142795. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

The major research activity is to develop and test a way of predicting ‘hot’ or ’emergent’ science from a detailed dynamic network model of the scientific literature that is characterized by a combination of temporal stability and instability at the topic level.

The model of science underlying all of our analysis was an 11-year co-citation model of science with hundreds of thousands of article clusters linked year-to-year. This model was constructed using the techniques reported in Klavans & Boyack (2011). JASIST 62, 1-18.

Findings to date have been surprising in many respects. Five hypotheses were stated in our proposal:

H1: Given an emergent cluster of article, linked micro-communities from the previous year will have higher vitality (or younger average reference age) than their disciplinary peers.

H2: Emergent clusters that occur in years 1-3 of a thread will have higher textual coherence than their disciplinary peers.

H3: Emergent clusters will be more prone to dying, splitting or restructuring than non-emergent clusters.

H4: Emergent papers that occur in years 1-3 of a thread will strongly connect multiple clusters rather appear as central to a single cluster.

H5: Emergent papers are correlated with citation sentiments terms associated with impact, emergence and/or breakthrough.

These were each tested using the model, results from interviews with program officers that assigned ‘hot’, ‘average’, and ‘cold’ values to hundreds of clusters, and with citation sentiments extracted from full text articles from 2007 associated with specific clusters. We found little support for any of the five hypotheses.

Despite the fact that the original hypotheses were rejected, the results were very useful in that they suggested alternative paths to take toward reaching the same goal of being able to identify and predict emerging topics from the literature. Details are available in our final report to NSF.

Using results from the hypothesis-based analyses, we moved forward along two new paths. First, results suggested that age (specifically- the stability of the cognitive frame used to characterize a document cluster) is extremely important. Document clusters that are born and quickly die (i.e., those in which the cognitive frame used to characterize the document cluster does not persist over time) tended to be considered cold (instead of hot) by experts. Document clusters that reach middle age (3-6 years) were more likely to be judged as hot science. Old (highly stable) document clusters were very close to the norm.

Second, one prevalent assumption made in most science or bibliometric studies is that all high impact papers are innovative (Nicholson & Ioannidis, 2012). As this project progressed, we came to question this assumption, and correspondingly designed an experiment to test it. In Section 5, we develop and test indicators designed to differentiate high impact papers that are turning points (i.e., those that change the flow of science) from those that are conforming (i.e., those that re-enforce the status quo in science). These analyses take advantage of full text data. The results are quite intriguing and show that over half of the 4,216 very high impact papers we analyzed are re-enforcing the status-quo.

Over the past year, we have expanded our ability to identify emergence at the group (document clusters) and individual (high impact paper) level of analysis. We have communicated these results at several conferences (feel free to contact us for copies of those conference papers and presentations) and started collaborations with researchers at Georgia Tech, Drexel, the University of Pennsylvania and the University of Massachusetts. Current research in this area is now being supported by a subcontract under IARPA’s FUSE program. The culmination of these efforts is represented by the development of a method to identify the most highly emergent topics from the literature, and publication of work characterizing a list of the top 25 most emergent areas of science in 2010 (Small, Boyack, & Klavans, 2013). Although the work on the identification of emerging topics was directly funded by the FUSE program, the work done on this NSF project paved the way and made that work possible. Detailed information on the top 25 list is available here.

We also note an important unintended consequence of this research. In the process of identifying highly emergent areas we have had to tackle the problem of identifying what is not emergent. We subsequently developed an indicator of non-innovative research. There is a significant risk in pursuing this line of research – consider the potential consequences of claiming that a large group of highly influential scientists are not innovative. But there is a greater risk in not developing an indicator of non-innovativeness. During an economic down-turn, non-innovative research may need to be de-funded. An objective indicator of non-innovative research is sorely needed.