NSF EAGER Project

 

Accurate Linking of Grants and Topics in Science

Proposed work

  1. Develop an accurate process for linking grants to topics.
    1. Develop a process for linking grants to topics using text.
    2. Quantify the accuracy of linkages between grants and topics.
    3. Significantly improve the accuracy of these linkages.
  2. Use accurate grant-to-topic assignments to analyze grant-to-article linkage data.
    1. Suggest false positives and false negatives in current NIH grant-to-article linkages published in NIH RePORTER.
    2. Suggest possible links between non-NIH grants and articles.
  3. Make these data publicly available on a project website.

Assumptions

  1. Topics are best represented as clusters of papers in our model of science.
  2. STAR METRICS data (and data from similar databases) are the best available sources of data that can be used to link grants to topics.
  3. Text (titles and abstracts) will be the basis for accurate assignment of grants to topics.
  4. Once grants are accurately assigned to topics using text, this can be used as the basis to test the accuracy of acknowledged grant-to-article linkages (from NIH RePORTER or other sources).

Significant changes

  1. We were able to obtain references for ~150 R01 grant proposals for which we already had the STAR METRICS project data. This was unanticipated at the time our proposal was written, but was very beneficial to the project because it allowed us to compare grant to topic assignments using proposal references, project text from STAR METRICS data, and articles that acknowledge grants (grant-to-article linkages).

Outcomes

  1. The accuracy of using text-based methods to link grants to topics has been quantified.
    1. Our text-based method for linking grants to topics is sufficiently accurate to conduct large scale studies (i.e. assigning funding to topics to evaluate large scale effects) based on aggregated data.
    2. However, our text-based method for linking grants to topics is not sufficiently accurate to use as a basis for testing the accuracy of grant-to-article linkages.
    3. Rather, we found that grant-to-article linkages are a more accurate way of assigning grants to topics than is our text-based method. This runs directly counter to our assumption 3 above.
  2. Attempts to improve the accuracy of assigning projects to topics using text by excluding ‘broader impacts’ text were unsuccessful. Broader impacts text, rather than diluting the signal, actually contributes positively to the textual accuracy.
  3. The unexpected availability of grant proposal reference data was a game-changer.
    1. We had previously found that papers can be assigned to topics more accurately using their references than using their title and abstract. In this project we showed that the distribution of proposal references to topics is nearly identical to the distribution of paper references to topics. Thus, we made the assumption that the most accurate way to link grants to topics is using the references from grant proposals.
    2. By comparing to proposal reference data, we showed that grant-to-article linkages are nearly as accurate as references for assigning grants to topics.
    3. We showed that textual data are the least accurate method (although still often good) for assigning grants to topics.
    4. These results are being written up for publication.
    5. We note that grant reference data are rarely available. In our opinion, the best path forward is to obtain grant reference data.
  4. The availability of grant reference data had an important spillover effect. We learned that proposal references, in addition to being an accurate way of assigning grants to topics, provide an early indicator of the intention to do either traditional or innovative work.
    1. Building on a recent paper by Foster, Rzhetsky & Evans (2015), we used the idea that a proposal is a statement of intentions. A proposal is (intended to be) innovative if it builds upon two topics that, at the time, are not yet linked in the literature. This is detected by assigning the proposal references to topics. Articles, however, make claims. A claimed innovation is detected in the same manner – the references in the article are assigned to topics, and the claim is innovative if the two major topics of the article are not yet linked. Innovative outcomes only occur if a paper is highly cited by two topics, and those two topics were not yet linked when the paper was published. In these cases, the paper has formed a new link in the knowledge network.
    2. These ideas and results were presented at the ICSS meeting in April, 2016, and are being written up for publication.
  5. The unexpected availability of proposal data had a second spillover effect. In reading these proposals, we realized that each proposal has a very well-articulated section on equipment. Equipment may be a very good way to anticipate what problems a researcher works on, and to anticipate which topics in science will experience large growth (because infrastructure is available to work on that topic). A follow-up SciSIP proposal building on these observations (and corresponding implications) has been submitted.

Outputs

  1. A project website has been created, and non-proprietary data used in the project has been posted. Grant-to-article data were obtained, cleaned, linked to Scopus article IDs where possible, and the resulting files are made available to other researchers here.
    1. Grant-to-article linkages were mined from the NSF website for SciSIP (PEC=7626), Economics (PEC=1320), and some Geophysics (PEC=1574) projects. Scopus article IDs and PubMed IDs were added to these data where possible. (links)
    2. Grant-to-publication linkages were downloaded from the UK Gateway to Research site (http://gtr.rcuk.ac.uk/search/project?term=*). Many of the publication records already contained either DOIs or PubMed IDs. Scopus article IDs were added to these data where possible. (links)
  2. A paper detailing results of the project is being prepared, and will be submitted for publication in summer 2016.