About the Data

Let's collaborate!

Find more information about where we gather data and how we process data below. But let's state this first: we would love to collaborate with universities and other providers of data to improve the quality of the ScienceFinder. Please reach out to the team at data@sciencefinder.nl.

Ranking and Scoring

Results in ScienceFinder are ranked. ScienceFinder does not offer advertising or the option to boost or adjust these rankings. The only way for institutions to affect their ranking is to deliver more complete data through the existing infrastructures. All items are ranked based on the same relevance algorithm which uses a full-text match with the keywords. The relevance of a document is decayed over time using a Gauss Decay function with a relevance drop of 5% for every year of age after the first year.

Result Aggregation

A ScienceFinder query combines three document types (Publications, Projects and Startups) to individuals, which are in turn affiliated to organisations. The top 75 organisations with the highest sum of relevance score of the documents are returned.

Query Expansion

To facilitate access to the often technical content of scientific outputs, ScienceFinder uses a combination of the PLOS taxonomy and Wikipedia categories for query expansion. Words that are both a PLOS taxonomy as Wikipedia category, will also be found with search terms that are subcategories of the Wikipedia parent category.

Our Datasources

The ScienceFinder connects to various open data sources like Narcis and Cordis We use API's to connect to these open data sources to periodically update the data from these sources. Additionally, ScienceFinder refers to individual university libraries and staff pages to determine affiliation links. Currently, publication data is periodically harvested from january 1st, 2012 through 2019 for all Dutch Universities as available in Narcis.

ScienceFinder is ‘downstream’ from these sources, and improvements in their data quality are the best way to improve the search results in ScienceFinder.

Imperfections

Unfortunately, some sources don’t offer persistent identifiers such as DOIs for their content or ORCIDs for their authors, which makes deduplication and reconciliation a challenge. Please be aware that results gathered from ScienceFinder still contain imperfections, as we work on continuously improving the data.

Other data sources

Next to the Narcis and Cordis data we enrich our database with data from Dutch universities that we receive through close collaboration with the Dutch universities. This is usually data about academic spin-off companies, industry-academia collaboration and other relevant projects.

If you feel that specific project data is missing, please get in touch and helps us to improve the quality of the data! Reach us through data@sciencefinder.nl.