About the Data
Let's collaborate!
Find more information about where we gather data and how we process data below. But let's state this first: we
would
love to collaborate with universities and other providers of data to improve the quality of the ScienceFinder.
Please reach out to the team at data@sciencefinder.nl.
Ranking and Scoring
Results in ScienceFinder are ranked. ScienceFinder does not offer advertising or the
option to boost or adjust these rankings. The only way for institutions to affect their ranking is to deliver
more complete data through the existing infrastructures.
All items are ranked based on the same relevance algorithm which uses a full-text match with the keywords.
The relevance of a document is decayed over time using a Gauss
Decay function with a relevance drop of 5% for every year of age after the first year.
Result Aggregation
A ScienceFinder query combines three document types (Publications, Projects and Startups) to individuals, which
are in turn affiliated to organisations. The top 75 organisations with the highest sum of relevance score of the
documents are returned.
Query Expansion
To facilitate access to the often technical content of scientific outputs, ScienceFinder uses a combination of
the PLOS taxonomy and Wikipedia categories for query
expansion.
Words that are both a PLOS taxonomy as Wikipedia category, will also be found with search terms that are
subcategories of the Wikipedia parent category.
Our Datasources
The ScienceFinder connects to various open data sources like
Narcis and Cordis
We use API's to connect to these open data sources to periodically update the data from these sources.
Additionally, ScienceFinder refers to individual university libraries and staff pages to determine affiliation
links. Currently, publication data is periodically harvested from january 1st, 2012 through 2019 for all
Dutch Universities as available in Narcis.
ScienceFinder is ‘downstream’ from these sources, and improvements in their data quality are the
best way to improve the search results in ScienceFinder.
Imperfections
Unfortunately, some sources don’t offer persistent identifiers such as DOIs for their content or ORCIDs for
their authors, which makes deduplication and reconciliation a challenge. Please be aware that results gathered
from ScienceFinder still contain imperfections, as we work on continuously improving the data.
Other data sources
Next to the Narcis and Cordis data we enrich our database with data from Dutch universities that we receive
through close collaboration with the Dutch universities. This is usually data about academic spin-off
companies, industry-academia collaboration and other relevant projects.
If you feel that specific project data is missing, please get in touch and helps us to improve the quality of
the data! Reach us through data@sciencefinder.nl.