REXPLORE-4.js

Excerpt

The data sources used by Rexplore offer in most cases only the name of the author’s affiliation (e.g., Universities, Research Labs, Hospitals), which is usually derived from parsing research papers and thus it is simply treated as a string. The procedure uses initially Wikipedia to retrieve a ‘standard’ identifier for the affiliation and then searches for the location associated with the affiliation in DBpedia. If the latter search is unsuccessful, then the Wikipedia page is parsed for the tag “location” from which city and country are extracted using a set of heuristic rules. After recovering information about the city or the country, the affiliation is mapped to the correct GeoNames ID. If the search for affiliation and/or location in Wikipedia/DBpedia fails, then the affiliation name is stripped of a set of typical terms, such as “university”, “college” or “hospital”, and the remaining string is searched for in the GeoNames database. This simple method provided good results, allowing us to correctly map disambiguated affiliations to GeoNames in about 85% of the cases.

Synopsys

  1. Have the affiliation as string
  2. Retrieve standard identifier using wikipedia
  3. Search the location of the affiliation in dbpedia, or
  4. Search the location of the affiliation in the wikipedia page
  5. Map the location to a GeoNames ID
  6. Or simplify the affiliation name and search it in the GeoNames database the ID

Diagram

Turtle

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix : <http://purl.org/datanode/ex/0.2/REXPLORE/4#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dn: <http://purl.org/datanode/ns/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix void: <http://rdfs.org/ns/void#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . <http://purl.org/datanode/ex/0.2/REXPLORE/4#> a foaf:Document ; rdfs:comment "0) Have the affiliation as string 1) Retrieve standard identifier using wikipedia 2) Search the location of the affiliation in dbpedia, or 3) Search the location of the affiliation in the wikipedia page 4) Map the location to a GeoNames ID 5) Or simplify the affiliation name and search it in the GeoNames database the ID" ; rdfs:label "The data sources used by Rexplore offer in most cases only the name of the author’s affiliation (e.g., Universities, Research Labs, Hospitals), which is usually derived from parsing research papers and thus it is simply treated as a string. The procedure uses initially Wikipedia to retrieve a ‘standard’ identifier for the affiliation and then searches for the location associated with the affiliation in DBpedia. If the latter search is unsuccessful, then the Wikipedia page is parsed for the tag “location” from which city and country are extracted using a set of heuristic rules. After recovering information about the city or the country, the affiliation is mapped to the correct GeoNames ID. If the search for affiliation and/or location in Wikipedia/DBpedia fails, then the affiliation name is stripped of a set of typical terms, such as “university”, “college” or “hospital”, and the remaining string is searched for in the GeoNames database. This simple method provided good results, allowing us to correctly map disambiguated affiliations to GeoNames in about 85% of the cases." . :affiliationsIdentifiers dn:extractedFrom :Wikipedia ; dn:processedFrom :affiliationsStrings . :affiliationsStrings dn:isSectionOf :corpusOfPublications . :locations dn:isSelectionOf :Geonames ; dn:processedFrom :locationsNames . :locationsFromDBPedia dn:hasIdentifiers :affiliationsIdentifiers ; dn:isSelectionOf :DBPedia . :locationsFromWikipedia dn:extractedFrom :Wikipedia ; dn:hasIdentifiers :affiliationsIdentifiers . :locationsNames dn:hasPart :locationsFromDBPedia, :locationsFromWikipedia, :simplifiedLocationsNames . :simplifiedLocationsNames dn:processedFrom [ dn:isSelectionOf :affiliationsStrings ] .