Towards a theoretical foundation for the harmonization of linked data

Enrico Daga Knowledge Media Institute (KMi) - The Open University; Semantic Technology Laboratory (STLab), Institute of Cognitive Sciences and Technologies (ISTC) - Italian National Research Council (CNR)

Doctoral Symposium, International Semantic Web Conference, 2012 (Submitted)

Online resources: Operators

Contents

[edit] Concepts

The following concepts are part of the definition of operators:

See also Types.

[edit] Operators

[edit] Operators

Operator Parameters Description
COPY
source - Dataset
target - Dataset
slice - Slice
Select from source the subgraph matching slice and copy it in target.
APPEND
source - TempDataset
target - Dataset
Append the content of source to target, deleting source.
FILTER
dataset - Dataset
slice - Slice
Delete from dataset all triples not matching slice.
SHORTCUT
dataset - Dataset
from - Path
predicate - IRI
Generate in dataset triples with predicate for each distinct selections of subjects/values of from.
TYPE
dataset - Dataset
predicates - Set ( IRI )
class - IRI
Assign class as type to any subject which has all predicates
DEFAULT
dataset - Dataset
type - IRI
predicate - IRI
default (LITERAL | IRI)
Assign default as value of predicate to any individual of type type which does not have a value for predicate in dataset
COUNT
dataset - Dataset
predicate - IRI
size - IRI
Counts the occurrences of predicate and put the result as value of size. The counting is applied on distinct subjects.

Example:

having:

<italy> <hasTender> <tender-1> .
<italy> <hasTender> <tender-2> .
<italy> <hasTender> <tender-3> .
<gb> <hasTender> <tender-11> .
<gb> <hasTender> <tender-21> .
<gb> <hasTender> <tender-31> .

applying:

COUNT (?triples, <hasTender>, <numberOfTenders>)

results in:

<italy> <hasTender> <tender-1> .
<italy> <hasTender> <tender-2> .
<italy> <hasTender> <tender-3> .
<italy> <numberOfTenders> "3"^^xsd:integer .
<gb> <hasTender> <tender-11> .
<gb> <hasTender> <tender-21> .
<gb> <hasTender> <tender-31> .
<gb> <numberOfTenders> "3"^^xsd:integer .
REVERT
dataset - Dataset
from - IRI
to - IRI
Generates triples with the to predicate from the triples from the from predicate , changing objects with subjects and vice versa in dataset. (to must not exists before).
LINK
dataset - Dataset
Discover and Materialize all owl:sameAs links in dataset.
KEYS
dataset - Dataset
keys - SET ( IRI ) 
Generates in dataset owl:sameAs links for entities which have the same values for the predicates in keys
RENEW
dataset - Dataset
prefix - IRI
Replace occurrences of any set of IRIs linked by owl:sameAs with a newly generated IRI, using prefix as initial part. If an IRI starting with prefix do already exists in the set, simply uses it.
MAXIMIZE
dataset - Dataset
select - IRI
build - IRI
[ datatype - IRI ]
Take the highest value of each set of selected triples per subject, casting the input at datatype, and adds to the graph the triples with the predicate build. This would automatically CAST to a xsd:integer datatype, if not provided. See also [1]
MINIMIZE
dataset - Dataset
select - IRI
build - IRI
[ datatype - IRI ]
Same as MAXIMIZE, but takes the lowest value.
AGGREGATE
dataset - Dataset
select - IRI
build - IRI
[ datatype IRI ]
Same as MAXIMIZE, but sums the values if numeric, or concatenates them if xsd:string.

If datatypes are numeric[1] CAST all to xsd:float and return the sum. Else CAST all to xsd:string and return the concatenation

AVERAGE
select - IRI
build - IRI
[ datatype - IRI ]
As MAX, AGGREGATE, MIN, but computes the average of the values, casting to datatype or xsd:integer.
CAST
dataset - Dataset
select - IRI
datatype - IRI
Force any values from select to be casted to datatype in dataset
EXPAND
dataset - Dataset
predicate - IRI
For any subject with predicate, generate equivalent values by the means of inferred knowledge. These could include:
  1. Subsumptions
  2. Part-of relationships
  3. Broader/Narrower (eg. SKOS)
  4. Other?
COLLAPSE
dataset - Dataset
predicate - IRI
policy (can be MAXIMIZE | MINIMIZE | AGGREGATE )
Collapse the set to a single value in triples.
  • If policy is MAXIMIZE: else take the Top level individual in a tree-based relationship
  • If policy is MINIMIZE: take all the bottom level individuals, (remove all values that can be inferred in a tree-based relationship)
  • If policy is AGGREGATE: generate a new individual, which represent the collection of values (generating also a label as concatenation of all labels)
FRAME
dataset - Dataset
predicates - SET ( IRI )
type - IRI
Filters out (remove all triples for) instances of type which have only a subset of the given predicates, but not all. Other predicates on subjects which are not in predicates are also removed (except for rdf:type).
VALUETOIRI
dataset - Dataset
predicate - IRI
[ prefix - IRI ]
Change any value of predicate to a IRI, optionally using prefix. The literal value is added as rdfs:label of the new entity
IRITOVALUE
dataset - Dataset
predicate - IRI
Change any object of predicate to its rdfs:label

[edit] Pre/post conditions

[edit] Concepts

Contracts in 'Design by contract' (Wikipedia):

Criteria of correctness:

  1. If the graph invariants AND preconditions are true before the operator is executed, then the invariants AND the postconditions will be true after the execution has been completed.
  2. When an operator is ready to go, the graph contained in a dataset should not violate the operator's preconditions.

In other words:

[edit] Functions

When the input of an operator is a Slice, then the following functions can be applied:

[edit] List of operators bound to graph properties

Consider the following implications:

Operator Parameters Conditions
COPY
source - Dataset
target - Dataset
slice - Slice

Expects

(HasSlice source slice )

Guarantees

(HasSlice target (Sanitize slice) )
(IsMinimal target (Predicates slice) )

Maintain

(HasSlice source slice )
APPEND
source - TempDataset
target - TempDataset | OutputDataset

Expects

-

Guarantees

(forall (slice - Slice) 
  (when (HasSlice source slice) 
    (and (HasSlice target slice) (not (HasSlice source slice))) ) )

Maintain

(forall (slice - Slice) 
   (when (HasSlice target slice) (HasSlice target slice))
FILTER
dataset - Dataset
slice - Slice

Expects

(HasSlice dataset slice )

Guarantees

(forall (slice2 - Slice) 
  (when 
    (and (HasSlice source slice2) (not (=, slice, slice2)) 
    (not (HasSlice dataset slice2))))

Maintain

(HasSlice dataset slice)
SHORTCUT
path - Path
predicate  - IRI

Expects

(HasPath dataset path )
(not (HasPredicate predicate ) )

Guarantees

(HasPredicate predicate)
(not (IsMinimal predicate ) )

Maintain

(HasPath dataset path )
TYPE
dataset - Dataset
predicates - SET ( IRI )
type - IRI

Expects

(not (HasType type ) )
(HasPredicate dataset predicates )

Guarantees

(TypeHasPredicates dataset type predicates )

Maintains

(HasPredicate dataset predicates )

DEFAULT

dataset - Dataset
type - IRI
predicate - IRI
default - LITERAL | IRI

Expects

(HasType dataset type )

Guarantees

(TypeHasPredicates dataset type predicate )
(HasSymbols dataset default )
(not (IsMinimal  predicate ) ) ;; you have at least rdf:type for sure

Maintains

(HasType dataset type )
COUNT
dataset - Dataset
predicate - IRI
build - IRI

Expects

(HasPredicate dataset predicate )
(not (HasPredicate dataset build ) )

Guarantees

(IsAmount dataset build predicate )
(not (IsMinimal  dataset predicate ) ) 
  ;; we have also build for sure
(not (IsMinimal dataset build ) ) 
  ;; we have also predicate for sure

Maintains

(HasPredicate dataset predicate )
REVERT
dataset - Dataset
from - IRI
to - IRI

Expects

(HasPredicate dataset from )
(not (HasPredicate dataset to ) )

Guarantees

(HasPredicate dataset to )
(InversePredicates dataset from to )
(not (IsMinimal dataset from ) ) 
   ;; we have also to for sure
(not (IsMinimal dataset to ) ) 
   ;; we have also from for sure

Maintains

(HasPredicate dataset from )
LINK
dataset - Dataset

Expects

-

Guarantees

(ExpressedDuplicates dataset)

Maintains

;; all properties
RENEW
dataset - Dataset
prefix - IRI

Expects

(ExpressedDuplicates dataset)
(not (IsMinimal dataset <owl:sameAs> ) )

Guarantees

(UniqueInstances dataset)

Maintains

(ExpressedDuplicates dataset) 
   ;; because no duplicates anymore are there
MAXIMIZE
dataset - Dataset
select - IRI
build - IRI
datatype - IRI

Expects

(HasPredicate dataset select )
(not (HasPredicate dataset build ) )

Guarantees

(IsMaximum dataset build select )

Maintains

(HasPredicate dataset select)
MINIMIZE
dataset - Dataset
select - IRI
build - IRI
datatype - IRI

Expects

(HasPredicate dataset select )
(not (HasPredicate dataset build ) )

Guarantees

(IsMinimum dataset build select )

Maintains

(HasPredicate dataset select )
AGGREGATE
dataset - Dataset
select - IRI
build - IRI
datatype - IRI

Expects

(HasPredicate dataset select )
(not (HasPredicate daaset build ) )

Guarantees

(IsAggregate dataset build select )

Maintains

(HasPredicate dataset select )
AVERAGE
select - IRI
build - IRI
datatype - IRI

Expects

(HasPredicate dataset select )
(not (HasPredicate dataset build ) )

Guarantees

(IsAverage dataset build select )

Maintains

(HasPredicate dataset select )
CAST
dataset - Dataset
select - IRI
datatype - IRI

Expects

(HasPredicate dataset select )

Guarantees

(HasDatatype dataset select datatype )

Maintains

(HasPredicates dataset select )
EXPAND
dataset - Dataset
predicate - IRI

Expects

(HasPredicate dataset predicate )

Guarantees

-

Maintains

(HasPredicate dataset predicate )
COLLAPSE
dataset - Dataset
predicate IRI

Expects

(HasPredicate predicate )

Guarantees

(HasPredicateCardinality dataset predicate 1 )

Maintains

(HasPredicate dataset predicate )
KEYS
dataset - Dataset
predicates - SET ( IRI )

Expects

(HasPredicate dataset predicates )

Guarantees

(UniqueKeys dataset predicates )

Maintains

(HasPredicates dataset predicates )
FRAME
dataset - Dataset
predicates - SET ( IRI )
type - IRI

Expects

(HasPredicate dataset predicates )

Guarantees

(TypeIsMinimal dataset predicates )

Maintains

(HasPredicates dataset predicates )
VALUETOIRI
dataset - Dataset
predicate - IRI
[ prefix - IRI ]

Expects

(HasPredicate dataset predicate )

Guarantees

(RangeIsIRI dataset predicate )
(not (RangeIsLiteral dataset predicate ) ) 

Maintains

(HasPredicates dataset predicate )
IRITOVALUE
dataset - Dataset
predicate - IRI

Expects

(HasPredicate dataset predicate )

Guarantees

(RangeIsLiteral dataset predicate )
(not (RangeIsIRI dataset predicate ) ) 

Maintains

(HasPredicates dataset predicate )

[edit] Relations with phases

These are the possible relations with the phases defined in the harmonization process



  1. 1.0 1.1 . If datatypes are not numeric, CAST all to xsd:string and take the longer value. List of XSD numeric datatypes:
    byte A signed 8-bit integer
    decimal A decimal value
    int A signed 32-bit integer
    integer An integer value
    long A signed 64-bit integer
    negativeInteger An integer containing only negative values (..,-2,-1)
    nonNegativeInteger An integer containing only non-negative values (0,1,2,..)
    nonPositiveInteger An integer containing only non-positive values (..,-2,-1,0)
    positiveInteger An integer containing only positive values (1,2,..)
    short A signed 16-bit integer
    unsignedLong An unsigned 64-bit integer
    unsignedInt An unsigned 32-bit integer
    unsignedShort An unsigned 16-bit integer
    unsignedByte An unsigned 8-bit integer