text/html;charset=utf‐8 en An article weighing the many different ways of decomposing and ordering data for storage relational,data,decomposition,decomposed,normalization,normal,form,projection,record,variance,chunking,tiling,storage,model,n‐ary,partition,attributes,across,dsm,nsm,pax,triple,rdf,binary index,follow global
The many ways of decomposing data
‐decompositions
 ‐column vs. row major tables
 ‐normalization/factoring
  ‐introduces surrogate keys between tables
 ‐chunking
  ‐preserves locality of reference in a certain dimension
  ‐based on data values
  ‐partitioning is a special case for single dimension
 ‐variance
  ‐record invariance
   ‐used to imply regular fields
   ‐a bijective mapping from a regular domain to integers to range
   ‐first part implied by order on regular medium of equal dimension and topology
   ‐generalizable to a regular medium of any dimension
  ‐inter‐variable invariance
   ‐essentially a special case of normalization
   ‐compression is achieved by first decomposing repetitive sets of fields
    into an index and then using record invariance to collapse it into a
    constant over one of the indices
 ‐regular chunking vs. irregular tiling
  ‐multiresolution trees
 ‐chunking in domain coordinates vs. chunking in table indices
 ‐subtypes and nulls
 ‐oids, surrogate keys, record keys
  ‐where are these supposed to come from?
   ‐1‐1 correspondence with any smart key => also 1‐1 between keys
   ‐how do (multivalued?!?) functional dependencies fit in?
 ‐rdf
  ‐blank nodes and existential quantification
   ‐blank node is essentially a relational null value => nulls are quantifiers
    as well
   ‐but essentially all model based representations are existentially
    quantified theories
   ‐how do oids/surrogate keys fit in?
 ‐relationships to be standardised and analysed
  ‐aggregation
  ‐generalization
  ‐typing
  ‐part‐whole vs. intensional/extensional sets
 ‐structured/object columns in tables are essentially equivalent to
 ‐object columns in an ordbms are different only in that they can point
  anywhere; however, an oid/rowid column exist everywhere and relational
  permits joins over arbitrary sets of tables
  ‐rm/t even allows us to find which tables!
  ‐how does this fit in with foreign keys and the relation/logic predicate
   identification?
 ‐partial indexes
 ‐vertical vs. horizontal representation
  ‐table functions for conversion
 ‐contrasting models for nulls
  ‐universal table
   ‐vertical in the extreme
   ‐closure under outer join?
  ‐binary model
   ‐horizontal in the extreme
   ‐closure under join?
   ‐oid’s/surrogates connect relevant pieces of information
  ‐horizontal representation/triples
   ‐not neat compared to binary
   ‐closure under join?
  ‐mixture
   ‐schema sql, rm/t, data dictionary, etc.
   ‐explicitly specified functional dependencies?
    ‐versus oid’s and (outer) join closure?
    ‐relationship with declared foreign keys/integrity
     ‐today can’t foreign key a field twice, for no real reason!!!
 ‐decomposition is a form of compression
  ‐relies on cartesian products; why precisely do we use cartesian products???
  ‐are there other operations that make sense?
   ‐polymorphism is an example, because column values affect presence of
    other columns/nulls; ragged tables
 ‐chunking can be used to enable multiple growable dimensions and even
  variable rank!
  ‐cf. hdf5, netcdf4
 ‐chunking
  ‐originally an attempt to leverage locality of reference e.g. in image
   processing
  ‐the underlying problem: manifolds of different dimension not even locally
   isomorphic, and cannot be smoothly embedded in ones of lower dimension
   ‐hence, when mapped into lower dimension, locality is lost
   ‐most storage is single dimensional at the physical level, hence high
    dimensional structures lose locality
   ‐this leads to seek time and extra data movement
   ‐chunking reduces this by preserving more of the locality
   ‐however, the solution is partial; it asymptotically approaches the optimum
    when the block size grows
    ‐the optimum that is possible still isn’t good enough
     ‐organization within the block is still a problem
      ‐e.g. cache coherent trees and the like
     ‐locality is broken at block edges
     ‐only performs optimally when all of the chunk data is processed at a
      time, and operations only span a single block
      ‐rarely
    ‐witness decomposition storage model, partition across, superblocks, etc.