VODML Mapping

Ongoing Interoperable Implementations

Omar Laurino, Gerard Lemson,
Tom Donaldson

May 28 2018

Outline

  • Modeling Implementations Website (OL)
  • Rama, an Astropy-Aware Python parser for VODML (OL)
  • Mapping Tool (GL)
  • Hubble Source Catalog implementation (TD)
  • Demo of interoperability between independent implementations - STScI/JHU/CfA (TD)
  • Simplifying the VODML Mapping Syntax (OL)
  • Collaborating and Testing: Git and Continuous Integration (OL)

Implementations Website

Modeling Workflow

  1. Modeling
  2. Annotating instances
  3. Parsing instances

Writing Models

  • By hand/XML editor
  • with a UML tool + translation scripts
  • with a Domain Specific Language

Writing Models (JAVA/Groovy DSL)

Jovial

model("source") {
    include("ivoa", version: "1.0")
    dataType("Position") {
        attribute(name: "ra", dataType: "ivoa:real")
        attribute(name: "dec", dataType: "ivoa:real")
    }
    objectType("Source") {
        attribute(name: "name", dataType: "ivoa:real")
        attribute(name: "position", dataType: "source:Position")
    }
}

How to write Instances

  • Error prone: we are overloading the xml semantics to describe non-xml data models
  • XML validation against VOTable schema, not model semantics
  • We don't have a validator yet

How to write Instances

  • By hand/XML editor
  • With a UML tool + translation scripts
  • With a Domain Specific Language
  • Gerard's Mapping Tool (point 'n click!)

Jovial (JAVA/Groovy)

def modelLocation = "file:example.vodml.xml"
def ivoaModelLocation = "https://volute.g-vo.org/svn/trunk/projects/dm/vo-dml/models/ivoa/vo-dml/IVOA-v1.0.vo-dml.xml"

dmInstance {
    model(vodmlURL: modelLocation)
    model(vodmlURL: ivoaModelLocation)
    instance(type: "source:Source") {
        instance(role: "name", value: "A Source")
        instance(role: "position") {
            instance(role: "ra", value: 122.02)
            instance(role: "dec", value: -12.44)
        }
    }
}

Parsing Annotated VOTables

Rama

Parses almost all the single-table patterns

@VO('source:Position')
class Position(BaseType):
    ra = Attribute('source:Position.ra', min_occurs=1, max_occurs=1)
    dec = Attribute('source:Position.dec', min_occurs=1, max_occurs=1)


@VO('source:Source')
class Source(BaseType):
    name = Attribute('source:Source.name', min_occurs=1, max_occurs=1)
    position = Attribute('source:Source.position', min_occurs=1, max_occurs=1)

Addressing Duality

  • Row-based (Objects) vs Column-based (Tables) Views
  • Sources: one source per row, multiple per table.
  • Light Curves: one per table, data in columns.

Tricky Use Cases

HSC catalog has columns referring to photometry filters.

ID ... Instrument Filter
247770718 ... WFPC2 F814W
255488227 ... ACS F606W

Astropy integration

Astropy is used to parse <TABLE>

Data is represented as Astropy quantities/columns when possible

Astropy integration

Adapters can be provided to decorate VODML objects

from astropy.coordinates import SkyCoord
from rama.models.coordinates import EquatorialCoord

simple_position_file = read("file.vot.xml")
position = simple_position_file.find_instances(StdPosition)[0]

assert isinstance(position.coord, SkyCoord)
assert isinstance(position.coord.__vo_object__, EquatorialCoord)

Gerard's Mapping Tool Demo

Tom's HSC and Interoperability Demo

Simplifying the Mapping Syntax

feedback:

  • too many elements
  • use attribute names only, not full identifies
  • leave annotation unchanged when FIELDs become PARAMs and vice-versa
  • remove LITERAL and only use CONSTANT/COLUMN?

Simplifying the Mapping Syntax

You can only start by mapping VODML concepts (roles, types) to VOTable concepts (tables, columns, params)

Simplifying the Mapping Syntax

View it on Github

Before

<INSTANCE ID="_source" dmtype="source:Detection">
    <COMPOSITION dmrole="source:Source.position">
        <INSTANCE dmtype="source:SourcePosition">
            <ATTRIBUTE dmrole="meas:CoordMeasure.coord">
                <INSTANCE dmtype="coords:domain.space.EquatorialCoord">
                    <ATTRIBUTE dmrole="coords:domain.space.EquatorialCoord.ra">
                        <COLUMN ref="SourceRA" dmtype="ivoa:real"/>
                    </ATTRIBUTE>
                    <ATTRIBUTE dmrole="coords:domain.space.EquatorialCoord.dec">
                        <COLUMN ref="SourceDec" dmtype="ivoa:real"/>
                    </ATTRIBUTE>
                    <REFERENCE dmrole="coords:Coordinate.frame">
                        <IDREF>_icrs_</IDREF>
                    </REFERENCE>
                    <ATTRIBUTE dmrole="omar:Made.this.up">
                        <CONSTANT ref="_SOME_PARAM" dmtype="ïvoa:real"/>
                    </ATTRIBUTE>
                    <ATTRIBUTE dmrole="omar:Something.different">
                        <LITERAL value="42" dmtype="ïvoa:real"/>
                    </ATTRIBUTE>
                </INSTANCE>
            </ATTRIBUTE>
        </INSTANCE>
    </COMPOSITION>
</INSTANCE>

After

<INSTANCE ID="_source" dmtype="source:Detection">
    <ROLE dmrole="position">
        <INSTANCE dmtype="source:SourcePosition">
            <ROLE dmrole="coord">
                <INSTANCE dmtype="coords:domain.space.EquatorialCoord">
                    <ROLE dmrole="ra">
                        <INSTANCE ref="SourceRA" dmtype="ivoa:real"/>
                    </ROLE>
                    <ROLE dmrole="dec">
                        <INSTANCE ref="SourceDec" dmtype="ivoa:real"/>
                    </ROLE>
                    <ROLE dmrole="frame">
                        <IDREF>_icrs_</IDREF>
                    </ROLE>
                    <ROLE dmrole="up">
                        <INSTANCE ref="_SOME_PARAM" dmtype="ïvoa:real"/>
                    </ROLE>
                    <ROLE dmrole="different">
                        <INSTANCE value="42" dmtype="ïvoa:real"/>
                    </ROLE>
                </INSTANCE>
            </ROLE>
        </INSTANCE>
    </ROLE>
</INSTANCE>

Two different approaches

  • model driven
  • annotation driven/dynamic

My recommendation

  • Explicit syntax is trivial-to-simple to implement for single table cases
  • Maybe replace COMPOSITON, REFERENCE, ATTRIBUTE with ROLE
  • Using attribute names imho only facilitates dynamic parsers
  • Merging CONSTANT and COLUMN is a mess (compelling use case?)

Implementations repository

  • Volute simply doesn't cut it, for so many reasons.
  • Model changes require changes to many examples/notebooks
  • Git(Lab) repository with Continuous Integration:
  1. static pages are built upon push (jekyll)
  2. tests are run
  3. pages are deployed.

Gitlab-CI Pipelines

feature branch master branch

Tests

Thank you!

Understanding XML: The Human’s Guide to Machine-Readable Data