openPMD

A meta-data standard

for mesh based data ...

and particle data sets.

Basic Concepts Optimizations & Performance Time Series and further Options (upcoming slide) Developer Tools & Processing Domain-Specific Extensions Overview Poster What are you interested in?

Clean Common Structure

... and Domain Specific Extensions

^*currently implemented, but not limited to

For hierarchical, self-describing data formats.

How to avoid confusion between actual low-level data sets & attributes (HDF5 speak), variables & attributes (ADIOS speak), files & groups/folders and the actual physical quantities?

openPMD naming convention

Physical quanities are records:

discretized vector field \(\vec F(\vec r)\) on a mesh
discretized scalar field \(T(\vec r)\) on a mesh
particle property position \(\vec r_i\)
...

Record: each particle property or mesh

Their actual components are stored in (multi-dimensional) arrays (= data sets / variables) inside those records! Position.x: not a record, it's a component

Required attributes for each record:

unitSI: conversion factor to common unit system
unitDimension: parsable dimensionality
time / timeUnitSI: iteration != time

Attributes for all records

unitDimension:

automate description of units
powers of the 7 (SI) base measures
e.g., `V/m` is length^1.0 mass^1.0 time^-3.0 electrical current^-1.0 thermodynamic temperature^0.0 ammount of substance^0.0 luminous intensity^0.0: `[1.0, 1.0, -3.0, -1.0, 0.0, 0.0, 0.0]`
(if your record can be scaled in a general normalized way, choose a reference and write a note in the `comment` attribute)

Attributes for all records

Required attributes for each mesh record:

geometry (+ parameters)
grid: spacing, global offset, axis labels
data order (C/Fortran)

Required attributes for each mesh record component:

position (on grid)

Attributes for mesh records

Required records for each particle species:

position
position offset (can be constant)

Optional:

id

Attributes for particle records

Examples for records

electric field \(\vec E(\vec r)\):

/ ... / meshes / E /
- x
- y
- z

temperature \(T(\vec r)\):

/ ... / meshes /
- \(T\)

Legend: Group / (multi-dim) Array

electron position \(\vec r\):

/ ... / particles / electrons / position /
- x
- y
- z

electron charge \(Q\):

/ ... / particles / electrons
- charge

Legend: Group / (1D) Array

Ok ok, I got it! But what about constant components in a record?

electron charge \(Q\) might be constant for all particles stored in species electrons:

/ ... / particles / electrons
- charge ← might be very large
  - unitSI

Legend: Group / Array / Attribute

/ ... / particles / electrons
- charge
  - value ← few bytes
  - shape
  - unitSI

Legend: Group / Array / Attribute

possible for any record component, e.g.:

/ ... / particles / electrons
- position
  - x + unitSI
  - y + unitSI
  - z
    - value, shape
    - unitSI

Legend: Group / Array / Attribute

All right. But how to handle Petabytes of data written from thousands (-millions) of compute nodes?

parallel, community file formats: writing/reading based on MPI & MPI-I/O
examples:
- PHDF5 .h5 (parallel/strided, uncompressed)
- ADIOS .bp (aggregated, compressed)

Parallel file formats

Particle Patches: Honor Decomposition

Particle Patches: Disjoint Particle Sets

Particle Patches: [Offset:Offset+Count]

Particle Patches: (Spatial) Hyperrectangles

In principle and everywhere^*: a human-readable comment (text) attribute is encouraged for everything not covered by the standard.

^* reserved for each group and data set

Comment Attribute

openPMD defines a minimal set of attributes.

You can always add more attributes and records!

openPMD is a not exclusive

openPMD defines a minimal set of attributes, e.g.

openPMD: identifier
basePath: prefix, currently fixed to `/data/`
meshesPath: relative sub-group, e.g., `meshes/`
particlesPath: relative sub-group, e.g., `particles/`

Required Base Attributes for `/`

author: My Name <email@example.com>
software: e.g., PIConGPU
softwareVersion: e.g., 0.1.0
date: 2015-12-02 17:48:42 +0100

Recommended Base Attributes for `/`

Example files and creator scripts:

tools/ $ ./createExamples_h5.py

Developer Tools: Examples

$ ./checkOpenPMD_h5.py -i example.h5 --EDPIC

Warning: Attribute softwareVersion (recommended) does NOT exist in `/`!
Found 1 iteration(s)
Iteration 0 : found 2 meshes
Iteration 0 : found 1 particle species

Result: 0 Errors and 1 Warning.

Developer Tools: Checker Script

Serial processing: high-level API

meta-data parsing
file-format independent (ADIOS & HDF5)
object oriented: meshes, particles & iterations

Developer Tools: Serial Processing

openPMD viewer

Python API: openPMD object aware
GUI: IPython Notebook (interactive, remote)
ideal for investigating 1-2D data (or slices)
modular: e.g., domain specific analysis chains

Developer Tools: Parallel Processing

Parallel processing I: integration in suites such as

VisIt, Paraview (GUI, Python)
libSplash (C/C++)

Developer Tools: Parallel Processing

Parallel processing II: integration in parallel python processing via pyDive

numpy-like parallel access
read and write support
based on zeroMQ / Jupyter notebook

Developer Tools: Parallel Processing

Common tools

openPMD-viewer numpy-like parallel access
read and write support
GUI: Jupyter notebook widgets

Developer Tools: Parallel Processing

Extensions: Domain-Specific Additions

Implemented by PIConGPU & Warp

electro-dynamic and electro-static PIC
additional attributes
naming conventions for records

The ED-PIC Extension

One could add extensions for

Experimental data: CCD images, interferograms, ...
Simulations: MD, FEM, ...

Other Domain-Specific Extensions

For future versions

record patches and AMR support
irregular cartesian grids
more mesh types: if required

Future Plans