Tuesday, February 12, 2008

Microformat for PolarGrid

Prelude this interesting discussion of very human and machine-readable data formats, here is my contribution to the google ranking of PolarGrid (a.k.a. Polar Grid) project.

Background
Over the weekend William gave us an overview of metadata generated from the Matlab data processing, which will be published through web feeds. Much of the following summary comes from email excerpts.

  • A processed dataset corresponds to one jpg image, which consists of processed measurement values from many radar samples. Fundamental properties of each processed dataset are defined by three parameters forming a triplet: waveform (wf), transmit antenna (tx), and receive antenna (rx).

  • Each radar system can transmit several different waveforms, and a waveform can be transmitted and received on one or more antennas. So an example jpg filename would be TYPE_YYYYMMDD_HHMMSS_image_wf_##_tx_##_rx_##.jpg, where TYPE describes the radar data type such as InSAR, YYYYMMDD_HHMMSS corresponds to the UTC timestamp of the first sample in the dataset, <wf, tx, rx> triplet specifies the particular waveform transmitted and received on a particular antenna, and the word "image" distinguishes these data from position or reference data.
    Additionally there may be pulse compressed and f-k migrated images, which correspond to
    "image_pc"or "image_fk", and are different types of processed images for the same track. Word "comb" would replace wf_##_tx_##_rx_## in case of combined images too.

  • A plain text file will be associated with each jpg image, containing three levels of metadata.

    1. Overall system configuration and setup:

      Description: "InSAR 2008 01 22 "
      Sampling Frequency (MHz): 720
      Number of Averages (Samples): 1
      Number of Waveforms: 1
      TX Attenuation: [0 0]
      RX Attenuation: [0 0]
      Number of Samples: [20000 20000]
      DSP Mode: coherent
      System Delay (us): 0

      Each TX/RX Attenuation value corresponds to a tx/rx antenna. Hence in case of 2 tx antennas and 8 rx antennas, the values could be

      TX Attenuation: [10 10]
      RX Attenuation: [5 5 5 5 5 5 5 5]

      Each "Number of Samples" value corresponds to a data acquisition unit (daq), the total number of which is always less than or equal to the total number of rx antennas. Hence in case of 4 daqs for 8 rx antennas, the value could be

      Number of Samples: [20000 20000 20000 20000]

      For each daq, the "Number of Samples" value is the sum of the corresponding value described in individual waveform metadata.

    2. Spatial and temporal information of data processing chunks:

      Start lon: (Degrees)
      Stop lon: (Degrees)
      Start lat: (Degrees)
      Stop lat: (Degrees)
      Start UTC time: (seconds.useconds)
      Stop UTC time: (seconds.useconds)

      The complete expedition consists of multiple (~30) data chunks, and all <wf, tx, rx> triplets in the same chunk share the same spatial and temporal chunk metadata.

    3. Information about individual waveforms. Continuing the example of 2 tx antennas, 8 rx antennas, and 4 daqs, 2 waveforms can be described as:

      Start Frequency (MHz): 120
      Stop Frequency (MHz): 300
      Pulse Width (us): 10
      Zero/Pi Mode: 1
      TX Attenuation: [0 0]
      RX Attenuation: [0 0 0 0 0 0 0 0]
      Number of Samples: [15000 15000 15000 15000]
      Sample Delay (us): [20 20 20 20]
      Blanking Time (us): [20 20 20 20]

      Start Frequency (MHz): 120
      Stop Frequency (MHz): 300
      Pulse Width (us): 3
      Zero/Pi Mode: 1
      TX Attenuation: [5 5]
      RX Attenuation: [7 7 7 7 7 7 7 7]
      Number of Samples: [5000 5000 5000 5000]
      Sample Delay (us): [0 0 0 0]
      Blanking Time (us): [5 5 5 5]

      For each waveform, each daq also corresponds to one Sample Delay and one Blanking Time.

    4. A fourth level of metadata associated with each jpg image can be deduced from its file name as discussed earlier, including radar type, specific <wf, tx, rx> triplet value, processed image type, and a timestamp.



Microformat Proposal
We propose the following set of microformats to describe the above metadata, and they can be composed to construct web feeds for individual jpg images. All class names in the proposed microformat start with "pg:" denoting the PolarGrid namespace.

  • Simple descriptive metadata:

    <span class="pg:description">InSAR 2008 01 22</span>
    <span class="pg:averages">1</span>
    <span class="pg:waveforms">1</span>
    <span class="pg:dsp-mode">coherent</span>
    <span class="pg:pulse-width">3</span>
    <span class="pg:zero-pi-mode">3</span>

  • Frequency, used in both overall system sampling and start/stop property of individual waveforms:

    <span class="pg:frequency>
    <span class="pg:name">Sampling/Start/Stop</span>
    <span class="pg:mhz">720</span>
    </span>

  • Antenna:

    <span class="pg:antenna">
    <span class="pg:id">0</span>
    <span class="pg:type">TX/RX</span>
    <span class="pg:attenuation">0</span>
    </span>

  • Array of antennas: (TX or RX)

    <span class="pg:antenna-array">
    <span class="pg:array-size">2</span>
    <span class="pg:antenna">
    <span class="pg:id">0</span>
    <span class="pg:type">TX</span>
    <span class="pg:attenuation">0</span>
    </span>
    <span class="pg:antenna">
    <span class="pg:id">1</span>
    <span class="pg:type">TX</span>
    <span class="pg:attenuation">0</span>
    </span>
    </span>

  • Data acquisition unit:

    <span class="pg:data-acquisition-unit">
    <span class="pg:id">0</span>
    <span class="pg:samples">50000</span>
    <span class="pg:delay">0</span>
    <span class="pg:blanking">5<span>
    </span>

  • Array of data acquisition units:

    <span class="pg:data-acquisition-unit-array">
    <span class="pg:array-size">2</span>
    <span class="pg:data-acquisition-unit">
    <span class="pg:id">0</span>
    <span class="pg:samples">50000</span>
    <span class="pg:delay">0</span>
    <span class="pg:blanking">5<span>
    </span>
    <span class="pg:data-acquisition-unit">
    <span class="pg:id">1</span>
    <span class="pg:samples">50000</span>
    <span class="pg:delay">0</span>
    <span class="pg:blanking">5<span>
    </span>
    </span>

  • Spatial and temporal information of a data processing chunk. Note that existing geo microformat is used, which is outside our pg namespace:

    <span class="pg:data-chunk">
    <span class="pg:name">Start/Stop</span>
    <span class="pg:utc-timestamp">1202755351.892651</span>
    <span class="geo">
    <span class="longitude">-2.193</span>
    <span class="latitude">52.686</span>
    </span>
    </span>

  • Individual waveform identification:

    <span class="pg:waveform">
    <span class="pg:id">1</span>
    </span>

  • Missing pieces: we are yet to understand the relationship between a waveform description and individual jpg images better to supply microformats describing the fourth level of metadata, especially the location of jpg images to be provided in the final web feeds.


Comments
The above proposal is a very preliminary draft, please criticize, suggest, and comment. Every single click improves the internet fame of our project too.

No comments: