Catalog

Data Formats

Image generated with ChatGPT
slides created Lukas Steinwender

FITS

Image credits: Lukas Steinwender
  • file-system designed for astronomy
  • hierarchy
    • HDU (Header Data Unit)
      • Header: ASCII table
      • data: binary table (ND-arrays)
    • Primary HDU
      • describes data
      • data must be image or no data at all
    • Extension HDU
      • optional
      • various data types
slides created Lukas Steinwender

Interacting With FITS Files

#reading
from astropy.io import fits
with fits.open(<path/to/file.fits>, "r") as hdul:                                         
    primary_header = hdul[0].header
    primary_data = hdul[0].data
    extension_header1 = hdul[1].header
    extension_data = hdul[1].data

hdul = fits.open(<path/to/file.fits>)
fits.close()                            #never forget to clean up!
hdul.info()
slides created Lukas Steinwender

Industry Standards: Apache Parquet

industry standard for cloud based data lakes

import pandas as pd
df = pd.read_parquet(<path/to/file.parquet>)
df.to_parquet(<path/to/file.parquet>)
slides created Lukas Steinwender

Industry Standards: HDF5

  • accessible through h5py
  • freely availble
  • strong compression
  • can be sliced without loading the entire file in memory
  • organized similar to a file explorer

industry standard for non-tabular data

import h5py
f = h5py.File(<path/to/file.h5>, "r")                                       #reading
f.close()
f2 = h5py.File(<path/to/file.h5>, "w")                                      #creating file
grp = f2.create_group(<name/of/group>)                                      #creating a group
dataset1 = grp.create_dataset(<name/of/dataset>, <shape>, dtype=<datatype>) #add data
dataset2 = grp.create_dataset(<name/of/dataset>, data=<your_data_array>)    #add data
slides created Lukas Steinwender

Industry Standards: JSON

  • JavaScript Object Notation
  • human and machine readable
  • lightweight
  • based on key-value pairs and lists
import json
with open(<path/to/file.json>, "r") as f:
    data = json.load(f)         #reading
with open(<path/to/file.json>, "w") as f:
    json.dump(data, f)          #writing
slides created Lukas Steinwender

FITS: Flexible Image Transfer Format