Microsoft word - tables-hdf5-comparison.doc

LOFAR Data Format revisited
In document LOFAR-DATAFORMAT-001 Sydney Cadot describes the requirements for the LOFAR data format and shows that HDF5 meets about all requirements. The casacore table system has not been discussed in this document, but meets all requirements with the exception of: - fully nestable data types - binding to Matlab and IDL - installable on Windows A few requirements mentioned in the document are debatable, especially the one in 3.4.4: 3.4.2. Accomodating 32-bit file systems. This is not needed anymore. All operating systems support 64-bit file systems nowadays. 3.4.4 The data format should be optimized for sequential access of large arrays. This should be the opposite, because different applications (flagging, calibration, imaging) require very different access patterns to the same data. Instead the data format should allow for different efficient access patterns. 3.4.5. Pipeline processing. It is not needed to process data in a sequential way from tape. Possible processing of data in streaming mode will use very different data formats and is outside the scope of a disk-based data format discussion. Furthermore some possible requirements have been omitted: - support of a boolean data type (in 3.5.1) - support of concurrent access by multiple processes - distributed storage. Note that distributed processing is a separate topic - thread safety
Needs for visibility data access
The main data axes are baseline, time, and frequency. The application defines in which order
the data will be traversed. For example, calibration steps through the data in time order, while
for imaging it is preferable to step by frequency channel. Flagging is usually done per
baseline in a running time/freq window. Plots can be made in all kinds of ways.
The data can be regular, but that is not always the case. Shorter baselines might use longer
time integration.
The file format should be such that data traversal is possible in the various directions. The
access in those directions shoyld be about equally fast.
Brief comparison of HDF5 and CasaTables
Both the Tables and the HDF5 data format are well suited for large collections of structured
data. They share some characteristics, but differ in many others. The main difference is that
HDF5 is hierarchical in nature, while CasaTables is relational.
The 1980’s showed a move from hierarchical data bases to relational data bases because the
latter offer more flexibility. Hierarchical data bases are (too) hard to traverse in a way
different from the hierarchy.
The following table gives a summary of the main differences between the formats and their
HDF5 has a much wider user base. Hence, some more tools are available. However, the
casapy tools like tablebrowser, tableplot, and casaviewer and the Table Query Language
make inspection (and change) of CasaTables very easy.
Usually single file (can be multiple) Directory of files storage manager best suited Storage managers can be loaded dynamically, so very adaptable Hierarchical Keywordset for table and per column Higher level TaQL (SQL-like) Can also create, update, delete, and insert supported by means of lock on entire table Tableplot for arbitrary xy-plots (part of casapy, not casacore) TaQL However, very slow when retrieving smallish data sizes (e.g. lines) differences) Virtual Storage Managers to scale e.g. float to short A data manager can be virtual (e.g. VirtualTaQLColumn) Virtual columns New storage managers possible and are automatically loaded as needed very responsive Only serious bug fixing New developments only if paying Documentation Quite extensive, but not always clear Good class documentation Very small chance of file corruption Very small chance of file corruption in case of machine crash Rows can be deleted depending on storage manager - Peter Fridman can access the table file containing the DATA directly in his RFI - It is straightforward to store the DATA and FLAG as a normal file (outside table system) and access it later as a table column using a dynamically loaded storage manager (like LofarStMan). - FLAG can be a virtual column on top of LOFAR_FLAGS which can be an Int or so


Comunicado a la opinión pública sobre el asesinato de Luis Lindstron Ante el asesinato de Luis Lindstron, ocurrido en la mañana de hoy en la localidad de Tacuatí (departamento de San Pedro), y que genera una profunda conmoción ciudadana, la Coordinadora de Derechos Humanos del Paraguay (CODEHUPY), Capítulo Paraguayo de la Plataforma Interamericana de Derechos Humanos, Democracia

In u knie het die normale gewrigsoppervlakte Om die moontlikheid van komplikasies tot 'n minimum heeltemal weggeslyt; u loop dus been op been. Mette beperk, reël ons met 'n patoloog om 'n aantal 'n knievervanging word hierdie oppervlaktes met 'n spesiale ondersoeke te doen, spesifiek om die klein kunsmateriaal vervang; by die bobeen met 'n metaal, moontlikheid uit te sluit dat daar

Copyright © 2010-2014 Online pdf catalog