LOFAR Data Format revisited
In document LOFAR-DATAFORMAT-001 Sydney Cadot describes the requirements for the LOFAR data format and shows that HDF5 meets about all requirements. The casacore table system has not been discussed in this document, but meets all requirements with the exception of:
- fully nestable data types - binding to Matlab and IDL - installable on Windows
A few requirements mentioned in the document are debatable, especially the one in 3.4.4: 3.4.2. Accomodating 32-bit file systems. This is not needed anymore. All operating systems support 64-bit file systems nowadays. 3.4.4 The data format should be optimized for sequential access of large arrays. This should be the opposite, because different applications (flagging, calibration, imaging) require very different access patterns to the same data. Instead the data format should allow for different efficient access patterns. 3.4.5. Pipeline processing. It is not needed to process data in a sequential way from tape. Possible processing of data in streaming mode will use very different data formats and is outside the scope of a disk-based data format discussion. Furthermore some possible requirements have been omitted:
- support of a boolean data type (in 3.5.1) - support of concurrent access by multiple processes - distributed storage. Note that distributed processing is a separate topic - thread safety
Needs for visibility data access The main data axes are baseline, time, and frequency. The application defines in which order the data will be traversed. For example, calibration steps through the data in time order, while for imaging it is preferable to step by frequency channel. Flagging is usually done per baseline in a running time/freq window. Plots can be made in all kinds of ways. The data can be regular, but that is not always the case. Shorter baselines might use longer time integration. The file format should be such that data traversal is possible in the various directions. The access in those directions shoyld be about equally fast. Brief comparison of HDF5 and CasaTables Both the Tables and the HDF5 data format are well suited for large collections of structured data. They share some characteristics, but differ in many others. The main difference is that HDF5 is hierarchical in nature, while CasaTables is relational. The 1980’s showed a move from hierarchical data bases to relational data bases because the latter offer more flexibility. Hierarchical data bases are (too) hard to traverse in a way different from the hierarchy. The following table gives a summary of the main differences between the formats and their (dis)advantages. HDF5 has a much wider user base. Hence, some more tools are available. However, the casapy tools like tablebrowser, tableplot, and casaviewer and the Table Query Language make inspection (and change) of CasaTables very easy. CasaTables
Usually single file (can be multiple) Directory of files
storage manager best suited Storage managers can be loaded dynamically, so very adaptable
Hierarchical Keywordset for table and per column
Higher level TaQL (SQL-like) Can also create, update, delete, and insert
supported by means of lock on entire table
Tableplot for arbitrary xy-plots (part of casapy, not casacore) TaQL
However, very slow when retrieving smallish data sizes (e.g. lines)
differences) Virtual Storage Managers to scale e.g. float to short
A data manager can be virtual (e.g. VirtualTaQLColumn)
Virtual columns New storage managers possible and are automatically loaded as needed
very responsive Only serious bug fixing New developments only if paying
Documentation Quite extensive, but not always clear Good class documentation
Very small chance of file corruption Very small chance of file corruption in case of machine crash
Rows can be deleted depending on storage manager
- Peter Fridman can access the table file containing the DATA directly in his RFI
- It is straightforward to store the DATA and FLAG as a normal file (outside table
system) and access it later as a table column using a dynamically loaded storage manager (like LofarStMan).
- FLAG can be a virtual column on top of LOFAR_FLAGS which can be an Int or so
Comunicado a la opinión pública sobre el asesinato de Luis Lindstron Ante el asesinato de Luis Lindstron, ocurrido en la mañana de hoy en la localidad de Tacuatí (departamento de San Pedro), y que genera una profunda conmoción ciudadana, la Coordinadora de Derechos Humanos del Paraguay (CODEHUPY), Capítulo Paraguayo de la Plataforma Interamericana de Derechos Humanos, Democracia
In u knie het die normale gewrigsoppervlakte Om die moontlikheid van komplikasies tot 'n minimum heeltemal weggeslyt; u loop dus been op been. Mette beperk, reël ons met 'n patoloog om 'n aantal 'n knievervanging word hierdie oppervlaktes met 'n spesiale ondersoeke te doen, spesifiek om die klein kunsmateriaal vervang; by die bobeen met 'n metaal, moontlikheid uit te sluit dat daar