Documentation for: nwb_core.py

NWB file format specification
Version 1.0.5g_beta, Oct 7, 2016
(File 'nwb_core.py', namespace 'core')

Table of contents

Introduction
    Naming conventions
    Link types
    Automatically created components
    Top level groups
    Top level datasets
TimeSeries
    <TimeSeries>
TimeSeries Class Hierarchy
Modules
    <Module>
    <Interface>
File organization
    /acquisition
    /analysis
    /epochs
    /general
    /general/extracellular_ephys
    /general/intracellular_ephys
    /general/optogenetics
    /general/optophysiology
    /processing
    /stimulus
Extending the format
Acknowledgements
Change history
Design notes

Introduction

Neurodata Without Borders: Neurophysiology is a project to develop a unified data format for cellular-based neurophysiology data, focused on the dynamics of groups of neurons measured under a large range of experimental conditions. Participating labs provided use cases and critical feedback to the effort. The design goals for the NWB format included:


Compatibility

Usability

Flexibility

Extensibility

Longevity


Hierarchical Data Format (HDF) was selected for the NWB format because it met several of the project's requirements. First, it is a mature data format standard with libraries available in multiple programming languages. Second, the format's hierarchical structure allows data to be grouped into logical self-documenting sections. Its structure is analogous to a file system in which its "groups" and "datasets" correspond to directories and files. Groups and datasets can have attributes that provide additional details, such as authorities' identifiers. Third, its linking feature enables data stored in one location to be transparently accessed from multiple locations in the hierarchy. The linked data can be external to the file. Fourth, HDFView, a free, cross-platform application, can be used to open a file and browse data. Finally, ensuring the ongoing accessibility of HDF-stored data is the mission of The HDF Group, the nonprofit that is the steward of the technology.


The NWB format standard is codified in a schema file written in a specification language created for this project. The specification language describes the schema, including data types and associations. A new schema file will be published for each revision of the NWB format standard. Data publishers can use the specification language to extend the format in order to store types of data not managed by the base format.

Naming conventions

In this document (and in the specification language used to define the format) an identifier enclosed in angle brackets (e.g. "<ElectricalSeries>") denotes a group or dataset with a "variable" name. That is, the name within the HDF5 file is set by the application creating the file and multiple instances may be created within the same group (each having a unique name). Identifiers that are not enclosed in angle brackets (e.g. "CompassDirection") are the actual name of the group or dataset within the HDF5 file. There can only be one instance within a given group since the name is fixed.

In some instances, the specification refers to HDF5 links. When links are made within the file, HDF5 soft-links (and not hard-links) should be used. This is because soft-links distinguish between the link and the target of the link, whereas hard-links cause multiple names (paths) to be created for the target, and there is no way to determine which of these names are preferable in a given situation. If the target of a soft link is removed (or moved to another location in the HDF5 file)—both of which can be done using the HDF5 API—then the soft link will "dangle," that is point to a target that no longer exists. For this reason, moving or removing targets of soft links should be avoided unless the links are updated to point to the new location.

Automatically created components

In the format, the value of some datasets and attributes can usually be determined automatically from other parts of the HDF5 file. For example, a dataset that has as value the target of a link can be determined automatically from a list of links in the HDF5 file. When possible, the NWB API will automatically create such components and required groups. The components (datasets, attributes and required groups) that are automatically created by the API are indicated by the phrase (Automatically created) in the description or comment. The creation of these components is specified by the "autogen" option in the format specification language. This is not a part of the format (different API's may create the data files in different ways). The information is included for the convenience of those using the NWB API and also for developers of other APIs who may wish to also auto-generate these components.

Top level groups


The content of these organizational groups is more fully described in the section titled, File organization. The NWB format is based on TimeSeries and Modules and these are defined first.


NWB stores general optical and electrical physiology data in a way that should be understandable to a naive user after a few minutes using looking at the file in an HDF5 browser, such as HDFView. The format is designed to be friendly to and usable by software tools and analysis scripts, and to impose few a priori assumptions about data representation and analysis. Metadata required to understand the data itself (core metadata) is generally stored with the data. Information required to interpret the experiment (general metadata) is stored in the group 'general'. Most general metadata is stored in free-form text fields. Machine-readable metadata is stored as attributes on these free-form text fields.


The only API assumed necessary to read a NWB file is an HDF5 library (e.g., h5py in python, libhdf5 in C, JHI5 in Java).

Top level datasets

Top-level datasets are for file identification and version information.


All times are stored in seconds using double precision (64 bit) floating point values. A smaller floating point value, e.g. 32 bit, is not permitted for storing times. This is because significant errors for time can result from using smaller data sizes. Throughout this document, sizes (number of bits) are provided for many datatypes (e.g. float32). If the size is followed by "!" then the size is the minimum size, otherwise it is the recommended size. For fields with a recommended size, larger or smaller sizes can be used (and for integer types both signed and unsigned), so long as the selected size encompasses the full range of data, and for floats, without loss of significant precision. Fields that have a minimum size can use larger, but not smaller sizes.

TimeSeries

The file format is designed around a data structure called a TimeSeries which stores time-varying data. A TimeSeries is a superset of several INCF types, including signal events, image stacks and experimental events. To account for different storage requirements and different modalities, a TimeSeries is defined in a minimal form and it can be extended, or subclassed, to account for different modalities and data storage requirements. When a TimeSeries is extended, it means that the 'subclassed' instance maintains or changes each of the components (eg, groups and datasets) of its parent and may have new groups and/or datasets of its own. The TimeSeries makes this process of defining such pairs more hierarchical.


Each TimeSeries has its own HDF5 group, and all datasets belonging to a TimeSeries are in that group. The group contains time and data components and users are free to add additional fields as necessary. There are two time objects represented. The first, timestamps, stores time information that is corrected to the experiment's time base (i.e., aligned to a master clock, with time-zero aligned to the starting time of the experiment). This field is used for data processing and subsequent scientific analysis. The second, sync, is an optional group that can be used to store the sample times as reported by the acquisition/stimulus hardware, before samples are converted to a common timebase and corrected relative to the master clock. This approach allows the NWB format to support streaming of data directly from hardware sources.

<TimeSeries>

General purpose time series.

When data is streamed from experiment hardware it should be stored in an HDF5 dataset having the same attributes as data, with time information stored as necessary. This allows the raw data files to be separate file-system objects that can be set as read-only once the experiment is complete. TimeSeries objects in /acquisition will link to the data field in the raw time series. Hardware-recorded time data must be corrected to a common time base (e.g., timestamps from all hardware sources aligned) before it can be included in timestamps. The uncorrected time can be stored in the sync group.


The group holding the TimeSeries can be used to store additional information (HDF5 datasets) beyond what is required by the specification. I.e., an end user is free to add additional key/value pairs as necessary for their needs. It should be noted that such lab-specific extensions may not be recognized by analysis tools/scripts existing outside the lab. Extensions are described in section Extending the format).


The data element in the TimeSeries will typically be an array of any valid HDF5 data type (e.g., a multi-dimentsional floating point array). The data stored can be in any unit. The attributes of the data field must indicate the SI unit that the data relates to (or appropriate counterpart, such as color-space) and the multiplier necessary to convert stored values to the specified SI unit.

TimeSeries Class Hierarchy

The TimeSeries is a data structure/object. It can be "subclassed" (or extended) to represent more narrowly focused modalities (e.g., electrical versus optical physiology) as well as new modalities (eg, video tracking of whisker positions). When it a TimeSeries is subclassed, new datasets can be added while all datasets of parent classes are either preserved as specified in the parent class or replaced by a new definition (changed). In the tables that follow, identifiers in the "Id" column that change the definition in the parent class are underlined. An initial set of subclasses are described here. Users are free to define subclasses for their particular requirements. This can be done by creating an extension to the format defining a new TimeSeries subclass (see Extending the format).

All datasets that are defined to be part of TimeSeries have the text attribute 'unit' that stores the unit specified in the documentation.

<AbstractFeatureSeries> extends <TimeSeries>

Abstract features, such as quantitative descriptions of sensory stimuli. The TimeSeries::data field is a 2D array, storing those features (e.g., for visual grating stimulus this might be orientation, spatial frequency and contrast). Null stimuli (eg, uniform gray) can be marked as being an independent feature (eg, 1.0 for gray, 0.0 for actual stimulus) or by storing NaNs for feature values, or through use of the TimeSeries::control fields. A set of features is considered to persist until the next set of features is defined. The final set of features stored should be the null set.

<AbstractFeatureSeries> includes all elements of <TimeSeries> with the the following additions or changes:

<AnnotationSeries> extends <TimeSeries>

Stores, eg, user annotations made during an experiment. The TimeSeries::data[] field stores a text array, and timestamps are stored for each annotation (ie, interval=1). This is largely an alias to a standard TimeSeries storing a text array but that is identifiable as storing annotations in a machine-readable way.

<AnnotationSeries> includes all elements of <TimeSeries> with the the following additions or changes:

<ElectricalSeries> extends <TimeSeries>

Stores acquired voltage data from extracellular recordings. The data field of an ElectricalSeries is an int or float array storing data in Volts. TimeSeries::data array structure: [num times] [num channels] (or [num_times] for single electrode).

<ElectricalSeries> includes all elements of <TimeSeries> with the the following additions or changes:

<SpikeEventSeries> extends <ElectricalSeries>

Stores "snapshots" of spike events (i.e., threshold crossings) in data. This may also be raw data, as reported by ephys hardware. If so, the TimeSeries::description field should describing how events were detected. All SpikeEventSeries should reside in a module (under EventWaveform interface) even if the spikes were reported and stored by hardware. All events span the same recording channels and store snapshots of equal duration. TimeSeries::data array structure: [num events] [num channels] [num samples] (or [num events] [num samples] for single electrode).

<SpikeEventSeries> includes all elements of <ElectricalSeries> with the the following additions or changes:

<ImageSeries> extends <TimeSeries>

General image data that is common between acquisition and stimulus time series. Sometimes the image data is stored in the HDF5 file in a raw format while other times it will be stored as an external image file in the host file system. The data field will either be binary data or empty. TimeSeries::data array structure: [frame] [y][x] or [frame][z][y][x].

<ImageSeries> includes all elements of <TimeSeries> with the the following additions or changes:

<ImageMaskSeries> extends <ImageSeries>

An alpha mask that is applied to a presented visual stimulus. The data[] array contains an array of mask values that are applied to the displayed image. Mask values are stored as RGBA. Mask can vary with time. The timestamps array indicates the starting time of a mask, and that mask pattern continues until it's explicitly changed.

<ImageMaskSeries> includes all elements of <ImageSeries> with the the following additions or changes:

<OpticalSeries> extends <ImageSeries>

Image data that is presented or recorded. A stimulus template movie will be stored only as an image. When the image is presented as stimulus, additional data is required, such as field of view (eg, how much of the visual field the image covers, or how what is the area of the target being imaged). If the OpticalSeries represents acquired imaging data, orientation is also important.

<OpticalSeries> includes all elements of <ImageSeries> with the the following additions or changes:


Structured dimension(s):

DimensionComponents [ name (unit) ]
fov (option 1)width (meter)height (meter) 
fov (option 2)width (meter)height (meter)depth (meter)

<TwoPhotonSeries> extends <ImageSeries>

A special case of optical imaging.

<TwoPhotonSeries> includes all elements of <ImageSeries> with the the following additions or changes:


Structured dimension(s):

DimensionComponents [ name (unit) ]
whdwidth (meter)height (meter)depth (meter)

<IndexSeries> extends <TimeSeries>

Stores indices to image frames stored in an ImageSeries. The purpose of the ImageIndexSeries is to allow a static image stack to be stored somewhere, and the images in the stack to be referenced out-of-order. This can be for the display of individual images, or of movie segments (as a movie is simply a series of images). The data field stores the index of the frame in the referenced ImageSeries, and the timestamps array indicates when that image was displayed.

<IndexSeries> includes all elements of <TimeSeries> with the the following additions or changes:

<IntervalSeries> extends <TimeSeries>

Stores intervals of data. The timestamps field stores the beginning and end of intervals. The data field stores whether the interval just started (>0 value) or ended (<0 value). Different interval types can be represented in the same series by using multiple key values (eg, 1 for feature A, 2 for feature B, 3 for feature C, etc). The field data stores an 8-bit integer. This is largely an alias of a standard TimeSeries but that is identifiable as representing time intervals in a machine-readable way.

<IntervalSeries> includes all elements of <TimeSeries> with the the following additions or changes:

<OptogeneticSeries> extends <TimeSeries>

Optogenetic stimulus. The data[] field is in unit of watts.

<OptogeneticSeries> includes all elements of <TimeSeries> with the the following additions or changes:

<PatchClampSeries> extends <TimeSeries>

Stores stimulus or response current or voltage. Superclass definition for patch-clamp data (this class should not be instantiated directly).

<PatchClampSeries> includes all elements of <TimeSeries> with the the following additions or changes:

<CurrentClampSeries> extends <PatchClampSeries>

Stores voltage data recorded from intracellular current-clamp recordings. A corresponding CurrentClampStimulusSeries (stored separately as a stimulus) is used to store the current injected.

<CurrentClampSeries> includes all elements of <PatchClampSeries> with the the following additions or changes:

<IZeroClampSeries> extends <CurrentClampSeries>

Stores recorded voltage data from intracellular recordings when all current and amplifier settings are off (i.e., CurrentClampSeries fields will be zero). There is no CurrentClampStimulusSeries associated with an IZero series because the amplifier is disconnected and no stimulus can reach the cell.

<IZeroClampSeries> includes all elements of <CurrentClampSeries> with the the following additions or changes:

<CurrentClampStimulusSeries> extends <PatchClampSeries>

Aliases to standard PatchClampSeries. Its functionality is to better tag PatchClampSeries for machine (and human) readability of the file.

<CurrentClampStimulusSeries> includes all elements of <PatchClampSeries> with the the following additions or changes:

<VoltageClampSeries> extends <PatchClampSeries>

Stores current data recorded from intracellular voltage-clamp recordings. A corresponding VoltageClampStimulusSeries (stored separately as a stimulus) is used to store the voltage injected.

<VoltageClampSeries> includes all elements of <PatchClampSeries> with the the following additions or changes:

<VoltageClampStimulusSeries> extends <PatchClampSeries>

Aliases to standard PatchClampSeries. Its functionality is to better tag PatchClampSeries for machine (and human) readability of the file.

<VoltageClampStimulusSeries> includes all elements of <PatchClampSeries> with the the following additions or changes:

<RoiResponseSeries> extends <TimeSeries>

ROI responses over an imaging plane. Each row in data[] should correspond to the signal from one ROI.

<RoiResponseSeries> includes all elements of <TimeSeries> with the the following additions or changes:

<SpatialSeries> extends <TimeSeries>

Direction, e.g., of gaze or travel, or position. The TimeSeries::data field is a 2D array storing position or direction relative to some reference frame. Array structure: [num measurements] [num dimensions]. Each SpatialSeries has a text dataset reference_frame that indicates the zero-position, or the zero-axes for direction. For example, if representing gaze direction, "straight-ahead" might be a specific pixel on the monitor, or some other point in space. For position data, the 0,0 point might be the top-left corner of an enclosure, as viewed from the tracking camera. The unit of data will indicate how to interpret SpatialSeries values.

<SpatialSeries> includes all elements of <TimeSeries> with the the following additions or changes:

Modules

NWB uses modules to store data for—and represent the results of—common data processing steps, such as spike sorting and image segmentation, that occur before scientific analysis of the data. Modules store the data used by software tools to calculate these intermediate results. Each module provides a list of the data it makes available, and it is free to provide whatever additional data that the module generates. Additional documentation is required for data that goes beyond standard definitions. All modules are stored directly under group /processing. The name of each module is chosen by the data provider (i.e. modules have a "variable" name). The particular data within each module is specified by one or more interfaces, which are groups residing directly within a module. Each interface extends (contains the attributes in) group <Interface> and has a fixed name (e.g. ImageSegmentation) that suggests the type of data it contains. The names of the interfaces within a given module are listed in the "interfaces" attribute for the module. The different types of Interfaces are described below.


<Module>

Module. Name should be descriptive. Stores a collection of related data organized by contained interfaces. Each interface is a contract specifying content related to a particular type of data.

<Interface>

The attributes specified here are included in all interfaces.

BehavioralEpochs

TimeSeries for storing behavoioral epochs. The objective of this and the other two Behavioral interfaces (e.g. BehavioralEvents and BehavioralTimeSeries) is to provide generic hooks for software tools/scripts. This allows a tool/script to take the output one specific interface (e.g., UnitTimes) and plot that data relative to another data modality (e.g., behavioral events) without having to define all possible modalities in advance. Declaring one of these interfaces means that one or more TimeSeries of the specified type is published. These TimeSeries should reside in a group having the same name as the interface. For example, if a BehavioralTimeSeries interface is declared, the module will have one or more TimeSeries defined in the module sub-group "BehavioralTimeSeries". BehavioralEpochs should use IntervalSeries. BehavioralEvents is used for irregular events. BehavioralTimeSeries is for continuous data.

BehavioralEpochs includes all elements of <Interface> with the the following additions or changes:

BehavioralEvents

TimeSeries for storing behavioral events. See description of <a href="#BehavioralEpochs">BehavioralEpochs</a> for more details.

BehavioralEvents includes all elements of <Interface> with the the following additions or changes:

BehavioralTimeSeries

TimeSeries for storing Behavoioral time series data.See description of <a href="#BehavioralEpochs">BehavioralEpochs</a> for more details.

BehavioralTimeSeries includes all elements of <Interface> with the the following additions or changes:

ClusterWaveforms

The mean waveform shape, including standard deviation, of the different clusters. Ideally, the waveform analysis should be performed on data that is only high-pass filtered. This is a separate module because it is expected to require updating. For example, IMEC probes may require different storage requirements to store/display mean waveforms, requiring a new interface or an extension of this one.

ClusterWaveforms includes all elements of <Interface> with the the following additions or changes:

Clustering

Clustered spike data, whether from automatic clustering tools (e.g., klustakwik) or as a result of manual sorting.

Clustering includes all elements of <Interface> with the the following additions or changes:

CompassDirection

With a CompassDirection interface, a module publishes a SpatialSeries object representing a floating point value for theta. The SpatialSeries::reference_frame field should indicate what direction corresponds to 0 and which is the direction of rotation (this should be clockwise). The si_unit for the SpatialSeries should be radians or degrees.

CompassDirection includes all elements of <Interface> with the the following additions or changes:

DfOverF

dF/F information about a region of interest (ROI). Storage hierarchy of dF/F should be the same as for segmentation (ie, same names for ROIs and for image planes).

DfOverF includes all elements of <Interface> with the the following additions or changes:

EventDetection

Detected spike events from voltage trace(s).

EventDetection includes all elements of <Interface> with the the following additions or changes:

EventWaveform

Represents either the waveforms of detected events, as extracted from a raw data trace in /acquisition, or the event waveforms that were stored during experiment acquisition.

EventWaveform includes all elements of <Interface> with the the following additions or changes:

EyeTracking

Eye-tracking data, representing direction of gaze.

EyeTracking includes all elements of <Interface> with the the following additions or changes:

FeatureExtraction

Features, such as PC1 and PC2, that are extracted from signals stored in a SpikeEvent TimeSeries or other source.

FeatureExtraction includes all elements of <Interface> with the the following additions or changes:

FilteredEphys

Ephys data from one or more channels that has been subjected to filtering. Examples of filtered data include Theta and Gamma (LFP has its own interface). FilteredEphys modules publish an ElectricalSeries for each filtered channel or set of channels. The name of each ElectricalSeries is arbitrary but should be informative. The source of the filtered data, whether this is from analysis of another time series or as acquired by hardware, should be noted in each's TimeSeries::description field. There is no assumed 1::1 correspondence between filtered ephys signals and electrodes, as a single signal can apply to many nearby electrodes, and one electrode may have different filtered (e.g., theta and/or gamma) signals represented.

FilteredEphys includes all elements of <Interface> with the the following additions or changes:

Fluorescence

Fluorescence information about a region of interest (ROI). Storage hierarchy of fluorescence should be the same as for segmentation (ie, same names for ROIs and for image planes).

Fluorescence includes all elements of <Interface> with the the following additions or changes:

ImageSegmentation

Stores pixels in an image that represent different regions of interest (ROIs) or masks. All segmentation for a given imaging plane is stored together, with storage for multiple imaging planes (masks) supported. Each ROI is stored in its own subgroup, with the ROI group containing both a 2D mask and a list of pixels that make up this mask. Segments can also be used for masking neuropil. If segmentation is allowed to change with time, a new imaging plane (or module) is required and ROI names should remain consistent between them.

ImageSegmentation includes all elements of <Interface> with the the following additions or changes:

ImagingRetinotopy

Intrinsic signal optical imaging or widefield imaging for measuring retinotopy. Stores orthogonal maps (e.g., altitude/azimuth; radius/theta) of responses to specific stimuli and a combined polarity map from which to identify visual areas.<br />Note: for data consistency, all images and arrays are stored in the format [row][column] and [row, col], which equates to [y][x]. Field of view and dimension arrays may appear backward (i.e., y before x).

ImagingRetinotopy includes all elements of <Interface> with the the following additions or changes:


Structured dimension(s):

DimensionComponents [ name (unit) ]
row_colrow (meter)column (meter)

LFP

LFP data from one or more channels. The electrode map in each published ElectricalSeries will identify which channels are providing LFP data. Filter properties should be noted in the ElectricalSeries description or comments field.

LFP includes all elements of <Interface> with the the following additions or changes:

MotionCorrection

An image stack where all frames are shifted (registered) to a common coordinate system, to account for movement and drift between frames. Note: each frame at each point in time is assumed to be 2-D (has only x & y dimensions).

MotionCorrection includes all elements of <Interface> with the the following additions or changes:


Structured dimension(s):

DimensionComponents [ name (unit) ]
xyx (pixels)y (pixels)

Position

Position data, whether along the x, x/y or x/y/z axis.

Position includes all elements of <Interface> with the the following additions or changes:

PupilTracking

Eye-tracking data, representing pupil size.

PupilTracking includes all elements of <Interface> with the the following additions or changes:

UnitTimes

Event times of observed units (e.g. cell, synapse, etc.). The UnitTimes group contains a group for each unit. The name of the group should match the value in the source module, if that is possible/relevant (e.g., name of ROIs from Segmentation module).

UnitTimes includes all elements of <Interface> with the the following additions or changes:

File organization

Group: /acquisition

Acquired data includes tracking and experimental data streams (ie, everything measured from the system).If bulky data is stored in the /acquisition group, the data can exist in a separate HDF5 file that is linked to by the file being used for processing and analysis.

When converting data from another format into NWB, there will be times that some data, particularly the raw data in acquisition and stimulus, is not included as part of the conversion. In such cases, a TimeSeries should be created that represents the missing data, even if the contents of that TimeSeries are empty. This helps to interpret the data in the file.

Group: /analysis

The file can store lab-specific and custom data analysis without restriction on its form or schema, reducing data formatting restrictions on end users. Such data should be placed in the analysis group. The analysis data should be documented so that it is sharable with other labs

No members or attributes specified for this group

Group: /epochs

An experiment can be separated into one or many logical intervals, with the order and duration of these intervals often definable before the experiment starts. In this document, and in the context of NWB, these intervals are called 'epochs'. Epochs have acquisition and stimulus data associated with them, and different epochs can overlap. Examples of epochs are the time when a rat runs around an enclosure or maze as well as intervening sleep sessions; the presentation of a set of visual stimuli to a mouse running on a wheel; or the uninterrupted presentation of current to a patch-clamped cell. Epochs can be limited to the interval of a particular stimulus, or they can span multiple stimuli. Different windows into the same time series can be achieved by including multiple instances of that time series, each with different start/stop times.

Group: /general

General experimental metadata, including animal strain, experimental protocols, experimenter, devices, etc, are stored under 'general'. Core metadata (e.g., that required to interpret data fields) is stored with the data itself, and implicitly defined by the file specification (eg, time is in seconds). The strategy used here for storing non-core metadata is to use free-form text fields, such as would appear in sentences or paragraphs from a Methods section. Metadata fields are text to enable them to be more general, for example to represent ranges instead of numerical values. Machine-readable metadata is stored as attributes to these free-form datasets. <br /><br />All entries in the below table are to be included when data is present. Unused groups (e.g., intracellular_ephys in an optophysiology experiment) should not be created unless there is data to store within them.

Group: /general/extracellular_ephys

Metadata related to extracellular electrophysiology.


Structured dimension(s):

DimensionComponents [ name (unit) ]
xyzx (meter)y (meter)z (meter)

Group: /general/intracellular_ephys

Metadata related to intracellular electrophysiology

Group: /general/optogenetics

Metadata describing optogenetic stimuluation

Group: /general/optophysiology

Metadata related to optophysiology.


Structured dimension(s):

DimensionComponents [ name (unit) ]
xyzx (Meter)y (Meter)z (Meter)

Group: /processing

'Processing' refers to intermediate analysis of the acquired data to make it more amenable to scientific analysis. These are performed using Modules, as defined above. All modules reside in the processing group.

Group: /stimulus

Stimuli are here defined as any signal that is pushed into the system as part of the experiment (eg, sound, video, voltage, etc). Many different experiments can use the same stimuli, and stimuli can be re-used during an experiment. The stimulus group is organized so that one version of template stimuli can be stored and these be used multiple times. These templates can exist in the present file or can be HDF5-linked to a remote library file.

Extending the format

The data organization presented in this document constitutes the core NWB format. Extensibility is handled by allowing users to store additional data as necessary using new datasets, attributes or groups. There are two ways to document these additions. The first is to add an attribute "neurodata_type" with value the string "Custom" to the additional groups or datasets, and provide documentation to describe the extra data if it is not clear from the context what the data represent. This method is simple but does not include a consistant way to describe the additions. The second method is to write an extension to the format. With this method, the additions are describe by the extension and attribute "schema_id" is set to the schema_id associated with the extension. Extensions to the format are written using the same specification language that is used to define the core format. Creating an extension allows adding the new data to the file through the API, validating files containing extra data, and also generating documentation for the additions. Popular extensions can be proposed and added to the official format specification. Writing and using extensions are described in the API documentation. Both methods allow extensibility without breaking backward compatibility.

Acknowledgements

The Neurodata Without Borders: Neurophysiology Initiative is funded by GE, the Allen Institute for Brain Science, the Howard Hughes Medical Institute (HHMI), The Kavli Foundation and the International Neuroinformatics Coordinating Facility. Our founding scientific partners are the Allen Institute, the Svoboda Lab at the Janelia Research Campus of HHMI, the Meister Lab at the California Institute of Technology, the Buzsaki Lab at New York University School of Medicine, and the University of California, Berkeley. Ovation.io is our founding development partner. Ken Harris at University College London provided invaluable input and advice.

Change history

1.0.5g_beta, Oct 7, 2016

Replace group options: autogen: {"type": "create"} and "_closed": True with "_properties": {"create": True} and "_properties": {"closed": True}. This done to make the specification language more consistent by having these group properties specified in one place ("_properties" dictionary).


1.0.5f_beta, Oct 3, 2016

Minor fixes to allow validation of schema using json-schema specification in file "meta-schema.py" using utility "check_schema.py".


1.0.5e_beta, Sept 22, 2016

Moved definition of <Module>/ out of /processing group to allow creating subclasses of Module. This is useful for making custom Module types that specified required interfaces. Example of this is in python-api/examples/create_scripts/module-e.py and the extension it uses (extensions/e-module.py).

Fixed malformed html in nwb_core.py documentation.

Changed html generated by doc_tools.py to html5 and fixed so passes validation at https://validator.w3.org.


1.0.5d_beta, Sept 6, 2016

Changed ImageSeries img_mask dimensions to:
"dimensions": ["num_y","num_x"]
to match description.


1.0.5c_beta, Aug 17, 2016

Change IndexSeries to allow linking to any form of TimeSeries, not just an ImageSeries


1.0.5b_beta, Aug 16, 2016


1.0.5a_beta, Aug 10, 2016

Expand class of Ids allowed in TimeSeries missing_fields attribute to allow custom uses.


1.0.5_beta Aug 2016

Allow subclasses to be used for merges instead of base class (specified by 'merge+' in format specification file).
Use 'neurodata_type=Custom' to flag additions that are not describe by a schema.
Exclude TimeSeries timestamps and starting time from under /stimulus/templates


1.0.4_beta June 2016

Generate documentation directly from format specification file."
Change ImageSeries external_file to an array.
Made TimeSeries description and comments recommended.


1.0.3 April, 2016

Renamed "ISI_Retinotopy" to "ISIRetinotopy"
Change ImageSeries external_file to an array. Added attribute starting_frame.

Added IZeroClampSeries.


1.0.2 February, 2016

Fixed documentation error, updating 'neurodata_version' to 'nwb_version'

Created ISI_Retinotopy interface

In ImageSegmentation module, moved pix_mask::weight attribute to be its own dataset, named pix_mask_weight. Attribute proved inadequate for storing sufficiently large array data for some segments

Moved 'gain' field from Current/VoltageClampSeries to parent PatchClampSeries, due need of stimuli to sometimes store gain

Added Ken Harris to the Acknowledgements section


1.0.1 October 7th, 2015

Added 'required' field to tables in the documentation, to indicate if group/dataset/attribute is required, standard or optional

Obsoleted 'file_create_date' attribute 'modification_time' and made file_create_date a text array

Removed 'resistance_compensation' from CurrentClampSeries due being duplicate of another field

Upgraded TwoPhotonSeries::imaging_plane to be a required value

Removed 'tags' attribute to group 'epochs' as it was fully redundant with the 'epoch/tags' dataset

Added text to the documentation stating that specified sizes for integer values are recommended sizes, while sizes for floats are minimum sizes

Added text to the documentation stating that, if the TimeSeries::data::resolution attribute value is unknown then store a NaN

Declaring the following groups as required (this was implicit before)


acquisition/

_ images/

_ timeseries/

analysis/

epochs/

general/

processing/

stimulus/

_ presentation/

_ templates/


This is to ensure consistency between .nwb files, to provide a minimum expected structure, and to avoid confusion by having someone expect time series to be in places they're not. I.e., if 'acquisition/timeseries' is not present, someone might reasonably expect that acquisition time series might reside in 'acquisition/'. It is also a subtle reminder about what the file is designed to store, a sort of built-in documentation. Subfolders in 'general/' are only to be included as needed. Scanning 'general/' should provide the user a quick idea what the experiment is about, so only domain-relevant subfolders should be present (e.g., 'optogenetics' and 'optophysiology'). There should always be a 'general/devices', but it doesn't seem worth making it mandatory without making all subfolders mandatory here.


1.0.0 September 28th, 2015

Convert document to .html

TwoPhotonSeries::imaging_plane was upgraded to mandatory to help enforce inclusion of important metadata in the file.

Design notes

The listed size of integers is the suggested size. What's important for integers is simply that the integer is large enough to store the required data, and preferably not larger. For floating point, double is required for timestamps, while floating point is largely sufficient for other uses. This is why doubles (float64) are stated in some places. Because floating point sizes are provided, integer sizes are provided as well.


Why do timestamps_link and data_link record linking between datasets, but links between epochs and timeseries are not recorded?

Epochs have a hardlink to entire timeseries (ie, the HDF5 group). If 100 epochs link to a time series, there is only one time series. The data and timestamps within it are not shared anywhere (at least from the epoch linking). An epoch is an entity that is put in for convenience and annotation so there isn't necessarily an important association between what epochs link to what time series (all epochs could link to all time series).

The timestamps_link and data_link fields refer to links made between time series, such as if timeseries A and timeseries B, each having different data (or time) share time (or data). This is much more important information as it shows structural associations in the data.