annotations.precomputed#

Export annotations in neuroglancer’s precomputed annotations format. The single entry point is write_precomputed_annotations(), which supports five annotation types: 'point', 'line', 'axis_aligned_bounding_box', 'ellipsoid', and 'polyline'.

Geometry columns#

Annotations can live in a coordinate space of any dimensionality (not just 3D); the column names are derived from coord_space.names. For example, with coord_space.names == ['x', 'y', 'z'], the main DataFrame must have the following geometry columns (plus any property / relationship columns):

  • 'point': x, y, z

  • 'line' and 'axis_aligned_bounding_box': xa, ya, za, xb, yb, zb

  • 'ellipsoid': x, y, z, rx, ry, rz

  • 'polyline': no geometry columns; vertices are supplied separately via polyline_points

The DataFrame’s index is used as the annotation ID. In most use-cases the annotation ID is not user-visible, so the index need not be carefully chosen; any unique uint64-compatible values will do (e.g. range(len(df))).

Examples#

Point annotations#

import pandas as pd
from ngsidekick.annotations.precomputed import write_precomputed_annotations

df = pd.DataFrame({
    'x': [10.0, 20.0, 30.0],
    'y': [10.0, 20.0, 30.0],
    'z': [10.0, 20.0, 30.0],
})

# df:
#       x     y     z
#    0  10.0  10.0  10.0
#    1  20.0  20.0  20.0
#    2  30.0  30.0  30.0

write_precomputed_annotations(df, 'xyz', 'point', output_dir='out/points')

Line annotations#

df = pd.DataFrame({
    'xa': [0.0, 10.0], 'ya': [0.0, 0.0], 'za': [0.0, 0.0],
    'xb': [5.0, 15.0], 'yb': [5.0, 5.0], 'zb': [0.0, 0.0],
})

# df:
#         xa   ya   za    xb   yb   zb
#    0   0.0  0.0  0.0   5.0  5.0  0.0
#    1  10.0  0.0  0.0  15.0  5.0  0.0

write_precomputed_annotations(df, 'xyz', 'line', output_dir='out/lines')

Bounding boxes use the same column convention as lines, with annotation_type='axis_aligned_bounding_box'.

Ellipsoid annotations#

df = pd.DataFrame({
    'x':  [10.0, 20.0], 'y':  [10.0, 20.0], 'z':  [10.0, 20.0],
    'rx': [ 2.0,  3.0], 'ry': [ 2.0,  3.0], 'rz': [ 2.0,  3.0],
})

# df:
#          x     y     z   rx   ry   rz
#    0  10.0  10.0  10.0  2.0  2.0  2.0
#    1  20.0  20.0  20.0  3.0  3.0  3.0

write_precomputed_annotations(df, 'xyz', 'ellipsoid', output_dir='out/ellipsoids')

Polyline annotations#

Polylines have a variable number of vertices, so vertex coordinates are passed in a separate auxiliary DataFrame supplied via the polyline_points argument: one row per vertex, with coordinate columns plus an 'annotation_id' column linking each vertex back to its polyline. Vertex order within an annotation defines the polyline’s traversal order.

The main DataFrame carries any per-annotation properties or relationships; its index supplies the annotation IDs referenced by polyline_points['annotation_id']. In the example below, two polylines each get a single mycolor rgb color property; the main DataFrame’s default RangeIndex [0, 1] matches the annotation_id values in the points table.

main_df = pd.DataFrame({
    'mycolor_r': [255,   0],
    'mycolor_g': [128, 200],
    'mycolor_b': [  0, 255],
})

# main_df:
#       mycolor_r  mycolor_g  mycolor_b
#    0        255        128          0
#    1          0        200        255

polyline_points = pd.DataFrame({
    'x':             [0.0, 1.0, 2.0,    5.0, 5.0],
    'y':             [0.0, 0.5, 1.0,    5.0, 6.0],
    'z':             [0.0, 0.0, 0.0,    0.0, 0.0],
    'annotation_id': [   0,   0,   0,     1,   1],
})

# polyline_points:
#         x    y    z  annotation_id
#    0  0.0  0.0  0.0              0
#    1  1.0  0.5  0.0              0
#    2  2.0  1.0  0.0              0
#    3  5.0  5.0  0.0              1
#    4  5.0  6.0  0.0              1

write_precomputed_annotations(
    main_df, 'xyz', 'polyline',
    properties=['mycolor'],
    polyline_points=polyline_points,
    output_dir='out/polylines',
)

See Properties and relationships below for the full set of supported property and relationship column conventions, which apply identically to polyline annotations.

If your polylines have no properties or relationships, you can omit the main DataFrame entirely and pass the points table as the first positional argument:

write_precomputed_annotations(polyline_points, 'xyz', 'polyline', output_dir='out/polylines')

Properties and relationships#

In addition to geometry columns, the main DataFrame can carry annotation properties (per-annotation attributes like color or a confidence score) and relationships (per-annotation lists of related segment IDs that neuroglancer can use to filter annotations by segment).

  • Numeric properties are plain numeric columns. The column dtype determines the encoded type (uint8, int8, …, float32).

  • Enum properties are pandas categorical columns. Each category becomes a discrete enum value with the category label shown in the neuroglancer UI.

  • Color properties (rgb or rgba) are split across one column per channel: <name>_r, <name>_g, <name>_b (and optionally <name>_a). List the base name in properties; the suffixed columns are picked up automatically.

  • Relationships are columns whose values are lists of related segment IDs (uint64). As a shortcut, if every annotation has exactly one related segment, the column may have dtype=np.uint64 (a scalar per row) instead of containing lists.

The example below demonstrates all four on 'line' annotations. The two single-segment relationships (body_pre / body_post) use scalar uint64 columns; the multi-segment relationship (nearby_mito) uses lists.

import numpy as np
import pandas as pd
from ngsidekick.annotations.precomputed import write_precomputed_annotations

df = pd.DataFrame({
    # line geometry columns
    'xa': [0.0, 10.0], 'ya': [0.0, 0.0], 'za': [0.0, 0.0],
    'xb': [5.0, 15.0], 'yb': [5.0, 5.0], 'zb': [0.0, 0.0],

    # numeric property
    'confidence': [0.92, 0.71],

    # enum property (pandas categorical)
    'kind': pd.Categorical(['excitatory', 'inhibitory']),

    # color property: one column per channel, rgb(a)
    'mycolor_r': [255,   0], 'mycolor_g': [128, 200], 'mycolor_b': [  0, 255],
    'mycolor_a': [255, 255],  # (alpha is optional)

    # single-segment relationships: scalar uint64 per row
    'body_pre':  np.array([100, 200], dtype=np.uint64),
    'body_post': np.array([300, 400], dtype=np.uint64),

    # multi-segment relationship: list of uint64 per row
    'nearby_mito': [[10, 11], [20, 21, 22]],
})

# df:
#         xa   ya   za    xb   yb   zb  confidence        kind  mycolor_r  mycolor_g  mycolor_b  mycolor_a  body_pre  body_post   nearby_mito
#    0   0.0  0.0  0.0   5.0  5.0  0.0        0.92  excitatory        255        128          0        255       100        300      [10, 11]
#    1  10.0  0.0  0.0  15.0  5.0  0.0        0.71  inhibitory          0        200        255        255       200        400  [20, 21, 22]

write_precomputed_annotations(
    df, 'xyz', 'line',
    # 'mycolor' is the base name; the _r/_g/_b/_a columns are picked up automatically.
    properties=['confidence', 'kind', 'mycolor'],
    relationships=['body_pre', 'body_post', 'nearby_mito'],
    output_dir='out/lines',
)

Releasing input data early#

For very large inputs (hundreds of millions of annotations and up), this function has high peak RAM requirements. You can partially mitigate the peak by transferring ownership of your input DataFrame to the function via a TableHandle: wrap your DataFrame, drop your own reference, and pass the handle. write_precomputed_annotations() will release the handle’s reference as soon as the data has been consumed, freeing the DataFrame before the writing phase begins:

from ngsidekick.annotations.precomputed import TableHandle

handle = TableHandle(df)
del df  # release your own reference
write_precomputed_annotations(handle, 'xyz', 'point', output_dir='out/points')

The same option applies to polyline_points.

API reference#

ngsidekick.annotations.precomputed.write_precomputed_annotations(df, coord_space, annotation_type, properties=(), relationships=(), output_dir='annotations', write_sharded=True, *, polyline_points=None, write_by_id=True, write_by_relationship=True, write_by_spatial_chunk=True, num_spatial_levels=7, target_chunk_limit=10000, shuffle_before_assigning_spatial_levels=True, max_threads=None, max_shards_per_transaction=None, description='')[source]#

Export the data from a pandas DataFrame into neuroglancer’s precomputed annotations format as described in the neuroglancer spec.

A progress bar is shown when writing each portion of the export (annotation ID index, related ID indexes), but there may be a significant amount of preprocessing time that occurs before the actual writing begins.

Note

Internally, the data will be copied during processing and again during writing, incurring significant RAM usage for large datasets. To save at least some RAM, you can wrap your dataframe in a TableHandle and then delete your own reference to the dataframe before calling this function. The TableHandle’s reference will be deleted internally as soon as possible (after the data is transformed for writing, before this function returns).

Parameters:
  • df (DataFrame | TableHandle | None) –

    DataFrame or TableHandle. The index of the DataFrame is used as the annotation ID, so it must be unique. The required columns depend on the annotation_type and the coordinate space. For example, assuming coord_space.names == ['x', 'y', 'z'], then provide the following columns:

    • For point annotations, provide [‘x’, ‘y’, ‘z’]

    • For line annotations or axis_aligned_bounding_box annotations, provide [‘xa’, ‘ya’, ‘za’, ‘xb’, ‘yb’, ‘zb’]

    • For ellipsoid annotations, provide [‘x’, ‘y’, ‘z’, ‘rx’, ‘ry’, ‘rz’] for the center point and radii.

    • For polyline annotations, do not provide x/y/z columns here. Instead, provide them in the polyline_points argument. If your polyline annotations have no properties or relationships, you may set df to None and pass only polyline_points.

    You may also provide additional columns to use as annotation properties, in which case their column names should be listed in the ‘properties’ argument. (See below.)

    If you provide a TableHandle, the handle’s reference will be unset before this function returns, deleting your data if you didn’t retain a reference to it yourself. (If you do retain a reference, it defeats the point of using a TableHandle in the first place.)

  • coord_space (CoordinateSpace | str | list[str] | dict[str, list]) –

    neuroglancer.coordinate_space.CoordinateSpace or equivalent. The coordinate space of the annotations. Among other things, this determines which input columns represent the annotation geometry. For convenience, we accept a couple different formats for the coordinate space, assuming a default scale of 1 nm if no scale/units are provided.

    Examples (all equivalent):

    >>> coord_space = "xyz"
    >>> coord_space = ['x', 'y', 'z']
    >>> coord_space = {"names": ['x', 'y', 'z']}
    >>> coord_space = {
        "names": ['x', 'y', 'z'],
        "units": ['nm', 'nm', 'nm'],
        "scales": [1, 1, 1]
    }
    >>> coord_space = CoordinateSpace(
    ...     names=['x', 'y', 'z'],
    ...     scales=[1.0, 1.0, 1.0],
    ...     units=['nm', 'nm', 'nm']
    ... )
    

  • annotation_type (Literal['point', 'line', 'ellipsoid', 'axis_aligned_bounding_box', 'polyline']) – Literal[‘point’, ‘line’, ‘ellipsoid’, ‘axis_aligned_bounding_box’, ‘polyline’] The type of annotation to export. Note that the columns you provide in the DataFrame depend on the annotation type.

  • properties (list[str] | list[AnnotationPropertySpec] | dict[str, AnnotationPropertySpec] | list[dict]) –

    If your dataframe contains columns for annotation properties, list the names of those columns here.

    Categorical columns will be automatically converted to integers with associated enum labels.

    To provide an rgb or rgba property such as ‘mycolor’, provide separate columns in your dataframe named ‘mycolor_r’, ‘mycolor_g’, ‘mycolor_b’ (and ‘mycolor_a’), and then include ‘mycolor’ in the properties list here.

    The full property spec for each property will be inferred from the column dtype, but if you want to explicitly override any property specs yourself, you can pass a list of AnnotationPropertySpec objects here instead of just listing column names.

    Property names must start with a lowercase letter and may contain only letters, numbers, and underscores.

  • relationships (list[str]) – list[str] If your annotations have related segment IDs, such relationships can be provided in the columns of your DataFrame. Each relationship should be listed in a single column, whose values are lists of segment IDs. In the special case where each annotation has exactly one related segment, the column may have dtype=np.uint64 instead of containing lists.

  • output_dir (str) – str The directory into which the exported annotations will be written. Subdirectories will be created for the “annotation ID index” and each “related object id index” as needed.

  • write_sharded (bool) – bool Whether to write the output as sharded files. The sharded format is preferable for most use cases. Without sharding, every annotation results in a separate file in the annotation ID index. Similarly, every related ID results in a separate file in the related ID index.

  • polyline_points (DataFrame | TableHandle | None) –

    DataFrame or TableHandle. Required when annotation_type='polyline'; must be None otherwise.

    One row per polyline vertex, with one column per coordinate axis plus an 'annotation_id' column indicating which polyline each vertex belongs to. For example, assuming coord_space.names == ['x', 'y', 'z'], then provide the following columns: [‘annotation_id’, ‘x’, ‘y’, ‘z’]. (For a polyline with N vertices, its annotation_id should appear N times.)

    For each polyline, the point order in the emitted annotation will match the order in which they appear in this dataframe.

    As with df, you may pass a TableHandle so the reference can be released as soon as the table has been consumed.

  • write_by_id (bool) – bool Whether to write the annotations to the “Annotation ID Index”. If False, skip writing.

  • write_relationships – bool Whether to write the relationships to the “Related Object ID Index”. If False, skip writing.

  • write_by_spatial_chunk (bool) – bool Whether to write the spatial index.

  • num_spatial_levels (int) – int The maximum number of spatial index levels to write. If not all levels are needed (because all annotations fit within the first N levels), then the actual number of levels written will be less than this value.

  • target_chunk_limit (int) –

    int For the spatial index, this is how many annotations we aim to place in each chunk (regardless of the level). If there are more annotations than fit within the specified num_spacial_levels while (approximately) adhering to the target_chunk_limit at each level, then the extra annotations will be assigned to the last level.

    Note

    Instead of specifying a valid limit here, you can disable subsampling in neuroglancer by setting this to the special value of 0. In our implementation, this is only valid when num_spatial_levels=1.

  • shuffle_before_assigning_spatial_levels (bool) – bool Whether to shuffle the annotations before assigning spatial levels. By default, we shuffle the annotations to avoid any bias in the spatial assignment, which is what the neuroglancer spec recommends. However, in some use-cases a bias may be desirable (e.g. deliberately preferring to show larger annotations when zoomed out). So if this is False, the annotations will be assigned to spatial levels in the order they appear in the input dataframe, with earlier annotations assigned to coarser spatial levels.

  • max_threads (int | None) – int or None Caps tensorstore’s internal thread pool (data-copy and file-I/O concurrency) when writing. Defaults to LSB_DJOB_NUMPROC on LSF clusters, otherwise multiprocessing.cpu_count().

  • max_shards_per_transaction (int | None) –

    int or None (Sharded mode only.) Caps the number of shards committed in a single tensorstore transaction. Tensorstore parallelizes the per-shard work (encode, compress, write) inside a transaction across its internal thread pool, so this knob trades RAM (more shards staged in memory at once) for throughput and effective CPU utilization (more parallel work available at commit).

    Defaults to max_threads so each transaction can saturate the available threads. Set higher for better throughput at extra RAM cost, or lower to reduce peak RAM.

  • description (str) – str A description of the annotation collection.

  • write_by_relationship (bool)

class ngsidekick.annotations.precomputed.TableHandle(df=None)[source]#

Bases: object

A wrapper for a pandas DataFrame that can be provided to transfer ownership of the DataFrame to write_precomputed_annotations(), which will delete the handle’s reference to the DataFrame as soon as possible to save RAM.

Example:

>>> handle = TableHandle(df)
>>> del df  # Delete your own reference to the original data
>>> write_precomputed_annotations(handle, 'xyz', 'point')
Parameters:

df (DataFrame | None)

df: DataFrame | None = None#