# Benchmark project structure This page describes the expected folder structure and file naming conventions for pose estimation benchmark datasets. :::{note} We mark requirements with italicised *keywords* that should be interpreted as described by the [Network Working Group](https://www.ietf.org/rfc/rfc2119.txt). In decreasing order of requirement, these are: *must*, *should*, and *may*. ::: ## Overview - A benchmark dataset is organised into a `Train` and a `Test` split. - Each split contains one or more [projects](#project) (i.e. datasets contributed by different groups). - Each project contains one or more [sessions](#session). - A session centres on a single video file (the [session video](#session-video)), from which [frames](#frames) (individually sampled images) and optionally [clips](#clips) (short video segments) are extracted. - Frames and clips are accompanied by [label files](#label-format) in COCO keypoints format. The current scope is limited to **single-animal pose estimation** from a **single camera view**. Support for multi-camera setups is planned for a future version. ## Folder structure ``` . ├── Train/ │ └── / │ └── sub-_ses-/ │ ├── Frames/ │ │ ├── sub-_ses-_cam-_frame-.png │ │ ├── ... │ │ └── sub-_ses-_cam-_framelabels.json │ ├── Clips/ (optional) │ │ ├── sub-_ses-_cam-_start-_dur-.mp4 │ │ ├── sub-_ses-_cam-_start-_dur-_cliplabels.json │ │ └── ... │ └── sub-_ses-_cam-.mp4 └── Test/ └── / └── sub-_ses-/ ├── Frames/ │ ├── sub-_ses-_cam-_frame-.png │ └── ... ├── Clips/ (optional) │ ├── sub-_ses-_cam-_start-_dur-.mp4 │ ├── sub-_ses-_cam-_start-_dur-_startlabels.json │ └── ... └── sub-_ses-_cam-.mp4 ``` :::{note} The `Test` split follows the same structure as `Train`, but includes different label files (see [Label format](#label-format) for details). ::: ### Train / Test * The top level *must* contain a `Train` and a `Test` folder. * Each split *must* contain at least one project folder. * Each session *must* belong to exactly one split. ### Project * Each project *must* have exactly one project-level folder within a given split. * The project folder name *should* be descriptive and without spaces (e.g. `SWC-plusmaze`, `IBL-headfixed`, `AIND-openfield`). ### Session * Each session *must* have exactly one session-level folder within a project. * Session folder names *must* be formatted as `sub-_ses-`. * `` and `` *must* be strictly alphanumeric (i.e. only `A-Z`, `a-z`, `0-9`). * A session folder *must* contain exactly one session video file at its root. * A session folder *must* contain a `Frames` folder. * A session folder *may* contain a `Clips` folder. :::{admonition} Examples :class: tip * valid: `sub-M708149_ses-20200317`, `sub-001_ses-01` * invalid: * `mouse-M708149_ses-20200317`: the first key should be `sub`. * `sub-M708149_20200317`: missing the `ses` key. * `sub-M70_8149_ses-20200317`: underscores are not allowed within values (ambiguous parsing). * `sub-M70-8149_ses-2020-03-17`: hyphens are not allowed within values (ambiguous parsing). ::: ### Session video * All video files (session videos and clips) *should* be in MP4 format (H.264 codec, yuv420p pixel format). Contributors *should* re-encode their videos to this format before submission (see [SLEAP documentation](https://docs.sleap.ai/latest/help/#usage) for guidance). * Session video filenames *must* follow the pattern: `sub-_ses-_cam-.mp4`. ### Frames The `Frames` folder contains individually sampled images. In the `Train` split, it also contains a label file with keypoint annotations. * Frames *must* be extracted from the session video. * Frame images *should* be in PNG format (`.png`). JPEG format (`.jpg` or `.jpeg`) *may* also be used. * Frame image filenames *must* follow the pattern: `sub-_ses-_cam-_frame-.`, where `` is `.png`, `.jpg`, or `.jpeg`. * `` *must* be the 0-based index of the frame in the session video. * `` *must* be padded to a consistent width across all frame files within a session (e.g. `0000`, `1000`). * In the `Train` split, a single label file *must* be provided per camera view, named `sub-_ses-_cam-_framelabels.json`. At present, only one camera view is included, so the split contains exactly one such label file. See [Frame labels](target-framelabels) for details. ### Clips A session *may* include a `Clips` folder containing short video segments and their label files. * Clips *must* be extracted from the session video and *must* have the same file format. * Clip filenames *must* follow the pattern: `sub-_ses-_cam-_start-_dur-.mp4`. * `` in the `start` field *must* be the 0-based index of the first frame of the clip in the session video, padded to a consistent width (e.g. `0500`, `1000`). * `` in the `dur` field *must* be the duration of the clip in number of frames (e.g. `5`, `30`). * A single label file *must* be provided per clip: * In the `Train` split, the file is named `sub-_ses-_cam-_start-_dur-_cliplabels.json` and contains keypoint annotations for every frame in the clip. See [Clip labels](target-cliplabels) for details. * In the `Test` split, the file is named `sub-_ses-_cam-_start-_dur-_startlabels.json` and contains keypoint annotations only for the first frame of the clip. See [Clip start labels](target-startlabels) for details. ## File naming All filenames follow a key-value pair convention, similar to the [BIDS standard](https://bids-specification.readthedocs.io/en/stable/02-common-principles.html) and [NeuroBlueprint](https://neuroblueprint.neuroinformatics.dev/latest/specification.html). * Filenames *must* consist of key-value pairs separated by underscores, with keys and values separated by hyphens. A filename *may* end with an additional suffix (not a key-value pair) before the extension: ``` -_-. -_-_. ``` The recognised suffixes are: * `framelabels` for [frame label files](target-framelabels). * `cliplabels` for [clip label files](target-cliplabels). * `startlabels` for [clip start label files](target-startlabels). * The following keys are used: | Key | Description | Examples | |---------|------------------------------------------------|-----------------| | `sub` | Subject identifier | `sub-001`, `sub-M708149` | | `ses` | Session identifier | `ses-02`, `ses-25`, `ses-20200317` | | `cam` | Camera identifier | `cam-topdown`, `cam-side2` | | `frame` | 0-based frame index in the session video | `frame-0000`, `frame-0500`, `frame-1000` | | `start` | 0-based frame index of the first frame of a clip in the session video | `start-0000`, `start-0500`, `start-1000` | | `dur` | Clip duration in number of frames | `dur-5`, `dur-30` | * The keys `sub`, `ses`, and `cam` *must* appear in every filename, in that order. * Key values *must* be strictly alphanumeric for `sub`, `ses` and `cam` (i.e. only `A-Z`, `a-z`, `0-9`). * Key values *must* be strictly numeric for `frame`, `start` and `dur` (i.e. only `0-9`). * Filenames *must* not contain spaces. ## Label format * The `Train` split includes ground-truth keypoint annotations both for the sampled frames (`framelabels.json`) and for entire clips (`cliplabels.json`), if present. * The `Test` split includes keypoint annotations only for the first frame of each clip (`startlabels.json`), if clips are present. Labels for frames and entire clips are withheld to support evaluation of pose estimation and point tracking methods. * Labels *must* be stored in the same folder as the corresponding frames or clips. * Labels *must* be stored in [COCO keypoints format](https://cocodataset.org/#format-data), with additional requirements described below. Each label file is a JSON file with `images`, `annotations`, and `categories` arrays. Image, annotation and category `id` values *must* be unique integers within a label file. :::{note} Annotation and category `id` values *should* be 1-indexed. This convention follows sleap-io's [`save_coco`](https://io.sleap.ai/latest/reference/sleap_io/io/coco/) function and avoids conflicts with models that treat category `0` as background. Image `id` values are always 0-indexed. The indexing origin differs for frame labels and clip labels, and clip start labels follow the same conventions as clip labels. Details are provided below. ::: (target-framelabels)= ### Frame labels (`framelabels.json`) * Frame labels *must* only exist in the `Train` split. * Within the `Frames` folder, there *must* be one frame label file per camera view, named `sub-_ses-_cam-_framelabels.json`. * Each entry in the `images` array *must* have an `id` equal to the 0-based frame index in the session video (matching the `` in the corresponding image filename). * Each entry in the `images` array *must* have a `file_name` that exactly matches the name of an existing [frame image](#frames) in the `Frames` folder (including the extension). :::{admonition} Example :class: tip For a session with 5 labelled frames sampled from different parts of the video, the `images` array would be: ```json [ {"id": 1000, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-01000.png", "width": 1300, "height": 1028}, {"id": 2300, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-02300.png", "width": 1300, "height": 1028}, {"id": 3500, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-03500.png", "width": 1300, "height": 1028}, {"id": 7200, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-07200.png", "width": 1300, "height": 1028}, {"id": 19800, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-19800.png", "width": 1300, "height": 1028} ] ``` Here each `id` is the 0-based frame index in the session video (matching the `` in the filename), and each `file_name` includes the `.png` extension. ::: (target-cliplabels)= ### Clip labels (`cliplabels.json`) * Clip labels *must* only exist in the `Train` split. * If a `Clips` folder is present, there *must* be one clip label file per clip, named `sub-_ses-_cam-_start-_dur-_cliplabels.json`. * The `images` array *must* contain an entry for every frame in the clip, in consecutive, monotonically increasing order (covering the entire clip duration). * Clip labels follow the same COCO keypoints format as frame labels, but with different conventions for image `id` and `file_name` values: * Each image `id` *must* be the **0-based index of the frame within the clip** (i.e. `0`, `1`, `2`, ...), not the index in the session video. * Each `file_name` *must* follow the same pattern as [frame image filenames](#frames), but **without the extension**. The `frame` field in the `file_name` *must* correspond to the index of that frame in the **session video**. This means that each entry in the `images` array encodes two pieces of information: the `id` gives the local position within the clip, while the `frame` field in `file_name` gives the global position in the session video. Note that in both cases the indices are 0-based. :::{admonition} Example :class: tip For a clip starting at frame 1000 with a duration of 5 frames, the `images` array would be: ```json [ {"id": 0, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1000", "width": 1300, "height": 1028}, {"id": 1, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1001", "width": 1300, "height": 1028}, {"id": 2, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1002", "width": 1300, "height": 1028}, {"id": 3, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1003", "width": 1300, "height": 1028}, {"id": 4, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1004", "width": 1300, "height": 1028} ] ``` Here `id: 0` through `id: 4` are the local clip indices, while `frame-1000` through `frame-1004` in the `file_name` values refer to the original frame positions in the session video. ::: (target-startlabels)= ### Clip start labels (`startlabels.json`) * Clip start labels *must* only exist in the `Test` split. * If a `Clips` folder is present, there *must* be one clip start label file per clip, named `sub-_ses-_cam-_start-_dur-_startlabels.json`. * Clip start labels provide keypoint annotations for the **first frame of the clip only**. They are intended for point-tracker evaluation, where the annotated points serve as the initial positions from which a tracker should propagate. * Clip start labels are identical to [Clip labels](target-cliplabels), except that the `images` array *must* contain exactly one entry corresponding to the first frame of the clip, and therefore must have `id: 0`. :::{admonition} Example :class: tip For a clip starting at frame 1000 with a duration of 5 frames, the `images` array would be: ```json [ {"id": 0, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1000", "width": 1300, "height": 1028} ] ``` ::: ### Visibility encoding * Keypoint visibility *must* use ternary encoding: * `0`: not labelled * `1`: labelled but not visible (occluded) * `2`: labelled and visible ## Example Below is a concrete example project structure: ``` . ├── Train/ │ └── SWC-plusmaze/ │ └── sub-M708149_ses-20200317/ │ ├── Frames/ │ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-01000.png │ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-02300.png │ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-03500.png │ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-07200.png │ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-19800.png │ │ └── sub-M708149_ses-20200317_cam-topdown_framelabels.json │ ├── Clips/ │ │ ├── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5.mp4 │ │ └── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5_cliplabels.json │ └── sub-M708149_ses-20200317_cam-topdown.mp4 └── Test/ └── SWC-plusmaze/ └── sub-M235678_ses-20210415/ ├── Frames/ │ ├── sub-M235678_ses-20210415_cam-topdown_frame-00500.png │ ├── sub-M235678_ses-20210415_cam-topdown_frame-01200.png │ ├── sub-M235678_ses-20210415_cam-topdown_frame-04800.png │ ├── sub-M235678_ses-20210415_cam-topdown_frame-09100.png │ └── sub-M235678_ses-20210415_cam-topdown_frame-15300.png ├── Clips/ │ ├── sub-M235678_ses-20210415_cam-topdown_start-0500_dur-5.mp4 │ └── sub-M235678_ses-20210415_cam-topdown_start-0500_dur-5_startlabels.json └── sub-M235678_ses-20210415_cam-topdown.mp4 ```