Benchmark project structure#

This page describes the expected folder structure and file naming conventions for pose estimation benchmark datasets.

Note

We mark requirements with italicised keywords that should be interpreted as described by the Network Working Group. In decreasing order of requirement, these are: must, should, and may.

Overview#

A benchmark dataset is organised into a Train and a Test split.
Each split contains one or more projects (i.e. datasets contributed by different groups).
Each project contains one or more sessions.
A session centres on a single video file (the session video), from which frames (individually sampled images) and optionally clips (short video segments) are extracted.
Frames and clips are accompanied by label files in COCO keypoints format.

The current scope is limited to single-animal pose estimation from a single camera view. Support for multi-camera setups is planned for a future version.

Folder structure#

.
├── Train/
│   └── <ProjectName>/
│       └── sub-<subjectID>_ses-<sessionID>/
│           ├── Frames/
│           │   ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.png
│           │   ├── ...
│           │   └── sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json
│           ├── Clips/    (optional)
│           │   ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4
│           │   ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json
│           │   └── ...
│           └── sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4
└── Test/
    └── <ProjectName>/
        └── sub-<subjectID>_ses-<sessionID>/
            ├── Frames/
            │   ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.png
            │   └── ...
            ├── Clips/    (optional)
            │   ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4
            │   ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_startlabels.json
            │   └── ...
            └── sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4

Note

The Test split follows the same structure as Train, but includes different label files (see Label format for details).

Train / Test#

The top level must contain a Train and a Test folder.
Each split must contain at least one project folder.
Each session must belong to exactly one split.

Project#

Each project must have exactly one project-level folder within a given split.
The project folder name should be descriptive and without spaces (e.g. SWC-plusmaze, IBL-headfixed, AIND-openfield).

Session#

Each session must have exactly one session-level folder within a project.
Session folder names must be formatted as sub-<subjectID>_ses-<sessionID>.
<subjectID> and <sessionID> must be strictly alphanumeric (i.e. only A-Z, a-z, 0-9).
A session folder must contain exactly one session video file at its root.
A session folder must contain a Frames folder.
A session folder may contain a Clips folder.

Examples

valid: sub-M708149_ses-20200317, sub-001_ses-01
invalid:
- mouse-M708149_ses-20200317: the first key should be sub.
- sub-M708149_20200317: missing the ses key.
- sub-M70_8149_ses-20200317: underscores are not allowed within values (ambiguous parsing).
- sub-M70-8149_ses-2020-03-17: hyphens are not allowed within values (ambiguous parsing).

Session video#

All video files (session videos and clips) should be in MP4 format (H.264 codec, yuv420p pixel format). Contributors should re-encode their videos to this format before submission (see SLEAP documentation for guidance).
Session video filenames must follow the pattern: sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4.

Frames#

The Frames folder contains individually sampled images. In the Train split, it also contains a label file with keypoint annotations.

Frames must be extracted from the session video.
Frame images should be in PNG format (.png). JPEG format (.jpg or .jpeg) may also be used.
Frame image filenames must follow the pattern: sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.<ext>, where <ext> is .png, .jpg, or .jpeg.
<frameID> must be the 0-based index of the frame in the session video.
<frameID> must be padded to a consistent width across all frame files within a session (e.g. 0000, 1000).
In the Train split, a single label file must be provided per camera view, named sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json. At present, only one camera view is included, so the split contains exactly one such label file. See Frame labels for details.

Clips#

A session may include a Clips folder containing short video segments and their label files.

Clips must be extracted from the session video and must have the same file format.
Clip filenames must follow the pattern: sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4.
<frameID> in the start field must be the 0-based index of the first frame of the clip in the session video, padded to a consistent width (e.g. 0500, 1000).
<nFrames> in the dur field must be the duration of the clip in number of frames (e.g. 5, 30).
A single label file must be provided per clip:
- In the Train split, the file is named sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json and contains keypoint annotations for every frame in the clip. See Clip labels for details.
- In the Test split, the file is named sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_startlabels.json and contains keypoint annotations only for the first frame of the clip. See Clip start labels for details.

File naming#

All filenames follow a key-value pair convention, similar to the BIDS standard and NeuroBlueprint.

Filenames must consist of key-value pairs separated by underscores, with keys and values separated by hyphens. A filename may end with an additional suffix (not a key-value pair) before the extension:
```
<key>-<value>_<key>-<value>.<extension>
<key>-<value>_<key>-<value>_<suffix>.<extension>
```
The recognised suffixes are:
- framelabels for frame label files.
- cliplabels for clip label files.
- startlabels for clip start label files.

The following keys are used:

Key	Description	Examples
`sub`	Subject identifier	`sub-001`, `sub-M708149`
`ses`	Session identifier	`ses-02`, `ses-25`, `ses-20200317`
`cam`	Camera identifier	`cam-topdown`, `cam-side2`
`frame`	0-based frame index in the session video	`frame-0000`, `frame-0500`, `frame-1000`
`start`	0-based frame index of the first frame of a clip in the session video	`start-0000`, `start-0500`, `start-1000`
`dur`	Clip duration in number of frames	`dur-5`, `dur-30`

The keys sub, ses, and cam must appear in every filename, in that order.
Key values must be strictly alphanumeric for sub, ses and cam (i.e. only A-Z, a-z, 0-9).
Key values must be strictly numeric for frame, start and dur (i.e. only 0-9).
Filenames must not contain spaces.

Label format#

The Train split includes ground-truth keypoint annotations both for the sampled frames (framelabels.json) and for entire clips (cliplabels.json), if present.
The Test split includes keypoint annotations only for the first frame of each clip (startlabels.json), if clips are present. Labels for frames and entire clips are withheld to support evaluation of pose estimation and point tracking methods.
Labels must be stored in the same folder as the corresponding frames or clips.
Labels must be stored in COCO keypoints format, with additional requirements described below. Each label file is a JSON file with images, annotations, and categories arrays. Image, annotation and category id values must be unique integers within a label file.

Note

Annotation and category id values should be 1-indexed. This convention follows sleap-io’s save_coco function and avoids conflicts with models that treat category 0 as background.

Image id values are always 0-indexed. The indexing origin differs for frame labels and clip labels, and clip start labels follow the same conventions as clip labels. Details are provided below.

Frame labels (`framelabels.json`)#

Frame labels must only exist in the Train split.
Within the Frames folder, there must be one frame label file per camera view, named sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json.
Each entry in the images array must have an id equal to the 0-based frame index in the session video (matching the <frameID> in the corresponding image filename).
Each entry in the images array must have a file_name that exactly matches the name of an existing frame image in the Frames folder (including the extension).

Example

For a session with 5 labelled frames sampled from different parts of the video, the images array would be:

[
  {"id": 1000, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-01000.png", "width": 1300, "height": 1028},
  {"id": 2300, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-02300.png", "width": 1300, "height": 1028},
  {"id": 3500, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-03500.png", "width": 1300, "height": 1028},
  {"id": 7200, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-07200.png", "width": 1300, "height": 1028},
  {"id": 19800, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-19800.png", "width": 1300, "height": 1028}
]

Here each id is the 0-based frame index in the session video (matching the <frameID> in the filename), and each file_name includes the .png extension.

Clip labels (`cliplabels.json`)#

Clip labels must only exist in the Train split.
If a Clips folder is present, there must be one clip label file per clip, named sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json.
The images array must contain an entry for every frame in the clip, in consecutive, monotonically increasing order (covering the entire clip duration).
Clip labels follow the same COCO keypoints format as frame labels, but with different conventions for image id and file_name values:
- Each image id must be the 0-based index of the frame within the clip (i.e. 0, 1, 2, …), not the index in the session video.
- Each file_name must follow the same pattern as frame image filenames, but without the extension. The frame field in the file_name must correspond to the index of that frame in the session video.

This means that each entry in the images array encodes two pieces of information: the id gives the local position within the clip, while the frame field in file_name gives the global position in the session video. Note that in both cases the indices are 0-based.

Example

For a clip starting at frame 1000 with a duration of 5 frames, the images array would be:

[
  {"id": 0, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1000", "width": 1300, "height": 1028},
  {"id": 1, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1001", "width": 1300, "height": 1028},
  {"id": 2, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1002", "width": 1300, "height": 1028},
  {"id": 3, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1003", "width": 1300, "height": 1028},
  {"id": 4, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1004", "width": 1300, "height": 1028}
]

Here id: 0 through id: 4 are the local clip indices, while frame-1000 through frame-1004 in the file_name values refer to the original frame positions in the session video.

Clip start labels (`startlabels.json`)#

Clip start labels must only exist in the Test split.
If a Clips folder is present, there must be one clip start label file per clip, named sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_startlabels.json.
Clip start labels provide keypoint annotations for the first frame of the clip only. They are intended for point-tracker evaluation, where the annotated points serve as the initial positions from which a tracker should propagate.
Clip start labels are identical to Clip labels, except that the images array must contain exactly one entry corresponding to the first frame of the clip, and therefore must have id: 0.

Example

For a clip starting at frame 1000 with a duration of 5 frames, the images array would be:

[
  {"id": 0, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1000", "width": 1300, "height": 1028}
]

Visibility encoding#

Keypoint visibility must use ternary encoding:
- 0: not labelled
- 1: labelled but not visible (occluded)
- 2: labelled and visible

Example#

Below is a concrete example project structure:

.
├── Train/
│   └── SWC-plusmaze/
│       └── sub-M708149_ses-20200317/
│           ├── Frames/
│           │   ├── sub-M708149_ses-20200317_cam-topdown_frame-01000.png
│           │   ├── sub-M708149_ses-20200317_cam-topdown_frame-02300.png
│           │   ├── sub-M708149_ses-20200317_cam-topdown_frame-03500.png
│           │   ├── sub-M708149_ses-20200317_cam-topdown_frame-07200.png
│           │   ├── sub-M708149_ses-20200317_cam-topdown_frame-19800.png
│           │   └── sub-M708149_ses-20200317_cam-topdown_framelabels.json
│           ├── Clips/
│           │   ├── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5.mp4
│           │   └── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5_cliplabels.json
│           └── sub-M708149_ses-20200317_cam-topdown.mp4
└── Test/
    └── SWC-plusmaze/
        └── sub-M235678_ses-20210415/
            ├── Frames/
            │   ├── sub-M235678_ses-20210415_cam-topdown_frame-00500.png
            │   ├── sub-M235678_ses-20210415_cam-topdown_frame-01200.png
            │   ├── sub-M235678_ses-20210415_cam-topdown_frame-04800.png
            │   ├── sub-M235678_ses-20210415_cam-topdown_frame-09100.png
            │   └── sub-M235678_ses-20210415_cam-topdown_frame-15300.png
            ├── Clips/
            │   ├── sub-M235678_ses-20210415_cam-topdown_start-0500_dur-5.mp4
            │   └── sub-M235678_ses-20210415_cam-topdown_start-0500_dur-5_startlabels.json
            └── sub-M235678_ses-20210415_cam-topdown.mp4

Benchmark project structure#

Overview#

Folder structure#

Train / Test#

Project#

Session#

Session video#

Frames#

Clips#

File naming#

Label format#

Frame labels (framelabels.json)#

Clip labels (cliplabels.json)#

Clip start labels (startlabels.json)#

Visibility encoding#

Example#

This Page

Frame labels (`framelabels.json`)#

Clip labels (`cliplabels.json`)#

Clip start labels (`startlabels.json`)#