Benchmark dataset#
This page describes the expected folder structure and file naming conventions for poseinterface benchmark datasets.
Note
We mark requirements with italicised keywords that should be interpreted as described by the Network Working Group. In decreasing order of requirement, these are: must, should, and may.
Overview#
A benchmark dataset is organised into a
Trainand aTestsplit.Each split contains one or more projects (i.e. datasets contributed by different groups).
Each project contains one or more sessions.
A session centres on a single video file (the session video), from which frames (individually sampled images) and optionally clips (short video segments) are extracted.
Frames and clips are accompanied by label files in COCO keypoints format.
The current scope is limited to single-animal pose estimation from a single camera view. Support for multi-camera setups is planned for a future version.
Folder structure#
Note
This specification describes both the contributed and the published versions of the dataset. Data contributors must provide full keypoint annotations (frame labels and clip labels) for both Train and Test splits. During the upload process, labels for the Test split are partially withheld to support evaluation. See Label format for details.
.
├── Train/
│ └── <ProjectName>/
│ └── sub-<subjectID>_ses-<sessionID>/
│ ├── Frames/
│ │ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.png
│ │ ├── ...
│ │ └── sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json
│ ├── Clips/ (optional)
│ │ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4
│ │ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json
│ │ └── ...
│ └── sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4
└── Test/
└── <ProjectName>/
└── sub-<subjectID>_ses-<sessionID>/
├── Frames/
│ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.png
│ ├── ...
│ └── sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json
├── Clips/ (optional)
│ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4
│ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json
│ └── ...
└── sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4
.
├── Train/
│ └── <ProjectName>/
│ └── sub-<subjectID>_ses-<sessionID>/
│ ├── Frames/
│ │ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.png
│ │ ├── ...
│ │ └── sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json
│ ├── Clips/ (optional)
│ │ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4
│ │ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json
│ │ └── ...
│ └── sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4
└── Test/
└── <ProjectName>/
└── sub-<subjectID>_ses-<sessionID>/
├── Frames/
│ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.png
│ └── ...
├── Clips/ (optional)
│ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4
│ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_startlabels.json
│ └── ...
└── sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4
Train / Test#
The top level must contain a
Trainand aTestfolder.Each split must contain at least one project folder.
Each session must belong to exactly one split.
Project#
Each project must have exactly one project-level folder within a given split.
The project folder name should be descriptive and without spaces (e.g.
SWC-plusmaze,IBL-headfixed,AIND-openfield).
Session#
Each session must have exactly one session-level folder within a project.
Session folder names must be formatted as
sub-<subjectID>_ses-<sessionID>.<subjectID>and<sessionID>must be strictly alphanumeric (i.e. onlyA-Z,a-z,0-9).A session folder must contain exactly one session video file at its root.
A session folder must contain a
Framesfolder.A session folder may contain a
Clipsfolder.
Examples
valid:
sub-M708149_ses-20200317,sub-001_ses-01invalid:
mouse-M708149_ses-20200317: the first key should besub.sub-M708149_20200317: missing theseskey.sub-M70_8149_ses-20200317: underscores are not allowed within values (ambiguous parsing).sub-M70-8149_ses-2020-03-17: hyphens are not allowed within values (ambiguous parsing).
Session video#
All video files (session videos and clips) should be in MP4 format (H.264 codec, yuv420p pixel format). Data contributors should re-encode their videos to this format before submission (see SLEAP documentation for guidance).
Session video filenames must follow the pattern:
sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4.
Frames#
The Frames folder contains individually sampled images and their label files.
Frames must be extracted from the session video.
Frame images should be in PNG format (
.png). JPEG format (.jpgor.jpeg) may also be used.Frame image filenames must follow the pattern:
sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.<ext>, where<ext>is.png,.jpg, or.jpeg.<frameID>must be the 0-based index of the frame in the session video.<frameID>must be padded to a consistent width across all frame files within a session (e.g.0000,1000).One frame label file (named
sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json) must be provided per camera view. At present, only one camera view is included, so each split contains exactly one such file. See Label format for differences between contributed and published versions.
Clips#
A session may include a Clips folder containing short video segments and their label files.
Clips must be extracted from the session video and must have the same file format.
Clip filenames must follow the pattern:
sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4.<frameID>in thestartfield must be the 0-based index of the first frame of the clip in the session video, padded to a consistent width (e.g.0500,1000).<nFrames>in thedurfield must be the duration of the clip in number of frames (e.g.5,30).One clip label file (named
sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json) must be provided per clip. See Label format for differences between contributed and published versions.
File naming#
All filenames follow a key-value pair convention, similar to the BIDS standard and NeuroBlueprint.
Filenames must consist of key-value pairs separated by underscores, with keys and values separated by hyphens. A filename may end with an additional suffix (not a key-value pair) before the extension:
<key>-<value>_<key>-<value>.<extension> <key>-<value>_<key>-<value>_<suffix>.<extension>
The recognised suffixes are:
framelabelsfor frame label files.cliplabelsfor clip label files.startlabelsfor clip start label files.
The following keys are used:
Key
Description
Value format
Examples
subSubject identifier
alphanumeric
sub-001,sub-M708149sesSession identifier
alphanumeric
ses-02,ses-25,ses-20200317camCamera identifier
alphanumeric
cam-topdown,cam-side2frame0-based frame index in the session video
numeric
frame-0000,frame-0500,frame-1000start0-based frame index of the first frame of a clip in the session video
numeric
start-0000,start-0500,start-1000durClip duration in number of frames
numeric
dur-5,dur-30The keys
sub,ses, andcammust appear in every filename, in that order.Filenames must not contain spaces.
Label format#
Data contributors must provide ground-truth keypoint annotations for both
TrainandTestsplits: frame labels (framelabels.json) for sampled frames, and clip labels (cliplabels.json) for entire clips, if present.In the published dataset, the
Trainsplit includes all submitted labels. TheTestsplit withholds frame labels and full clip labels to support evaluation; only clip start labels (startlabels.json), derived from the first frame of each clip’s annotations, are published.Labels must be stored in the same folder as the corresponding frames or clips.
Labels must be stored in COCO keypoints format, with additional requirements described below. Each label file is a JSON file with
images,annotations, andcategoriesarrays. Image, annotation and categoryidvalues must be unique integers within a label file.The
namefield in eachcategoriesentry should be the common English name of the species in lowercase (e.g."mouse","rat","zebrafish","macaque").
Note
Annotation and category id values should be 1-indexed. This convention follows sleap-io’s save_coco function and avoids conflicts with models that treat category 0 as background.
Image id values are always 0-indexed. The indexing origin differs for frame labels and clip labels, and clip start labels follow the same conventions as clip labels. Details are provided below.
Complete examples of label files are available in the repository under tests/data/Train.
Frame labels (framelabels.json)#
Within the
Framesfolder, there must be one frame label file per camera view, namedsub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json.Each entry in the
imagesarray must have anidequal to the 0-based frame index in the session video (matching the<frameID>in the corresponding image filename).Each entry in the
imagesarray must have afile_namethat exactly matches the name of an existing frame image in theFramesfolder (including the extension).
Example
For a session with 5 labelled frames sampled from different parts of the video, the images array would be:
[
{"id": 1000, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-01000.png", "width": 1300, "height": 1028},
{"id": 2300, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-02300.png", "width": 1300, "height": 1028},
{"id": 3500, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-03500.png", "width": 1300, "height": 1028},
{"id": 7200, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-07200.png", "width": 1300, "height": 1028},
{"id": 19800, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-19800.png", "width": 1300, "height": 1028}
]
Here each id is the 0-based frame index in the session video (matching the <frameID> in the filename), and each file_name includes the .png extension.
Clip labels (cliplabels.json)#
If a
Clipsfolder is present, there must be one clip label file per clip, namedsub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json.The
imagesarray must contain an entry for every frame in the clip, in consecutive, monotonically increasing order (covering the entire clip duration).Clip labels follow the same COCO keypoints format as frame labels, but with different conventions for image
idandfile_namevalues:Each image
idmust be the 0-based index of the frame within the clip (i.e.0,1,2, …), not the index in the session video.Each
file_namemust follow the same pattern as frame image filenames, but without the extension. Theframefield in thefile_namemust correspond to the index of that frame in the session video.
This means that each entry in the images array encodes two pieces of information: the id gives the local position within the clip, while the frame field in file_name gives the global position in the session video. Note that in both cases the indices are 0-based.
Example
For a clip starting at frame 1000 with a duration of 5 frames, the images array would be:
[
{"id": 0, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1000", "width": 1300, "height": 1028},
{"id": 1, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1001", "width": 1300, "height": 1028},
{"id": 2, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1002", "width": 1300, "height": 1028},
{"id": 3, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1003", "width": 1300, "height": 1028},
{"id": 4, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1004", "width": 1300, "height": 1028}
]
Here id: 0 through id: 4 are the local clip indices, while frame-1000 through frame-1004 in the file_name values refer to the original frame positions in the session video.
Intermediate file videolabels.json#
Note
This file is not a required part of a benchmark dataset. It is an intermediate cache file useful for data contributors when preparing labelled clips, and it is documented here only because it is optionally auto-discovered by the extract-clip command and the corresponding extract_clip() function.
A
videolabels.jsonfile uses the same schema ascliplabels.json, but it refers to a full video rather than to a clip of it.It is produced once per video (e.g. by converting model predictions for the entire video into the
cliplabelsschema) and reused to extract any number of clip label files from that video.When present alongside a session video as
sub-<subjectID>_ses-<sessionID>_cam-<camID>_videolabels.json, theextract-clipcommand will slice it into per-clipcliplabels.jsonfiles matching the requested frame ranges.In the
videolabels.jsonfile, each entry in theimageslist uses the 0-based frame index in the video as itsid(same convention as frame labels).
Clip start labels (startlabels.json)#
Clip start labels only exist in the published
Testsplit and are derived automatically from the contributed clip labels during the upload process.They are identical to clip labels, except that the
imagesarray must contain exactly one entry (the first frame of the clip, withid: 0). They are intended for point-tracker evaluation, where the annotated points serve as the initial positions from which a tracker should propagate.
Example
For a clip starting at frame 1000 with a duration of 5 frames, the images array would be:
[
{"id": 0, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1000", "width": 1300, "height": 1028}
]
Visibility encoding#
Keypoint visibility must use ternary encoding:
0: not labelled1: labelled but not visible (occluded)2: labelled and visible
Example#
Below is a concrete example. A matching example dataset (with label files) is available in the repository under tests/data/Train.
.
├── Train/
│ └── SWC-plusmaze/
│ └── sub-M708149_ses-20200317/
│ ├── Frames/
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-01000.png
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-02300.png
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-03500.png
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-07200.png
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-19800.png
│ │ └── sub-M708149_ses-20200317_cam-topdown_framelabels.json
│ ├── Clips/
│ │ ├── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5.mp4
│ │ └── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5_cliplabels.json
│ └── sub-M708149_ses-20200317_cam-topdown.mp4
└── Test/
└── SWC-plusmaze/
└── sub-M235678_ses-20210415/
├── Frames/
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-00500.png
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-01200.png
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-04800.png
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-09100.png
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-15300.png
│ └── sub-M235678_ses-20210415_cam-topdown_framelabels.json
├── Clips/
│ ├── sub-M235678_ses-20210415_cam-topdown_start-0500_dur-5.mp4
│ └── sub-M235678_ses-20210415_cam-topdown_start-0500_dur-5_cliplabels.json
└── sub-M235678_ses-20210415_cam-topdown.mp4
.
├── Train/
│ └── SWC-plusmaze/
│ └── sub-M708149_ses-20200317/
│ ├── Frames/
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-01000.png
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-02300.png
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-03500.png
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-07200.png
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-19800.png
│ │ └── sub-M708149_ses-20200317_cam-topdown_framelabels.json
│ ├── Clips/
│ │ ├── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5.mp4
│ │ └── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5_cliplabels.json
│ └── sub-M708149_ses-20200317_cam-topdown.mp4
└── Test/
└── SWC-plusmaze/
└── sub-M235678_ses-20210415/
├── Frames/
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-00500.png
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-01200.png
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-04800.png
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-09100.png
│ └── sub-M235678_ses-20210415_cam-topdown_frame-15300.png
├── Clips/
│ ├── sub-M235678_ses-20210415_cam-topdown_start-0500_dur-5.mp4
│ └── sub-M235678_ses-20210415_cam-topdown_start-0500_dur-5_startlabels.json
└── sub-M235678_ses-20210415_cam-topdown.mp4