DatasetApi

class DatasetApi[source]

Bases: supervisely.api.module_api.UpdateableModule, supervisely.api.module_api.RemoveableModuleApi

API for working with Dataset. DatasetApi object is immutable.

Parameters
api : Api

API connection to the server.

Usage example
import os
from dotenv import load_dotenv

import supervisely as sly

# Load secrets and create API object from .env file (recommended)
# Learn more here: https://developer.supervisely.com/getting-started/basics-of-authentication
if sly.is_development():
    load_dotenv(os.path.expanduser("~/supervisely.env"))
api = sly.Api.from_env()

# Pass values into the API constructor (optional, not recommended)
# api = sly.Api(server_address="https://app.supervise.ly", token="4r47N...xaTatb")

project_id = 1951
ds = api.dataset.get_list(project_id)

Methods

copy

Copies given Dataset in destination Project by ID.

copy_batch

Copy given Datasets to the destination Project by IDs.

create

Create Dataset with given name in the given Project.

exists

Checks if an entity with the given parent_id and name exists

get_free_name

Generates a free name for an entity with the given parent_id and name.

get_info_by_id

Get Datasets information by ID.

get_info_by_name

Return Dataset information by name or None if Dataset does not exist.

get_list

Returns list of dataset in the given project, or list of nested datasets in the dataset with specified parent_id.

get_list_all

List all available datasets from all available teams for the user that match the specified filtering criteria.

get_list_all_pages

Get list of all or limited quantity entities from the Supervisely server.

get_list_all_pages_generator

This generator function retrieves a list of all or a limited quantity of entities from the Supervisely server, yielding batches of entities as they are retrieved

get_or_create

Checks if Dataset with given name already exists in the Project, if not creates Dataset with the given name.

get_tree

Returns a tree of all datasets in the project as a dictionary, where the keys are the DatasetInfo objects and the values are dictionaries containing the children of the dataset.

info_sequence

NamedTuple DatasetInfo information about Dataset.

info_tuple_name

NamedTuple name - DatasetInfo.

move

Moves given Dataset in destination Project by ID.

move_batch

Moves given Datasets to the destination Project by IDs.

move_to_dataset

Moves dataset with specified ID to the dataset with specified destination ID.

remove

Remove an entity with the specified ID from the Supervisely server.

remove_batch

Remove entities with given IDs from the Supervisely server.

remove_permanently

!!! WARNING !!! Be careful, this method deletes data from the database, recovery is not possible.

tree

Yields tuples of (path, dataset) for all datasets in the project.

update

Attributes

MAX_WAIT_ATTEMPTS

Maximum number of attempts that will be made to wait for a certain condition to be met.

WAIT_ATTEMPT_TIMEOUT_SEC

Number of seconds for intervals between attempts.

InfoType

alias of supervisely.api.module_api.DatasetInfo

copy(dst_project_id, id, new_name=None, change_name_if_conflict=False, with_annotations=False)[source]

Copies given Dataset in destination Project by ID.

Parameters
dst_project_id : int

Destination Project ID in Supervisely.

id : int

ID of copied Dataset.

new_name : str, optional

New Dataset name.

change_name_if_conflict : bool, optional

Checks if given name already exists and adds suffix to the end of the name.

with_annotations : bool, optional

If True copies Dataset with annotations, otherwise copies just items from Dataset without annotation.

Returns

Information about Dataset. See info_sequence

Return type

DatasetInfo

Usage example
import supervisely as sly

os.environ['SERVER_ADDRESS'] = 'https://app.supervise.ly'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()

dst_proj_id = 1982
ds = api.dataset.get_list(dst_proj_id)
print(len(ds)) # 0

new_ds = api.dataset.copy(dst_proj_id, id=2540, new_name="banana", with_annotations=True)
ds = api.dataset.get_list(dst_proj_id)
print(len(ds)) # 1
copy_batch(dst_project_id, ids, new_names=None, change_name_if_conflict=False, with_annotations=False)[source]

Copy given Datasets to the destination Project by IDs.

Parameters
dst_project_id : int

Destination Project ID in Supervisely.

ids : List[int]

IDs of copied Datasets.

new_names : List[str], optional

New Datasets names.

change_name_if_conflict : bool, optional

Checks if given name already exists and adds suffix to the end of the name.

with_annotations : bool, optional

If True copies Datasets with annotations, otherwise copies just items from Datasets without annotations.

Raises

RuntimeError if can not match “ids” and “new_names” lists, len(ids) != len(new_names)

Returns

Information about Datasets. See info_sequence

Return type

List[DatasetInfo]

Usage example
import supervisely as sly

os.environ['SERVER_ADDRESS'] = 'https://app.supervise.ly'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()

dst_proj_id = 1980
ds = api.dataset.get_list(dst_proj_id)
print(len(ds)) # 0

ds_ids = [2532, 2557]
ds_names = ["lemon_test", "kiwi_test"]

copied_datasets = api.dataset.copy_batch(dst_proj_id, ids=ds_ids, new_names=ds_names, with_annotations=True)
ds = api.dataset.get_list(dst_proj_id)
print(len(ds)) # 2
create(project_id, name, description='', change_name_if_conflict=False, parent_id=None)[source]

Create Dataset with given name in the given Project.

Parameters
project_id : int

Project ID in Supervisely where Dataset will be created.

name : str

Dataset Name.

description : str, optional

Dataset description.

change_name_if_conflict : bool, optional

Checks if given name already exists and adds suffix to the end of the name.

parent_id : Optional[int]

Parent Dataset ID. If set to None, then the Dataset will be created at the top level of the Project, otherwise the Dataset will be created in a specified Dataset.

Returns

Information about Dataset. See info_sequence

Return type

DatasetInfo

Usage example
import supervisely as sly

project_id = 116482

os.environ['SERVER_ADDRESS'] = 'https://app.supervise.ly'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()

ds_info = api.dataset.get_list(project_id)
print(len(ds_info)) # 1

new_ds = api.dataset.create(project_id, 'new_ds')
new_ds_info = api.dataset.get_list(project_id)
print(len(new_ds_info)) # 2
exists(parent_id, name)

Checks if an entity with the given parent_id and name exists

Parameters
parent_id : int

ID of the parent entity.

name : str

Name of the entity.

Returns

Returns True if entity exists, and False if not

Return type

bool

Usage example
import supervisely as sly

# You can connect to API directly
address = 'https://app.supervise.ly/'
token = 'Your Supervisely API Token'
api = sly.Api(address, token)

# Or you can use API from environment
os.environ['SERVER_ADDRESS'] = 'https://app.supervise.ly'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()


name = "IMG_0315.jpeg"
dataset_id = 55832
exists = api.image.exists(dataset_id, name)
print(exists) # True
get_free_name(parent_id, name)

Generates a free name for an entity with the given parent_id and name. Adds an increasing suffix to original name until a unique name is found.

Parameters
parent_id : int

ID of the parent entity.

name : str

Name of the entity.

Returns

Returns free name.

Return type

str

Usage example
import supervisely as sly

# You can connect to API directly
address = 'https://app.supervise.ly/'
token = 'Your Supervisely API Token'
api = sly.Api(address, token)

# Or you can use API from environment
os.environ['SERVER_ADDRESS'] = 'https://app.supervise.ly'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()


name = "IMG_0315.jpeg"
dataset_id = 55832
free_name = api.image.get_free_name(dataset_id, name)
print(free_name) # IMG_0315_001.jpeg
get_info_by_id(id, raise_error=False)[source]

Get Datasets information by ID.

Parameters
id : int

Dataset ID in Supervisely.

Returns

Information about Dataset. See info_sequence

Return type

DatasetInfo

Usage example
import supervisely as sly

dataset_id = 384126

os.environ['SERVER_ADDRESS'] = 'https://app.supervise.ly'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()

ds_info = api.dataset.get_info_by_id(dataset_id)
get_info_by_name(project_id, name, fields=None, parent_id=None)[source]

Return Dataset information by name or None if Dataset does not exist. If parent_id is not None, the search will be performed in the specified Dataset. Otherwise the search will be performed at the top level of the Project.

Parameters
project_id : int

Project ID in which the Dataset is located.

name : str

Dataset name.

fields : List[str], optional

List of fields to return. If None, then all fields are returned.

parent_id : Union[int, None]

Parent Dataset ID. If the Dataset is not nested, then the value is None.

Returns

Information about Dataset. See info_sequence

Return type

Union[DatasetInfo, None]

get_list(project_id, filters=None, recursive=False, parent_id=None)[source]

Returns list of dataset in the given project, or list of nested datasets in the dataset with specified parent_id. To get list of all datasets including nested, recursive parameter should be set to True. Otherwise, the method will return only datasets in the top level.

Parameters
project_id : int

Project ID in which the Datasets are located.

filters : List[dict], optional

List of params to sort output Datasets.

parent_id : Optional[int]

Parent Dataset ID. If set to None, the search will be performed at the top level of the Project, otherwise the search will be performed in the specified Dataset.

Recursive

If True, returns all Datasets from the given Project including nested Datasets.

Returns

List of all Datasets with information for the given Project. See info_sequence

Return type

List[DatasetInfo]

Usage example
import supervisely as sly

project_id = 1951

os.environ['SERVER_ADDRESS'] = 'https://app.supervise.ly'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()
ds = api.dataset.get_list(project_id)

print(ds)
# Output: [
#     DatasetInfo(id=2532,
#                 name="lemons",
#                 description="",
#                 size="861069",
#                 project_id=1951,
#                 images_count=6,
#                 items_count=6,
#                 created_at="2021-03-02T10:04:33.973Z",
#                 updated_at="2021-03-10T09:31:50.341Z",
#                 reference_image_url="http://app.supervise.ly/z6ut6j8bnaz1vj8aebbgs4-public/images/original/...jpg"),
#                 DatasetInfo(id=2557,
#                 name="kiwi",
#                 description="",
#                 size="861069",
#                 project_id=1951,
#                 images_count=6,
#                 items_count=6,
#                 created_at="2021-03-10T09:31:33.701Z",
#                 updated_at="2021-03-10T09:31:44.196Z",
#                 reference_image_url="http://app.supervise.ly/h5un6l2bnaz1vj8a9qgms4-public/images/original/...jpg")
# ]
get_list_all(filters=None, sort=None, sort_order=None, per_page=None, page='all')[source]

List all available datasets from all available teams for the user that match the specified filtering criteria.

Parameters
filters : List[Dict[str, str]], optional

List of parameters for filtering the available Datasets. Every Dict must consist of keys: - ‘field’: Takes values ‘id’, ‘projectId’, ‘workspaceId’, ‘groupId’, ‘createdAt’, ‘updatedAt’ - ‘operator’: Takes values ‘=’, ‘eq’, ‘!=’, ‘not’, ‘in’, ‘!in’, ‘>’, ‘gt’, ‘>=’, ‘gte’, ‘<’, ‘lt’, ‘<=’, ‘lte’ - ‘value’: Takes on values according to the meaning of ‘field’ or null

sort : str, optional

Specifies by which parameter to sort the project list. Takes values ‘id’, ‘name’, ‘size’, ‘createdAt’, ‘updatedAt’

sort_order : str, optional

Determines which value to list from.

per_page : int, optional

Number of first items found to be returned. ‘None’ will return the first page with a default size of 20000 datasets.

page : Union[int, Literal["all"]], optional

Page number, used to retrieve the following items if the number of them found is more than per_page. The default value is ‘all’, which retrieves all available datasets. ‘None’ will return the first page with datasets, the amount of which is set in param ‘per_page’.

Returns

Search response information and ‘DatasetInfo’ of all datasets that are searched by a given criterion.

Return type

dict

Usage example

import supervisely as sly
import os

os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()

filter_1 = {
    "field": "updatedAt",
    "operator": "<",
    "value": "2023-12-03T14:53:00.952Z"
}
filter_2 = {
    "field": "updatedAt",
    "operator": ">",
    "value": "2023-04-03T14:53:00.952Z"
}
filters = [filter_1, filter_2]
datasets = api.dataset.get_list_all(filters)
print(datasets)
# Output:
# {
#     "total": 2,
#     "perPage": 20000,
#     "pagesCount": 1,
#     "entities": [ DatasetInfo(id = 16,
#                       name = 'ds1',
#                       description = None,
#                       size = '861069',
#                       project_id = 22,
#                       images_count = None,
#                       items_count = None,
#                       created_at = '2020-04-03T13:43:24.000Z',
#                       updated_at = '2020-04-03T14:53:00.952Z',
#                       reference_image_url = None,
#                       team_id = 2,
#                       workspace_id = 2),
#                   DatasetInfo(id = 17,
#                       name = 'ds1',
#                       description = None,
#                       size = '1177212',
#                       project_id = 23,
#                       images_count = None,
#                       items_count = None,
#                       created_at = '2020-04-03T13:43:24.000Z',
#                       updated_at = '2020-04-03T14:53:00.952Z',
#                       reference_image_url = None,
#                       team_id = 2,
#                       workspace_id = 2
#                       )
#                 ]
# }
get_list_all_pages(method, data, progress_cb=None, convert_json_info_cb=None, limit=None, return_first_response=False)

Get list of all or limited quantity entities from the Supervisely server.

Parameters
method : str

Request method name

data : dict

Dictionary with request body info

progress_cb : Progress, optional

Function for tracking download progress.

convert_json_info_cb : Callable, optional

Function for convert json info

limit : int, optional

Number of entity to retrieve

return_first_response : bool, optional

Specify if return first response

get_list_all_pages_generator(method, data, progress_cb=None, convert_json_info_cb=None, limit=None, return_first_response=False)

This generator function retrieves a list of all or a limited quantity of entities from the Supervisely server, yielding batches of entities as they are retrieved

Parameters
method : str

Request method name

data : dict

Dictionary with request body info

progress_cb : Progress, optional

Function for tracking download progress.

convert_json_info_cb : Callable, optional

Function for convert json info

limit : int, optional

Number of entity to retrieve

return_first_response : bool, optional

Specify if return first response

get_or_create(project_id, name, description='', parent_id=None)[source]

Checks if Dataset with given name already exists in the Project, if not creates Dataset with the given name. If parent id is specified then the search will be performed in the specified Dataset, otherwise the search will be performed at the top level of the Project.

Parameters
project_id : int

Project ID in Supervisely.

name : str

Dataset name.

description : str, optional

Dataset description.

parent_id : Union[int, None]

Parent Dataset ID. If set to None, then the Dataset will be created at the top level of the Project, otherwise the Dataset will be created in a specified Dataset.

Returns

Information about Dataset. See info_sequence

Return type

DatasetInfo

Usage example
import supervisely as sly

project_id = 116482

os.environ['SERVER_ADDRESS'] = 'https://app.supervise.ly'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()

ds_info = api.dataset.get_list(project_id)
print(len(ds_info)) # 1

api.dataset.get_or_create(project_id, 'ds1')
ds_info = api.dataset.get_list(project_id)
print(len(ds_info)) # 1

api.dataset.get_or_create(project_id, 'new_ds')
ds_info = api.dataset.get_list(project_id)
print(len(ds_info)) # 2
get_tree(project_id)[source]

Returns a tree of all datasets in the project as a dictionary, where the keys are the DatasetInfo objects and the values are dictionaries containing the children of the dataset. Recommended to use with the dataset_tree method to iterate over the tree.

Parameters
project_id : int

Project ID for which the tree is built.

Returns

Dictionary of datasets and their children.

Return type

Dict[DatasetInfo, Dict]

Usage example

import supervisely as sly

api = sly.Api.from_env()

project_id = 123

dataset_tree = api.dataset.get_tree(project_id)
print(dataset_tree)
# Output:
# {
#     DatasetInfo(id=2532, name="lemons", description="", ...: {
#         DatasetInfo(id=2557, name="kiwi", description="", ...: {}
#     }
# }
static info_sequence()[source]

NamedTuple DatasetInfo information about Dataset.

Example
DatasetInfo(id=452984,
            name='ds0',
            description='',
            size='3997776',
            project_id=118909,
            images_count=11,
            items_count=11,
            created_at='2021-03-03T15:54:08.802Z',
            updated_at='2021-03-16T09:31:37.063Z',
            reference_image_url='https://app.supervise.ly/h5un6l2bnaz1vj8a9qgms4-public/images/original/K/q/jf/...png'),
            team_id=1,
            workspace_id=2
static info_tuple_name()[source]

NamedTuple name - DatasetInfo.

move(dst_project_id, id, new_name=None, change_name_if_conflict=False, with_annotations=False)[source]

Moves given Dataset in destination Project by ID.

Parameters
dst_project_id : int

Destination Project ID in Supervisely.

id : int

ID of moved Dataset.

new_name : str, optional

New Dataset name.

change_name_if_conflict : bool, optional

Checks if given name already exists and adds suffix to the end of the name.

with_annotations : bool, optional

If True moves Dataset with annotations, otherwise moves just items from Dataset without annotation.

Returns

Information about Dataset. See info_sequence

Return type

DatasetInfo

Usage example
import supervisely as sly

os.environ['SERVER_ADDRESS'] = 'https://app.supervise.ly'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()

dst_proj_id = 1985
ds = api.dataset.get_list(dst_proj_id)
print(len(ds)) # 0

new_ds = api.dataset.move(dst_proj_id, id=2550, new_name="cucumber", with_annotations=True)
ds = api.dataset.get_list(dst_proj_id)
print(len(ds)) # 1
move_batch(dst_project_id, ids, new_names=None, change_name_if_conflict=False, with_annotations=False)[source]

Moves given Datasets to the destination Project by IDs.

Parameters
dst_project_id : int

Destination Project ID in Supervisely.

ids : List[int]

IDs of moved Datasets.

new_names : List[str], optional

New Datasets names.

change_name_if_conflict : bool, optional

Checks if given name already exists and adds suffix to the end of the name.

with_annotations : bool, optional

If True moves Datasets with annotations, otherwise moves just items from Datasets without annotations.

Raises

RuntimeError if can not match “ids” and “new_names” lists, len(ids) != len(new_names)

Returns

Information about Datasets. See info_sequence

Return type

List[DatasetInfo]

Usage example
import supervisely as sly

os.environ['SERVER_ADDRESS'] = 'https://app.supervise.ly'
os.environ['API_TOKEN'] = 'Your Supervisely API Token'
api = sly.Api.from_env()

dst_proj_id = 1978
ds = api.dataset.get_list(dst_proj_id)
print(len(ds)) # 0

ds_ids = [2545, 2560]
ds_names = ["banana_test", "mango_test"]

movied_datasets = api.dataset.move_batch(dst_proj_id, ids=ds_ids, new_names=ds_names, with_annotations=True)
ds = api.dataset.get_list(dst_proj_id)
print(len(ds)) # 2
move_to_dataset(dataset_id, destination_dataset_id)[source]

Moves dataset with specified ID to the dataset with specified destination ID.

Parameters
dataset_id : int

ID of the dataset to be moved.

destination_dataset_id : int

ID of the destination dataset.

Usage example
import supervisely as sly

api = sly.Api.from_env()

dataset_id = 123
destination_dataset_id = 456

api.dataset.move_to_dataset(dataset_id, destination_dataset_id)
Return type

None

remove(id)

Remove an entity with the specified ID from the Supervisely server.

Parameters
id : int

Entity ID in Supervisely

remove_batch(ids, progress_cb=None)

Remove entities with given IDs from the Supervisely server.

Parameters
ids : List[int]

IDs of entities in Supervisely.

progress_cb : Callable

Function for control remove progress.

remove_permanently(ids, batch_size=50, progress_cb=None)[source]

!!! WARNING !!! Be careful, this method deletes data from the database, recovery is not possible.

Delete permanently datasets with given IDs from the Supervisely server. All dataset IDs must belong to the same team. Therefore, it is necessary to sort IDs before calling this method.

Parameters
ids : Union[int, List]

IDs of datasets in Supervisely.

batch_size : int, optional

The number of entities that will be deleted by a single API call. This value must be in the range 1-50 inclusive, if you set a value out of range it will automatically adjust to the boundary values.

progress_cb : Callable, optional

Function for control delete progress.

Returns

A list of response content in JSON format for each API call.

Return type

List[dict]

tree(project_id)[source]

Yields tuples of (path, dataset) for all datasets in the project. Path of the dataset is a list of parents, e.g. [“ds1”, “ds2”, “ds3”]. For root datasets, the path is an empty list.

Parameters
project_id : int

Project ID in which the Dataset is located.

Returns

Generator of tuples of (path, dataset).

Return type

Generator[Tuple[List[str], DatasetInfo], None, None]

Usage example

import supervisely as sly

api = sly.Api.from_env()

project_id = 123

for parents, dataset in api.dataset.tree(project_id):
    parents: List[str]
    dataset: sly.DatasetInfo
    print(parents, dataset.name)


# Output:
# [] ds1
# ["ds1"] ds2
# ["ds1", "ds2"] ds3
update(id, name=None, description=None)