DatasetApi¶
- class DatasetApi[source]¶
Bases:
supervisely.api.module_api.UpdateableModule,supervisely.api.module_api.RemoveableModuleApiAPI for working with
Dataset.DatasetApiobject is immutable.- Parameters
- api : Api
API connection to the server.
- Usage example
import os from dotenv import load_dotenv import supervisely as sly # Load secrets and create API object from .env file (recommended) # Learn more here: https://developer.supervisely.com/getting-started/basics-of-authentication if sly.is_development(): load_dotenv(os.path.expanduser("~/supervisely.env")) api = sly.Api.from_env() # Pass values into the API constructor (optional, not recommended) # api = sly.Api(server_address="https://app.supervise.ly", token="4r47N...xaTatb") project_id = 1951 ds = api.dataset.get_list(project_id)
Methods
_convert_info_to_json
Copies given Dataset in destination Project by ID.
Copy given Datasets to the destination Project by IDs.
Create Dataset with given name in the given Project.
Checks if the dataset with the given name exists in the project.
Generates a free name for an entity with the given parent_id and name.
Get Datasets information by ID.
Return Dataset information by name or None if Dataset does not exist.
Returns list of dataset in the given project, or list of nested datasets in the dataset with specified parent_id.
List all available datasets from all available teams for the user that match the specified filtering criteria.
Get list of all or limited quantity entities from the Supervisely server.
This generator function retrieves a list of all or a limited quantity of entities from the Supervisely server, yielding batches of entities as they are retrieved
Get the list of items for a given page number.
Yields list of images in dataset asynchronously page by page.
Returns a list of all nested datasets in the specified dataset.
Checks if Dataset with given name already exists in the Project, if not creates Dataset with the given name.
Returns a tree of all datasets in the project as a dictionary, where the keys are the DatasetInfo objects and the values are dictionaries containing the children of the dataset.
NamedTuple DatasetInfo information about Dataset.
NamedTuple name - DatasetInfo.
Moves given Dataset in destination Project by ID.
Moves given Datasets to the destination Project by IDs.
Moves dataset with specified ID to the dataset with specified destination ID.
Remove an entity with the specified ID from the Supervisely server.
Remove entities with given IDs from the Supervisely server.
!!! WARNING !!! Be careful, this method deletes data from the database, recovery is not possible.
Yields tuples of (path, dataset) for all datasets in the project.
Update Dataset information by given ID.
Update custom data for Dataset by given ID.
Attributes
MAX_WAIT_ATTEMPTSMaximum number of attempts that will be made to wait for a certain condition to be met.
WAIT_ATTEMPT_TIMEOUT_SECNumber of seconds for intervals between attempts.
- InfoType¶
alias of
supervisely.api.module_api.DatasetInfo
-
copy(dst_project_id, id, new_name=
None, change_name_if_conflict=False, with_annotations=False)[source]¶ Copies given Dataset in destination Project by ID.
- Parameters
- dst_project_id : int
Destination Project ID in Supervisely.
- id : int
ID of copied Dataset.
- new_name : str, optional
New Dataset name.
- change_name_if_conflict : bool, optional
Checks if given name already exists and adds suffix to the end of the name.
- with_annotations : bool, optional
If True copies Dataset with annotations, otherwise copies just items from Dataset without annotation.
- Returns
Information about Dataset. See
info_sequence- Return type
DatasetInfo- Usage example
import supervisely as sly os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() dst_proj_id = 1982 ds = api.dataset.get_list(dst_proj_id) print(len(ds)) # 0 new_ds = api.dataset.copy(dst_proj_id, id=2540, new_name="banana", with_annotations=True) ds = api.dataset.get_list(dst_proj_id) print(len(ds)) # 1
-
copy_batch(dst_project_id, ids, new_names=
None, change_name_if_conflict=False, with_annotations=False)[source]¶ Copy given Datasets to the destination Project by IDs.
- Parameters
- dst_project_id : int
Destination Project ID in Supervisely.
- ids : List[int]
IDs of copied Datasets.
- new_names : List[str], optional
New Datasets names.
- change_name_if_conflict : bool, optional
Checks if given name already exists and adds suffix to the end of the name.
- with_annotations : bool, optional
If True copies Datasets with annotations, otherwise copies just items from Datasets without annotations.
- Raises
RuntimeErrorif can not match “ids” and “new_names” lists, len(ids) != len(new_names)- Returns
Information about Datasets. See
info_sequence- Return type
List[DatasetInfo]- Usage example
import supervisely as sly os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() dst_proj_id = 1980 ds = api.dataset.get_list(dst_proj_id) print(len(ds)) # 0 ds_ids = [2532, 2557] ds_names = ["lemon_test", "kiwi_test"] copied_datasets = api.dataset.copy_batch(dst_proj_id, ids=ds_ids, new_names=ds_names, with_annotations=True) ds = api.dataset.get_list(dst_proj_id) print(len(ds)) # 2
-
create(project_id, name, description=
'', change_name_if_conflict=False, parent_id=None)[source]¶ Create Dataset with given name in the given Project.
- Parameters
- project_id : int
Project ID in Supervisely where Dataset will be created.
- name : str
Dataset Name.
- description : str, optional
Dataset description.
- change_name_if_conflict : bool, optional
Checks if given name already exists and adds suffix to the end of the name.
- parent_id :
Optional[int] Parent Dataset ID. If set to None, then the Dataset will be created at the top level of the Project, otherwise the Dataset will be created in a specified Dataset.
- Returns
Information about Dataset. See
info_sequence- Return type
DatasetInfo- Usage example
import supervisely as sly project_id = 116482 os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() ds_info = api.dataset.get_list(project_id) print(len(ds_info)) # 1 new_ds = api.dataset.create(project_id, 'new_ds') new_ds_info = api.dataset.get_list(project_id) print(len(new_ds_info)) # 2
-
exists(project_id, name, parent_id=
None)[source]¶ Checks if the dataset with the given name exists in the project. If parent_id is not None, the search will be performed in the specified Dataset.
- get_free_name(parent_id, name)¶
Generates a free name for an entity with the given parent_id and name. Adds an increasing suffix to original name until a unique name is found.
- Parameters
- Returns
Returns free name.
- Return type
- Usage example
import supervisely as sly # You can connect to API directly address = 'https://app.supervise.ly/' token = 'Your Supervisely API Token' api = sly.Api(address, token) # Or you can use API from environment os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() name = "IMG_0315.jpeg" dataset_id = 55832 free_name = api.image.get_free_name(dataset_id, name) print(free_name) # IMG_0315_001.jpeg
-
get_info_by_id(id, raise_error=
False)[source]¶ Get Datasets information by ID.
- Parameters
- id : int
Dataset ID in Supervisely.
- Returns
Information about Dataset. See
info_sequence- Return type
DatasetInfo- Usage example
import supervisely as sly dataset_id = 384126 os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() ds_info = api.dataset.get_info_by_id(dataset_id)
-
get_info_by_name(project_id, name, fields=
None, parent_id=None)[source]¶ Return Dataset information by name or None if Dataset does not exist. If parent_id is not None, the search will be performed in the specified Dataset. Otherwise the search will be performed at the top level of the Project.
- Parameters
- Returns
Information about Dataset. See
info_sequence- Return type
Union[DatasetInfo, None]
-
get_list(project_id, filters=
None, recursive=False, parent_id=None)[source]¶ Returns list of dataset in the given project, or list of nested datasets in the dataset with specified parent_id. To get list of all datasets including nested, recursive parameter should be set to True. Otherwise, the method will return only datasets in the top level.
- Parameters
- project_id : int
Project ID in which the Datasets are located.
- filters : List[dict], optional
List of params to sort output Datasets.
- recursive : bool, optional
If True, returns all Datasets from the given Project including nested Datasets.
- parent_id :
Optional[int] Parent Dataset ID. If set to None, the search will be performed at the top level of the Project, otherwise the search will be performed in the specified Dataset.
- Returns
List of all Datasets with information for the given Project. See
info_sequence- Return type
List[DatasetInfo]- Usage example
import supervisely as sly project_id = 1951 os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() ds = api.dataset.get_list(project_id) print(ds) # Output: [ # DatasetInfo(id=2532, # name="lemons", # description="", # size="861069", # project_id=1951, # images_count=6, # items_count=6, # created_at="2021-03-02T10:04:33.973Z", # updated_at="2021-03-10T09:31:50.341Z", # reference_image_url="http://app.supervise.ly/z6ut6j8bnaz1vj8aebbgs4-public/images/original/...jpg"), # DatasetInfo(id=2557, # name="kiwi", # description="", # size="861069", # project_id=1951, # images_count=6, # items_count=6, # created_at="2021-03-10T09:31:33.701Z", # updated_at="2021-03-10T09:31:44.196Z", # reference_image_url="http://app.supervise.ly/h5un6l2bnaz1vj8a9qgms4-public/images/original/...jpg") # ]
-
get_list_all(filters=
None, sort=None, sort_order=None, per_page=None, page='all')[source]¶ List all available datasets from all available teams for the user that match the specified filtering criteria.
- Parameters
- filters : List[Dict[str, str]], optional
List of parameters for filtering the available Datasets. Every Dict must consist of keys: - ‘field’: Takes values ‘id’, ‘projectId’, ‘workspaceId’, ‘groupId’, ‘createdAt’, ‘updatedAt’ - ‘operator’: Takes values ‘=’, ‘eq’, ‘!=’, ‘not’, ‘in’, ‘!in’, ‘>’, ‘gt’, ‘>=’, ‘gte’, ‘<’, ‘lt’, ‘<=’, ‘lte’ - ‘value’: Takes on values according to the meaning of ‘field’ or null
- sort : str, optional
Specifies by which parameter to sort the project list. Takes values ‘id’, ‘name’, ‘size’, ‘createdAt’, ‘updatedAt’
- sort_order : str, optional
Determines which value to list from.
- per_page : int, optional
Number of first items found to be returned. ‘None’ will return the first page with a default size of 20000 datasets.
- page : Union[int, Literal["all"]], optional
Page number, used to retrieve the following items if the number of them found is more than per_page. The default value is ‘all’, which retrieves all available datasets. ‘None’ will return the first page with datasets, the amount of which is set in param ‘per_page’.
- Returns
Search response information and ‘DatasetInfo’ of all datasets that are searched by a given criterion.
- Return type
- Usage example
import supervisely as sly import os os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() filter_1 = { "field": "updatedAt", "operator": "<", "value": "2023-12-03T14:53:00.952Z" } filter_2 = { "field": "updatedAt", "operator": ">", "value": "2023-04-03T14:53:00.952Z" } filters = [filter_1, filter_2] datasets = api.dataset.get_list_all(filters) print(datasets) # Output: # { # "total": 2, # "perPage": 20000, # "pagesCount": 1, # "entities": [ DatasetInfo(id = 16, # name = 'ds1', # description = None, # size = '861069', # project_id = 22, # images_count = None, # items_count = None, # created_at = '2020-04-03T13:43:24.000Z', # updated_at = '2020-04-03T14:53:00.952Z', # reference_image_url = None, # team_id = 2, # workspace_id = 2), # DatasetInfo(id = 17, # name = 'ds1', # description = None, # size = '1177212', # project_id = 23, # images_count = None, # items_count = None, # created_at = '2020-04-03T13:43:24.000Z', # updated_at = '2020-04-03T14:53:00.952Z', # reference_image_url = None, # team_id = 2, # workspace_id = 2 # ) # ] # }
-
get_list_all_pages(method, data, progress_cb=
None, convert_json_info_cb=None, limit=None, return_first_response=False)¶ Get list of all or limited quantity entities from the Supervisely server.
- Parameters
- method : str
Request method name
- data : dict
Dictionary with request body info
- progress_cb : Progress, optional
Function for tracking download progress.
- convert_json_info_cb : Callable, optional
Function for convert json info
- limit : int, optional
Number of entity to retrieve
- return_first_response : bool, optional
Specify if return first response
-
get_list_all_pages_generator(method, data, progress_cb=
None, convert_json_info_cb=None, limit=None, return_first_response=False)¶ This generator function retrieves a list of all or a limited quantity of entities from the Supervisely server, yielding batches of entities as they are retrieved
- Parameters
- method : str
Request method name
- data : dict
Dictionary with request body info
- progress_cb : Progress, optional
Function for tracking download progress.
- convert_json_info_cb : Callable, optional
Function for convert json info
- limit : int, optional
Number of entity to retrieve
- return_first_response : bool, optional
Specify if return first response
- async get_list_idx_page_async(method, data)¶
Get the list of items for a given page number. Page number is specified in the data dictionary.
-
async get_list_page_generator_async(method, data, pages_count=
None, semaphore=None)¶ Yields list of images in dataset asynchronously page by page.
- Parameters
- method : str
Method to call for listing items.
- data : dict
Data to pass to the API method.
- pages_count : int, optional
Preferred number of pages to retrieve if used with a
per_pagelimit. Will be automatically adjusted if thepagesCountdiffers from the requested number.- semaphore :
asyncio.Semaphore, optional Semaphore for limiting the number of simultaneous requests.
- kwargs
Additional arguments.
- Returns
List of images in dataset.
- Return type
AsyncGenerator[List[ImageInfo]]
- Usage example
import supervisely as sly import asyncio os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() method = 'images.list' data = { 'datasetId': 123456 } loop = sly.utils.get_or_create_event_loop() images = loop.run_until_complete(api.image.get_list_generator_async(method, data))
- get_nested(project_id, dataset_id)[source]¶
Returns a list of all nested datasets in the specified dataset.
- Parameters
- Returns
List of nested datasets.
- Return type
List[DatasetInfo]
- Usage example
import supervisely as sly api = sly.Api.from_env() project_id = 123 dataset_id = 456 datasets = api.dataset.get_nested(project_id, dataset_id) for dataset in datasets: print(dataset.name, dataset.id) # Output: ds1 123
-
get_or_create(project_id, name, description=
'', parent_id=None)[source]¶ Checks if Dataset with given name already exists in the Project, if not creates Dataset with the given name. If parent id is specified then the search will be performed in the specified Dataset, otherwise the search will be performed at the top level of the Project.
- Parameters
- project_id : int
Project ID in Supervisely.
- name : str
Dataset name.
- description : str, optional
Dataset description.
- parent_id : Union[int, None]
Parent Dataset ID. If set to None, then the Dataset will be created at the top level of the Project, otherwise the Dataset will be created in a specified Dataset.
- Returns
Information about Dataset. See
info_sequence- Return type
DatasetInfo- Usage example
import supervisely as sly project_id = 116482 os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() ds_info = api.dataset.get_list(project_id) print(len(ds_info)) # 1 api.dataset.get_or_create(project_id, 'ds1') ds_info = api.dataset.get_list(project_id) print(len(ds_info)) # 1 api.dataset.get_or_create(project_id, 'new_ds') ds_info = api.dataset.get_list(project_id) print(len(ds_info)) # 2
- get_tree(project_id)[source]¶
Returns a tree of all datasets in the project as a dictionary, where the keys are the DatasetInfo objects and the values are dictionaries containing the children of the dataset. Recommended to use with the dataset_tree method to iterate over the tree.
- Parameters
- project_id : int
Project ID for which the tree is built.
- Returns
Dictionary of datasets and their children.
- Return type
Dict[DatasetInfo, Dict]
- Usage example
import supervisely as sly api = sly.Api.from_env() project_id = 123 dataset_tree = api.dataset.get_tree(project_id) print(dataset_tree) # Output: # { # DatasetInfo(id=2532, name="lemons", description="", ...: { # DatasetInfo(id=2557, name="kiwi", description="", ...: {} # } # }
- static info_sequence()[source]¶
NamedTuple DatasetInfo information about Dataset.
- Example
DatasetInfo(id=452984, name='ds0', description='', size='3997776', project_id=118909, images_count=11, items_count=11, created_at='2021-03-03T15:54:08.802Z', updated_at='2021-03-16T09:31:37.063Z', reference_image_url='https://app.supervise.ly/h5un6l2bnaz1vj8a9qgms4-public/images/original/K/q/jf/...png'), team_id=1, workspace_id=2
-
move(dst_project_id, id, new_name=
None, change_name_if_conflict=False, with_annotations=False)[source]¶ Moves given Dataset in destination Project by ID.
- Parameters
- dst_project_id : int
Destination Project ID in Supervisely.
- id : int
ID of moved Dataset.
- new_name : str, optional
New Dataset name.
- change_name_if_conflict : bool, optional
Checks if given name already exists and adds suffix to the end of the name.
- with_annotations : bool, optional
If True moves Dataset with annotations, otherwise moves just items from Dataset without annotation.
- Returns
Information about Dataset. See
info_sequence- Return type
DatasetInfo- Usage example
import supervisely as sly os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() dst_proj_id = 1985 ds = api.dataset.get_list(dst_proj_id) print(len(ds)) # 0 new_ds = api.dataset.move(dst_proj_id, id=2550, new_name="cucumber", with_annotations=True) ds = api.dataset.get_list(dst_proj_id) print(len(ds)) # 1
-
move_batch(dst_project_id, ids, new_names=
None, change_name_if_conflict=False, with_annotations=False)[source]¶ Moves given Datasets to the destination Project by IDs.
- Parameters
- dst_project_id : int
Destination Project ID in Supervisely.
- ids : List[int]
IDs of moved Datasets.
- new_names : List[str], optional
New Datasets names.
- change_name_if_conflict : bool, optional
Checks if given name already exists and adds suffix to the end of the name.
- with_annotations : bool, optional
If True moves Datasets with annotations, otherwise moves just items from Datasets without annotations.
- Raises
RuntimeErrorif can not match “ids” and “new_names” lists, len(ids) != len(new_names)- Returns
Information about Datasets. See
info_sequence- Return type
List[DatasetInfo]- Usage example
import supervisely as sly os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() dst_proj_id = 1978 ds = api.dataset.get_list(dst_proj_id) print(len(ds)) # 0 ds_ids = [2545, 2560] ds_names = ["banana_test", "mango_test"] movied_datasets = api.dataset.move_batch(dst_proj_id, ids=ds_ids, new_names=ds_names, with_annotations=True) ds = api.dataset.get_list(dst_proj_id) print(len(ds)) # 2
- move_to_dataset(dataset_id, destination_dataset_id)[source]¶
Moves dataset with specified ID to the dataset with specified destination ID.
- Parameters
- Usage example
import supervisely as sly api = sly.Api.from_env() dataset_id = 123 destination_dataset_id = 456 api.dataset.move_to_dataset(dataset_id, destination_dataset_id)- Return type
- remove(id)¶
Remove an entity with the specified ID from the Supervisely server.
- Parameters
- id : int
Entity ID in Supervisely
-
remove_batch(ids, progress_cb=
None)¶ Remove entities with given IDs from the Supervisely server.
- Parameters
- ids : List[int]
IDs of entities in Supervisely.
- progress_cb : Callable
Function for control remove progress.
-
remove_permanently(ids, batch_size=
50, progress_cb=None)[source]¶ !!! WARNING !!! Be careful, this method deletes data from the database, recovery is not possible.
Delete permanently datasets with given IDs from the Supervisely server. All dataset IDs must belong to the same team. Therefore, it is necessary to sort IDs before calling this method.
- Parameters
- ids : Union[int, List]
IDs of datasets in Supervisely.
- batch_size : int, optional
The number of entities that will be deleted by a single API call. This value must be in the range 1-50 inclusive, if you set a value out of range it will automatically adjust to the boundary values.
- progress_cb : Callable, optional
Function for control delete progress.
- Returns
A list of response content in JSON format for each API call.
- Return type
List[dict]
- tree(project_id)[source]¶
Yields tuples of (path, dataset) for all datasets in the project. Path of the dataset is a list of parents, e.g. [“ds1”, “ds2”, “ds3”]. For root datasets, the path is an empty list.
- Parameters
- project_id : int
Project ID in which the Dataset is located.
- Returns
Generator of tuples of (path, dataset).
- Return type
Generator[Tuple[List[str], DatasetInfo], None, None]
- Usage example
import supervisely as sly api = sly.Api.from_env() project_id = 123 for parents, dataset in api.dataset.tree(project_id): parents: List[str] dataset: sly.DatasetInfo print(parents, dataset.name) # Output: # [] ds1 # ["ds1"] ds2 # ["ds1", "ds2"] ds3
-
update(id, name=
None, description=None, custom_data=None)[source]¶ Update Dataset information by given ID.
- Parameters
- Returns
Information about Dataset. See
info_sequence- Return type
DatasetInfo- Usage example
import supervisely as sly dataset_id = 384126 os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() new_ds = api.dataset.update(dataset_id, name='new_ds', description='new description')
- update_custom_data(id, custom_data)[source]¶
Update custom data for Dataset by given ID. Custom data is a dictionary that can store any additional information about the Dataset.
- Parameters
- id : int
Dataset ID in Supervisely.
- custom_data : Dict[Any, Any]
New custom data.
- Returns
Information about Dataset. See
info_sequence- Return type
DatasetInfo- Usage example
import supervisely as sly dataset_id = 384126 os.environ['SERVER_ADDRESS'] = 'https://app.supervisely.com' os.environ['API_TOKEN'] = 'Your Supervisely API Token' api = sly.Api.from_env() new_ds = api.dataset.update_custom_data(dataset_id, custom_data={'key': 'value'})