fs¶
Functions
|
Create tar archive from directory and optionally split it into parts of specified size. |
|
Change directory name in path by index. |
|
Recursively delete a directory tree, but save root directory. |
|
|
|
Copy file from one path to another, if destination directory doesn't exist it will be created. |
|
Asynchronously copy file from one path to another, if destination directory doesn't exist it will be created. |
|
Check whether directory is empty or not. |
|
Check whether directory exists or not. |
|
Generator that yields paths to directories that meet the requirements of the check_function. |
|
Generator that yields paths to directories that contain markers files. |
|
Load image from url to host by target path. |
|
Recursively create parent directory for target path. |
|
Check whether file exists or not. |
|
Get the size of a directory. |
|
Extracts file extension from a given path. |
|
Get hash from target file. |
|
Get hash from target file asynchronously. |
|
Get hash from target file by reading it in chunks. |
|
Asynchronously get hash from target file by reading it in chunks. |
|
Extracts file name from a given path. |
|
Extracts file name with ext from a given path. |
|
Extracts offset information for files from TAR archives and returns a generator that yields the information in batches. |
|
Get the size of a file. |
|
Get list containing the names of the directories in the given directory. |
|
Returns a dictionary representing the directory tree. |
|
Converts global path to relative path. |
|
Creates a hard link pointing to src named dst. |
|
Creates a hard links pointing to src named dst files recursively. |
|
Checks if the file is an archive by its mimetype using list of the most common archive mimetypes. |
|
Check if remote_path starts is on agent (e.g. starts with 'agent://<agent-id>/'). |
|
Recursively walks through directory and returns list with all file paths, and optionally subdirectory paths. |
|
Returns list with file paths presented in given directory. |
|
Recursively walks through directory and returns list with all file paths. |
|
Recursively list files in the directory asynchronously. |
|
Get tree for target directory and displays it in the log. |
|
Creates a leaf directory and all intermediate ones. |
|
Return agent id and path in agent folder from remote_path. |
|
Recursively delete a directory tree. |
|
Cleans the given directory from junk files and dirs (e.g. .DS_Store, __MACOSX, Thumbs.db, etc.). |
|
Processes blob file locally and creates a pickle file with offset information. |
|
Remove file which may not exist. |
|
Check if string is a valid URL. |
|
Returns integer representation of byte size from string representation. |
|
Generator that yields directories in the directory tree, starting from the level below the root directory and then going down the tree. |
|
Sets access and modification times for a file. |
|
Sets access and modification times for a file asynchronously. |
|
Get tree for target directory. |
|
Unpacks archive to the target directory, removes junk files and directories. |
|
Unpacks archive to the target directory, removes junk files and directories. |
Description
File system utilities for Supervisely.
-
archive_directory(dir_, tar_path, split=
None, chunk_size_mb=50)[source]¶ Create tar archive from directory and optionally split it into parts of specified size. You can adjust the size of the chunk to read from the file, while archiving the file into parts. Be careful with this parameter, it can affect the performance of the function. When spliting, if the size of split is less than the chunk size, the chunk size will be adjusted to fit the split size.
- Parameters:
- dir : str
Target directory path.
- tar_path : str¶
Path for output tar archive.
- split : Union[int, str]¶
Split archive into parts of specified size (in bytes) or size with suffix (e.g. ‘1Kb’ = 1024, ‘1Mb’ = 1024 * 1024). Default is None.
- chunk_size_mb : int¶
Size of the chunk to read from the file. Default is 50Mb.
- Returns:
None or list of archive parts if split is not None
- Return type:
Union[None, List[str]]
- Usage Example:
from supervisely.io.fs import archive_directory # If split is not needed. archive_directory('/home/admin/work/projects/examples', '/home/admin/work/examples.tar') # If split is specified. archive_parts_paths = archive_directory('/home/admin/work/projects/examples', '/home/admin/work/examples/archive.tar', split=1000000) print(archive_parts_paths) # ['/home/admin/work/examples/archive.tar.001', '/home/admin/work/examples/archive.tar.002']
- change_directory_at_index(path, dir_name, dir_index)[source]¶
Change directory name in path by index. If you use counting from the end, keep in mind that if the path ends with a file, the file will be assigned to the last index.
- Parameters:
- Returns:
New path
- Return type:
- Raises:
IndexError – If the catalog index is out of bounds for a given path
- Usage Example:
import supervisely as sly input_path = 'head/dir_1/file.txt' new_path = sly.io.fs.change_directory_at_index(input_path, 'dir_2', -2) print(new_path)
-
clean_dir(dir_, ignore_errors=
True)[source]¶ Recursively delete a directory tree, but save root directory.
- Parameters:
- dir : str
Target directory path.
- Ignore_errors:
Ignore possible errors while removes directory content.
- Ignore_errors:
bool
- Returns:
None
- Return type:
None
- Usage Example:
from supervisely.io.fs import clean_dir clean_dir('/home/admin/work/projects/examples')
- copy_file(src, dst)[source]¶
Copy file from one path to another, if destination directory doesn’t exist it will be created.
-
async copy_file_async(src, dst, progress_cb=
None, progress_cb_type='size')[source]¶ Asynchronously copy file from one path to another, if destination directory doesn’t exist it will be created.
- Parameters:
- Returns:
None
- Return type:
None
- Usage Example:
import supervisely as sly from supervisely._utils import run_coroutine coroutine = sly.fs.copy_file_async('/home/admin/work/projects/example/1.png', '/home/admin/work/tests/2.png') run_coroutine(coroutine)
- dirs_filter(input_path, check_function)[source]¶
Generator that yields paths to directories that meet the requirements of the check_function.
- Parameters:
- Usage Example:
import supervisely as sly input_path = '/home/admin/work/projects/examples' # Prepare the check function. def check_function(directory) -> bool: images_dir = os.path.join(directory, "images") annotations_dir = os.path.join(directory, "annotations") return os.path.isdir(images_dir) and os.path.isdir(annotations_dir) for directory in sly.fs.dirs(input_path, check_function): # Now you can be sure that the directory meets the requirements. # Do something with it. print(directory)
-
dirs_with_marker(input_path, markers, check_function=
None, ignore_case=False)[source]¶ Generator that yields paths to directories that contain markers files. If the check_function is specified, then the markered directory will be yielded only if the check_function returns True. The check_function must take a single argument - the path to the markered directory and return True or False.
- Parameters:
- input_path : str¶
path to the directory in which the search will be performed
- markers : Union[str, List[str]]¶
single marker or list of markers (e.g. ‘config.json’ or [‘config.json’, ‘config.yaml’])
- check_function : Callable¶
function to check that directory meets the requirements and returns bool
- ignore_case : bool¶
ignore case when searching for markers
- Usage Example:
import supervisely as sly input_path = '/home/admin/work/projects/examples' # You can pass a string if you have only one marker. # markers = 'config.json' # Or a list of strings if you have several markers. # There's no need to pass one marker in different cases, you can use ignore_case=True for this. markers = ['config.json', 'config.yaml'] # Check function is optional, if you don't need the directories to meet any requirements, # you can omit it. def check_function(dir_path): test_file_path = os.path.join(dir_path, 'test.txt') return os.path.exists(test_file_path) for directory in sly.fs.dirs_with_marker(input_path, markers, check_function, ignore_case=True): # Now you can be sure that the directory contains the markers and meets the requirements. # Do something with it. print(directory)
-
download(url, save_path, cache=
None, progress=None, headers=None, timeout=None)[source]¶ Load image from url to host by target path.
- Parameters:
- url : str¶
Target file path.
- url¶
The path where the file is saved.
- cache=
None¶ An instance of
FileCacheclass that provides caching functionality for the downloaded content. If None, caching is disabled.- progress : Progress, optional¶
Function for tracking download progress.
- headers : Dict, optional.¶
A dictionary of HTTP headers to include in the request.
- timeout : int, optional.¶
The maximum number of seconds to wait for a response from the server. If the server does not respond within the timeout period, a TimeoutError is raised.
- Returns:
Full path to downloaded image
- Return type:
- Usage Example:
from supervisely.io.fs import download img_link = 'https://m.media-amazon.com/images/M/MV5BMTYwOTEwNjAzMl5BMl5BanBnXkFtZTcwODc5MTUwMw@@._V1_.jpg' im_path = download(img_link, '/home/admin/work/projects/examples/avatar.jpeg') print(im_path) # Output: # /home/admin/work/projects/examples/avatar.jpeg # if you need to specify some headers headers = {'User-Agent': 'Mozilla/5.0'} im_path = download(img_link, '/home/admin/work/projects/examples/avatar.jpeg', headers=headers) print(im_path) # Output: # /home/admin/work/projects/examples/avatar.jpeg
- file_exists(path)[source]¶
Check whether file exists or not.
- Parameters:
- dir : str
Target file path.
- Returns:
True if file exists, False otherwise.
- Return type:
- Usage Example:
from supervisely.io.fs import file_exists file_exists('/home/admin/work/projects/examples/1.jpeg') # True file_exists('/home/admin/work/projects/examples/not_exist_file.jpeg') # False
-
get_file_hash_chunked(path, chunk_size=
1048576)[source]¶ Get hash from target file by reading it in chunks.
- Parameters:
- Returns:
File hash as a base64 encoded string.
- Return type:
- Usage Example:
from supervisely.io.fs import get_file_hash_chunked file_hash = get_file_hash_chunked('/home/admin/work/projects/examples/1.jpeg') print(file_hash) # Example output: rKLYA/p/P64dzidaQ/G7itxIz3ZCVnyUhEE9fSMGxU4=
-
async get_file_hash_chunked_async(path, chunk_size=
1048576)[source]¶ Asynchronously get hash from target file by reading it in chunks.
-
get_file_offsets_batch_generator(archive_path, team_file_id=
None, filter_func=None, output_format='dicts', batch_size=10000)[source]¶ Extracts offset information for files from TAR archives and returns a generator that yields the information in batches.
team_file_idmay be None if it’s not possible to obtain the ID at this moment. You can set theteam_file_idlater when uploading the file to Supervisely.- Parameters:
- archive_path : str¶
Local path to the archive
- team_file_id : Optional[int]¶
ID of file in Team Files. Default is None.
team_file_idmay be None if it’s not possible to obtain the ID at this moment. You can set theteam_file_idlater when uploading the file to Supervisely.- filter_func : Callable, optional¶
Function to filter files. The function should take a filename as input and return True if the file should be included.
- output_format : Literal["dicts", "objects"]¶
Format of the output. Default is
dicts.objects- returns a list of BlobImageInfo objects.dicts- returns a list of dictionaries.
- Returns:
Generator yielding batches of file information in the specified format.
- Return type:
Generator[Union[List[Dict], List[
BlobImageInfo]]], None, None]- Raises:
ValueError – If the archive type is not supported or contains compressed files
- Usage Example:
import supervisely as sly archive_path = '/home/admin/work/projects/examples.tar' file_infos = sly.fs.get_file_offsets_batch_generator(archive_path) for batch in file_infos: print(batch) # Output: # [ # { # "title": "image1.jpg", # "teamFileId": None, # "sourceBlob": { # "offsetStart": 0, # "offsetEnd": 123456 # } # }, # { # "title": "image2.jpg", # "teamFileId": None, # "sourceBlob": { # "offsetStart": 123456, # "offsetEnd": 234567 # } # } # ]
-
get_subdirs(dir_path, recursive=
False)[source]¶ Get list containing the names of the directories in the given directory.
- Parameters:
- Returns:
List containing directories names.
- Return type:
- Usage Example:
from supervisely.io.fs import get_subdirs subdirs = get_subdirs('/home/admin/work/projects/examples') print(subdirs) # Output: ['tests', 'users', 'ds1']
- get_subdirs_tree(dir_path)[source]¶
Returns a dictionary representing the directory tree. It will have only directories and subdirectories (not files).
- Parameters:
- Returns:
Dictionary representing the directory tree.
- Return type:
Dict[str, Union[str, Dict]]- Usage Example:
from supervisely.io.fs import get_subdirs_tree tree = get_subdirs_tree('/home/admin/work/projects/examples') print(tree) # Output: {'examples': {'tests': {}, 'users': {}, 'ds1': {}}}
- global_to_relative(global_path, base_dir)[source]¶
Converts global path to relative path.
- Parameters:
- Returns:
Relative path.
- Return type:
- Usage Example:
from supervisely.io.fs import global_to_relative relative_path = global_to_relative('/home/admin/work/projects/examples/1.jpeg', '/home/admin/work/projects') print(relative_path) # Output: examples/1.jpeg
- hardlink_or_copy_file(src, dst)[source]¶
Creates a hard link pointing to src named dst. If the link cannot be created, the file will be copied.
- hardlink_or_copy_tree(src, dst)[source]¶
Creates a hard links pointing to src named dst files recursively. If the link cannot be created, the file will be copied.
- is_archive(file_path)[source]¶
Checks if the file is an archive by its mimetype using list of the most common archive mimetypes.
- is_on_agent(remote_path)[source]¶
Check if remote_path starts is on agent (e.g. starts with ‘agent://<agent-id>/’).
-
list_dir_recursively(dir, include_subdirs=
False, use_global_paths=False)[source]¶ Recursively walks through directory and returns list with all file paths, and optionally subdirectory paths.
- Parameters:
- Returns:
List containing file paths, and optionally subdirectory paths.
- Return type:
List[str]- Usage Example:
import supervisely as sly list_dir = sly.fs.list_dir_recursively("/home/admin/work/projects/lemons_annotated/") print(list_dir) # Output: ['meta.json', 'ds1/ann/IMG_0748.jpeg.json', 'ds1/ann/IMG_4451.jpeg.json', 'ds1/img/IMG_0748.jpeg', 'ds1/img/IMG_4451.jpeg']
-
list_files(dir, valid_extensions=
None, filter_fn=None, ignore_valid_extensions_case=False)[source]¶ Returns list with file paths presented in given directory. Can be filtered by valid extensions and filter function. Also can be case insensitive for valid extensions.
- Parameters:
- dir¶
Target dir path.
- dir¶
str
- valid_extensions : List[str]¶
List with valid file extensions.
- filter_fn : Callable, optional¶
Function with a single argument. Argument is a file path. Function determines whether to keep a given file path. Must return True or False.
- ignore_valid_extensions_case : bool¶
If True, validation of file extensions will be case insensitive.
- Returns:
List with file paths
- Return type:
List[str]- Usage Example:
import supervisely as sly list_files = sly.fs.list_files("/home/admin/work/projects/lemons_annotated/ds1/img/") print(list_files) # Output: ['/home/admin/work/projects/lemons_annotated/ds1/img/IMG_0748.jpeg', '/home/admin/work/projects/lemons_annotated/ds1/img/IMG_4451.jpeg']
-
list_files_recursively(dir, valid_extensions=
None, filter_fn=None, ignore_valid_extensions_case=False)[source]¶ Recursively walks through directory and returns list with all file paths. Can be filtered by valid extensions and filter function.
- Parameters:
- dir¶
Target dir path.
- dir¶
str
- valid_extensions : List[str], optional¶
List with valid file extensions.
- filter_fn : Callable, optional¶
Function with a single argument. Argument is a file path. Function determines whether to keep a given file path. Must return True or False.
- ignore_valid_extensions_case : bool¶
If True, validation of file extensions will be case insensitive.
- Returns:
List with file paths
- Return type:
List[str]- Usage Example:
import supervisely as sly list_files = sly.fs.list_files_recursively("/home/admin/work/projects/lemons_annotated/ds1/img/") print(list_files) # Output: ['/home/admin/work/projects/lemons_annotated/ds1/img/IMG_0748.jpeg', '/home/admin/work/projects/lemons_annotated/ds1/img/IMG_4451.jpeg']
-
async list_files_recursively_async(dir_path, valid_extensions=
None, filter_fn=None, ignore_valid_extensions_case=False)[source]¶ Recursively list files in the directory asynchronously. Returns list with all file paths. Can be filtered by valid extensions and filter function.
- Parameters:
- Returns:
List of file paths
- Return type:
List[str]
- Usage Example:
import supervisely as sly from supervisely._utils import run_coroutine dir_path = '/home/admin/work/projects/examples' coroutine = sly.fs.list_files_recursively_async(dir_path) files = run_coroutine(coroutine)
-
log_tree(dir_path, logger, level=
'info')[source]¶ Get tree for target directory and displays it in the log.
-
mkdir(dir, remove_content_if_exists=
False)[source]¶ Creates a leaf directory and all intermediate ones.
- parse_agent_id_and_path(remote_path)[source]¶
Return agent id and path in agent folder from remote_path.
- Parameters:
- Returns:
agent id and path in agent folder
- Return type:
- Raises:
ValueError – if remote_path doesn’t start with ‘agent://<agent-id>/’
- Usage Example:
import os from dotenv import load_dotenv import supervisely as sly # Load secrets and create API object from .env file (recommended) # Learn more here: https://developer.supervisely.com/getting-started/basics-of-authentication if sly.is_development(): load_dotenv(os.path.expanduser("~/supervisely.env")) api = sly.Api.from_env() # Parse agent id and path in agent folder from remote_path remote_path = "agent://1/agent_folder/subfolder/file.txt" agent_id, path_in_agent_folder = sly.fs.parse_agent_id_and_path(remote_path) print(agent_id) # 1 print(path_in_agent_folder) # /agent_folder/subfolder/file.txt
- remove_dir(dir_)[source]¶
Recursively delete a directory tree.
- Parameters:
- dir : str
Target directory path.
- Returns:
None
- Return type:
None
- Usage Example:
from supervisely.io.fs import remove_dir remove_dir('/home/admin/work/projects/examples')
- remove_junk_from_dir(dir)[source]¶
Cleans the given directory from junk files and dirs (e.g. .DS_Store, __MACOSX, Thumbs.db, etc.).
-
save_blob_offsets_pkl(blob_file_path, output_dir, team_file_id=
None, filter_func=None, batch_size=10000, replace=False)[source]¶ Processes blob file locally and creates a pickle file with offset information.
- Parameters:
- blob_file_path : str¶
Path to the local blob file
- output_dir : str¶
Path to the output directory
- team_file_id : Optional[int]¶
ID of file in Team Files. Default is None.
team_file_idmay be None if it’s not possible to obtain the ID at this moment. You can set theteam_file_idlater when uploading the file to Supervisely.- filter_func : Callable, optional¶
Function to filter files. The function should take a filename as input and return True if the file should be included.
- batch_size : int, optional¶
Number of files to process in each batch, defaults to 10000
- replace : bool¶
If True, overwrite the existing file if it exists. If False, skip processing if the file already exists and return its path. Default is False.
- Returns:
Path to the output pickle file
- Return type:
- Usage Example:
import supervisely as sly archive_path = '/path/to/examples.tar' output_dir = '/path/to/output' sly.fs.save_blob_offsets_pkl(archive_path, output_dir)
- string_to_byte_size(string)[source]¶
Returns integer representation of byte size from string representation.
If input is integer, returns the same integer for convenience.
- Parameters:
- Returns:
Integer representation of byte size (or the same integer if input is integer).
- Return type:
- Raises:
ValueError – If input string is invalid.
- Usage Example:
from supervisely.io.fs import string_to_byte_size string_size = "1.5M" size = string_to_byte_size(string_size) print(size) # 1572864
-
subdirs_tree(dir_path, ignore=
None, ignore_content=None)[source]¶ Generator that yields directories in the directory tree, starting from the level below the root directory and then going down the tree. If ignore is specified, it will ignore paths which end with the specified directory names. All subdirectories of ignored directories will still be yielded.
- Parameters:
- dir_path : str¶
Target directory path.
- ignore : List[str]¶
List of directories to ignore. Note, that function still will yield subdirectories of ignored directories. It will only ignore paths which end with the specified directory names.
- ignore_content : List[str]¶
List of directories which subdirectories should be ignored.
- Returns:
Generator that yields directories in the directory tree.
- Return type:
Generator[str, None, None]
- tree(dir_path)[source]¶
Get tree for target directory.
- Parameters:
- Returns:
Tree with directory files and subdirectories
- Return type:
- Usage Example:
from supervisely.io.fs import tree dir_tree = tree('/home/admin/work/projects/examples') print(dir_tree) # Output: /home/admin/work/projects/examples # ├── [4.0K] 1 # │ ├── [165K] crop.jpeg # │ ├── [169K] fliplr.jpeg # │ ├── [169K] flipud.jpeg # │ ├── [166K] relative_crop.jpeg # │ ├── [167K] resize.jpeg # │ ├── [169K] rotate.jpeg # │ ├── [171K] scale.jpeg # │ └── [168K] translate.jpeg # ├── [ 15K] 123.jpeg # ├── [158K] 1.jpeg # ├── [188K] 1.txt # ├── [1.3M] 1.zip # ├── [4.0K] 2 # ├── [ 92K] acura.png # ├── [1.2M] acura_PNG122.png # ├── [198K] aston_martin_PNG55.png # ├── [4.0K] ds1 # │ ├── [4.0K] ann # │ │ ├── [4.3K] IMG_0748.jpeg.json # │ │ ├── [ 151] IMG_0777.jpeg.json # │ │ ├── [ 151] IMG_0888.jpeg.json # │ │ ├── [3.7K] IMG_1836.jpeg.json # │ │ ├── [8.1K] IMG_2084.jpeg.json # │ │ ├── [5.5K] IMG_3861.jpeg.json # │ │ ├── [6.0K] IMG_4451.jpeg.json # │ │ └── [5.0K] IMG_8144.jpeg.json # │ └── [4.0K] img # │ ├── [152K] IMG_0748.jpeg # │ ├── [210K] IMG_0777.jpeg # │ ├── [210K] IMG_0888.jpeg # │ ├── [137K] IMG_1836.jpeg # │ ├── [139K] IMG_2084.jpeg # │ ├── [145K] IMG_3861.jpeg # │ ├── [133K] IMG_4451.jpeg # │ └── [136K] IMG_8144.jpeg # ├── [152K] example.jpeg # ├── [2.4K] example.json # ├── [153K] flip.jpeg # ├── [ 65K] hash1.jpeg # ├── [ 336] meta.json # └── [5.4K] q.jpeg # 5 directories, 37 files
-
unpack_archive(archive_path, target_dir, remove_junk=
True, is_split=False, chunk_size_mb=50)[source]¶ Unpacks archive to the target directory, removes junk files and directories. To extract a split archive, you must pass the path to the first part in archive_path. Archive parts must be in the same directory. Format: archive_name.tar.001, archive_name.tar.002, etc. Works with tar and zip. You can adjust the size of the chunk to read from the file, while unpacking the file from parts. Be careful with this parameter, it can affect the performance of the function.
- Parameters:
- archive_path : str¶
Path to the archive.
- target_dir : str¶
Path to the target directory.
- remove_junk : bool¶
Remove junk files and directories. Default is True.
- is_split : bool¶
Determines if the source archive is split into parts. If True, archive_path must be the path to the first part. Default is False.
- chunk_size_mb : int¶
Size of the chunk to read from the file. Default is 50Mb.
- Returns:
None
- Return type:
None
- Usage Example:
import supervisely as sly archive_path = '/home/admin/work/examples.tar' target_dir = '/home/admin/work/projects' sly.fs.unpack_archive(archive_path, target_dir)
-
async unpack_archive_async(archive_path, target_dir, remove_junk=
True, is_split=False, chunk_size_mb=50)[source]¶ Unpacks archive to the target directory, removes junk files and directories. To extract a split archive, you must pass the path to the first part in archive_path. Archive parts must be in the same directory. Format: archive_name.tar.001, archive_name.tar.002, etc. Works with tar and zip. You can adjust the size of the chunk to read from the file, while unpacking the file from parts. Be careful with this parameter, it can affect the performance of the function.
- Parameters:
- archive_path : str¶
Path to the archive.
- target_dir : str¶
Path to the target directory.
- remove_junk : bool¶
Remove junk files and directories. Default is True.
- is_split : bool¶
Determines if the source archive is split into parts. If True, archive_path must be the path to the first part. Default is False.
- chunk_size_mb : int¶
Size of the chunk to read from the file. Default is 50Mb.
- Returns:
None
- Return type:
None
- Usage Example:
import supervisely as sly from supervisely._utils import run_coroutine archive_path = '/home/admin/work/examples.tar' target_dir = '/home/admin/work/projects' coroutine = sly.fs.unpack_archive_async(archive_path, target_dir) run_coroutine(coroutine)