geopathfinder package¶
Subpackages¶
- geopathfinder.naming_conventions namespace
- Submodules
- geopathfinder.naming_conventions.acube_naming module
- geopathfinder.naming_conventions.bmon_naming module
- geopathfinder.naming_conventions.eodr_naming module
- geopathfinder.naming_conventions.sgrt_naming module
SgrtFilenameSgrtFilename.decode_date()SgrtFilename.decode_rel_orbit()SgrtFilename.decode_time()SgrtFilename.delimiterSgrtFilename.encode_date()SgrtFilename.encode_rel_orbit()SgrtFilename.encode_time()SgrtFilename.etimeSgrtFilename.fields_defSgrtFilename.from_filename()SgrtFilename.ftileSgrtFilename.padSgrtFilename.product_idSgrtFilename.stimeSgrtFilename.time
sgrt_path()sgrt_tree()
- geopathfinder.naming_conventions.yeoda_naming module
YeodaFilenameYeodaFilename.decode_datetime()YeodaFilename.decode_extra_field()YeodaFilename.delimiterYeodaFilename.encode_datetime()YeodaFilename.encode_extra_field()YeodaFilename.etimeYeodaFilename.fields_defYeodaFilename.from_filename()YeodaFilename.ftileYeodaFilename.padYeodaFilename.stimeYeodaFilename.time
yeoda_path()yeoda_tree()
Submodules¶
geopathfinder.file_naming module¶
- class geopathfinder.file_naming.SmartFilename(fields, fields_def, ext=None, pad='-', delimiter='_', convert=False, compact=False)[source]¶
Bases:
objectSmartFilename class handles file names with pre-defined field names and field length.
- classmethod from_filename(filename_str, fields_def, pad='-', delimiter='_', convert=False, compact=False)[source]¶
Converts a filename given as a string into a SmartFilename class object.
- Parameters:
filename_str (str) – Filename without any paths (e.g., “M20170725_test.tif”).
fields_def (OrderedDict) –
- Name of fields (keys) in right order and length (values). It must contain:
- ”len”: int
Length (positive number) of filename part (must be given). 0 to allow any length.
- ”start”: int, optional
Start index of filename part (default is 0).
- ”delim”: str, optional
Delimiter between this and the following filename part (default is the one from the parent class).
- ”pad”: str,
Padding for filename part (default is the one from the parent class).
pad (str, optional) – Padding symbol (default: ‘-‘).
delimiter (str, optional) – Delimiter (default: ‘_’)
convert (bool, optional) – If true, decoding is applied to parts of the filename, where such an operation is available (default is False).
- Returns:
Class representing a filename.
- Return type:
- class geopathfinder.file_naming.SmartFilenamePart(arg, start=0, length=None, delimiter='_', pad='-', decoder=None, encoder=None, compact=False)[source]¶
Bases:
objectRepresents a part of filename.
- property decoded¶
Converts filename part to a decoded (object) representation.
- Returns:
Decoded (object) representation of a filename part.
- Return type:
- property encoded¶
Converts filename part to an encoded (string) representation.
- Returns:
Encoded (string) representation of a filename part.
- Return type:
geopathfinder.folder_naming module¶
Module handling folder trees.
- class geopathfinder.folder_naming.NullSmartPath[source]¶
Bases:
SmartPathClass for non-exiting paths. Helps to avoid errors.
- class geopathfinder.folder_naming.NullSmartTree(root)[source]¶
Bases:
SmartTreeClass for non-exiting paths. Helps to avoid errors.
- class geopathfinder.folder_naming.SmartPath(levels, hierarchy, make_dir=False)[source]¶
Bases:
objectBase class for the single path structure to a data set. - allows building a path, - searching files with temporal slicing, - creating a pandas.DataFrame from a folder
- base_onto_root(root)[source]¶
Adds as first level ‘root’ to the SmartPath-instance.
- Parameters:
root (str) – String of the root directory
- build_file_register(down_to_level=None, up_to_level=None, pattern='.')[source]¶
Builds a file register collecting files at all levels in the SmartPath.
- Parameters:
- get_disk_usage(unit=None, up_to_level=None, down_to_level=None, file_pattern='.')[source]¶
Computes the disk usage for each SmartPath and creates a Pandas DataFrame.
- Parameters:
unit (str) – output unit of disk usage in bytes (e.g., “GB”, “TB”, …)
down_to_level (str, optional) – deepest level that should be included in the file register
up_to_level (str, optional) – highest level that should be included in the file register
file_pattern (str tuple, optional) – strings defining file pattern that are included in disk usage sums e.g. (‘C1003’, ‘E048N012T6’)
- Returns:
disk usage of all files along the SmartPath
- Return type:
Number
- remove_level(level)[source]¶
In the SmartPath-instance, it removes a level from the hierarchy and level dictionary.
- Parameters:
level (str) – Name of level in hierarchy.
- search_files(level, pattern='.', full_paths=False)[source]¶
Searches files meeting the regex pattern at level in the SmartPath
- Parameters:
- Returns:
filenames – File names at the level.
- Return type:
- search_files_ts(level, pattern='.', date_position=1, date_format='%Y%m%d_%H%M%S', starttime=None, endtime=None, full_paths=False)[source]¶
Function searching files at a level in the SmartPath, returning the filenames and the datetimes as pd.DataFrame
- Parameters:
level (str) – name of level in hierarchy
pattern (str tuple, optional) – strings defining search pattern for file search e.g. (‘C1003’, ‘E048N012T6’)
date_position (int) – position of first character of date string in name of files
date_format (str) – string with the datetime format in the filenames. e.g. ‘%Y%m%d_%H%M%S’ reflects ‘20161224_000000’
starttime (str or datetime, optional) – earliest date and time, if str must follow “date_format”
endtime (str or datetime, optional) – latest date and time, if str must follow “date_format”
full_paths (bool, optional) – should full paths be in the dataframe? default: False
- Returns:
df – Dataframe holding the filenames and the datetimes
- Return type:
- class geopathfinder.folder_naming.SmartTree(root, hierarchy, make_dir=False)[source]¶
Bases:
objectClass for collecting multiple SmartPaths() at one root path.
- collect_level_smartpath(level, pattern=None, unique=False)[source]¶
Returns a list of Smartpaths reaching a given level, and matching a given pattern.
- Parameters:
- Returns:
list of paths at given level, matching the given pattern
- Return type:
list of SmartPaths
- collect_level_string(level, pattern=None, unique=False)[source]¶
Returns a list of paths at given level, and matching a given pattern.
- Parameters:
- Returns:
list of paths at given level, matching the given pattern
- Return type:
- collect_level_topnames(level, pattern=None, unique=True)[source]¶
Returns list of topnames of folders at given level.
- Parameters:
- Returns:
topnames – list of folder-topnames at given level, matching the given pattern
- Return type:
- copy_smarttree_on_fs(target_dir, level=None, level_pattern='', file_pattern=None, file_list=None)[source]¶
Copies all files and directories in the SmartTree to a target directory, using shutil.copytree().
- Parameters:
target_dir (str) – the target directory
level (str, optional) – if set, only the branch at given “level” and matching given “pattern” will be copied. e.g. ‘wflow’. Otherwise all below root directory will be copied.
level_pattern (str) – string defining search pattern at given level e.g. ‘A0202’
file_pattern (str or tuple of str, optional) – string patterns for file matching. when starting with “-” it is interpreted as negative pattern that should be excluded from the matches
file_list (list of str, optional) – List with full file paths that should be included in the copy process. Only files that are in both: tree.file_register & file_list are copied! This list might come from a yeoda datacube filtering
- file_register_search(pattern, full_paths=True)[source]¶
Searches files in register meeting the regex pattern
- get_all_dirs()[source]¶
Returns all full paths in the SmartTree
- Returns:
Sorted list of all full paths
- Return type:
- get_all_smartpaths()[source]¶
Returns SmartPaths in the SmartTree
- Returns:
List of all included SmartPaths
- Return type:
list of SmartPaths
- get_disk_usage(unit='KB', group_by=[], file_pattern='.', total=False)[source]¶
Computes the disk usage for each SmartPath and creates a Pandas DataFrame.
- Parameters:
unit (str, optional) – output unit of disk usage in bytes (e.g., “GB”, “TB”, …)
group_by (list, optional) – list of levels forming groups, delivering disk usage sums e.g. [‘tile’, ‘var’]
file_pattern (str tuple, optional) – strings defining file pattern that are included in disk usage sums e.g. (‘M2019’, ‘SSM——‘)
total (bool, optional) – returns the total disk usage for the root
- Returns:
Pandas DataFrame containing the disk usage per SmartPath and the directory hierarchy as columns (without the root directory path)
- Return type:
DataFrame
- get_smartpath(pattern)[source]¶
Returns one SmartPath-object from the SmartTree that matches with the pattern. If more than one match, None is returned.
- Parameters:
pattern (str tuple) – strings defining search pattern for path search e.g. (‘C1003’, ‘E048N012T6’)
- Returns:
The path object matching the pattern.
- Return type:
- get_subtree_matching(level, level_pattern, register_file_pattern=None)[source]¶
Returns a subtree of the SmartTree with branches comprising ALL matches with the pattern at the given level.
- Parameters:
level (str) – Name of level in hierarchy. e.g. ‘wflow’
level_pattern (str) – string defining search pattern at given level e.g. ‘C1003’
register_file_pattern (str tuple, optional) – strings defining search pattern for file search for file_register e.g. (‘C1003’, ‘E048N012T6’) No asterisk is needed (‘*’)! Sequence of strings in given tuple is crucial! Be careful: If the tree is large, this can take a while!
- Returns:
branch – SmartTree object that describes the seeked branch, part of current SmartTree
- Return type:
- get_subtree_unique_rebased(level, level_pattern, register_file_pattern=None)[source]¶
Returns a single branch (a subtree) of the SmartTree with ONE UNIQUE match with the pattern at the given level. Performs rebasing to deeper root.
- Parameters:
level (str) – Name of level in hierarchy. e.g. ‘wflow’
level_pattern (str) – string defining search pattern at given level e.g. ‘C1003’
register_file_pattern (str tuple, optional) – strings defining search pattern for file search for file_register e.g. (‘C1003’, ‘E048N012T6’) No asterisk is needed (‘*’)! Sequence of strings in given tuple is crucial! Be careful: If the tree is large, this can take a while!
- Returns:
branch – SmartTree object that describes the seeked branch, part of current SmartTree
- Return type:
- print_all_dirs()[source]¶
Nice function to print nicely all directories to screen. I’m proud of this function!
- print_collect_level(level, pattern=None, unique=False)[source]¶
Nice function to print nicely output from collect_level_string()
- geopathfinder.folder_naming.build_smarttree(root, hierarchy, target_level=None, register_file_pattern=None)[source]¶
Function walking through directories in root path for building a structure of SmartPaths. Can also search for files. Attention: The SmartTree is only working properly if all folders in “root” follow the “hierarchy”!
- Parameters:
root (str) – root path of the SmartTree. Gets added as level ‘root’ in hierarchy.
hierarchy (list of str) – List defining the order of the levels
target_level (str, optional) – Can speed up things: Level name of target tree-depth. The SmartTree is only built from directories reaching this level, and only built down to this level. If not set, all directories are built down to deepest depth.
register_file_pattern (str tuple, optional) – strings defining search pattern for file search for file_register e.g. (‘C1003’, ‘E048N012T6’) No asterisk is needed (‘*’)! Sequence of strings in given tuple is crucial! Be careful: If the tree is large, this can take a while!
- Returns:
Tree object for the dataset.
- Return type:
- geopathfinder.folder_naming.copy_tree(source, dest, file_pattern=None, overwrite=False, file_list=None)[source]¶
Copies a directory tree structure.
- Parameters:
source (str) – directory that should be copied, recursively
dest (str) – where the tree should be copied to
file_pattern (str or tuple of str, optional) – string patterns for file matching. when starting with “-” it is interpreted as negative pattern that should be excluded from the matches
overwrite (bool, optional) – should existing files be overwritten? default: False
file_list (list of str, optional) – List with full file paths that should be included in the copy process. This list might come from a yeoda datacube filtering
- geopathfinder.folder_naming.create_smartpath(root, hierarchy, levels, make_dir=False)[source]¶
Function for creating a SmartPath().
- Parameters:
root (str) – root path of the SmartPath. Gets added as level ‘root’ in hierarchy.
hierarchy (list of str) – list defining the order of the levels
levels (list) – list of the names of levels in the hierarchy
make_dir (bool, optional) – if set to True, then the full path of the SmartPath is created in the filesystem (default: False).
- Return type:
- geopathfinder.folder_naming.expand_full_path(path, files)[source]¶
Joins the path at level with given filenames.
- geopathfinder.folder_naming.extract_times(files, date_position=1, date_format='%Y%m%d_%H%M%S')[source]¶
Extracts the datetimes from filenames.
- Parameters:
- Returns:
times – List of datetime objects extracted from the filenames.
- Return type:
list of datetime
- geopathfinder.folder_naming.patterns_2_regex(patterns)[source]¶
Converts any string, or tuple of strings, to a regex pattern.
- geopathfinder.folder_naming.reduce_2_basename(files)[source]¶
Converts full file paths to file base names.