geopathfinder package

Subpackages

Submodules

geopathfinder.file_naming module

class geopathfinder.file_naming.SmartFilename(fields, fields_def, ext=None, pad='-', delimiter='_', convert=False, compact=False)[source]

Bases: object

SmartFilename class handles file names with pre-defined field names and field length.

classmethod from_filename(filename_str, fields_def, pad='-', delimiter='_', convert=False, compact=False)[source]

Converts a filename given as a string into a SmartFilename class object.

Parameters:
  • filename_str (str) – Filename without any paths (e.g., “M20170725_test.tif”).

  • fields_def (OrderedDict) –

    Name of fields (keys) in right order and length (values). It must contain:
    • ”len”: int

      Length (positive number) of filename part (must be given). 0 to allow any length.

    • ”start”: int, optional

      Start index of filename part (default is 0).

    • ”delim”: str, optional

      Delimiter between this and the following filename part (default is the one from the parent class).

    • ”pad”: str,

      Padding for filename part (default is the one from the parent class).

  • pad (str, optional) – Padding symbol (default: ‘-‘).

  • delimiter (str, optional) – Delimiter (default: ‘_’)

  • convert (bool, optional) – If true, decoding is applied to parts of the filename, where such an operation is available (default is False).

Returns:

Class representing a filename.

Return type:

SmartFilename

class geopathfinder.file_naming.SmartFilenamePart(arg, start=0, length=None, delimiter='_', pad='-', decoder=None, encoder=None, compact=False)[source]

Bases: object

Represents a part of filename.

property decoded

Converts filename part to a decoded (object) representation.

Returns:

Decoded (object) representation of a filename part.

Return type:

object

property encoded

Converts filename part to an encoded (string) representation.

Returns:

Encoded (string) representation of a filename part.

Return type:

str

has_valid_len()[source]

Checks if a SmartFilenamePart instance has a valid len. It is valid if the specified length is equal to the length of the SmartFilenamePart, or if len == 0, i.e accepting any length.

Returns:

True if SmartFilenamePart instance is valid, else False.

Return type:

bool

geopathfinder.folder_naming module

Module handling folder trees.

class geopathfinder.folder_naming.NullSmartPath[source]

Bases: SmartPath

Class for non-exiting paths. Helps to avoid errors.

class geopathfinder.folder_naming.NullSmartTree(root)[source]

Bases: SmartTree

Class for non-exiting paths. Helps to avoid errors.

class geopathfinder.folder_naming.SmartPath(levels, hierarchy, make_dir=False)[source]

Bases: object

Base class for the single path structure to a data set. - allows building a path, - searching files with temporal slicing, - creating a pandas.DataFrame from a folder

base_onto_root(root)[source]

Adds as first level ‘root’ to the SmartPath-instance.

Parameters:

root (str) – String of the root directory

build_file_register(down_to_level=None, up_to_level=None, pattern='.')[source]

Builds a file register collecting files at all levels in the SmartPath.

Parameters:
  • down_to_level (str, optional) – deepest level that should be included in the file register

  • up_to_level (str, optional) – highest level that should be included in the file register

  • pattern (str tuple, optional) – strings defining search pattern for file search e.g. (‘C1003’, ‘E048N012T6’)

build_levels(level='', make_dir=False)[source]
Parameters:
  • level (str, optional) – Name of level in hierarchy

  • make_dir (bool, optional) – creates the directory until level

Returns:

path – Full path of the SmartPath (to the deepest level)

Return type:

str

expand_full_path(level, files)[source]

Joins the path at level with given filenames.

Parameters:
  • level (str) – Name of level in hierarchy.

  • files (list of str) – List of file names.

Returns:

path – Full file path.

Return type:

str

get_dir(make_dir=False)[source]

Get directory.

Parameters:

make_dir (bool, optional) – Create directory if not exists (default: False).

Returns:

folder – Full path of the SmartPath

Return type:

str

get_disk_usage(unit=None, up_to_level=None, down_to_level=None, file_pattern='.')[source]

Computes the disk usage for each SmartPath and creates a Pandas DataFrame.

Parameters:
  • unit (str) – output unit of disk usage in bytes (e.g., “GB”, “TB”, …)

  • down_to_level (str, optional) – deepest level that should be included in the file register

  • up_to_level (str, optional) – highest level that should be included in the file register

  • file_pattern (str tuple, optional) – strings defining file pattern that are included in disk usage sums e.g. (‘C1003’, ‘E048N012T6’)

Returns:

disk usage of all files along the SmartPath

Return type:

Number

get_level(level)[source]

Gets the path to the level.

Parameters:

level (str) – Name of level in hierarchy.

Returns:

path – Path from root to level.

Return type:

str

make_dir()[source]

Creates directory from root to deepest level

print_dir()[source]

Nice function to print nicely the directory of the path.

print_file_register()[source]

Nice function to print nicely all registered files to screen.

remove_level(level)[source]

In the SmartPath-instance, it removes a level from the hierarchy and level dictionary.

Parameters:

level (str) – Name of level in hierarchy.

search_files(level, pattern='.', full_paths=False)[source]

Searches files meeting the regex pattern at level in the SmartPath

Parameters:
  • level (str) – Name of level in hierarchy

  • pattern (str tuple, optional) – strings defining search pattern for file search e.g. (‘C1003’, ‘E048N012T6’)

  • full_paths (bool, optional) – If True, full paths will be included in dataframe (default: False)

Returns:

filenames – File names at the level.

Return type:

list of str

search_files_ts(level, pattern='.', date_position=1, date_format='%Y%m%d_%H%M%S', starttime=None, endtime=None, full_paths=False)[source]

Function searching files at a level in the SmartPath, returning the filenames and the datetimes as pd.DataFrame

Parameters:
  • level (str) – name of level in hierarchy

  • pattern (str tuple, optional) – strings defining search pattern for file search e.g. (‘C1003’, ‘E048N012T6’)

  • date_position (int) – position of first character of date string in name of files

  • date_format (str) – string with the datetime format in the filenames. e.g. ‘%Y%m%d_%H%M%S’ reflects ‘20161224_000000’

  • starttime (str or datetime, optional) – earliest date and time, if str must follow “date_format”

  • endtime (str or datetime, optional) – latest date and time, if str must follow “date_format”

  • full_paths (bool, optional) – should full paths be in the dataframe? default: False

Returns:

df – Dataframe holding the filenames and the datetimes

Return type:

pandas.DataFrame

trim2level(level, remove='deeper_including')[source]

Removes all levels that are higher or equal to given level.

Parameters:
  • level (str) – String of the level that should be removed, together will all higher levels.

  • remove (str) – what should be removed? e.g. “deeper_including” removes the level itself, and deeper levels.

class geopathfinder.folder_naming.SmartTree(root, hierarchy, make_dir=False)[source]

Bases: object

Class for collecting multiple SmartPaths() at one root path.

add_smartpath(smartpath, make_dir=False)[source]

Adds a SmartPath-object to the SmartTree.

Parameters:
  • smartpath (SmartPath) – A SmartPath object. Only valid if hierarchy is compatible with the hierarchy of the SmartTree.

  • make_dir (bool, optional) – creates the full directory of the SmartPath

collect_level_smartpath(level, pattern=None, unique=False)[source]

Returns a list of Smartpaths reaching a given level, and matching a given pattern.

Parameters:
  • level (str) – name of level in hierarchy

  • pattern (str tuple, optional) – strings defining search pattern for path search e.g. (‘C1003’, ‘E048N012T6’)

  • unique (bool, optional) – if set, a list of unique paths is returned

Returns:

list of paths at given level, matching the given pattern

Return type:

list of SmartPaths

collect_level_string(level, pattern=None, unique=False)[source]

Returns a list of paths at given level, and matching a given pattern.

Parameters:
  • level (str) – name of level in hierarchy

  • pattern (str tuple, optional) – strings defining search pattern for path search e.g. (‘C1003’, ‘E048N012T6’)

  • unique (bool, optional) – if set, a list of unique paths is returned

Returns:

list of paths at given level, matching the given pattern

Return type:

list of str

collect_level_topnames(level, pattern=None, unique=True)[source]

Returns list of topnames of folders at given level.

Parameters:
  • level (str) – name of level in hierarchy

  • pattern (str tuple, optional) – strings defining search pattern for path search e.g. (‘C1003’, ‘E048N012T6’)

  • unique (bool, optional) – if set, a list of unique paths is returned

Returns:

topnames – list of folder-topnames at given level, matching the given pattern

Return type:

list of str

copy_smarttree_on_fs(target_dir, level=None, level_pattern='', file_pattern=None, file_list=None)[source]

Copies all files and directories in the SmartTree to a target directory, using shutil.copytree().

Parameters:
  • target_dir (str) – the target directory

  • level (str, optional) – if set, only the branch at given “level” and matching given “pattern” will be copied. e.g. ‘wflow’. Otherwise all below root directory will be copied.

  • level_pattern (str) – string defining search pattern at given level e.g. ‘A0202’

  • file_pattern (str or tuple of str, optional) – string patterns for file matching. when starting with “-” it is interpreted as negative pattern that should be excluded from the matches

  • file_list (list of str, optional) – List with full file paths that should be included in the copy process. Only files that are in both: tree.file_register & file_list are copied! This list might come from a yeoda datacube filtering

count_dirs()[source]

Sets the dir_count the SmartTree

Searches files in register meeting the regex pattern

Parameters:
  • pattern (str tuple, optional) – strings defining search pattern for file search e.g. (‘C1003’, ‘E048N012T6’)

  • full_paths (bool, optional) – If True, full paths will be included in dataframe (default: False)

Returns:

filenames – File names

Return type:

str or list of str

get_all_dirs()[source]

Returns all full paths in the SmartTree

Returns:

Sorted list of all full paths

Return type:

list

get_all_smartpaths()[source]

Returns SmartPaths in the SmartTree

Returns:

List of all included SmartPaths

Return type:

list of SmartPaths

get_disk_usage(unit='KB', group_by=[], file_pattern='.', total=False)[source]

Computes the disk usage for each SmartPath and creates a Pandas DataFrame.

Parameters:
  • unit (str, optional) – output unit of disk usage in bytes (e.g., “GB”, “TB”, …)

  • group_by (list, optional) – list of levels forming groups, delivering disk usage sums e.g. [‘tile’, ‘var’]

  • file_pattern (str tuple, optional) – strings defining file pattern that are included in disk usage sums e.g. (‘M2019’, ‘SSM——‘)

  • total (bool, optional) – returns the total disk usage for the root

Returns:

Pandas DataFrame containing the disk usage per SmartPath and the directory hierarchy as columns (without the root directory path)

Return type:

DataFrame

get_smartpath(pattern)[source]

Returns one SmartPath-object from the SmartTree that matches with the pattern. If more than one match, None is returned.

Parameters:

pattern (str tuple) – strings defining search pattern for path search e.g. (‘C1003’, ‘E048N012T6’)

Returns:

The path object matching the pattern.

Return type:

SmartPath

get_subtree_matching(level, level_pattern, register_file_pattern=None)[source]

Returns a subtree of the SmartTree with branches comprising ALL matches with the pattern at the given level.

Parameters:
  • level (str) – Name of level in hierarchy. e.g. ‘wflow’

  • level_pattern (str) – string defining search pattern at given level e.g. ‘C1003’

  • register_file_pattern (str tuple, optional) – strings defining search pattern for file search for file_register e.g. (‘C1003’, ‘E048N012T6’) No asterisk is needed (‘*’)! Sequence of strings in given tuple is crucial! Be careful: If the tree is large, this can take a while!

Returns:

branch – SmartTree object that describes the seeked branch, part of current SmartTree

Return type:

SmartTree

get_subtree_unique_rebased(level, level_pattern, register_file_pattern=None)[source]

Returns a single branch (a subtree) of the SmartTree with ONE UNIQUE match with the pattern at the given level. Performs rebasing to deeper root.

Parameters:
  • level (str) – Name of level in hierarchy. e.g. ‘wflow’

  • level_pattern (str) – string defining search pattern at given level e.g. ‘C1003’

  • register_file_pattern (str tuple, optional) – strings defining search pattern for file search for file_register e.g. (‘C1003’, ‘E048N012T6’) No asterisk is needed (‘*’)! Sequence of strings in given tuple is crucial! Be careful: If the tree is large, this can take a while!

Returns:

branch – SmartTree object that describes the seeked branch, part of current SmartTree

Return type:

SmartTree

make_dirs()[source]

Creates a full path for each of the contained SmartPaths.

print_all_dirs()[source]

Nice function to print nicely all directories to screen. I’m proud of this function!

print_collect_level(level, pattern=None, unique=False)[source]

Nice function to print nicely output from collect_level_string()

print_collect_level_topnames(level, pattern=None, unique=True)[source]

Nice function to print nicely output from collect_level_topnames()

print_file_register()[source]

Nice function to print nicely all registered files to screen.

print_root()[source]

Function to print nicely the root to screen.

remove_smartpath(key)[source]

Removes the SmartPath with ‘key’ from the SmartTree

Parameters:

key (str) – Path representing the key for the SmartPath

geopathfinder.folder_naming.build_smarttree(root, hierarchy, target_level=None, register_file_pattern=None)[source]

Function walking through directories in root path for building a structure of SmartPaths. Can also search for files. Attention: The SmartTree is only working properly if all folders in “root” follow the “hierarchy”!

Parameters:
  • root (str) – root path of the SmartTree. Gets added as level ‘root’ in hierarchy.

  • hierarchy (list of str) – List defining the order of the levels

  • target_level (str, optional) – Can speed up things: Level name of target tree-depth. The SmartTree is only built from directories reaching this level, and only built down to this level. If not set, all directories are built down to deepest depth.

  • register_file_pattern (str tuple, optional) – strings defining search pattern for file search for file_register e.g. (‘C1003’, ‘E048N012T6’) No asterisk is needed (‘*’)! Sequence of strings in given tuple is crucial! Be careful: If the tree is large, this can take a while!

Returns:

Tree object for the dataset.

Return type:

SmartTree

geopathfinder.folder_naming.copy_tree(source, dest, file_pattern=None, overwrite=False, file_list=None)[source]

Copies a directory tree structure.

Parameters:
  • source (str) – directory that should be copied, recursively

  • dest (str) – where the tree should be copied to

  • file_pattern (str or tuple of str, optional) – string patterns for file matching. when starting with “-” it is interpreted as negative pattern that should be excluded from the matches

  • overwrite (bool, optional) – should existing files be overwritten? default: False

  • file_list (list of str, optional) – List with full file paths that should be included in the copy process. This list might come from a yeoda datacube filtering

geopathfinder.folder_naming.create_smartpath(root, hierarchy, levels, make_dir=False)[source]

Function for creating a SmartPath().

Parameters:
  • root (str) – root path of the SmartPath. Gets added as level ‘root’ in hierarchy.

  • hierarchy (list of str) – list defining the order of the levels

  • levels (list) – list of the names of levels in the hierarchy

  • make_dir (bool, optional) – if set to True, then the full path of the SmartPath is created in the filesystem (default: False).

Return type:

SmartPath

geopathfinder.folder_naming.expand_full_path(path, files)[source]

Joins the path at level with given filenames.

Parameters:
  • level (str) – Name of level in hierarchy.

  • files (list of str) – List of file names.

Returns:

path – Full file path.

Return type:

str

geopathfinder.folder_naming.extract_times(files, date_position=1, date_format='%Y%m%d_%H%M%S')[source]

Extracts the datetimes from filenames.

Parameters:
  • files (list of str) – list of strings with filenames or filepaths

  • date_position (int) – position of first character of date string in name of files

  • date_format (str) – string with the datetime format in the filenames. ‘%Y%m%d_%H%M%S’ reflects eg. ‘20161224_000000’

Returns:

times – List of datetime objects extracted from the filenames.

Return type:

list of datetime

geopathfinder.folder_naming.patterns_2_regex(patterns)[source]

Converts any string, or tuple of strings, to a regex pattern.

Parameters:

patterns (str or tuple of str) – string patterns for matching. when starting with “-” it is interpreted as negative pattern that should be excluded from the matches

Returns:

str

Return type:

regex string

geopathfinder.folder_naming.reduce_2_basename(files)[source]

Converts full file paths to file base names.

Parameters:

files (list of str) – list of filepaths

Returns:

filenames – List of base file names.

Return type:

list of str

Carries out the file search using the strings in pattern as regex strings.

Parameters:
  • path (search in this directory. Subdirectories are ignored.)

  • pattern (str or tuple of str) – string patterns for matching. when starting with “-” it is interpreted as negative pattern that should be excluded from the matches

  • full_paths (bool, optional) – should full paths be returned? default: True

Returns:

a tuple (files, count) that contains the file list and the count of files

Return type:

tuple

geopathfinder.folder_naming.transform_bytes(bytes, unit='KB')[source]

Module contents