geoips.utils package#

Submodules#

geoips.utils.cache_files module#

Module for handling cached files in GeoIPS.

This modules provides functions to manage cache files in GeoIPS. Cache files will be stored in the user cache directory, which is platform-dependent. The correct cache directory is determined using the GEOIPS_CACHE_DIR environment variable which defaults to platformdirs.user_cache_dir(“geoips”) if not set.

geoips.utils.cache_files.create_cached_json_from_yaml(source, cache_dir=None)[source]#

Create a cached JSON file from a YAML file.

This function reads a YAML file and writes its contents to a JSON file in the user cache directory. The JSON file will be created if it does not already exist, or updated if it does.

Parameters:
  • source (str) – The path to the source YAML file.

  • cache_dir (str, optional) – The path to the cache directory. If not provided, the default user cache directory will be used.

Returns:

The path to the cached JSON file.

Return type:

str

geoips.utils.cache_files.get_cached_json(source, cache_dir=None)[source]#

Get the cached JSON file corresponding to a YAML file.

Some files in GeoIPS are stored in YAML format, but we want to use JSON at runtime because loading JSON is faster than loading YAML. This function checks if the cached JSON file exists and is up to date. If it does not exist or is out of date, it creates a new cached JSON file from the YAML file. It then returns the contents of the cached JSON file.

Parameters:
  • source (str) – The path to the source YAML file.

  • cache_dir (str, optional) – The path to the cache directory. If not provided, the default user cache directory will be used.

geoips.utils.cache_files.source_modified(source, dest)[source]#

Check if the source file was modified more recently than the destination file.

This uses os.path.getmtime() to determine whether the source file has been modified more recently than the destination file. Returns True if the source file was modified more recently than the destination file, False otherwise.

Parameters:
  • source (str) – The path to the source file we are monitoring for changes.

  • dest (str) – The path to the destination file we will update if the source file changes.

Returns:

True if the source file was modified more recently than the destination file, False otherwise.

Return type:

bool

geoips.utils.composite module#

Utilities for swath compositing in GeoIPS.

geoips.utils.composite.find_preproc_alg_files(product_time, composite_window, sector_name, product, sensor, platform, file_format='netcdf', product_db=False, db_query_plugin=None, db_schemas=None, db_tables=None)[source]#

Find pre-processed algorithm files that were saved to disk.

Parameters:
  • product_time (datetime.datetime) – Product time

  • composite_window (str) – How far back to search for pre-processed files. Window needs to be specified in iso8601 duration format (e.g. PT4H)

  • sector_name (str) – Name of sector to composite

  • product (str) – Name of product to composite

  • sensor (str) – Name of sensor to composite

  • platform (str) – Name of platform to composite

  • file_format (str, optional) – Pre-processed file format, by default “netcdf”

  • product_db (bool, optional) – Use product database to find any pre-processed file, by default False

  • db_query_plugin (str, optional) – Name of product database query plugin, by default None

  • db_schemas (list, optional) – Names of postgres schema to query, by default None

  • db_tables (list, optional) – Names of table to query under schema, by default None

Returns:

List of pre-processed algorithm files

Return type:

list

geoips.utils.composite.find_preproc_alg_netcdfs(product_time_start, product_time_end, sector_name, product, sensor, platform, product_db=False, postgres_query_plugin=None, postgres_schemas=None, postgres_tables=None)[source]#

Find pre-processed algorithm netCDF files that were saved to disk.

Parameters:
  • product_time_start (datetime.datetime) – Earliest product time to search for valid files

  • product_time_start – Latest product time to search for valid files

  • sector_name (str) – Name of sector to composite

  • product (str) – Name of product to composite

  • sensor (str) – Name of sensor to composite

  • platform (str) – Name of platform to composite

  • file_format (str, optional) – Pre-processed file format, by default “netcdf”

  • product_db (bool, optional) – Use product database to find any pre-processed file, by default False

  • db_query_plugin (str, optional) – Name of product database query plugin, by default None

  • db_schemas (list, optional) – Names of postgres schema to query, by default None

  • db_tables (list, optional) – Names of table to query under schema, by default None

Returns:

List of pre-processed netCDF algorithm files

Return type:

list

geoips.utils.context_managers module#

Module for handling optional dependencies throughout GeoIPS.

geoips.utils.context_managers.import_optional_dependencies(loglevel='info')[source]#

Attempt to import a package and log the event if the import fails.

Parameters:

loglevel (str) – Name of the log level to write to. May be any valid log level (e.g. debug, info, etc.).

geoips.utils.decorators module#

GeoIPS decorators module.

class geoips.utils.decorators.deprecated(replacement=None)[source]#

Bases: object

A decorator that deprecates a function.

When applied to a function, will cause that function to raise a DeprecationWarning when called.

geoips.utils.decorators.developmental(func)[source]#

Mark an interfaces as developmental.

When applied to a function, will prepend a “developmental” message to the beginning of that function’s docstring.

geoips.utils.memusg module#

Utilities for tracking and monitoring memory and resource usage.

class geoips.utils.memusg.PidLog(inpid, logstr='')[source]#

Bases: object

Track a PID and all children.

  • Requires psutil and threading

checkpoint_usage_stats()[source]#

Return organized dictionary of stats from track_resource_usage.

Returns:

Resource usage statistics for markers/checkpoints recorded by the track_resource_usage method. Dictionary is ordered by the markers passed to track_resource_usage. Each key in the return dictionary will hold a dictionary of the resource usage statistics. If a group is specified in the marker name (e.g. “FOO: BAR”), the return dictionary will have a “BAR” key, and will have a “checkpoint_group” key in the corresponding resource usage dictionary with a value of “FOO”.

Return type:

dict

print_mem_usg(logstr='', verbose=False)[source]#

Print verbose resouce usage.

print_resource_usage()[source]#

Print verbose resource usage, using “resource” package.

save_csv()[source]#

Save a csv file to output.

save_exit()[source]#

Exit the thread cleanly.

track_pids()[source]#

Track pids and create a dict of values.

track_resource_usage(logstr='', verbose=False, key=None, show_log=True, checkpoint=False, increment_key=True)[source]#

Record resouce usage for a given processing marker in GeoIPS.

Parameters:
  • logstr (str, optional) – String to include at the start of the log message, by default “”

  • verbose (bool, optional) – Print the full resource usage statistics dict to stdout, by default False

  • key (str, optional) – Unique name of marker to record statistics, by default None A marker can be categorized as part of a group when using “:” in the marker name (e.g “FOO: BAR”). This is used in the checkpoint_usage_stats method, where markers will have a “checkpoint_group” key in their resource usage statistics dictionary holding the name of the group. (e.g. stats[“BAR”][“checkpoint_group”] == “FOO”)

  • show_log (bool, optional) – Use LOG.info to print the current memory usage, by default True

  • checkpoint (bool, optional) – Re-use a marker key to record more detailed profiling of resource usage in terms of both timing and maximum memory usage, by default False

  • increment_key (bool, optional) – Append the number of times marker is re-used to the key, by default True Marker names should be unique, and should at minimum hold the start/end resource usage for the marker. If a marker is re-used and increment_key is False, a KeyError is raised. If increment_key is True and the marker has both start/end stats, a number is automatically appended to the key name. This number represents how many times a marker has been re-used. For example: “FOO: BAR” -> “FOO: BAR(1)”: -> “FOO: BAR(2)” (Note: “FOO: BAR(2)” is only created if “FOO: BAR(1)” has start/end stats)

Returns:

Resource usage statistics for checkpoint

Return type:

dict

Raises:

KeyError – Duplicate key is discovered and increment_key is False

geoips.utils.memusg.print_mem_usage(logstr='', verbose=False)[source]#

Print memory usage to LOG.info.

  • By default include psutil output.

  • If verbose is True, include output from both psutil and resource packages.

geoips.utils.memusg.print_resource_usage(logstr='')[source]#

Print verbose resource usage, using “resource” package.

geoips.utils.memusg.single_track_pid(procpid)[source]#

Output a snapshot of a pid usage on server.

  • Requires an input pid.

Module contents#

Geoips utilities init file.