geoips.utils package#
Submodules#
geoips.utils.cache_files module#
Module for handling cached files in GeoIPS.
This modules provides functions to manage cache files in GeoIPS. Cache files will be stored in the user cache directory, which is platform-dependent. The correct cache directory is determined using the GEOIPS_CACHE_DIR environment variable which defaults to platformdirs.user_cache_dir(“geoips”) if not set.
- geoips.utils.cache_files.create_cached_json_from_yaml(source, cache_dir=None)[source]#
Create a cached JSON file from a YAML file.
This function reads a YAML file and writes its contents to a JSON file in the user cache directory. The JSON file will be created if it does not already exist, or updated if it does.
- Parameters:
source (str) – The path to the source YAML file.
cache_dir (str, optional) – The path to the cache directory. If not provided, the default user cache directory will be used.
- Returns:
The path to the cached JSON file.
- Return type:
str
- geoips.utils.cache_files.get_cached_json(source, cache_dir=None)[source]#
Get the cached JSON file corresponding to a YAML file.
Some files in GeoIPS are stored in YAML format, but we want to use JSON at runtime because loading JSON is faster than loading YAML. This function checks if the cached JSON file exists and is up to date. If it does not exist or is out of date, it creates a new cached JSON file from the YAML file. It then returns the contents of the cached JSON file.
- Parameters:
source (str) – The path to the source YAML file.
cache_dir (str, optional) – The path to the cache directory. If not provided, the default user cache directory will be used.
- geoips.utils.cache_files.source_modified(source, dest)[source]#
Check if the source file was modified more recently than the destination file.
This uses os.path.getmtime() to determine whether the source file has been modified more recently than the destination file. Returns True if the source file was modified more recently than the destination file, False otherwise.
- Parameters:
source (str) – The path to the source file we are monitoring for changes.
dest (str) – The path to the destination file we will update if the source file changes.
- Returns:
True if the source file was modified more recently than the destination file, False otherwise.
- Return type:
bool
geoips.utils.composite module#
Utilities for swath compositing in GeoIPS.
- geoips.utils.composite.find_preproc_alg_files(product_time, composite_window, sector_name, product, sensor, platform, file_format='netcdf', product_db=False, db_query_plugin=None, db_schemas=None, db_tables=None)[source]#
Find pre-processed algorithm files that were saved to disk.
- Parameters:
product_time (datetime.datetime) – Product time
composite_window (str) – How far back to search for pre-processed files. Window needs to be specified in iso8601 duration format (e.g. PT4H)
sector_name (str) – Name of sector to composite
product (str) – Name of product to composite
sensor (str) – Name of sensor to composite
platform (str) – Name of platform to composite
file_format (str, optional) – Pre-processed file format, by default “netcdf”
product_db (bool, optional) – Use product database to find any pre-processed file, by default False
db_query_plugin (str, optional) – Name of product database query plugin, by default None
db_schemas (list, optional) – Names of postgres schema to query, by default None
db_tables (list, optional) – Names of table to query under schema, by default None
- Returns:
List of pre-processed algorithm files
- Return type:
list
- geoips.utils.composite.find_preproc_alg_netcdfs(product_time_start, product_time_end, sector_name, product, sensor, platform, product_db=False, postgres_query_plugin=None, postgres_schemas=None, postgres_tables=None)[source]#
Find pre-processed algorithm netCDF files that were saved to disk.
- Parameters:
product_time_start (datetime.datetime) – Earliest product time to search for valid files
product_time_start – Latest product time to search for valid files
sector_name (str) – Name of sector to composite
product (str) – Name of product to composite
sensor (str) – Name of sensor to composite
platform (str) – Name of platform to composite
file_format (str, optional) – Pre-processed file format, by default “netcdf”
product_db (bool, optional) – Use product database to find any pre-processed file, by default False
db_query_plugin (str, optional) – Name of product database query plugin, by default None
db_schemas (list, optional) – Names of postgres schema to query, by default None
db_tables (list, optional) – Names of table to query under schema, by default None
- Returns:
List of pre-processed netCDF algorithm files
- Return type:
list
geoips.utils.context_managers module#
Module for handling optional dependencies throughout GeoIPS.
geoips.utils.decorators module#
GeoIPS decorators module.
geoips.utils.memusg module#
Utilities for tracking and monitoring memory and resource usage.
- class geoips.utils.memusg.PidLog(inpid, logstr='')[source]#
Bases:
object
Track a PID and all children.
Requires psutil and threading
- checkpoint_usage_stats()[source]#
Return organized dictionary of stats from track_resource_usage.
- Returns:
Resource usage statistics for markers/checkpoints recorded by the track_resource_usage method. Dictionary is ordered by the markers passed to track_resource_usage. Each key in the return dictionary will hold a dictionary of the resource usage statistics. If a group is specified in the marker name (e.g. “FOO: BAR”), the return dictionary will have a “BAR” key, and will have a “checkpoint_group” key in the corresponding resource usage dictionary with a value of “FOO”.
- Return type:
dict
- track_resource_usage(logstr='', verbose=False, key=None, show_log=True, checkpoint=False, increment_key=True)[source]#
Record resouce usage for a given processing marker in GeoIPS.
- Parameters:
logstr (str, optional) – String to include at the start of the log message, by default “”
verbose (bool, optional) – Print the full resource usage statistics dict to stdout, by default False
key (str, optional) – Unique name of marker to record statistics, by default None A marker can be categorized as part of a group when using “:” in the marker name (e.g “FOO: BAR”). This is used in the checkpoint_usage_stats method, where markers will have a “checkpoint_group” key in their resource usage statistics dictionary holding the name of the group. (e.g. stats[“BAR”][“checkpoint_group”] == “FOO”)
show_log (bool, optional) – Use LOG.info to print the current memory usage, by default True
checkpoint (bool, optional) – Re-use a marker key to record more detailed profiling of resource usage in terms of both timing and maximum memory usage, by default False
increment_key (bool, optional) – Append the number of times marker is re-used to the key, by default True Marker names should be unique, and should at minimum hold the start/end resource usage for the marker. If a marker is re-used and increment_key is False, a KeyError is raised. If increment_key is True and the marker has both start/end stats, a number is automatically appended to the key name. This number represents how many times a marker has been re-used. For example: “FOO: BAR” -> “FOO: BAR(1)”: -> “FOO: BAR(2)” (Note: “FOO: BAR(2)” is only created if “FOO: BAR(1)” has start/end stats)
- Returns:
Resource usage statistics for checkpoint
- Return type:
dict
- Raises:
KeyError – Duplicate key is discovered and increment_key is False
- geoips.utils.memusg.print_mem_usage(logstr='', verbose=False)[source]#
Print memory usage to LOG.info.
By default include psutil output.
If verbose is True, include output from both psutil and resource packages.
Module contents#
Geoips utilities init file.