Documentation#
PYCSE#
Module containing useful scientific and engineering functions.
Linear regression
Nonlinear regression.
Differential equation solvers.
See http://kitchingroup.cheme.cmu.edu/pycse
Copyright 2020, John Kitchin (see accompanying license files for details).
- pycse.PYCSE.Rsquared(y, Y)#
Return R^2, or coefficient of determination.
y is a 1d array of observations. Y is a 1d array of predictions from a model.
- Returns:
- The R^2 value for the fit.
- pycse.PYCSE.bic(x, y, model, popt)#
Compute the Bayesian information criterion (BIC).
- Parameters:
- modelfunction(x, …) returns prediction for y
- poptoptimal parameters
- yarray, known y-values
- Returns:
- pycse.PYCSE.ivp(f, tspan, y0, *args, **kwargs)#
Solve an ODE initial value problem.
- Parameters:
- ffunction
- callable y’(x, y) = f(x, y)
- tspanarray
- The x points you want the solution at. The first and last points are used in
- tspan in solve_ivp.
- y0array
- Initial conditions
- *argstype
- arbitrary positional arguments to pass to solve_ivp
- **kwargstype arbitrary kwargs to pass to solve_ivp.
- max_step is set to be the min diff of tspan. dense_output is set to True.
- t_eval is set to the array specified in tspan.
- Returns:
- solution from solve_ivp
- pycse.PYCSE.lbic(X, y, popt)#
Compute the Bayesian information criterion for a linear model.
- Returns:
- BICfloat
- pycse.PYCSE.nlinfit(model, x, y, p0, alpha=0.05, **kwargs)#
Nonlinear regression with confidence intervals.
- Parameters:
- modelfunction f(x, p0, p1, …) = y
- xarray of the independent data
- yarray of the dependent data
- p0array of the initial guess of the parameters
- alpha100*(1 - alpha) is the confidence interval
i.e. alpha = 0.05 is 95% confidence
- kwargs are passed to curve_fit.
- Returns:
- [p, pint, SE]
p is an array of the fitted parameters pint is an array of confidence intervals SE is an array of standard errors for the parameters.
- pycse.PYCSE.nlpredict(X, y, model, loss, popt, xnew, alpha=0.05, ub=1e-05, ef=1.05)#
Prediction error for a nonlinear fit.
- Parameters:
- modelmodel function with signature model(x, …)
- lossloss function the model was fitted with loss(…)
- poptthe optimized paramters
- xnewx-values to predict at
- alphaconfidence level, 95% = 0.05
- ubupper bound for smallest allowed Hessian eigenvalue
- efeigenvalue factor for scaling Hessian
- This function uses numdifftools for the Hessian and Jacobian.
- See https://en.wikipedia.org/wiki/Prediction_interval#Unknown_mean,_unknown_variance
- Returns:
- y, yint, se
- ypredicted values
- yintprediction interval at alpha confidence interval
- sestandard error of prediction
- pycse.PYCSE.polyfit(x, y, deg, alpha=0.05, *args, **kwargs)#
Least squares polynomial fit with parameter confidence intervals.
- Parameters:
- xarray_like, shape (M,)
x-coordinates of the M sample points
(x[i], y[i])
.- yarray_like, shape (M,) or (M, K)
y-coordinates of the sample points. Several data sets of sample points sharing the same x-coordinates can be fitted at once by passing in a 2D-array that contains one dataset per column.
- degint
Degree of the fitting polynomial
- *args and **kwargs are passed to regress.
- Returns:
- [b, bint, se]
- b is a vector of the fitted parameters
- bint is a 2D array of confidence intervals
- se is an array of standard error for each parameter.
- pycse.PYCSE.predict(X, y, pars, XX, alpha=0.05, ub=1e-05, ef=1.05)#
Prediction interval for linear regression.
Based on the delta method.
- Parameters:
- Xknown x-value array, one row for each y-point
- yknown y-value array
- parsfitted parameters
- XXx-value array to make predictions for
- alphaconfidence level, 95% = 0.05
- ubupper bound for smallest allowed Hessian eigenvalue
- efeigenvalue factor for scaling Hessian
- See See https://en.wikipedia.org/wiki/Prediction_interval#Unknown_mean,_unknown_variance
- Returns
- y, yint, pred_se
- ythe predicted values
- yint: confidence interval
- pred_se: std error on predictions.
- pycse.PYCSE.regress(A, y, alpha=0.05, *args, **kwargs)#
Linear least squares regression with confidence intervals.
Solve the matrix equation (A p = y) for p.
The confidence intervals account for sample size using a student T multiplier.
This code is derived from the descriptions at http://www.weibull.com/DOEWeb/confidence_intervals_in_multiple_linear_regression.htm and http://www.weibull.com/DOEWeb/estimating_regression_models_using_least_squares.htm
- Parameters:
- Aa matrix of function values in columns, e.g.
A = np.column_stack([T**0, T**1, T**2, T**3, T**4])
- ya vector of values you want to fit
- alpha100*(1 - alpha) confidence level
- args and kwargs are passed to np.linalg.lstsq
- Returns:
- [b, bint, se]
- b is a vector of the fitted parameters
- bint is an array of confidence intervals. The ith row is for the ith parameter.
- se is an array of standard error for each parameter.
pycse.utils#
Provides utility functions in pycse.
Fuzzy comparisons for float numbers.
An ignore exception decorator
A handy function to read a google sheet.
- pycse.utils.feq(x, y, epsilon=np.float64(2.220446049250313e-16))#
Fuzzy equals.
x == y with tolerance
- pycse.utils.fge(x, y, epsilon=np.float64(2.220446049250313e-16))#
Fuzzy greater than or equal to .
x >= y with tolerance
- pycse.utils.fgt(x, y, epsilon=np.float64(2.220446049250313e-16))#
Fuzzy greater than.
x > y with tolerance
- pycse.utils.fle(x, y, epsilon=np.float64(2.220446049250313e-16))#
Fuzzy less than or equal to.
x <= y with tolerance
- pycse.utils.flt(x, y, epsilon=np.float64(2.220446049250313e-16))#
Fuzzy less than.
x < y with tolerance
- pycse.utils.ignore_exception(*exceptions)#
Ignore exceptions on decorated function.
>>> with ignore_exception(ZeroDivisionError): ... print(1/0)
- pycse.utils.read_gsheet(url, *args, **kwargs)#
Return a dataframe for the Google Sheet at url.
args and kwargs are passed to pd.read_csv The url should be viewable by anyone with the link.
pycse.plotly#
Module for using plotly with orgmode.
This monkey-patches go.Figure.show to provide a png image for org-mode, and an html file that is saved that you can click on in org-mode to see the interactive version.
- pycse.plotly.myshow(self, *args, **kwargs)#
Make a PNG image to display for plotly.
pycse.hashcache#
hashcache - a class decorator for persistent, file/hash-based cache
I found some features of joblib were unsuitable for how I want to use a cache.
1. The “file” Python thinks the function is in is used to save the results in joblib, which leads to repeated runs if you run the same code in Python, notebook or stdin, and means the cache is not portable to other machines, and maybe not even in time since temp directories and kernel parameters are involved. I could not figure out how to change those in joblib.
2. joblib uses the function source code in the hash, so inconsequential changes like whitespace, docstrings and comments change the hash.
This library aims to provide a simpler version of what I wish joblib did for me.
Results are cached based on a hash of the function name, argnames, bytecode, arg values and kwarg values. I use joblib.hash for this. This means any two functions with the same bytecode, even if they have different names, will cache to the same result.
The cache location is set as a class attribute:
HashCache.cache = ‘./cache’
HashCache - stores joblib.dump pickle strings in files named by hash
SqlCache - stores orjson serialized data in a sqlite3 database by hash key
JsonCache - stores orjson serialized data in json files, compatible with maggma
This is still alpha, proof of concept code. Test it a lot for your use case. The API is not stable, and subject to change.
Pros:
1. File-based cache which means many functions can run in parallel reading and writing, and you are limited only by file io speeds, and disk space.
2. semi-portability. The cache could be synced across machines, and caches can be merged with little risk of conflict.
No server is required. Everything is done at the OS level.
4. Extendability. You can define your own functions for loading and dumping data.
Cons:
1. hashes are fragile and not robust. They are fragile with respect to any changes in how byte-code is made, or via mutable arguments, etc. The hashes are not robust to system level changes like library versions, or global variables. The only advantage of hashes is you can compute them.
2. File-based cache which means if you generate thousands of files, it can be slow to delete them. Although it should be fast to access the results since you access them directly by path, it will not be fast to iterate over all the results, e.g. if you want to implement some kind of search or reporting.
3. No server. You have to roll your own update strategy if you run things on multiple machines that should all cache to a common location.
Changelog#
[2023-09-23 Sat] Changed hash signature (breaking change). It is too difficult to figure out how to capture global state, and the use of internal variable names is not consistent with using the bytecode to be insensitive to unimportant variable name changes.
Pulled out some functions for loading and dumping data. This is a precursor to enabling other backends like lmdb or sqlite instead of files. You can then simply provide new functions for this.
[2024-06-18 Tue] Changed from function to class decorator (breaking change).
- class pycse.hashcache.HashCache(function)#
Class decorator to cache using hashes and pickle (via joblib). Data is stored in directories named by the hash.
Methods
__call__
(*args, **kwargs)This is the decorator code that runs around self.function.
dump
(**kwargs)Dump KWARGS to the cache.
dump_data
(hsh, data)Dump DATA into HSH.
get_hash
(args, kwargs)Get a hash for running FUNC(ARGS, KWARGS).
get_hashpath
(hsh)Return path to file for HSH.
get_standardized_args
(args, kwargs)Returns a standardized dictionary of kwargs for func(args, kwargs)
load
(hsh[, cache])Load saved variables from HSH.
load_data
(hsh)Load data for HSH.
- static dump(**kwargs)#
Dump KWARGS to the cache. Returns a hash string for future lookup.
cache is a special kwarg that is not saved
- dump_data(hsh, data)#
Dump DATA into HSH.
- get_hash(args, kwargs)#
Get a hash for running FUNC(ARGS, KWARGS).
This is the most critical feature of hashcache as it provides a key to store and look up results later. You should think carefully before changing this function, it breaks past caches.
FUNC should be as pure as reasonable. This hash is insensitive to global variables.
The hash is on the function name, bytecode, and a standardized kwargs including defaults. We use bytecode because it is insensitive to things like whitespace, comments, docstrings, and variable name changes that don’t affect results. It is assumed that two functions with the same name and bytecode will evaluate to the same result. However, this makes the hash fragile to changes in Python version that affect bytecode.
- get_hashpath(hsh)#
Return path to file for HSH.
- get_standardized_args(args, kwargs)#
Returns a standardized dictionary of kwargs for func(args, kwargs)
This dictionary includes default values, even if they were not called.
- static load(hsh, cache='cache')#
Load saved variables from HSH.
- load_data(hsh)#
Load data for HSH.
HSH is a string for the hash associated with the data you want.
Returns success, data. If it succeeds, success with be True. If the data does not exist yet, sucess will be False, and data will be None.
- class pycse.hashcache.JsonCache(function)#
Json-based cache.
This is compatible with maggma.
- Attributes:
- default
Methods
__call__
(*args, **kwargs)This is the decorator code that runs around self.function.
dump
(**kwargs)Dump KWARGS to the cache.
dump_data
(hsh, data)Dump DATA into HSH.
get_hash
(args, kwargs)Get a hash for running FUNC(ARGS, KWARGS).
get_hashpath
(hsh)Return path to file for HSH.
get_standardized_args
(args, kwargs)Returns a standardized dictionary of kwargs for func(args, kwargs)
load
(hsh)Load data from HSH.
load_data
(hsh)Load data for HSH.
- static dump(**kwargs)#
Dump KWARGS to the cache. Returns a hash string for future lookup.
- dump_data(hsh, data)#
Dump DATA into HSH.
- static load(hsh)#
Load data from HSH.
- load_data(hsh)#
Load data for HSH.
HSH is a string for the hash associated with the data you want.
Returns success, data. If it succeeds, success with be True. If the data does not exist yet, sucess will be False, and data will be None.
- class pycse.hashcache.SqlCache(function)#
Class decorator to cache using orjson and sqlite. Data is stored in a sqlite database as json.
- Attributes:
- default
Methods
__call__
(*args, **kwargs)This is the decorator code that runs around self.function.
dump
(**kwargs)Dump KWARGS to the cache.
dump_data
(hsh, data)Dump DATA into HSH.
get_hash
(args, kwargs)Get a hash for running FUNC(ARGS, KWARGS).
get_hashpath
(hsh)Return path to file for HSH.
get_standardized_args
(args, kwargs)Returns a standardized dictionary of kwargs for func(args, kwargs)
load
(hsh)Load data from HSH.
load_data
(hsh)Load data for HSH.
search
(query, *args)Run a sql QUERY with args.
- static dump(**kwargs)#
Dump KWARGS to the cache. Returns a hash string for future lookup.
- dump_data(hsh, data)#
Dump DATA into HSH. DATA must be serializable to json.
- static load(hsh)#
Load data from HSH.
- load_data(hsh)#
Load data for HSH.
HSH is a string for the hash associated with the data you want.
Returns success, data. If it succeeds, success with be True. If the data does not exist yet, sucess will be False, and data will be None.
- static search(query, *args)#
Run a sql QUERY with args. args are substituted in ? placeholders in the query.
This is just a light wrapper on con.execute.
- pycse.hashcache.hashcache(*args, **kwargs)#
Raises an exception if the old hashcache decorator is used.