Documentation

PYCSE

Module containing useful scientific and engineering functions.

  • Linear regression

  • Nonlinear regression.

  • Differential equation solvers.

See http://kitchingroup.cheme.cmu.edu/pycse

Copyright 2020, John Kitchin (see accompanying license files for details).

pycse.PYCSE.Rsquared(y, Y)

Return R^2, or coefficient of determination.

y is a 1d array of observations. Y is a 1d array of predictions from a model.

Returns
The R^2 value for the fit.
pycse.PYCSE.bic(x, y, model, popt)

Compute the Bayesian information criterion (BIC).

Parameters
modelfunction(x, …) returns prediction for y
poptoptimal parameters
yarray, known y-values
Returns
BICfloat
https://en.wikipedia.org/wiki/Bayesian_information_criterion#Gaussian_special_case
pycse.PYCSE.ivp(f, tspan, y0, *args, **kwargs)

Solve an ODE initial value problem.

Parameters
ffunction
callable y’(x, y) = f(x, y)
tspanarray
The x points you want the solution at. The first and last points are used in
tspan in solve_ivp.
y0array
Initial conditions
*argstype
arbitrary positional arguments to pass to solve_ivp
**kwargstype arbitrary kwargs to pass to solve_ivp.
max_step is set to be the min diff of tspan. dense_output is set to True.
t_eval is set to the array specified in tspan.
Returns
solution from solve_ivp
pycse.PYCSE.lbic(X, y, popt)

Compute the Bayesian information criterion for a linear model.

Returns
BICfloat
pycse.PYCSE.nlinfit(model, x, y, p0, alpha=0.05, **kwargs)

Nonlinear regression with confidence intervals.

Parameters
modelfunction f(x, p0, p1, …) = y
xarray of the independent data
yarray of the dependent data
p0array of the initial guess of the parameters
alpha100*(1 - alpha) is the confidence interval

i.e. alpha = 0.05 is 95% confidence

kwargs are passed to curve_fit.
Returns
[p, pint, SE]

p is an array of the fitted parameters pint is an array of confidence intervals SE is an array of standard errors for the parameters.

pycse.PYCSE.nlpredict(X, y, model, loss, popt, xnew, alpha=0.05, ub=1e-05, ef=1.05)

Prediction error for a nonlinear fit.

Parameters
modelmodel function with signature model(x, …)
lossloss function the model was fitted with loss(…)
poptthe optimized paramters
xnewx-values to predict at
alphaconfidence level, 95% = 0.05
ubupper bound for smallest allowed Hessian eigenvalue
efeigenvalue factor for scaling Hessian
This function uses numdifftools for the Hessian and Jacobian.
Returns
y, yint, se
ypredicted values
yintprediction interval at alpha confidence interval
sestandard error of prediction
pycse.PYCSE.polyfit(x, y, deg, alpha=0.05, *args, **kwargs)

Least squares polynomial fit with parameter confidence intervals.

Parameters
xarray_like, shape (M,)

x-coordinates of the M sample points (x[i], y[i]).

yarray_like, shape (M,) or (M, K)

y-coordinates of the sample points. Several data sets of sample points sharing the same x-coordinates can be fitted at once by passing in a 2D-array that contains one dataset per column.

degint

Degree of the fitting polynomial

*args and **kwargs are passed to regress.
Returns
[b, bint, se]
b is a vector of the fitted parameters
bint is a 2D array of confidence intervals
se is an array of standard error for each parameter.
pycse.PYCSE.predict(X, y, pars, XX, alpha=0.05, ub=1e-05, ef=1.05)

Prediction interval for linear regression.

Based on the delta method.

Parameters
Xknown x-value array, one row for each y-point
yknown y-value array
parsfitted parameters
XXx-value array to make predictions for
alphaconfidence level, 95% = 0.05
ubupper bound for smallest allowed Hessian eigenvalue
efeigenvalue factor for scaling Hessian
Returns
y, yint, pred_se
ythe predicted values
yintan array of predicted confidence intervals
pycse.PYCSE.regress(A, y, alpha=0.05, *args, **kwargs)

Linear least squares regression with confidence intervals.

Solve the matrix equation (A p = y) for p.

The confidence intervals account for sample size using a student T multiplier.

This code is derived from the descriptions at http://www.weibull.com/DOEWeb/confidence_intervals_in_multiple_linear_regression.htm and http://www.weibull.com/DOEWeb/estimating_regression_models_using_least_squares.htm

Parameters
Aa matrix of function values in columns, e.g.

A = np.column_stack([T**0, T**1, T**2, T**3, T**4])

ya vector of values you want to fit
alpha100*(1 - alpha) confidence level
ags and kwargs are passed to np.linalg.lstsq
Returns
[b, bint, se]
b is a vector of the fitted parameters
bint is a 2D array of confidence intervals
se is an array of standard error for each parameter.

pycse.utils

Provides utility functions in pycse.

  1. Fuzzy comparisons for float numbers.

  2. An ignore exception decorator

  3. A handy function to read a google sheet.

pycse.utils.feq(x, y, epsilon=2.220446049250313e-16)

Fuzzy equals.

x == y with tolerance

pycse.utils.fge(x, y, epsilon=2.220446049250313e-16)

Fuzzy greater than or equal to .

x >= y with tolerance

pycse.utils.fgt(x, y, epsilon=2.220446049250313e-16)

Fuzzy greater than.

x > y with tolerance

pycse.utils.fle(x, y, epsilon=2.220446049250313e-16)

Fuzzy less than or equal to.

x <= y with tolerance

pycse.utils.flt(x, y, epsilon=2.220446049250313e-16)

Fuzzy less than.

x < y with tolerance

pycse.utils.ignore_exception(*exceptions)

Ignore exceptions on decorated function.

>>> with ignore_exception(ZeroDivisionError):
...     print(1/0)
pycse.utils.read_gsheet(url, *args, **kwargs)

Return a dataframe for the Google Sheet at url.

args and kwargs are passed to pd.read_csv The url should be viewable by anyone with the link.

pycse.plotly

Module for using plotly with orgmode.

This monkey-patches go.Figure.show to provide a png image for org-mode, and an html file that is saved that you can click on in org-mode to see the interactive version.

pycse.plotly.myshow(self, *args, **kwargs)

Make a PNG image to display for plotly.

pycse.hashcache

hashcache - a decorator for persistent, file/hash-based cache

I found some features of joblib were unsuitable for how I want to use a cache.

1. The “file” Python thinks the function is in is used to save the results in joblib, which leads to repeated runs if you run the same code in Python, notebook or stdin, and means the cache is not portable to other machines, and maybe not even in time since temp directories and kernel parameters are involved. I could not figure out how to change those in joblib.

2. joblib uses the function source code in the hash, so inconsequential changes like whitespace, docstrings and comments change the hash.

This library aims to provide a simpler version of what I wish joblib did for me.

Results are cached based on a hash of the function name, argnames, bytecode, arg values and kwarg values. I use joblib.hash for this. This means any two functions with the same bytecode, even if they have different names, will cache to the same result.

The cache location is set as a function attribute:

hashcache.cache = ‘./cache’

This is alpha, proof of concept code. Test it a lot for your use case. The API is not stable, and subject to change.

Some things to do:

1. the function attributes are kind of weird, maybe these should be decorator arguments.

Pros:

1. File-based cache which means many functions can run in parallel reading and writing, and you are limited only by file io speeds, and disk space.

2. semi-portability. The cache could be synced across machines, and caches can be merged with little risk of conflict.

  1. No server is required. Everything is done at the OS level.

4. Extendability. You can define your own functions for loading and dumping data.

Cons:

1. hashes are fragile and not robust. They are fragile with respect to any changes in how byte-code is made, or via mutable arguments, etc. The hashes are not robust to system level changes like library versions, or global variables. The only advantage of hashes is you can compute them.

2. File-based cache which means if you generate thousands of files, it can be slow to delete them. Although it should be fast to access the results since you access them directly by path, it will not be fast to iterate over all the results, e.g. if you want to implement some kind of search or reporting.

3. No server. You have to roll your own update strategy if you run things on multiple machines that should all cache to a common location.

Changelog

[2023-09-23 Sat] Changed hash signature (breaking change). It is too difficult to figure out how to capture global state, and the use of internal variable names is not consistent with using the bytecode to be insensitive to unimportant variable name changes.

Pulled out some functions for loading and dumping data. This is a precursor to enabling other backends like lmdb or sqlite instead of files. You can then simply provide new functions for this.

pycse.hashcache.dump_data(hsh, data, verbose)

Dump DATA into HSH.

pycse.hashcache.get_hash(func, args, kwargs)

Get a hash for running FUNC(ARGS, KWARGS).

This is the most critical feature of hashcache as it provides a key to store and look up results later. You should think carefully before changing this function, it breaks past caches.

FUNC should be as pure as reasonable. This hash is insensitive to global variables.

The hash is on the function name, bytecode, and a standardized kwargs including defaults. We use bytecode because it is insensitive to things like whitespace, comments, docstrings, and variable name changes that don’t affect results. It is assumed that two functions with the same name and bytecode will evaluate to the same result.

pycse.hashcache.get_hashpath(hsh)

Return path to file for HSH.

pycse.hashcache.get_standardized_args(func, args, kwargs)

Returns a standardized dictionary of kwargs for func(args, kwargs)

This dictionary includes default values, even if they were not called.

pycse.hashcache.hashcache(verbose=False, loader=<function load_data>, dumper=<function dump_data>)

Cache results by hash of the function, arguments and kwargs.

Set hashcache.cache to the directory you want the cache saved in. Default = cache

pycse.hashcache.load_data(hsh, verbose=False)

Load data for HSH.

HSH is a string for the hash associated with the data you want.

Returns success, data. If it succeeds, success with be True. If the data does not exist yet, sucess will be False, and data will be None.