Documentation
Contents
DocumentationΒΆ
PYCSEΒΆ
Module containing useful scientific and engineering functions.
Linear regression
Nonlinear regression.
Differential equation solvers.
See http://kitchingroup.cheme.cmu.edu/pycse
Copyright 2020, John Kitchin (see accompanying license files for details).
- pycse.PYCSE.Rsquared(y, Y)ΒΆ
Return R^2, or coefficient of determination.
y is a 1d array of observations. Y is a 1d array of predictions from a model.
- Returns
- The R^2 value for the fit.
- pycse.PYCSE.bic(x, y, model, popt)ΒΆ
Compute the Bayesian information criterion (BIC).
- Parameters
- modelfunction(x, β¦) returns prediction for y
- poptoptimal parameters
- yarray, known y-values
- Returns
- pycse.PYCSE.ivp(f, tspan, y0, *args, **kwargs)ΒΆ
Solve an ODE initial value problem.
- Parameters
- ffunction
- callable yβ(x, y) = f(x, y)
- tspanarray
- The x points you want the solution at. The first and last points are used in
- tspan in solve_ivp.
- y0array
- Initial conditions
- *argstype
- arbitrary positional arguments to pass to solve_ivp
- **kwargstype arbitrary kwargs to pass to solve_ivp.
- max_step is set to be the min diff of tspan. dense_output is set to True.
- t_eval is set to the array specified in tspan.
- Returns
- solution from solve_ivp
- pycse.PYCSE.lbic(X, y, popt)ΒΆ
Compute the Bayesian information criterion for a linear model.
- Returns
- BICfloat
- pycse.PYCSE.nlinfit(model, x, y, p0, alpha=0.05, **kwargs)ΒΆ
Nonlinear regression with confidence intervals.
- Parameters
- modelfunction f(x, p0, p1, β¦) = y
- xarray of the independent data
- yarray of the dependent data
- p0array of the initial guess of the parameters
- alpha100*(1 - alpha) is the confidence interval
i.e. alpha = 0.05 is 95% confidence
- kwargs are passed to curve_fit.
- Returns
- [p, pint, SE]
p is an array of the fitted parameters pint is an array of confidence intervals SE is an array of standard errors for the parameters.
- pycse.PYCSE.nlpredict(X, y, model, loss, popt, xnew, alpha=0.05, ub=1e-05, ef=1.05)ΒΆ
Prediction error for a nonlinear fit.
- Parameters
- modelmodel function with signature model(x, β¦)
- lossloss function the model was fitted with loss(β¦)
- poptthe optimized paramters
- xnewx-values to predict at
- alphaconfidence level, 95% = 0.05
- ubupper bound for smallest allowed Hessian eigenvalue
- efeigenvalue factor for scaling Hessian
- This function uses numdifftools for the Hessian and Jacobian.
- Returns
- y, yint, se
- ypredicted values
- yintprediction interval at alpha confidence interval
- sestandard error of prediction
- pycse.PYCSE.polyfit(x, y, deg, alpha=0.05, *args, **kwargs)ΒΆ
Least squares polynomial fit with parameter confidence intervals.
- Parameters
- xarray_like, shape (M,)
x-coordinates of the M sample points
(x[i], y[i])
.- yarray_like, shape (M,) or (M, K)
y-coordinates of the sample points. Several data sets of sample points sharing the same x-coordinates can be fitted at once by passing in a 2D-array that contains one dataset per column.
- degint
Degree of the fitting polynomial
- *args and **kwargs are passed to regress.
- Returns
- [b, bint, se]
- b is a vector of the fitted parameters
- bint is a 2D array of confidence intervals
- se is an array of standard error for each parameter.
- pycse.PYCSE.predict(X, y, pars, XX, alpha=0.05, ub=1e-05, ef=1.05)ΒΆ
Prediction interval for linear regression.
Based on the delta method.
- Parameters
- Xknown x-value array, one row for each y-point
- yknown y-value array
- parsfitted parameters
- XXx-value array to make predictions for
- alphaconfidence level, 95% = 0.05
- ubupper bound for smallest allowed Hessian eigenvalue
- efeigenvalue factor for scaling Hessian
- Returns
- y, yint, pred_se
- ythe predicted values
- yintan array of predicted confidence intervals
- pycse.PYCSE.regress(A, y, alpha=0.05, *args, **kwargs)ΒΆ
Linear least squares regression with confidence intervals.
Solve the matrix equation (A p = y) for p.
The confidence intervals account for sample size using a student T multiplier.
This code is derived from the descriptions at http://www.weibull.com/DOEWeb/confidence_intervals_in_multiple_linear_regression.htm and http://www.weibull.com/DOEWeb/estimating_regression_models_using_least_squares.htm
- Parameters
- Aa matrix of function values in columns, e.g.
A = np.column_stack([T**0, T**1, T**2, T**3, T**4])
- ya vector of values you want to fit
- alpha100*(1 - alpha) confidence level
- ags and kwargs are passed to np.linalg.lstsq
- Returns
- [b, bint, se]
- b is a vector of the fitted parameters
- bint is a 2D array of confidence intervals
- se is an array of standard error for each parameter.
pycse.utilsΒΆ
Provides utility functions in pycse.
Fuzzy comparisons for float numbers.
An ignore exception decorator
A handy function to read a google sheet.
- pycse.utils.feq(x, y, epsilon=2.220446049250313e-16)ΒΆ
Fuzzy equals.
x == y with tolerance
- pycse.utils.fge(x, y, epsilon=2.220446049250313e-16)ΒΆ
Fuzzy greater than or equal to .
x >= y with tolerance
- pycse.utils.fgt(x, y, epsilon=2.220446049250313e-16)ΒΆ
Fuzzy greater than.
x > y with tolerance
- pycse.utils.fle(x, y, epsilon=2.220446049250313e-16)ΒΆ
Fuzzy less than or equal to.
x <= y with tolerance
- pycse.utils.flt(x, y, epsilon=2.220446049250313e-16)ΒΆ
Fuzzy less than.
x < y with tolerance
- pycse.utils.ignore_exception(*exceptions)ΒΆ
Ignore exceptions on decorated function.
>>> with ignore_exception(ZeroDivisionError): ... print(1/0)
- pycse.utils.read_gsheet(url, *args, **kwargs)ΒΆ
Return a dataframe for the Google Sheet at url.
args and kwargs are passed to pd.read_csv The url should be viewable by anyone with the link.
pycse.plotlyΒΆ
Module for using plotly with orgmode.
This monkey-patches go.Figure.show to provide a png image for org-mode, and an html file that is saved that you can click on in org-mode to see the interactive version.
- pycse.plotly.myshow(self, *args, **kwargs)ΒΆ
Make a PNG image to display for plotly.
pycse.hashcacheΒΆ
hashcache - a decorator for persistent, file/hash-based cache
I found some features of joblib were unsuitable for how I want to use a cache.
1. The βfileβ Python thinks the function is in is used to save the results in joblib, which leads to repeated runs if you run the same code in Python, notebook or stdin, and means the cache is not portable to other machines, and maybe not even in time since temp directories and kernel parameters are involved. I could not figure out how to change those in joblib.
2. joblib uses the function source code in the hash, so inconsequential changes like whitespace, docstrings and comments change the hash.
This library aims to provide a simpler version of what I wish joblib did for me.
Results are cached based on a hash of the function name, argnames, bytecode, arg values and kwarg values. I use joblib.hash for this. This means any two functions with the same bytecode, even if they have different names, will cache to the same result.
The cache location is set as a function attribute:
hashcache.cache = β./cacheβ
This is alpha, proof of concept code. Test it a lot for your use case. The API is not stable, and subject to change.
Some things to do:
1. the function attributes are kind of weird, maybe these should be decorator arguments.
Pros:
1. File-based cache which means many functions can run in parallel reading and writing, and you are limited only by file io speeds, and disk space.
2. semi-portability. The cache could be synced across machines, and caches can be merged with little risk of conflict.
No server is required. Everything is done at the OS level.
4. Extendability. You can define your own functions for loading and dumping data.
Cons:
1. hashes are fragile and not robust. They are fragile with respect to any changes in how byte-code is made, or via mutable arguments, etc. The hashes are not robust to system level changes like library versions, or global variables. The only advantage of hashes is you can compute them.
2. File-based cache which means if you generate thousands of files, it can be slow to delete them. Although it should be fast to access the results since you access them directly by path, it will not be fast to iterate over all the results, e.g. if you want to implement some kind of search or reporting.
3. No server. You have to roll your own update strategy if you run things on multiple machines that should all cache to a common location.
ChangelogΒΆ
[2023-09-23 Sat] Changed hash signature (breaking change). It is too difficult to figure out how to capture global state, and the use of internal variable names is not consistent with using the bytecode to be insensitive to unimportant variable name changes.
Pulled out some functions for loading and dumping data. This is a precursor to enabling other backends like lmdb or sqlite instead of files. You can then simply provide new functions for this.
- pycse.hashcache.dump_data(hsh, data, verbose)ΒΆ
Dump DATA into HSH.
- pycse.hashcache.get_hash(func, args, kwargs)ΒΆ
Get a hash for running FUNC(ARGS, KWARGS).
This is the most critical feature of hashcache as it provides a key to store and look up results later. You should think carefully before changing this function, it breaks past caches.
FUNC should be as pure as reasonable. This hash is insensitive to global variables.
The hash is on the function name, bytecode, and a standardized kwargs including defaults. We use bytecode because it is insensitive to things like whitespace, comments, docstrings, and variable name changes that donβt affect results. It is assumed that two functions with the same name and bytecode will evaluate to the same result.
- pycse.hashcache.get_hashpath(hsh)ΒΆ
Return path to file for HSH.
- pycse.hashcache.get_standardized_args(func, args, kwargs)ΒΆ
Returns a standardized dictionary of kwargs for func(args, kwargs)
This dictionary includes default values, even if they were not called.
- pycse.hashcache.hashcache(verbose=False, loader=<function load_data>, dumper=<function dump_data>)ΒΆ
Cache results by hash of the function, arguments and kwargs.
Set hashcache.cache to the directory you want the cache saved in. Default = cache
- pycse.hashcache.load_data(hsh, verbose=False)ΒΆ
Load data for HSH.
HSH is a string for the hash associated with the data you want.
Returns success, data. If it succeeds, success with be True. If the data does not exist yet, sucess will be False, and data will be None.