## New publication - Neural network embeddings based similarity search method for atomistic systems

| categories: | tags:

Searching for atomic structures in databases is like finding a needle in the haystack. It is difficult to construct a query that finds what you want, without finding nothing, or everything! It is difficult to use atomic coordinates because they are sensitive to translations, rotations and permutations. There are many ways to construct equivalent unit cells that also make it difficult to uniquely query materials.

In this paper we show how to construct queries for atomic structures that allow you to quickly find similar atomic structures. We achieve this by using invariant fingerprint vectors from machine learning models coupled with approximate nearest neighbor vector search algorithms. We apply it to molecules, bulk materials and adsorbates on surfaces. We show how the geometric similarity in found atomic systems leads to better data sets for building new machine learning models, and that the found systems tend to show geometric and electronic structure similarity.

@article{yang-2022-neural-networ,
author =       {Yilin Yang and Mingjie Liu and John R. Kitchin},
title =        {Neural Network Embeddings Based Similarity Search Method for
Atomistic Systems},
journal =      {Digital Discovery},
volume =       {},
number =       {},
pages =        {},
year =         2022,
doi =          {10.1039/d2dd00055e},
url =          {https://doi.org/10.1039/D2DD00055E},
DATE_ADDED =   {Mon Sep 12 17:21:30 2022},
}


org-mode source

Org-mode version = 9.5.1

## New publication - Evaluation of the Degree of Rate Control via Automatic Differentiation

| categories: | tags:

Determining which steps in a chemical reaction network are important in controlling the reaction rate is challenging. The degree of rate control is a valuable tool for this, but it requires the derivatives of the reaction rate with respect to rate parameters. In many scenarios we do not have an analytical expression for the reaction rate, and even when we do the derivatives may be tedious to derive and implement. In this work, we show how to use automatic differentiation to address this difficulty, enabling straightforward evaluation of the degree of rate control and sensitivity analysis of complex reaction networks.

@article{yang-2022-evaluat,
author =       {Yang, Yilin and Achar, Siddarth K. and Kitchin, John R.},
title =        {Evaluation of the degree of rate control via automatic
differentiation},
journal =      {AIChE Journal},
volume =       {n/a},
number =       {n/a},
pages =        {e17653},
year =         2022,
keywords =     {catalysis, reaction kinetics},
doi =          {10.1002/aic.17653},
url =          {https://aiche.onlinelibrary.wiley.com/doi/abs/10.1002/aic.17653},
eprint =       {https://aiche.onlinelibrary.wiley.com/doi/pdf/10.1002/aic.17653},
abstract =     {Abstract The degree of rate control (DRC) quantitatively
identifies the kinetically relevant (sometimes known as
rate-limiting) steps of a complex reaction network. This
concept relies on derivatives which are commonly implemented
numerically, for example, with finite differences (FDs).
Numerical derivatives are tedious to implement, and can be
problematic, and unstable or unreliable. In this study, we
demonstrate the use of automatic differentiation (AD) in the
evaluation of the DRC. AD libraries are increasingly available
through modern machine learning frameworks. Compared with the
FDs, AD provides solutions with higher accuracy with lower
computational cost. We demonstrate applications in
steady-state and transient kinetics. Furthermore, we
illustrate a hybrid local-global sensitivity analysis method,
the distributed evaluation of local sensitivity analysis, to
assess the importance of kinetic parameters over an uncertain
space. This method also benefits from AD to obtain
high-quality results efficiently.}
}


org-mode source

Org-mode version = 9.5.1

## New publication - Model-Specific to Model-General Uncertainty for Physical Properties

| categories: | tags:

When we fit models to data there are two kinds of uncertainty: the kind that represents uncertainty in the data, e.g. random noise that we cannot fit, and uncertainty in the model, e.g. are we using the right one. With a physics based model, we get model-specific estimates of uncertainty. We show in this paper how to think about and quantify these kinds of errors, and particularly how to use Bayesian models like a Gaussian process to get a model-general error when making predictions about physical properties.

@article{zhan-2022-model-specif,
author =       {Ni Zhan and John R. Kitchin},
title =        {Model-Specific To Model-General Uncertainty for Physical
Properties},
journal =      {Industrial \& Engineering Chemistry Research},
volume =       {nil},
number =       {nil},
pages =        {acs.iecr.1c04706},
year =         2022,
doi =          {10.1021/acs.iecr.1c04706},
url =          {http://dx.doi.org/10.1021/acs.iecr.1c04706},
DATE_ADDED =   {Sun Feb 13 12:08:27 2022},
}


org-mode source

Org-mode version = 9.5.1

## New publication on segregation in ternary alloy surfaces

| categories: | tags:

In this paper we combine density functional theory, machine learning, Monte Carlo simulations and experimental data to study segregation in a ternary alloy Cu-Pd-Au surface across the composition space. We found varying agreement and disagreement between tehe simulated and experimental results, and discuss the origins of these. Overall, Au segregates significantly across the composition space, and we learned a lot about the contributions to the discrepancies observed in Cu-Pd segregation and Au-Cu segregration.

Find the paper at https://pubs.acs.org/doi/10.1021/acs.jpcc.1c09647.

@article{yang-2022-simul-segreg,
author =       {Yilin Yang and Zhitao Guo and Andrew J. Gellman and John R.
Kitchin},
title =        {Simulating Segregation in a Ternary Cu-Pd-Au Alloy With
Density Functional Theory, Machine Learning, and Monte Carlo
Simulations},
journal =      {The Journal of Physical Chemistry C},
volume =       {nil},
number =       {nil},
pages =        {acs.jpcc.1c09647},
year =         2022,
doi =          {10.1021/acs.jpcc.1c09647},
url =          {http://dx.doi.org/10.1021/acs.jpcc.1c09647},
DATE_ADDED =   {Thu Jan 20 12:39:49 2022},
}


org-mode source

Org-mode version = 9.5.1

## Launching Point Breeze Publishing

| categories: news | tags:

I am excited to launch a new project this year: https://pointbreezepubs.gumroad.com/. This venture exists to publish booklets to help people learn how to use Python in science and engineering. Why am I doing this? I think computing skills are as important as domain knowledge today. I have spent the last 25 years learning how to use computing in science and engineering, and I have been teaching other people how to do that for the past 15 years. In that time huge changes have occurred in both hardware and software, data science and machine learning have emerged and they are playing a role almost everywhere. It has never been more important for people to learn how to use computers than it is today.

Solving science and engineering problems with computers requires first, and foremost, domain knowledge. Without that, you won't know what problem to solve, or know if the solution makes sense after you get it. It also requires complementary computational skills. Similar to a math education, where you first learn algebra, then geometry, and then calculus, you should not simply jump into data science or machine learning without a foundation of computational skills. I think of these skills like this:

Level 1 is basic programming in Python. Although everything rests on this foundation, this level alone does not solve many interesting problems in science and engineering. You have to combine this with some mathematical domain knowledge in level 2 to get to those. Levels 1 and 2 are adequate for many common science and engineering problems. If you move in a direction of specialization, especially using computers, it is often found that even though levels 1 and 2 are adequate, they may become tedious in large problems, or when used frequently. The solution is almost always to create an abstraction, a framework, that removes the tedium, and is more convenient to use. This is level 3. The abstraction hides a lot of detail, and can make it more difficulty to customize behavior, but the payoff is convenience. Finally, and this is debatable, I think level 4 contains today's machine learning frameworks. I separate them from level 3 because they are often used to write tools used in level 3, and they typically require skills that are not learned in level 2.

So how does Point Breeze Publishing help here? We have published the first step in this booklet:

These booklets come in two forms: PDF and ipynb. The first is traditional, and easy to read. The second format is less traditional, but it allows you to execute the code yourself to see how it works.

Over the next few weeks, I will publish these additional booklets, with some supplementary materials.

1. Intermediate Python computations in science and engineering
2. Python computations for lab courses
3. Ordinary differential equations
4. Optimization in Python

These booklets cover most of what chemical engineering undergraduate students need (my opinion of course), and lay a solid foundation for levels 3 and 4 as described above. These are not reference books, or documentation from the packages. They are a guided tour through the topics to help you get started, learn how to think about these topics, and become a self-learner in them.

Where to from here? Over the summer, I will work on some more advanced booklets on data science and machine learning. I will also explore some other ways to deliver these booklets. I use PDF and ipynb now because I know how to do it, but other options exist.

This whole venture is possible because of scimax, and I hope this becomes a route to publish books about using scimax for scientists and engineers.

Want to keep up with what we are doing?