New publication - Neural network embeddings based similarity search method for atomistic systems

| categories: publication, news | tags:

Searching for atomic structures in databases is like finding a needle in the haystack. It is difficult to construct a query that finds what you want, without finding nothing, or everything! It is difficult to use atomic coordinates because they are sensitive to translations, rotations and permutations. There are many ways to construct equivalent unit cells that also make it difficult to uniquely query materials.

In this paper we show how to construct queries for atomic structures that allow you to quickly find similar atomic structures. We achieve this by using invariant fingerprint vectors from machine learning models coupled with approximate nearest neighbor vector search algorithms. We apply it to molecules, bulk materials and adsorbates on surfaces. We show how the geometric similarity in found atomic systems leads to better data sets for building new machine learning models, and that the found systems tend to show geometric and electronic structure similarity.

You can read more about this work here (it is Open Access): Yilin Yang, Mingie Liu, John Kitchin, Digital Discovery, 2022,

  author =       {Yilin Yang and Mingjie Liu and John R. Kitchin},
  title =        {Neural Network Embeddings Based Similarity Search Method for
                  Atomistic Systems},
  journal =      {Digital Discovery},
  volume =       {},
  number =       {},
  pages =        {},
  year =         2022,
  doi =          {10.1039/d2dd00055e},
  url =          {},
  DATE_ADDED =   {Mon Sep 12 17:21:30 2022},

Copyright (C) 2022 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.5.1

Discuss on Twitter