The sqlite variation of ase.db

| categories: ase, vasp | tags:

In a recent post we explored writing VASP calculations to an ase database in json format. Today we explore a similar idea, but writing to sqlite. I have incorporated the code from the previous post into a utils module in jasp.

from jasp.utils import *

print get_jasp_dirs('/home-research/jkitchin/research/rutile-atat')
['/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/0', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/1', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/10', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/11', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/12', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/13', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/14', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/15', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/16', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/17', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/2', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/3', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/4', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/5', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/59', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/6', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/66', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/7', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/73', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/74', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/78', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/8', '/home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/9']

That new function get_jasp_dirs just returns a list of directories that are known finished VASP calculations. We will use a functional style of programming to map a function onto each directory in that list.

from jasp import *
from jasp.utils import *

from ase.db import connect

# I want a sqlite extension, so we have to specify db as the type, which means sqlite
c = connect('vaspdb.sqlite', type='db')

dirs = get_jasp_dirs('/home-research/jkitchin/research/rutile-atat')

def write(directory):
    with jasp(directory) as calc:
        atoms = calc.get_atoms()
        calc.results['energy'] = atoms.get_potential_energy()
        calc.results['forces'] = atoms.get_forces()
    print c.write(atoms), directory

# functional approach to writing
map(write, dirs)
1 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/0
2 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/1
3 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/10
4 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/11
5 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/12
6 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/13
7 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/14
8 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/15
9 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/16
10 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/17
11 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/2
12 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/3
13 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/4
14 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/5
15 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/59
16 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/6
17 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/66
18 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/7
19 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/73
20 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/74
21 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/78
22 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/8
23 /home-research/jkitchin/research/rutile-atat/RuTi_O_rutile/9

Now, we have a sqlite database. Let us explore that a bit before using the python interface again.

1 Exploring the database with sqlite

First we see the tables.

.tables
keywords           species            text_key_values  
number_key_values  systems          

We can see details of the tables like this.

select * from sqlite_master;
table|systems|systems|2|CREATE TABLE systems (
    id integer primary key autoincrement,
    unique_id text unique,
    ctime real,
    mtime real,
    user text,
    numbers blob,
    positions blob,
    cell blob,
    pbc integer,
    initial_magmoms blob,
    initial_charges blob,
    masses blob,
    tags blob,
    momenta blob,
    constraints text,
    calculator text,
    calculator_parameters text,
    energy real,
    free_energy real,
    forces blob,
    stress blob,
    dipole blob,
    magmoms blob,
    magmom blob,
    charges blob,
    data text)
index|sqlite_autoindex_systems_1|systems|3|
table|sqlite_sequence|sqlite_sequence|4|CREATE TABLE sqlite_sequence(name,seq)
table|species|species|5|CREATE TABLE species (
    Z integer,
    n integer,
    id text,
    foreign key (id) references systems(id))
table|keywords|keywords|6|CREATE TABLE keywords (
    keyword text,
    id text,
    foreign key (id) references systems(id))
table|text_key_values|text_key_values|8|CREATE TABLE text_key_values (
    key text,
    value text,
    id text,
    foreign key (id) references systems(id))
table|number_key_values|number_key_values|10|CREATE TABLE number_key_values (
    key text,
    value real,
    id text,
    foreign key (id) references systems (id))
index|unique_id_index|systems|11|CREATE INDEX unique_id_index on systems(unique_id)
index|ctime_index|systems|12|CREATE INDEX ctime_index on systems(ctime)
index|user_index|systems|13|CREATE INDEX user_index on systems(user)
index|calculator_index|systems|14|CREATE INDEX calculator_index on systems(calculator)
index|species_index|species|15|CREATE INDEX species_index on species(Z)
index|keyword_index|keywords|16|CREATE INDEX keyword_index on keywords(keyword)
index|text_index|text_key_values|17|CREATE INDEX text_index on text_key_values(key)
index|number_index|number_key_values|18|CREATE INDEX number_index on number_key_values(key)

Let us see one entry from the systems table.

select * from systems where id=1;
id|unique_id|ctime|mtime|user|numbers|positions|cell|pbc|initial_magmoms|initial_charges|masses|tags|momenta|constraints|calculator|calculator_parameters|energy|free_energy|forces|stress|dipole|magmoms|magmom|charges|data
1|3ed58bb16897177be0ed56400c90b6f4|14.2361260035084|14.2361260035084|jkitchin|@|7|||||||vasp|{"incar": {"doc": "INCAR parameters", "nbands": 43, "sigma": 0.1, "prec": "high", "encut": 350.0}, "doc": "JSON representation of a VASP calculation.\n\nenergy is in eV\nforces are in eV/\\AA\nstress is in GPa (sxx, syy, szz,  syz, sxz, sxy)\nmagnetic moments are in Bohr-magneton\nThe density of states is reported with E_f at 0 eV.\nVolume is reported in \\AA^3\nCoordinates and cell parameters are reported in \\AA\n\nIf atom-projected dos are included they are in the form:\n{ados:{energy:data, {atom index: {orbital : dos}}}\n", "potcar": [["Ru", "potpaw_PBE/Ru/POTCAR", "dee616f2a1e7a5430bb588f1710bfea3001d54ea"], ["O", "potpaw_PBE/O/POTCAR", "9a0489b46120b0cad515d935f44b5fbe3a3b1dfa"]], "input": {"kpts": [6, 6, 10], "kpts_nintersections": null, "reciprocal": false, "setups": {}, "xc": "PBE", "txt": "-", "gamma": true}, "atoms": {"cell": [[4.526933343669885, 0.0, 0.0], [0.0, 4.526933343669885, 0.0], [0.0, 0.0, 3.095292162609941]], "symbols": ["O", "O", "O", "O", "Ru", "Ru"], "tags": [0, 0, 0, 0, 0, 0], "pbc": [true, true, true], "positions": [[1.3820537023391204, 1.3820537023391204, 0.0], [3.1448796413307645, 3.1448796413307645, 0.0], [3.645520374174063, 0.8814129694958222, 1.5476460813049704], [0.8814129694958222, 3.645520374174063, 1.5476460813049704], [0.0, 0.0, 0.0], [2.2634666718349425, 2.2634666718349425, 1.5476460813049704]]}, "data": {"stress": [0.0884313161515024, 0.0884313161515024, 0.06042693164307849, -0.0, -0.0, -0.0], "doc": "Data from the output of the calculation", "volume": 63.432210741434858, "total_energy": -44.251496, "forces": [[-0.023609, -0.023609, 0.0], [0.023609, 0.023609, 0.0], [-0.023609, 0.023609, 0.0], [0.023609, -0.023609, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], "fermi_level": 5.0374}, "metadata": {"date.created": 1395241327.477995, "uuid": "7081ee4a-af77-11e3-a6e6-003048f5e49e", "date.created.ascii": "Wed Mar 19 11:02:07 2014", "user.username": "jkitchin", "atoms.resort": [2, 3, 4, 5, 0, 1], "user.email": "jkitchin@andrew.cmu.edu", "user.fullname": "John Kitchin", "O.potential.git_hash": "9a0489b46120b0cad515d935f44b5fbe3a3b1dfa", "atoms.tags": [0, 0, 0, 0, 0, 0], "O.potential.path": "potpaw_PBE/O/POTCAR", "Ru.potential.path": "potpaw_PBE/Ru/POTCAR", "Ru.potential.git_hash": "dee616f2a1e7a5430bb588f1710bfea3001d54ea"}}|-44.251496||+|||||{"keywords": [], "data": {}, "key_value_pairs": {}}

How about the species table:

select * from species where id=1;
Z|n|id
8|4|1
44|2|1

Now we can find a calculation with two Ru and 4 oxygen atoms, but it takes some sqlite knowledge.

select sp1.id,sys.id 
from species as sp1 
inner join
species as sp2
on sp1.id = sp2.id
inner join systems as sys
on sp1.id=sys.id
where (sp1.Z=44 and sp1.n=2) and (sp2.Z=8 and sp2.n=4);
id|id
1|1

That is an expected result. Let us get back to the python interface.

2 The python interface to the ase-db

Let us search for entries containing 2 Ru atoms and 4 O atoms again. We know this should be the first entry from before. Note how much simpler this syntax is.

from ase.db import connect

# I want a sqlite extension, so we have to specify db as the type, which means sqlite
c = connect('vaspdb.sqlite', type='db')

for row in c.select('Ru=2,O=4'): print row
{'ctime': 14.236126003508412, 'energy': -44.251496, 'tags': array([0, 0, 0, 0, 0, 0], dtype=int32), 'positions': array([[ 1.3820537 ,  1.3820537 ,  0.        ],
       [ 3.14487964,  3.14487964,  0.        ],
       [ 3.64552037,  0.88141297,  1.54764608],
       [ 0.88141297,  3.64552037,  1.54764608],
       [ 0.        ,  0.        ,  0.        ],
       [ 2.26346667,  2.26346667,  1.54764608]]), 'calculator': u'vasp', 'calculator_parameters': {u'incar': {u'doc': u'INCAR parameters', u'prec': u'high', u'nbands': 43, u'sigma': 0.1, u'encut': 350.0}, u'doc': u'JSON representation of a VASP calculation.\n\nenergy is in eV\nforces are in eV/\\AA\nstress is in GPa (sxx, syy, szz,  syz, sxz, sxy)\nmagnetic moments are in Bohr-magneton\nThe density of states is reported with E_f at 0 eV.\nVolume is reported in \\AA^3\nCoordinates and cell parameters are reported in \\AA\n\nIf atom-projected dos are included they are in the form:\n{ados:{energy:data, {atom index: {orbital : dos}}}\n', u'potcar': [[u'Ru', u'potpaw_PBE/Ru/POTCAR', u'dee616f2a1e7a5430bb588f1710bfea3001d54ea'], [u'O', u'potpaw_PBE/O/POTCAR', u'9a0489b46120b0cad515d935f44b5fbe3a3b1dfa']], u'input': {u'kpts': array([ 6,  6, 10]), u'reciprocal': False, u'xc': u'PBE', u'kpts_nintersections': None, u'setups': {}, u'txt': u'-', u'gamma': True}, u'atoms': {u'cell': array([[ 4.52693334,  0.        ,  0.        ],
       [ 0.        ,  4.52693334,  0.        ],
       [ 0.        ,  0.        ,  3.09529216]]), u'symbols': [u'O', u'O', u'O', u'O', u'Ru', u'Ru'], u'tags': array([0, 0, 0, 0, 0, 0]), u'pbc': array([ True,  True,  True], dtype=bool), u'positions': array([[ 1.3820537 ,  1.3820537 ,  0.        ],
       [ 3.14487964,  3.14487964,  0.        ],
       [ 3.64552037,  0.88141297,  1.54764608],
       [ 0.88141297,  3.64552037,  1.54764608],
       [ 0.        ,  0.        ,  0.        ],
       [ 2.26346667,  2.26346667,  1.54764608]])}, u'data': {u'stress': array([ 0.08843132,  0.08843132,  0.06042693, -0.        , -0.        , -0.        ]), u'doc': u'Data from the output of the calculation', u'volume': 63.43221074143486, u'total_energy': -44.251496, u'forces': array([[-0.023609, -0.023609,  0.      ],
       [ 0.023609,  0.023609,  0.      ],
       [-0.023609,  0.023609,  0.      ],
       [ 0.023609, -0.023609,  0.      ],
       [ 0.      ,  0.      ,  0.      ],
       [ 0.      ,  0.      ,  0.      ]]), u'fermi_level': 5.0374}, u'metadata': {u'date.created': 1395241327.477995, u'uuid': u'7081ee4a-af77-11e3-a6e6-003048f5e49e', u'date.created.ascii': u'Wed Mar 19 11:02:07 2014', u'user.username': u'jkitchin', u'atoms.resort': array([2, 3, 4, 5, 0, 1]), u'user.email': u'jkitchin@andrew.cmu.edu', u'user.fullname': u'John Kitchin', u'O.potential.git_hash': u'9a0489b46120b0cad515d935f44b5fbe3a3b1dfa', u'atoms.tags': array([0, 0, 0, 0, 0, 0]), u'O.potential.path': u'potpaw_PBE/O/POTCAR', u'Ru.potential.path': u'potpaw_PBE/Ru/POTCAR', u'Ru.potential.git_hash': u'dee616f2a1e7a5430bb588f1710bfea3001d54ea'}}, 'cell': array([[ 4.52693334,  0.        ,  0.        ],
       [ 0.        ,  4.52693334,  0.        ],
       [ 0.        ,  0.        ,  3.09529216]]), 'numbers': array([ 8,  8,  8,  8, 44, 44], dtype=int32), 'forces': array([[-0.023609, -0.023609,  0.      ],
       [ 0.023609,  0.023609,  0.      ],
       [-0.023609,  0.023609,  0.      ],
       [ 0.023609, -0.023609,  0.      ],
       [ 0.      ,  0.      ,  0.      ],
       [ 0.      ,  0.      ,  0.      ]]), 'mtime': 14.236126003508412, 'pbc': array([ True,  True,  True], dtype=bool), 'id': 1, 'unique_id': u'3ed58bb16897177be0ed56400c90b6f4', 'user': u'jkitchin'}

3 Summary

It is not yet obvious what the advantage of the sqlite format over the json format is. One is that you can use SQL to create queries, which is probably more powerful than the ase-db format. It is a little mysterious how the ase-db searches the json format to me.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter