Building a MongoDB Database
=============================

This tutorial will guide you through the process of taking an HDF5 file and generating 
the MongoDB database for that file.

Learning Objectives
-------------------
- Learn how to generate a MongoDB database from an HDF5 file.
- Understand the format of the mongodb entries
- Understand how to add information about the molecules to the database.


Required Files
--------------
The files for this tutorial are located in examples/DatabaseGeneration and examples/DatabaseGeneration/inputs

- t8_with_smiles_suggested_charge.hdf5 : This is the HDF5 file that you will be using to generate the MongoDB database. It contains molecular data, including SMILES strings and other relevant information.

Tutorial
--------

To build the database, it works much like in the previous example :doc:`adding_smiles_to_hdf5_files`. 

First, we will import the library and start the MONGODB client.

.. literalinclude :: ../../../../examples/DatabaseGeneration/setup_database.py
    :language: python
    :start-at: from pymongo import MongoClient
    :end-at: client = MongoClient

Next, we need to ensure that the database is empty. This is done by dropping the database if it exists.

.. literalinclude :: ../../../../examples/DatabaseGeneration/setup_database.py
    :language: python
    :start-after: client = MongoClient
    :end-at: db.drop_collection

In this example, we are showing how to build the data base from a single HDF5 file; however, the tools that we
are demonstrating are applicable to generating a database from a collection of HDF5 files.

.. literalinclude :: ../../../../examples/DatabaseGeneration/setup_database.py
    :language: python
    :start-at: process_hdf5_folder
    :end-before: # This script processes

This command does a few things, the first thing it does is looks at the folder_path specified (in this case "./inputs").
It then processes each of the hdf5 files pulling the documents from them. It then assigns these a database name (test_db)
and a level_of_theory, which is the QM method used to produce the data. Lastly, the data_source key tells you where the data came
from, in this case the tautobase.

Now, we will access the collections and count the number of documents. To do this, we first load the database from the 
locally hosted client (which we just added in the previous step!) and then access the collection.

.. literalinclude :: ../../../../examples/DatabaseGeneration/setup_database.py
    :language: python
    :start-at: Access the client
    :end-before: Accessing Entries


Now, we want to look at an entry. We can do this by accessign the collection and pulling the first entry.

.. literalinclude :: ../../../../examples/DatabaseGeneration/setup_database.py
    :language: python
    :start-at: Accessing Entries
    :end-before: Adding Data Fields

Now - the database starts with relatively limited information (which you can see when you pull up an entry.)

Instead, we want to add some additional information to the database. This is done by adding a few fields to the entry.

.. literalinclude :: ../../../../examples/DatabaseGeneration/setup_database.py
    :language: python
    :start-at: Adding Data Fields
    :end-before: Alternatively,

And there you go! You've taken your first steps towards using the database software. 

You can do all of these steps quickly by using a recipe and kwargs.

For instance,


.. literalinclude :: ../../../../examples/DatabaseGeneration/setup_database.py
    :language: python
    :start-at: Alternatively,


Note - the outputs for this example are all printed to the terminal.

Full Code
---------

.. literalinclude :: ../../../../examples/DatabaseGeneration/setup_database.py
    :language: python