The following Julia packages will be needed. When using a Pluto notebook, it’s built-in package manager will automatically download and install them for you.
using ODBC, DBInterface, DataFrames
Open Database Connectivity (ODBC) drivers allow us to make connections to the SQL server. Using the ODBC.jl package, we can check the currently available drivers on our system:
It is also possible to install a driver once it’s location is known.
To remove a driver, use:
Using a full connection string, we can now connect to the locally running SQL server, which was set up previously. The IP address, port, existing database name, user ID and password are needed. Note that in case the database name is unknown, we can connect to ‘master’ as this name always exists by default.
Using the conn_master object, we can now execute queries on the server. Let’s list all the databases.
In order to create a new database, we should first check if the name already exists using the list_db function. If not, then we create it as shown below with ‘FruitsDB’ as an example.
Listing all the databases again, we can verify that ‘FruitsDB’ has now been created.
SQL Server databases can contain a number of tables, which are simply an ordered collection of data. A table itself is a collection of rows, also known as records. Before we can start populating a table, we first need to create it within an existing database. As an example, let’s create a table called ‘Price_and_Origin’ within ‘FruitsDB’. This table will contain three columns — Name (String), Price (Float) and Origin (String). Note that VARCHAR(50) is used to denote variable-size string data. 50 is the size in bytes, and for single-byte encoding it also represents the length of the string.
Once a table exists, we can add data to it. Easiest way is to use DataFrame as the source. Remember that our table ‘Price_and_Origin’ expects three columns with name, price and origin. Therefore, we can use some dummy data as shown below:
To insert values, we can make use of DBInterface.executemany function, which allows passing multiple values in sequence. This can be done as shown in the function below. The finally clause ensures that database connection is closed using the DBInterface.close! function. This is generally a good practice, which helps avoid accidentally reusing the same connection for something else.
Let’s verify if the database got populated as we had expected. We first set up a connection ‘conn_fruit’ to connect to ‘FruitsDB’ on the SQL Server. Then we can select all entries from the table ‘Price_and_Origin’ and pass it to a DataFrame sink.
Following the same sequence as shown in the previous section, the database can now be updated with new data.
Let’s verify if the new data is indeed present within the database.
Re-executing the add_to_fruit_table function above again would add duplicate rows to the table.
Using a common table expression (CTE), we can delete duplicate rows from a given table. The following function helps us achieve this:
Check if the rows are unique.
It is often needed to remove entries (matching a certain condition) from the table within a database. For example, we can remove all fruits whose price is > 95 as shown below:
Using the DROP statement within DBInterface.execute function, a table can be deleted. Rest of the function will remain the same as delete_rows.
"DROP TABLE $table_name")