Working with MS SQL Server in Julia | by Vikas Negi | Jul, 2023


The following Julia packages will be needed. When using a Pluto notebook, it’s built-in package manager will automatically download and install them for you.

using ODBC, DBInterface, DataFrames

Open Database Connectivity (ODBC) drivers allow us to make connections to the SQL server. Using the ODBC.jl package, we can check the currently available drivers on our system:

Image by author

It is also possible to install a driver once it’s location is known.

Image by author

To remove a driver, use:

Image by author

Using a full connection string, we can now connect to the locally running SQL server, which was set up previously. The IP address, port, existing database name, user ID and password are needed. Note that in case the database name is unknown, we can connect to ‘master’ as this name always exists by default.

Image by author

Using the conn_master object, we can now execute queries on the server. Let’s list all the databases.

Image by author

In order to create a new database, we should first check if the name already exists using the list_db function. If not, then we create it as shown below with ‘FruitsDB’ as an example.

Image by author

Listing all the databases again, we can verify that ‘FruitsDB’ has now been created.

Image by author

SQL Server databases can contain a number of tables, which are simply an ordered collection of data. A table itself is a collection of rows, also known as records. Before we can start populating a table, we first need to create it within an existing database. As an example, let’s create a table called ‘Price_and_Origin’ within ‘FruitsDB’. This table will contain three columns — Name (String), Price (Float) and Origin (String). Note that VARCHAR(50) is used to denote variable-size string data. 50 is the size in bytes, and for single-byte encoding it also represents the length of the string.

Once a table exists, we can add data to it. Easiest way is to use DataFrame as the source. Remember that our table ‘Price_and_Origin’ expects three columns with name, price and origin. Therefore, we can use some dummy data as shown below:

Image by author

To insert values, we can make use of DBInterface.executemany function, which allows passing multiple values in sequence. This can be done as shown in the function below. The finally clause ensures that database connection is closed using the DBInterface.close! function. This is generally a good practice, which helps avoid accidentally reusing the same connection for something else.

Image by author

Let’s verify if the database got populated as we had expected. We first set up a connection ‘conn_fruit’ to connect to ‘FruitsDB’ on the SQL Server. Then we can select all entries from the table ‘Price_and_Origin’ and pass it to a DataFrame sink.

Image by author

Following the same sequence as shown in the previous section, the database can now be updated with new data.

Adding new fruits (Image by author)

Let’s verify if the new data is indeed present within the database.

Note that the number of rows is now 7 (Image by author)

Re-executing the add_to_fruit_table function above again would add duplicate rows to the table.

“Lichi” and “Pear” appear twice (Image from author)

Using a common table expression (CTE), we can delete duplicate rows from a given table. The following function helps us achieve this:

Remove duplicate entries (Image by author)

Check if the rows are unique.

Duplicate entries have been removed (Image by author)

It is often needed to remove entries (matching a certain condition) from the table within a database. For example, we can remove all fruits whose price is > 95 as shown below:

Fruits with price > 95 have been removed (Image by author)

Using the DROP statement within DBInterface.execute function, a table can be deleted. Rest of the function will remain the same as delete_rows.

DBInterface.execute(conn_db, 
"DROP TABLE $table_name")



Source link

Leave a Comment