Building a maintainable and modular LLM application stack with Hamilton in 13 minutes | by Stefan Krawczyk | Jul, 2023


Here’s what the LLM Application dataflow would look like when using pinecone with sentence transformers. With Hamilton to understand how things connect is just as simple as display_all_functions() call on the Hamilton driver object.

Let’s explain the two main ways to implement modular code with Hamilton using our example for context.

Hamilton’s focus is on readability. Without explaining what @config.when does, you can probably tell that this is a conditional statement, and only included when the predicate is satisfied. Below you will find the implementation for converting text to embeddings with the OpenAI and the Cohere API.

Hamilton will recognize two functions as alternative implementations because of the @config.when decorator and the same function name embeddings preceding the double underscore (__cohere, __openai). Their function signatures need not be entirely the same, which means it’s easy and clear to adopt different implementations.

embedding_module.py

For this project, it made sense to have all embedding services implemented in the same file with the @config.when decorator since there are only 3 functions per service. However, as the project grows in complexity, functions could be moved to separate modules too, and the next section’s modularity pattern employed instead. Another point to note is that each of these functions is independently unit-testable. Should you have specific needs, it’s straightforward to encapsulate it in the function and test it.

Below you will find the implementation of vector database operations for Pinecone and Weaviate. Note that the snippets are from pinecone_module.py and weaviate_module.py and notice how function signatures resemble and differ.

pinecone_module.py and weaviate_module.py

With Hamilton, the dataflow is stitched together using function names and function input arguments. Therefore by sharing function names for similar operations, the two modules are easily interchangeable. Since the LanceDB, Pinecone, and Weaviate implementations reside in separate modules, it reduces the number of dependencies per file and makes them shorter, improving both readability and maintainability. The logic for each implementation is clearly encapsulated in these named functions, so unit testing is straightforward to implement for each respective module. The separate modules reinforce the idea that they shouldn’t be loaded simultaneously. The Hamilton driver will actually throw an error when multiple functions with the same name are found that helps enforce this concept.

The key part for running Hamilton code is the Driver object found in run.py. Excluding the code for the CLI and some argument parsing, we get:

Snippet of run.py

The Hamilton Driver, which orchestrates execution and is what you manipulate your dataflow through, allows modularity through three mechanisms as seen in the above code snippet:

  1. Driver configuration. this is a dictionary the driver receives at instantiation containing information that should remain constant, such as which API to use, or the embedding service API key. This integrates well with a command plane that can pass JSON or strings (e.g., a Docker container, Airflow, Metaflow, etc.). Concretely this is where we’d specify swapping out what embedding API to use.
  2. Driver modules. the driver can receive an arbitrary number of independent Python modules to build the dataflow from. Here, the vector_db_module can be swapped in for the desired vector database implementation we’re connecting to. One can also import modules dynamically through importlib, which can be useful for development vs production contexts, and also enable a configuration driven way to changing the dataflow implementation.
  3. Driver execution. The final_vars parameter determines what output should be returned. You do not need to restructure your code to change what output you want to get. Let’s take the case of wanting to debug something within our dataflow, it is possible to request the output of any function by adding its name to final_vars. For example, if you have some intermediate output to debug, it’s easy to request it, or stop execution at that spot entirely. Note, the driver can receive inputs and overrides values when calling execute(); in the code above, the class_name is an execution time input that indicates the embedding object we want to create and where to store it in our vector database.

In Hamilton, the key to enable swappable components is to:

  1. define functions with effectively the same name and then,
  2. annotate them with @config.when and choose which one to use via configuration passed to the driver, or,
  3. put them in separate python modules and pass in the desired module to the driver.

So we’ve just shown how you can plugin, swap, and call various LLM components with Hamilton. We didn’t need to explain what an object oriented hierarchy is, nor require you to have extensive software engineering experience to follow (we hope!). To accomplish this, we just had to match function names, and their output types. We think this way of writing and modularizing code is therefore more accessible than current LLM frameworks permit.

To add to our claims, here a few practical implications of writing Hamilton code for LLM workflows that we’ve observed:

This ability to swap out modules/@config.when also means that integration testing in a CI system is straightforward to think about, since you have the flexibility and freedom to swap/isolate parts of the dataflow as desired.

  1. The modularity Hamilton enables can allow one to mirror cross team boundaries easily. The function names & their output types become a contract, which ensures one can make surgical changes and be confident in the change, as well as have the visibility into downstream dependencies with Hamilton’s visualization and lineage features (like the initial visualization we saw). For example, it’s clear how to interact and consume from the vector database.
  2. Code changes are simpler to review, because the flow is defined by declarative functions. The changes are self-contained; because there is no object oriented hierarchy to learn, just a function to modify. Anything “custom” is de facto supported by Hamilton.

When there is an error with Hamilton, it’s clear as to what the code it maps to is, and because of how the function is defined, one knows where to place it within the dataflow.

Take the simple example of the embeddings function using cohere. If there was a time out, or error in parsing the response it would be clear that it maps to this code, and from the function definition you’d know where in the flow it fits.

@config.when(embedding_service="cohere")
def embeddings__cohere(
embedding_provider: cohere.Client,
text_contents: list[str],
model_name: str = "embed-english-light-v2.0",
) -> list[np.ndarray]:
"""Convert text to vector representations (embeddings) using Cohere Embed API
reference: https://docs.cohere.com/reference/embed
"""
response = embedding_provider.embed(
texts=text_contents,
model=model_name,
truncate="END",
)
return [np.asarray(embedding) for embedding in response.embeddings]



Source link

Leave a Comment