In this post we’ll take a look at deploying an AutoGluon AutoML model on Snowflake as part of a Python UDF that requires custom Python dependencies not available as part of the provided Anaconda Snowflake repository.
We’ll explain how to import a “wheel” file that has been downloaded from PyPi and use it as a dependency in a Python UDF on Snowflake.
Goals of Exercise
In this post we’ll show two main points:
how to load external python dependencies for Snowpark Python UDFs (pip packages, but are not part of Snowflake’s Anaconda channel)
how to get AutoGluon working with Snowpark Python UDFs
With that in mind, let’s dig into how to load external python dependenices and make them available to our Python UDF on Snowflake.
Part 1: Loading External Python Dependencies for Snowpark Python UDFs
The general process for planning and deploying external python dependencies for Python UDFs on Snowflake is:
determine which dependencies you need for the Python UDF on Snowflake
A WHL (Wheel) file is a distribution package file saved in Python’s wheel format. It is a standard format installation of Python distributions and contains all the files and metadata required for installation. The WHL file also contains information about the Python versions and platforms supported by this wheel file. Similar to an MSI setup file, WHL file format is a ready-to-install format that allows running the installation package without building the source distribution.
Once we have the .whl files locally, we can import these into Snowflake with the following steps:
create a stage to hold the .whl files on Snowflake
upload each .whl file to the stage you created (e.g., via a SQL PUT command, or using the Session object in the Snowpark Python API)
upload a python script (snowflake_whl_loader.py) to unzip and add dependency .whl files to the Snowflake instance’s path variable
write the Python UDF and reference the python script (snowflake_whl_loader.py) to allow your code to use the dependencies
The caveat is that this should work under the following conditions:
the python package is platform-independant
the python package doesn’t require OS-native libraries or specific CPU architectures
Let’s look at some details on the steps below.
Collect Dependency WHL Files
After you’ve determined which dependencies you need for your UDF, check to see if Snowflake already has the dependency in the built-in Anaconda repo:
If you don’t see your dependency in that list, then head over to https://pypi.org/ to download the .whl files for each dependency you need.
Note: you may end up having to track down several extra dependencies that aren’t obvious at first.
Use Snowpark Python API Session to Upload Files
Once you have the .whl files you need downloaded locally, now you need to upload the files to a stage (e.g., “@AUTOGLUON_PACKAGES”) on Snowflake. You can do this easily from a python jupyter notebook (with Snowpark python dependencies) with the Session.File class as seen in the example below:
Once we have the external dependency .whl files in a Snowflake stage, we can start building our Python UDF.
Upload Our WHL File Loader
We also need to upload our python script to install the WHL files from the Snowflake stage remotely. You can see an example of this below where we again use the Session.File class to upload the file to the stage.
The snowpark_whl_loader.py script uses the approach described in Snowflake’s documentation. In this design pattern the script extracts the contents of a zip file in a stage to the system path. Finally the directory that is extracted from the zip file is added to the system path. The code section below shows the core of our whl file loader:
self._lock = threading.Lock()
self._fd = open('/tmp/lockfile.LOCK', 'w+')
def __exit__(self, type, value, traceback):
IMPORT_DIRECTORY_NAME = "snowflake_import_directory"
import_dir = sys._xoptions[IMPORT_DIRECTORY_NAME]
path=import_dir + file_name
extracted='/tmp/' + file_name
if not os.path.isdir(extracted):
with zipfile.ZipFile(path, 'r') as myzip:
Later in this blog post we’ll show a few extra helper methods in the loader file that are specific to our AutoGluon loading goals.
Next, let’s install our UDF on Snowflake.
Install UDF with External Dependencies
Using the Snowpark Python API we can build and install a Python UDF from a Jupyter notebook.
The things we need to do to create and install this Python UDF on Snowflake is to import any files in stages that we may need in our Python UDF Code. In this case we need:
any WHL files we need as dependencies
any python scripts we need, such as the WHL file loading script
the list of packages our UDF code needs to use that are already in the Anaconda repository available inside Snowflake
As we can see in the code snippet above, we are adding both some WHL files and then the snowpark_whl_loader.py script we created with the session.add_import() method.
Once we have the stage files imported to our session, they are available locally to the snowflake instance python code. However, for the WHL files, we need to unpack them and add them to the system path.
Install Dependencies on Remote Snowflake Instance Manually
Once we have the depenency WHL files and the loader script imported into our UDF Python code, we can now install the dependencies with the loader script.
The code listing below shows all of the lines of our install script snowpark_whl_loader.py:
This will produce a local directory on your machine that will contain the serialized version of all of the sub-models that represent the results of the AutoML training process. We can use this directory to instantiate a copy of the model and produce inferences from the model on new data.