Building a Streamlit Application
To build a streamlit application we need to do a few things:
- install streamlit locally to develop against
- get the basic UI elements of our system working
- wire in our data elements via streamlit bindings for pandas dataframes
- integrate our pmml model
- deploy to the cloud
With that in mind, let's get into how to install streamlit.
Installing streamlit locally is simple, just use
pip install (or your favorite environment management tool):
pip install streamlit
Just like that, were ready to fire up the text editor and start writing some python code for streamlit.
Writing Streamlit Applications
Once we have our text editor or IDE up and rolling, we can import streamlit to get going:
import streamlit as st
Once we have our environment ready, we can take a quick look at the core concepts in streamlit.
Once you are comfortable with those concepts, take a look the the basic steps to build a streamlit application.
Streamlit Applications Automatically Update as Your Change Code
The application in the web browser will automatically update as you iteratively change the source code. Makes for a nice way to quickly prototype applications.
Building Our End-User Credit Card Fraud Application
We want to build an application that allows an end-user to use the model we created in this series. This application should be point and click and not require the end-user to have any knowledge about data science, pandas, nor machine learning workflows.
Profiling the End-User
Our model is for fraud detection in credit card transacitons and the end-user is an analyst at the bank who is not part of IT, but uses software provided by IT. They will access this application from their web browser.
Each day the analyst needs to find the top 20-30 cases per day of fraud to analyze and potentially promote for further analysis.
They need a way to quickly see those top cases per day and then drill down into a specific customer account to see if they can manually discover any other patterns that may help them with their job.
Defining Key Information to Show
We need to provide the analyst with:
- A list of the highest scored fraudulent transactions for a given date
- A list of all transactions for a given customer
- The fraud probability for a given transaction
We want to present the above information in a way that allows a user to visually make association and easily see patterns (across time and space) in the data.
Mapping Data to the UI
One of the core concepts in data science applications is working with Pandas dataframes. Our applications many times will need to display a dataframe, and this is easy with streamlit as we can see below.
data = pd.read_csv( "/path/file.csv" )
In the example above we're passing in a dataframe and it's rendering as an interactive table. This is a great example of how streamlit just takes care of things for us such as writing a dataframe to a an interactive html table.
In data science python applications we often use dataframes as the input to machine learning models or as the output for machine learning models. Being able to easily display dataframes is a huge help.
The other data component of our application is our PMML model. Later on in this article we'll see how to load PMML models and display the output to the user interface with streamlit.
For now, let's dig into the major panels of information we want to provide to our end-user analyst.
Panel 1: "Fraud Case Search"
One of our stated goals is to get a list of transactions that are scored to be most likely fraudulent for a specific date. Most of the time we'd want to see "today", but some analysts might want to go back and take a look at previous dates as well.
Given that each transaction has an associated fraud rating, we also want to be able to quickly filter down the list of transaction records. To do this, we'll use a slider component, as we see below.
Snazzy looking user interface, huh? This actually was not a lot (under 25 lines) of code in streamlit, as we can see below:
current_date_select = st.date_input("Date for Fraud", value=current_date)
st.subheader("Model Threshold Slider")
fraud_threshold = st.select_slider('Select Threshold for Fraud', value=0.9, options=[1.0, 0.975, 0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1])
st.header("Top Scored Fraudulent Transactions")
cols_to_show_top20 = ["CUSTOMER_ID", "TRANSACTION_ID", "TERMINAL_ID", "TX_DATETIME", "TX_AMOUNT", "FRAUD_SCORE"]
df_rows_found = df_transactions[ (df_transactions["TX_DATETIME"].dt.date == current_date_select) & (df_transactions["FRAUD_SCORE"] >= fraud_threshold) ][cols_to_show_top20].sort_values(by=['FRAUD_SCORE'], ascending=False)
rows_found = len(df_rows_found.index)
st.write('Transactions Found:', rows_found)
We'll come back to seeing the full application in action in a moment, for now let's move on and take a look at how to build the "customer analysis" panel.
Panel 2: "Customer Analysis"
Once we know what transactions have been flagged for a given date, we may want to dig into the customers who have fraud or the locations of the terminals where the fraud was committed.
In the image below we can see the user interface panel that shows the list of all transactions for a given customer. The user interface also includes an interactive map component showing where each transaction occured, along with a component that gives a visual timeline for each customer's transactions.
This panel gives the analyst to lookup any specific customer to get a better sense of how and where fraud was being committed in a time and space sense.
Finally, we have one last component on this apge where we can lookup a specific transaction and examine its fraud score as calculated by the pmml model, but also we can look at every attribute for the given record.
The streamlit python code for the customer analysis panel is similar to the previous panel, but it is longer due to the folium mapping code logic. For the sake of page space, we're not going to list all of it here but you can check it out on github.
Let's now take a look at the code to see how we integrated the PMML model from part 4 of this series.