Blog Index / Technical Blog
Platform Engineering, Data Engineering, and Generative AI
Technical examples for data platforms, data pipelines, and large language model applications.
2025
Building a Standard Claims Data Model With the Cube Semantic Layer and Databricks
Creating standard claims data models for claims knowledge work in property insurance companies.
Building a Property Insurance Claims Data Lakehouse with Airflow and Databricks
In this article we build a data lakehouse for property insurance claims management. This allows us to build out a complete claims analysis platform complete with standardized data modeling.
Capturing Email Attachments with Apache Airflow
Automate claims TPA bordereaux email attachment capture with Apache Airflow.
Selecting a Data Storage Strategy for Your Data Platform
The choice of data storage architecture directly impacts your organization's ability to efficiently query, manage, and scale large datasets on your data platform. This article takes you through how to make an informed decision about selecting a data storage strategy for your data platform.
Platform Architecture for Ingesting Bordereaux Data in Property Insurance
A general architecture for property insurance claims data integration and information architecture.
A Python Script to Generate Bordereaux Data
A free python script that generates realistic TPA bordereaux claims data in multiple formats.
First Notice of Loss Analysis with Power BI in Property Insurance
This guide walks you through how to use Power BI with incoming bordereaux claims data, similar to what insurers might receive from a broker or TPA to uncover trends, reduce manual effort, and drive smarter decisions across your insurance operations.
2024
- January 25th 2024 - Building Natural Language User Interfaces over Analytics Applications
- January 19th 2024 - The Evolution of Retrieval Augmented Generation
2023
DBT Hello World [Series]
- March 23rd, 2023 - Part 2: Building Our First DBT Pipeline for Patient History Data Summarization
- March 21st, 2023 - Part 1: Setting up a Local Data Stack with DBT and DuckDB
Assorted Articles
- July 25th, 2023 - Using Local LLM Models and LangChain to Evaluate Reasoning Ability of LLMs
- February 15th, 2023 - Snowflake Python UDF AutoML Predictions with Amazon AutoGluon
2022
Credit Card Fraud Detection [Series]
- June 17th, 2022 - Part 5: Deploying the Model as a Streamlit Application
- July 10th, 2022 - Part 4: Grid Search Model Selection with Amazon SageMaker Studio Lab and Shap Hypertune
- July 3rd, 2022 - Part 3: Connecting Amazon SageMaker Studio Lab to Snowflake
- June 28th, 2022 - Part 2: Scalable Feature Engineering with Snowpark
- June 16th, 2022 - Part 1: Loading the Credit Card Transaction Data Into Snowflake
Applied Predictive Maintenance [Series]
- May 24th, 2022 - Part 6: Going to Production with Snowpark
- Apr 15th, 2022 - Part 5: Analyzing the Results
- Mar 9th, 2022 - Part 4: Machine Learning Workflows with Sci-Kit Learn to Build Predictive Maintenance Models
- Feb 22nd, 2022 - Part 3: Exploratory Data Analysis
- Jan 26th, 2022 - Part 2: Sensor Data Ingest, Storage, and Analysis with Snowflake
- Jan 12th, 2022 - Part 1: Making the Business Case for Predictive Maintenance
2021
- Jan 28th, 2021 - An Introduction to Pandas: Coffee Analysis
- Feb 24th, 2021 - Forecasting Your AWS GPU Cloud Spend for Deep Learning
- Mar 3rd, 2021 - Ray: Distributed Python for Data Science and Other Applications
- Apr 20th, 2021 - An Introduction to NVIDIA's Multi-Instance GPUs Under Kubernetes
- May 25th, 2021 - A Cloud GPU Value Model for NVIDIA Multi-Instance GPUs (MIG)
- May 27th, 2021 - Applied Machine Learning Quarterly (Q2 2021)
- Sept 25th, 2021 - GANs for Unsupervised Anomaly Detection in Manufacturing
2020
- Feb 10th, 2020 - 2020 Trends: Data Markets and the Value of Data
- Apr 6th, 2020 - Analysis of April 2020 Covid-19 Impact vs Hospital Resources (v1.3) for Hamilton County, TN
- Apr 22nd, 2020 - Analyzing Covid-19 Shelter-In-Place Effects for Tennessee and Hamilton County
- Apr 27th, 2020 - An Adaptive Economic Model for Pandemics
- May 9th, 2020 - Applied Machine Learning Quarterly (May / Q2 2020) - Tools and Infrastructure
- July 22nd, 2020 - Deploying a HuggingFace NLP Model with KFServing
2019
- Jan 9th, 2019 - 2019 Trends: Kubernetes and Container Orchestration
- Jan 10th, 2019 - Tutorial: Setting Up Kafka on Amazon Web Services
- Jan 14th, 2019 - 2019 Trends: 2019 Trends - Kubeflow and Machine Learning Infrastructure
- Jan 21st, 2019 - Tutorial: Effects of Batch Size, Acknowledgments, and Compression on Kafka Throughput
- Feb 25th, 2019 - Tutorial: Setting Up Kubeflow 3.5 on Google Kubernetes Engine
- Feb 27th, 2019 - 2019 Trends: Interview on the Machine Learning and Artificial Intelligence Market
- Mar 5th, 2019 - Running Distributed TensorFlow Jobs on Kubeflow 3.5
- April 19th, 2019 - Using The TensorFlow Estimator Design Pattern
- May 20th, 2019 - A Practical Guide for Data Scientists Using GPUs with TensorFlow
- August 27th, 2019 - Announcing The CUIP 2019 Smart City Data Challenge
Building the Next-Generation Retail Experience with Computer Vision and Apache Kafka [Series]
- Oct 23rd, 2019 - Part 1 of 4: "Big Cloud Dealz and @BigCloudRon"
- Oct 31st, 2019 - Part 2 of 4: "Prototyping Shopping Cart 2.0 with Computer Vision"
- Nov 5th, 2019 - Part 3 of 4: "Running Kafka Connect to Ingest the Inventory"
- Jan 7th, 2020 - Part 4 of 4: "KStreams, KTables, and the Next Generation Shopping Cart"
2018
- May 25th, 2018 - Rail, Aquariums, and Data: Part 1
- May 25th, 2018 - Rail, Aquariums, and Data: Part 2
- May 25th, 2018 - Rail, Aquariums, and Data: Part 3
- May 25th, 2018 - Rail, Aquariums, and Data: Part 4
- May 25th, 2018 - Rail, Aquariums, and Data: Part 5
- July 16th, 2018 - Notes on Using Apache Avro with Apache Kafka
- July 26th, 2018 - Real-Time Sensor Data Processing with the Kafka Streaming API
Need help with data platform architecture?
Reach out to start a discussion with our team about your data platform needs.
Talk to our team