AI
Snowflake

17.03.2025

AI agents & Snowflake: LangChain and Snowflake Cortex

Mika Heino explores how LangChain works with Snowflake and connects it into Snowflake Cortex.

Written by — Mika Heino, Data Architect

Use case

Explore how LangChain works with Snowflake and connect it into Snowflake Cortex. In addition to this, create a workflow which uses previously created connection to the Perplexity API. User should get an answer from Perplexity if the answer provided by the chosen Cortex model does not satisfy judge LLM (LLM-as-a-judge concept).

Workflows and agents

Before diving into Langchain, we need to understand that this use case example is not an agent solution even though the title might indicate so. I quote a blog from LangGraph which quotes Anthropic on this.

“Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.”

Our current use case falls into the category of a workflow. Later, for agents we shall investigate LangGraphs, a product by LangChain. LangGraphs offer more advanced orchestration capabilities as they are designed for scenarios requiring complex, stateful, and multi-agent workflows whereas LangChain provides a standard interface for integrating language models and components into applications as you later see in the example.

What is LangChain

As said earlier, LangChain is a software framework that helps facilitate the integration of large language models (LLMs) into applications. Basically it’s a framework that helps in creating tasks like prompt chaining, logging, call backs, persistent memory, and efficient connections to multiple data sources. All these come standard out of the box with LangChain. In the bigger picture LangChain is a Workflow Orchestration component, whereas Perplexity API is a tool in our context (even though it can be itself categorized as an AI Agent).

LangChain is just one of several frameworks, but was chosen for this use case being code-first (Python and JavaScript) and being one most popular framework. A need for a code-first solution arose because I wanted the code run inside Snowflake Container Services as a runnable product and link it to other Snowflake Cortex services (Cortex Search and Cortex Guard to name a few). Ability to leverage LLMs and Deep Research tools was also a deciding factor.

A list of alternatives for LangChain can be found on the internet which already states that LangChain might not be an ideal solution for independent AI agents. Another investigation-worthly framework would be Llamaindex, but it currently supports Snowflake through CustomLLM.

LangChain support for Snowflake

Previous versions of LangChain used CustomLLM wrapper approach for Snowflake Cortex, but current version of LangChain has better, “official” support for Snowflake Cortex provided by Prathamesh Nimkar (from Snowflake) released at December 2024. This version supports all Snowflake Cortex functions (i.e. including classify_text, sentiment, summarize..) but focuses on complete which is the instruction following function that gives you an answer.

LangChain also supports fetching data from Snowflake through document loader functionality.

LIMITATIONS

As of the end of February 2025, the Runnable implementation, ChatSnowflakeCortex, does not support the following Snowflake Cortex Complete options which would help in workflow creation.

RESPONSE FORMAT

Response format option in Snowflake Cortex complete enables you to define a structured JSON output format for your question. These structured outputs would be ideal for example creating a workflow that is used to gather necessary spare parts with OEM codes. For example the SQL below defines that we want part name and the necessary universal OEM code.

SELECT SNOWFLAKE.CORTEX.COMPLETE(
'claude-3-5-sonnet',
[
{
'role': 'user',
'content': 'Return the necessary Volkswagen Transporter T2/T25 spare parts for fixing a broken clutch. Return the parts with official Volkswagen OEM codes.'
}
],
{
'temperature': 0,
'max_tokens': 1000,
'response_format': {
'type': 'json',
'schema': {
'type': 'object',
'properties': {
'spare_parts': {
'type': 'array',
'items': {
'type': 'object',
'properties': {
'part_name': {'type': 'string'},
'oem_code': {'type': 'string'}
},
'required': ['part_name', 'oem_code']
}
}
},
'required': ['spare_parts']
}
}
}
);

Results into the list of spare parts with OEM codes.

{
"created": 1741194512,
"model": "claude-3-5-sonnet",
"structured_output": [
{
"raw_message": {
"spare_parts": [
{
"oem_code": "025141031",
"part_name": "Clutch Disc"
},
{
"oem_code": "025141025E",
"part_name": "Pressure Plate"
},
{
"oem_code": "020141165A",
"part_name": "Release Bearing"
},
{
"oem_code": "251721335A",
"part_name": "Clutch Cable"
},
{
"oem_code": "020311123",
"part_name": "Pilot Bearing"
}
]
},
"type": "json"
}
],
"usage": {
"completion_tokens": 34,
"prompt_tokens": 461,
"total_tokens": 495
}
}

I checked the results (image 1) and these are actual correct codes and these parts are needed to fix a broken clutch (clutch cable is not necessary).

Image 1.

GUARDRAILS

As of the end of February 2025, the Runnable implementation, ChatSnowflakeCortex, does not support the following Snowflake Cortex Complete options which would help in workflow creation.

This means by enabling guardrails as true.

select SNOWFLAKE.CORTEX.COMPLETE('mistral-7b', [{'role': 'user', 'content': 'Question that leads to harmful content'}], {'guardrails': true});

The answer will be filtered automatically.

{
"choices": [
{
"messages": "Response filtered by Cortex Guard"
}
],
"created": 1721744057,
"model": "mistral-7b",
"usage": {
"completion_tokens": 503,
"guardrails_tokens": 651,
"prompt_tokens": 86,
"total_tokens": 1240
}
}

Authentication

In addition to the options above, the community version of Snowflake Cortex connector, at this state, does not support key-pair authentication. My Python development setup uses key-pair authentication for connections to Snowflake which meant that I had to obey the LangChain password based authentication method of connecting, which is far from ideal. Snowflake offers legacy service for connecting applications that cannot easily use more secure authentication methods. More about this options on Augusto Kiniama Rosa’s blog about Snowflake Security

Challenges

Snowflake Cortex offers a variety of models to choose from. All these are hosted inside Snowflake and using them requires passing messages into a database which means that your user and system messages are sent as SQL. This raised quite a few challenges at the start with the SQL sanitization as apostrophes cut the messages too early, causing SQL errors. The end result though is pretty clean where the message with roles are sent as JSON (example 1).

select snowflake.cortex.complete(
'mistral-large',
parse_json('[{"content": "Analyze if the answer contains uncertainty or lacks specifics. Respond ONLY with CERTAIN or UNCERTAIN.", "role": "system"}, {"content": "Question: What was the original air-cooled engine size of the 1965 Volkswagen Beetle?\\nAnswer: The original air-cooled engine of the 1965 Volkswagen Beetle was a 1.2-liter (1192 cc) flat-four engine. This engine produced around 40 horsepower", "role": "user"}]'),
parse_json('{"temperature": 0.0, "top_p": 0.95, "max_tokens": 200}')
) as llm_response;

LLM-as-a-judge approach

For testing LangChain and Snowflake I created a simple LLM-as-a-judge workflow. First LLM answers and the second one validates it and starts the Perplexity API in case it’s not satisfied about the answer. The reason for this tool usage is that the foundation models within Snowflake Cortex are hosted by Snowflake and do not access the public internet and thus the internet for latest information.

judge_as_llm_alternative

This version of LLM-as-a-judge is over simplified. More about the whole topic can be found through this Github -repo which gathers whitepapers and studies around the concept.

Putting it all together

The code for this is rather short. Models are defined by ChatSnowflakeCortex and called in separate methods (create expert & judge messages which use langchain core message types). Adding the Perplextitly call would be an addition to the display results else branch, but I have not added that in here as later we will be exploring tool calling.

import os
import argparse
import re
from typing import List
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_community.chat_models.snowflake import ChatSnowflakeCortex

# Environment Setup
os.environ["SNOWFLAKE_ACCOUNT"] = ""*****"
os.environ["SNOWFLAKE_USERNAME"] = ""****"
os.environ["SNOWFLAKE_PASSWORD"] = ""*****"
os.environ["SNOWFLAKE_DATABASE"] = ""*****"
os.environ["SNOWFLAKE_SCHEMA"] = ""*****"
os.environ["SNOWFLAKE_ROLE"] = ""*****"
os.environ["SNOWFLAKE_WAREHOUSE"] = "*****"

def sanitize_sql_input(text: str) -> str:
"""Sanitize text for SQL safety with enhanced escaping"""
sanitized = re.sub(r"[';\\\"\n\r\t]", " ", text)
return re.sub(r"\s+", " ", sanitized).strip()[:500]

def create_expert_messages(question: str) -> List[SystemMessage | HumanMessage]:
return [
SystemMessage(content=(
"You are a Volkswagen Beetle expert with deep technical knowledge. "
"Provide detailed answers. If uncertain, say: "
"I dont have sufficient information to answer that question."
)),
HumanMessage(content=f"Question: {question}")
]

def create_judge_messages(question: str, answer: str) -> List[SystemMessage | HumanMessage]:
return [
SystemMessage(content=(
"Analyze if the answer contains uncertainty or lacks specifics. "
"Respond ONLY with CERTAIN or UNCERTAIN."
)),
HumanMessage(content=f"Question: {question}\\nAnswer: {answer}")
]

def main():
parser = argparse.ArgumentParser(description="VW Expert AI Agent with Verification")
parser.add_argument("question", type=str, help="Your question about Volkswagen Beetle")
args = parser.parse_args()

# Sanitize input
clean_question = sanitize_sql_input(args.question)
print(f"\n[LOG] Sanitized input: {clean_question}\n")

# Initialize models
expert_llm = ChatSnowflakeCortex(model="snowflake-arctic",
cortex_function="complete",
temperature=0.1,
max_tokens=50,
top_p=0.95)
judge_llm = ChatSnowflakeCortex(model="claude-3-5-sonnet",
cortex_function="complete",
temperature=0.0,
max_tokens=200,
top_p=0.95)

try:
# Generate expert response
expert_messages = create_expert_messages(clean_question)
print(f"[LOG] Expert messages:\n{expert_messages}\n")
expert_response = expert_llm.invoke(expert_messages)
expert_answer = sanitize_sql_input(expert_response.content)
print(f"[LOG] Expert raw answer: {expert_answer}\n")

# Generate judge verification
judge_messages = create_judge_messages(clean_question, expert_answer)
print(f"[LOG] Judge messages:\n{judge_messages}\n")
judge_response = judge_llm.invoke(judge_messages)
verdict = judge_response.content.strip().lower()
print(f"[LOG] Judge verdict: {verdict}\n")

# Display results
if "uncertain" in verdict:
print("\n[RESULT] Uncertain Answer:")
print(expert_answer)
else:
print("\n[RESULT] Verified Certain Answer:")
print(expert_answer)

except Exception as e:
print(f"\n[ERROR] Processing failed: {str(e)}")
raise

if __name__ == "__main__":
main()

Testing

certain ANSWER

Getting a verdict answer is easy, as we ask something that has been known for a long time. In this use case we tried to ask the original VW Beetle's engine size. For this use case the expert model was llama and latest addition in the Snowflake Cortex library, Claude 3.5 Sonnet acted as the judge.

As you can see from the output, an expert message is passed for judgement and verified as certain.

(agent) mheino@RWin-PF4J1VD3:~/aiagent$ python vw.py "What was the original air-cooled engine size of the 1965 Volkswagen Beetle?"

[LOG] Expert messages:

[SystemMessage(content='You are a Volkswagen Beetle expert with deep technical knowledge. Provide detailed answers. If uncertain, say: I dont have sufficient information to answer that question.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Question: What was the original air-cooled engine size of the 1965 Volkswagen Beetle?', additional_kwargs={}, response_metadata={})]

[LOG] Judge messages:

[SystemMessage(content='Analyze if the answer contains uncertainty or lacks specifics. Respond ONLY with CERTAIN or UNCERTAIN.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Question: What was the original air-cooled engine size of the 1965 Volkswagen Beetle?\\nAnswer: Answer: The original air-cooled engine size of the 1965 Volkswagen Beetle was 1.2-liter (1192cc) with a 4-cylinder, horizontally opposed', additional_kwargs={}, response_metadata={})]

[LOG] Judge verdict: certain

[RESULT] Verified Certain Answer:

Answer: The original air-cooled engine size of the 1965 Volkswagen Beetle was 1.2-liter (1192cc) with a 4-cylinder, horizontally opposed

uncertain ANSWER

Getting an uncertain answer is not as easy as getting a certain answer. For this purpose the expert model was changed to snowflake-arctic and the question itself was changed to false (the 1999 WV Beetle was not air-cooled). Snowflake-arctic model excels in SQL and coding – not in general history questions and as expected, it fails to answer correctly which is picked up by the judge LLM.

(agent) mheino@RWin-PF4J1VD3:~/aiagent$ python vw.py "What was the original air-cooled engine size of the 1999 Volkswagen Beetle?"

[LOG] Sanitized input: What was the original air-cooled engine size of the 1999 Volkswagen Beetle?

[LOG] Expert messages:
[SystemMessage(content='You are a Volkswagen Beetle expert with deep technical knowledge. Provide detailed answers. If uncertain, say: I dont have sufficient information to answer that question.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Question: What was the original air-cooled engine size of the 1999 Volkswagen Beetle?', additional_kwargs={}, response_metadata={})]

[LOG] Expert raw answer: Answer: The original air-cooled engine size of the 1999 Volkswagen Beetle was 2.0L.

[LOG] Judge messages:
[SystemMessage(content='Analyze if the answer contains uncertainty or lacks specifics. Respond ONLY with CERTAIN or UNCERTAIN.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Question: What was the original air-cooled engine size of the 1999 Volkswagen Beetle?\\nAnswer: Answer: The original air-cooled engine size of the 1999 Volkswagen Beetle was 2.0L.', additional_kwargs={}, response_metadata={})]

[LOG] Judge verdict: uncertain

the 1999 vw beetle was not air-cooled - it had a water-cooled engine. the original air-cooled beetles were produced until 1979. the 1999 model was part of the new beetle generation which used modern water-cooled engines. while the 2.0l engine size mentioned was available in the 1999 new beetle, the premise about it being air-cooled is incorrect, making this answer uncertain.

[RESULT] Uncertain Answer:
Answer: The original air-cooled engine size of the 1999 Volkswagen Beetle was 2.0L.

CONCLUSION

As you can see, LangChain in our use case brings a framework with message types. As this example is rather small, we are exploring only a subset of the LangChain possibilities and in our use case LangChain is actually limiting our Snowflake Cortex usage. In the bigger picture Snowflake Cortex complete is only a way to use base models and thus it’s expected that support is what it is. Utilizing for example Snowflake Cortex Search service is out of the picture, which means creating an autonomous AI Agent and utilizing Langchain fully might be impossible at this time.

Written by — Mika Heino, Data Architect

Follow us

AI agents & Snowflake: LangChain and Snowflake Cortex

Use case

Workflows and agents

What is LangChain

LangChain support for Snowflake

LIMITATIONS

RESPONSE FORMAT

GUARDRAILS

Authentication

Challenges

LLM-as-a-judge approach

Putting it all together

Testing

certain ANSWER

uncertain ANSWER

CONCLUSION

Latest from the Blog

Creepy AI cases with AI CTO Rasmus Toivanen

Debunking AI myths with AI CTO Rasmus Toivanen

Leading with data: From access to action

Social media

Contact

Futurice Family