AI/ML News

How Twilio generated SQL utilizing Looker Modeling Language knowledge with Amazon Bedrock

August 9, 2024

Table of Contents

This publish is co-written with Aishwarya Gupta, Apurva Gawad, and Oliver Cody from Twilio.

Immediately’s main firms belief Twilio’s Buyer Engagement Platform (CEP) to construct direct, personalised relationships with their clients all over the place on the earth. Twilio allows firms to make use of communications and knowledge so as to add intelligence and safety to each step of the client journey, from gross sales and advertising and marketing to development, customer support, and plenty of extra engagement use circumstances in a versatile, programmatic manner. Throughout 180 international locations, thousands and thousands of builders and lots of of hundreds of companies use Twilio to create personalised experiences for his or her clients. As one of many largest AWS clients, Twilio engages with knowledge, synthetic intelligence (AI), and machine studying (ML) companies to run their day by day workloads.

Knowledge is the foundational layer for all generative AI and ML purposes. Managing and retrieving the suitable info may be advanced, particularly for knowledge analysts working with giant knowledge lakes and complicated SQL queries. To deal with this, Twilio partnered with AWS to develop a digital assistant that helps their knowledge analysts discover and retrieve related knowledge from Twilio’s knowledge lake by changing consumer questions requested in pure language to SQL queries. This digital assistant device makes use of Amazon Bedrock, a totally managed generative AI service that gives entry to high-performing basis fashions (FMs) and capabilities like Retrieval Augmented Era (RAG). RAG optimizes language mannequin outputs by extending the fashions’ capabilities to particular domains or a company’s inside knowledge for tailor-made responses.

This publish highlights how Twilio enabled pure language-driven knowledge exploration of enterprise intelligence (BI) knowledge with RAG and Amazon Bedrock.

Twilio’s use case

Twilio wished to supply an AI assistant to assist their knowledge analysts discover knowledge of their knowledge lake. They used the metadata layer (schema info) over their knowledge lake consisting of views (tables) and fashions (relationships) from their knowledge reporting device, Looker, because the supply of reality. Looker is an enterprise platform for BI and knowledge purposes that helps knowledge analysts discover and share insights in actual time.

Twilio applied RAG utilizing Anthropic Claude 3 on Amazon Bedrock to develop a digital assistant device referred to as AskData for his or her knowledge analysts. This device converts questions from knowledge analysts requested in pure language (reminiscent of “Which desk accommodates buyer deal with info?”) right into a SQL question utilizing the schema info out there in Looker Modeling Language (LookML) fashions and views. The analysts can run this generated SQL straight, saving them the time to first establish the tables containing related info after which write a SQL question to retrieve the knowledge.

The AskData device supplies ease of use and effectivity to its customers:

Customers want correct details about the info in a fast and accessible method to make enterprise choices. Offering a device to reduce their time spent discovering tables and writing SQL queries permits them to focus extra on enterprise outcomes and fewer on logistical duties.
Customers sometimes attain out to the engineering assist channel once they have questions on knowledge that’s deeply embedded within the knowledge lake or if they will’t entry it utilizing numerous queries. Having an AI assistant can scale back the engineering time spent in responding to those queries and supply solutions extra shortly.

Resolution overview

On this publish, we present you a step-by-step implementation and design of the AskData device designed to function an AI assistant for Twilio’s knowledge analysts. We focus on the next:

use a RAG method to retrieve the related LookML metadata comparable to customers’ questions with the assistance of environment friendly knowledge chunking and indexing and generate SQL queries from pure language
choose the optimum giant language mannequin (LLM) on your use case from Amazon Bedrock
How analysts can question the info utilizing pure language questions
The advantages of utilizing RAG for knowledge evaluation, together with elevated productiveness and lowered engineering overhead of discovering the info (tables) and writing SQL queries.

This answer makes use of Amazon Bedrock, Amazon Relational Database Service (Amazon RDS), Amazon DynamoDB, and Amazon Easy Storage Service (Amazon S3). The next diagram illustrates the answer structure.

The workflow consists of the next steps:

An end-user (knowledge analyst) asks a query in pure language concerning the knowledge that resides inside an information lake.
This query makes use of metadata (schema info) saved in Amazon RDS and dialog historical past saved in DynamoDB for personalised retrieval to the consumer’s questions:
- The RDS database (PostgreSQL with pgvector) shops the LookML tables and views as embeddings which can be retrieved by a vector similarity search.
- The DynamoDB desk shops the earlier dialog historical past with this consumer.
The context and pure language query are parsed by Amazon Bedrock utilizing an FM (on this case, Anthropic Claude 3 Haiku), which responds with a customized SQL question that the consumer can use to retrieve correct info from the info lake. The next is the immediate template that’s used for producing the SQL question:

Human: The context info beneath represents the LookML knowledge for Looker views and fashions. 
Utilizing this context knowledge, please generate a presto SQL question that may return the right end result for the consumer's query. 
Please present a SQL question with the right syntax, desk names, and column names based mostly on the supplied LookML knowledge.



1. Use the right underlying SQL desk names (desk identify in sql_table_name) 
and column names (use column names from the scale of the view as they're the right column names). 
Use the next for example:

{{instance redacted}}

2. Be a part of tables as essential to get the right end result. 
- Keep away from pointless joins if not explicitly requested by the consumer.

3. Keep away from pointless filters if not explicitly requested by the consumer.

4. If the view has a derived desk, use the derived question to reply query 
utilizing desk names and column names from derived question. Use the next for example:

{{instance redacted}}

5. The schema identify is represented as . throughout the LookML views. 
Use the present schema identify or "public" because the schema identify if no schema is specified.



That is the chat historical past from earlier messages:



{chat_history}





{context}



That is the consumer query:



{query}



Assistant: Here's a SQL question for the consumer query:

The answer contains 4 fundamental steps:

Use semantic search on LookML metadata to retrieve the related tables and views comparable to the consumer questions.
Use FMs on Amazon Bedrock to generate correct SQL queries based mostly on the retrieved desk and examine info.
Create a easy net software utilizing LangChain and Streamlit.
Refine your present software utilizing strategic strategies reminiscent of immediate engineering, optimizing inference parameters and different LookML content material.

Stipulations

To implement the answer, it is best to have an AWS account, mannequin entry to your alternative of FM on Amazon Bedrock, and familiarity with DynamoDB, Amazon RDS, and Amazon S3.

Entry to Amazon Bedrock FMs isn’t granted by default. To achieve entry to an FM, an AWS Id and Entry Administration (IAM) consumer with adequate permissions must request entry to it by the Amazon Bedrock console. After entry is supplied to a mannequin, it’s out there for the customers within the account.

To handle mannequin entry, select Mannequin entry within the navigation pane on the Amazon Bedrock console. The mannequin entry web page permits you to view a listing of accessible fashions, the output modality of the mannequin, whether or not you’ve got been granted entry to it, and the Finish Consumer License Settlement (EULA). You need to overview the EULA for phrases and situations of utilizing a mannequin earlier than requesting entry to it. For details about mannequin pricing, check with Amazon Bedrock pricing.

Mannequin entry

Construction and index the info

On this answer, we use the RAG method to retrieve the related schema info from LookML metadata comparable to customers’ questions after which generate a SQL question utilizing this info.

This answer makes use of two separate collections which can be created in our vector retailer: one for Looker views and one other for Looker fashions. We used the sentence-transformers/all-mpnet-base-v2 mannequin for creating vector embeddings and PostgreSQL with pgvector as our vector database. So long as the LookML file doesn’t exceed the context window of the LLM used to generate the ultimate response, we don’t break up the file into chunks and as a substitute move the file in its entirety to the embeddings mannequin. The vector similarity search is ready to discover the right recordsdata that comprise the LookML tables and views related to the consumer’s query. We will move the complete LookML file contents to the LLM, making the most of its giant context window, and the LLM is ready to decide the schemas for the related tables and views to generate the SQL question.

The 2 subsets of LookML metadata present distinct forms of details about the info lake. Views signify particular person tables, and fashions outline the relationships between these tables. By separating these elements, we will first retrieve the related views based mostly on the consumer’s query, after which use these outcomes to establish the related fashions that seize the relationships between the retrieved views.

This two-step process supplies a extra complete understanding of the related tables and their relationships to the consumer query. The next diagram reveals how each subsets of metadata are chunked and saved as embeddings in numerous vectors for enhanced retrieval. The LookML view and mannequin info is introduced into Amazon S3 by a separate knowledge pipeline (not proven).

Content material ingestion into vector db

Choose the optimum LLM on your use case

Choosing the suitable LLM for any use case is crucial. Each use case has totally different necessities for context size, token measurement, and the power to deal with numerous duties like summarization, process completion, chatbot purposes, and so forth. Amazon Bedrock is a totally managed service that gives a alternative of high-performing FMs from main AI firms like AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon inside a single API, together with a broad set of capabilities to construct generative AI purposes with safety, privateness, and accountable AI.

This answer is applied utilizing Anthropic Claude 3, out there by Amazon Bedrock. Anthropic Claude 3 is chosen for 2 fundamental causes:

Elevated context window – Anthropic Claude 3 can deal with as much as 200,000 tokens in its context, permitting for processing bigger LookML queries and tables. This expanded capability is essential when coping with advanced or in depth knowledge, so the LLM has entry to the required info for correct and knowledgeable responses to the consumer.
Enhanced reasoning skills – Anthropic Claude 3 demonstrates enhanced efficiency when working with bigger contexts, enabling it to raised perceive and reply to consumer queries that require a deeper comprehension of the views, fashions, and their relationships. You’ll be able to acquire granular management over the reasoning capabilities utilizing a number of immediate engineering methods.

Construct an internet software

This answer makes use of LangChain and Streamlit to construct an internet software and combine Amazon Bedrock into it. LangChain is a framework particularly designed to simplify the creation of purposes utilizing LLMs, and it’s simple to make use of Amazon Bedrock by LangChain utilizing the Amazon Bedrock element out there in LangChain. We use Streamlit to develop the frontend for this net software.

For knowledge analysts to effortlessly work together with and get queries to extract related knowledge from their knowledge lake, this answer implements a chat engine utilizing the ConversationalRetrievalChain mechanism, which allows you to move a customized vector retailer retriever, immediate, and dialog historical past to the LLM and generate personalised solutions to consumer questions. To retailer the chat historical past, we use DynamoDB with the consumer session ID as the first key. DynamoDB is a extremely scalable and sturdy NoSQL database service, enabling you to effectively retailer and retrieve chat histories for a number of consumer classes concurrently. The next screenshot reveals an instance of the chat interface developed utilizing Streamlit.

AskData consumer interface

The next is an instance situation to get an concept of the consumer workflow when interacting with AskData:

Consumer Query - how do I discover a buyer deal with?

AskData Response - Primarily based on the supplied LookML knowledge, 
you will discover the client deal with by querying the raw_mailing_addresses desk, 
which accommodates address-related info. 
Right here is the SQL question to retrieve the client deal with:

SELECT
    account,
    address1,
    address2,
    metropolis,
    province,
    postal_code,
    iso_country
FROM
    raw_mailing_addresses

This question will return the account SID (which can be utilized to establish the client), the primary line of the deal with, the second line of the deal with (if any), town, the province or state, the postal code, and the ISO nation code for all entries within the raw_mailing_addresses desk. In case you have a particular buyer’s account and also you wish to retrieve the deal with for that buyer, you possibly can add a WHERE clause to the question:

SELECT
    account,
    address1,
    address2,
    metropolis,
    province,
    postal_code,
    iso_country
FROM
    raw_mailing_addresses
WHERE
    account="YourCustomerAccount" -- Substitute with the precise account

Substitute ‘YourCustomerAccount‘ with the precise account of the client whose deal with you wish to discover.

Optimize the applying

Though utilizing an LLM to reply consumer questions on knowledge is environment friendly, it comes with acknowledged limitations, reminiscent of the power of the LLM to generate inaccurate responses, typically resulting from hallucinated info. To boost the accuracy of our software and scale back hallucinations, we did the next:

Set the temperature for the LLM to 0.1 to scale back the LLM’s propensity for overly inventive responses.
Added directions within the immediate to solely generate the SQL question based mostly on the context (schema, chat historical past) being supplied within the immediate.
Meticulously eliminated duplicate and redundant entries from the LookML knowledge earlier than it was ingested into the vector database.
Added a consumer expertise suggestions (a score from 1–5 with an non-obligatory textual content enter for feedback) as a part of the UI of AskData. We used the suggestions to enhance the standard of our knowledge, prompts, and inference parameter settings.

Primarily based on consumer suggestions, the applying achieved a web promoter rating (NPS) of 40, surpassing the preliminary goal rating of 35%. We set this goal because of the following key elements: the shortage of related info for particular consumer questions throughout the LookML knowledge, particular guidelines associated to the construction of SQL queries that may have to be added, and the expectation that generally the LLM would make a mistake despite all of the measures we put in place.

Conclusion

On this publish, we illustrated the way to use generative AI to considerably improve the effectivity of information analysts. Through the use of LookML as metadata for our knowledge lake, we constructed vector shops for views (tables) and fashions (relationships). With the RAG framework, we effectively retrieved pertinent info from these shops and supplied it as context to the LLM alongside consumer queries and any earlier chat historical past. The LLM then seamlessly generated SQL queries in response.

Our growth course of was streamlined thanks to numerous AWS companies, significantly Amazon Bedrock, which facilitated the mixing of LLM for question responses, and Amazon RDS, serving as our vector shops.

Try the next sources to be taught extra:

Get began with Amazon Bedrock at this time, and depart your suggestions and questions within the feedback part.

In regards to the Authors

Apurva Gawad is a Senior Knowledge Engineer at Twilio specializing in constructing scalable methods for knowledge ingestion and empowering enterprise groups to derive precious insights from knowledge. She has a eager curiosity in AI exploration, mixing technical experience with a ardour for innovation. Outdoors of labor, she enjoys touring to new locations, at all times searching for recent experiences and views.

Aishwarya Gupta is a Senior Knowledge Engineer at Twilio targeted on constructing knowledge methods to empower enterprise groups to derive insights. She enjoys to journey and discover new locations, meals, and tradition.

Oliver Cody is a Senior Knowledge Engineering Supervisor at Twilio with over 28 years {of professional} expertise, main multidisciplinary groups throughout EMEA, NAMER, and India. His expertise spans all issues knowledge throughout numerous domains and sectors. He has targeted on growing revolutionary knowledge options, considerably optimizing efficiency and lowering prices.

Amit Arora is an AI and ML specialist architect at Amazon Net Companies, serving to enterprise clients use cloud-based machine studying companies to quickly scale their improvements. He’s additionally an adjunct lecturer within the MS knowledge science and analytics program at Georgetown College in Washington D.C.

Johnny Chivers is a Senior Options Architect working throughout the Strategic Accounts group at AWS. With over 10 years of expertise serving to clients undertake new applied sciences, he guides them by architecting end-to-end options spanning infrastructure, huge knowledge, and AI.

Supply hyperlink