AI/ML News

Catalog, question, and search audio applications with Amazon Transcribe and Information Bases for Amazon Bedrock

August 6, 2024

Table of Contents

Info retrieval programs have powered the knowledge age by their potential to crawl and sift by large quantities of information and rapidly return correct and related outcomes. These programs, reminiscent of search engines like google and yahoo and databases, usually work by indexing on key phrases and fields contained in information information.

Nevertheless, a lot of our information within the digital age additionally is available in non-text format, reminiscent of audio and video information. Discovering related content material normally requires looking by text-based metadata reminiscent of timestamps, which should be manually added to those information. This may be onerous to scale as the amount of unstructured audio and video information continues to develop.

Fortuitously, the rise of synthetic intelligence (AI) options that may transcribe audio and supply semantic search capabilities now provide extra environment friendly options for querying content material from audio information at scale. Amazon Transcribe is an AWS AI service that makes it easy to transform speech to textual content. Amazon Bedrock is a completely managed service that provides a alternative of high-performing basis fashions (FMs) from main AI firms by a single API, together with a broad set of capabilities to construct generative AI purposes with safety, privateness, and accountable AI.

On this publish, we present how Amazon Transcribe and Amazon Bedrock can streamline the method to catalog, question, and search by audio applications, utilizing an instance from the AWS re:Suppose podcast collection.

Resolution overview

The next diagram illustrates how you should utilize AWS providers to deploy an answer for cataloging, querying, and looking by content material saved in audio information.

On this answer, audio information saved in mp3 format are first uploaded to Amazon Easy Storage Service (Amazon S3) storage. Video information (reminiscent of mp4) that include audio in supported languages may also be uploaded to Amazon S3 as a part of this answer. Amazon Transcribe will then transcribe these information and retailer the whole transcript in JSON format as an object in Amazon S3.

To catalog these information, every JSON file in Amazon S3 ought to be tagged with the corresponding episode title. This enables us to later retrieve the episode title for every question outcome.

Subsequent, we use Amazon Bedrock to create numerical representations of the content material inside every file. These numerical representations are additionally referred to as embeddings, they usually’re saved as vectors inside a vector database that we will later question.

Amazon Bedrock is a completely managed service that makes FMs from main AI startups and Amazon accessible by an API. Included with Amazon Bedrock is Information Bases for Amazon Bedrock. As a completely managed service, Information Bases for Amazon Bedrock makes it easy to arrange a Retrieval Augmented Era (RAG) workflow.

With Information Bases for Amazon Bedrock, we first arrange a vector database on AWS. Information Bases for Amazon Bedrock can then routinely break up the info information saved in Amazon S3 into chunks after which create embeddings of every chunk utilizing Amazon Titan on Amazon Bedrock. Amazon Titan is a household of high-performing FMs from Amazon. Included with Amazon Titan is Amazon Titan Textual content Embeddings, which we use to create the numerical illustration of the textual content inside every chunk and retailer them in a vector database.

When a consumer queries the contents of the audio information by a generative AI software or AWS Lambda perform, it makes an API name to Information Bases for Amazon Bedrock. Information Bases for Amazon Bedrock will then orchestrate a name to the vector database to carry out a semantic search, which returns essentially the most related outcomes. Subsequent, Information Bases for Amazon Bedrock augments the consumer’s authentic question with these outcomes to a immediate, which is distributed to the massive language mannequin (LLM). The LLM will return outcomes which might be extra correct and related to the consumer question.

Let’s stroll by an instance of how one can catalog, question, and search by a library of audio information utilizing these AWS AI providers. For this publish, we use episodes of the re:Suppose podcast collection, which has over 20 episodes. Every episode is an audio program recorded in mp3 format. As we proceed so as to add new episodes, we are going to need to use AI providers to make the duty of querying and trying to find particular content material extra scalable with out the necessity to manually add metadata for every episode.

Stipulations

Along with getting access to AWS providers by the AWS Administration Console, you want a number of different sources to deploy this answer.

First, you want a library of audio information to catalog, question, and search. For this publish, we use episodes of the AWS re:Suppose podcast collection.

To make API calls to Amazon Bedrock from our generative AI software, we use Python model 3.11.4 and the AWS SDK for Python (Boto3).

Transcribe audio information

The primary activity is to transcribe every mp3 file utilizing Amazon Transcribe. For directions on transcribing with the AWS Administration Console or AWS CLI, seek advice from the Amazon Transcribe Developer information. Amazon Transcribe can create a transcript for every episode and retailer it as an S3 object in JSON format.

Catalog audio information utilizing tagging

To catalog every episode, we tag the S3 object for every episode with the corresponding episode title. For directions on tagging objects in S3, seek advice from the Amazon Easy Storage Service Consumer Information. For instance, for the S3 object AI-Accelerators.json, we tag it with key = “title” and worth = “Episode 20: AI Accelerators within the Cloud.”

The title is the one metadata we have to manually add for every audio file. There isn’t any must manually add timestamps for every chapter or part with the intention to later seek for particular content material.

Arrange a vector database utilizing Information Bases for Amazon Bedrock

Subsequent, we arrange our absolutely managed RAG workflow utilizing Information Bases for Amazon Bedrock. For directions on making a data base, seek advice from the Amazon Bedrock Consumer Information. We start by specifying a knowledge supply. In our case, we select the S3 bucket location the place our transcripts in JSON format are saved.

Subsequent, we choose an embedding mannequin. The embedding mannequin will convert every chunk of our transcript into embeddings. Embeddings are numbers, and the which means of every embedding is dependent upon the mannequin. In our instance, we choose Titan Textual content Embeddings v2 with a dimension measurement of 1024.

The embeddings are saved as vectors in a vector database. You possibly can both specify an present vector database you have got already created or have Information Bases for Amazon Bedrock create one for you. For our instance, we’ve got Information Bases for Amazon Bedrock create a vector database utilizing Amazon OpenSearch Serverless.

Earlier than you possibly can question the vector database, you need to first sync it with the info supply. Throughout every sync operation, Information Bases for Amazon Bedrock will break up the info supply into chunks after which use the chosen embedding mannequin to embed every chunk as a vector. Information Bases for Amazon Bedrock will then retailer these vectors within the vector database.

The sync operation in addition to different Amazon Bedrock operations described up to now will be carried out both utilizing the console or API calls.

Question the audio information

Now we’re prepared to question and seek for particular content material from our library of podcast episodes. In episode 20, titled “AI Accelerators within the Cloud,” our visitor Matthew McClean, a senior supervisor from AWS’s Annapurna group, shared why AWS determined to purchase Annapurna Labs in 2015. For our first question, we ask, “Why did AWS purchase Annapurna Labs?”

We entered this question into Information Bases for Amazon Bedrock utilizing Anthropic Claude and bought the next response:

“AWS acquired Annapurna Labs in 2015 as a result of Annapurna was offering AWS with nitro playing cards that offloaded virtualization, safety, networking and storage from EC2 situations to release CPU sources.”

That is a precise quote from Matthew McClean within the podcast episode. You wouldn’t get this quote when you had entered the identical immediate into different publicly accessible generative AI chatbots as a result of they don’t have the vector database with embeddings of the podcast transcript to supply extra related context.

Retrieve an episode title

Now let’s suppose that along with getting extra related responses, we additionally need to retrieve the right podcast episode title that was related to this question from our catalog of podcast episodes.

To retrieve the episode title, we first use essentially the most related information chunk from the question. Every time Information Bases for Amazon Bedrock responds to a question, it additionally supplies a number of chunks of information that it retrieved from the vector database that had been most related to the question so as of relevance. We will take the primary chunk that was returned. These chunks are returned as JSON paperwork. Nested contained in the JSON is the S3 location of the transcript object. In our instance, the S3 location is s3://rethinkpodcast/textual content/transcripts/AI-Accelerators.json.

The primary phrases within the chunk textual content are: “Yeah, positive. So possibly I can begin with the historical past of Annapurna…”

As a result of we’ve got already tagged this transcript object in Amazon S3 with the episode title, we will retrieve the title by retrieving the worth of the tag the place key = “title”. On this case, the title is “Episode 20: AI Accelerators within the Cloud.”

Search the beginning time

What if we additionally need to search and discover the beginning time contained in the episode the place the related content material begins? We need to accomplish that with out having to manually learn by the transcript or hearken to the episode from the start, and with out manually including timestamps for each chapter.

We will discover the beginning time a lot quicker by having our generative AI software make a number of extra API calls. We begin by treating the chunk textual content as a substring of the whole transcript. We then seek for the beginning time of the primary phrase within the chunk textual content.

In our instance, the primary phrases returned had been “Yeah, positive. So possibly I can begin with the historical past of Annapurna…” We now want to look the whole transcript for the beginning time of the phrase “Yeah.”

Amazon Transcribe outputs the beginning time of each phrase within the transcript. Nevertheless, any phrase can seem greater than as soon as. The phrase “Yeah” happens 28 instances within the transcript, and every prevalence has its personal begin time. So how will we decide the right begin time for “Yeah” in our instance?

There are a number of approaches an software developer can use to search out the right begin time. For our instance, we use the Python string discover() methodology to search out the place of the chunk textual content inside the total transcript.

For the chunk textual content that begins with “Yeah, positive. So possibly I can begin with the historical past of Annapurna…” the discover() methodology returned the place as 2047. If we deal with the transcript as one lengthy textual content string, the chunk “Yeah, positive. So possibly…” begins at character place 2047.

Discovering the beginning time now turns into a matter of counting the character place of every phrase within the transcript and utilizing it to lookup the right begin time from the transcript file generated by Amazon Transcribe. This can be tedious for an individual to do manually, however trivial for a pc.

In our instance Python code, we loop by an array that comprises the beginning time for every token whereas counting the variety of the character place that every token begins at. As a result of we’re looping by the tokens, we will construct a brand new array that shops the beginning time for every character place.

On this instance question, the beginning time for the phrase “Yeah” at place 2047 is 160 seconds, or 2 minutes and 40 seconds into the podcast. You possibly can verify the recording beginning at 2 minutes 40 seconds.

Clear up

This answer incurs expenses based mostly on the providers you employ:

Amazon Transcribe operates below a pay-as-you-go pricing mannequin. For extra particulars, see Amazon Transcribe Pricing.
Amazon Bedrock makes use of an on-demand quota, so that you solely pay for what you employ. For extra data, seek advice from Amazon Bedrock pricing.
With OpenSearch Serverless, you solely pay for the sources consumed by your workload.
In case you’re utilizing Information Bases for Amazon Bedrock with different vector databases moreover OpenSearch Serverless, you might proceed to incur expenses even when not operating any queries. It is suggested you delete your data base and its related vector retailer together with audio information saved in Amazon S3 to keep away from pointless prices once you’re performed testing this answer.

Conclusion

Cataloging, querying, and looking by massive volumes of audio information will be tough to scale. On this publish, we confirmed how Amazon Transcribe and Information Bases for Amazon Bedrock can assist automate and make the method of retrieving related data from audio information extra scalable.

You possibly can start transcribing your personal library of audio information with Amazon Transcribe. To be taught extra on how Information Bases for Amazon Bedrock can then orchestrate a RAG workflow to your transcripts with vector shops, seek advice from Information Bases now delivers absolutely managed RAG expertise in Amazon Bedrock.

With the assistance of those AI providers, we will now increase the frontiers of our data bases.

In regards to the Creator

Nolan Chen is a Accomplice Options Architect at AWS, the place he helps startup firms construct modern options utilizing the cloud. Previous to AWS, Nolan specialised in information safety and serving to prospects deploy high-performing extensive space networks. Nolan holds a bachelor’s diploma in Mechanical Engineering from Princeton College.

Supply hyperlink