Conferences play an important position in decision-making, undertaking coordination, and collaboration, and distant conferences are frequent throughout many organizations. Nonetheless, capturing and structuring key takeaways from these conversations is commonly inefficient and inconsistent. Manually summarizing conferences or extracting motion gadgets requires important effort and is susceptible to omissions or misinterpretations.
Giant language fashions (LLMs) supply a extra sturdy answer by reworking unstructured assembly transcripts into structured summaries and motion gadgets. This functionality is very helpful for undertaking administration, buyer assist and gross sales calls, authorized and compliance, and enterprise data administration.
On this submit, we current a benchmark of various understanding fashions from the Amazon Nova household out there on Amazon Bedrock, to supply insights on how one can select the perfect mannequin for a gathering summarization activity.
LLMs to generate assembly insights
Fashionable LLMs are extremely efficient for summarization and motion merchandise extraction as a consequence of their capability to grasp context, infer subject relationships, and generate structured outputs. In these use circumstances, immediate engineering gives a extra environment friendly and scalable method in comparison with conventional mannequin fine-tuning or customization. Quite than modifying the underlying mannequin structure or coaching on massive labeled datasets, immediate engineering makes use of fastidiously crafted enter queries to information the mannequin’s habits, immediately influencing the output format and content material. This methodology permits for speedy, domain-specific customization with out the necessity for resource-intensive retraining processes. For duties equivalent to assembly summarization and motion merchandise extraction, immediate engineering allows exact management over the generated outputs, ensuring they meet particular enterprise necessities. It permits for the versatile adjustment of prompts to go well with evolving use circumstances, making it a perfect answer for dynamic environments the place mannequin behaviors have to be shortly reoriented with out the overhead of mannequin fine-tuning.
Amazon Nova fashions and Amazon Bedrock
Amazon Nova fashions, unveiled at AWS re:Invent in December 2024, are constructed to ship frontier intelligence at industry-leading value efficiency. They’re among the many quickest and most cost-effective fashions of their respective intelligence tiers, and are optimized to energy enterprise generative AI functions in a dependable, safe, and cost-effective method.
The understanding mannequin household has 4 tiers of fashions: Nova Micro (text-only, ultra-efficient for edge use), Nova Lite (multimodal, balanced for versatility), Nova Professional (multimodal, stability of pace and intelligence, splendid for many enterprise wants) and Nova Premier (multimodal, essentially the most succesful Nova mannequin for complicated duties and instructor for mannequin distillation). Amazon Nova fashions can be utilized for a wide range of duties, from summarization to structured textual content technology. With Amazon Bedrock Mannequin Distillation, clients also can convey the intelligence of Nova Premier to a sooner and less expensive mannequin equivalent to Nova Professional or Nova Lite for his or her use case or area. This may be achieved by the Amazon Bedrock console and APIs such because the Converse API and Invoke API.
Answer overview
This submit demonstrates use Amazon Nova understanding fashions, out there by Amazon Bedrock, for automated perception extraction utilizing immediate engineering. We deal with two key outputs:
- Assembly summarization – A high-level abstractive abstract that distills key dialogue factors, selections made, and significant updates from the assembly transcript
- Motion gadgets – A structured checklist of actionable duties derived from the assembly dialog that apply to the whole staff or undertaking
The next diagram illustrates the answer workflow.
Conditions
To observe together with this submit, familiarity with calling LLMs utilizing Amazon Bedrock is anticipated. For detailed steps on utilizing Amazon Bedrock for textual content summarization duties, confer with Construct an AI textual content summarizer app with Amazon Bedrock. For extra details about calling LLMs, confer with the Invoke API and Utilizing the Converse API reference documentation.
Answer elements
We developed the 2 core options of the answer—assembly summarization and motion merchandise extraction—through the use of well-liked fashions out there by Amazon Bedrock. Within the following sections, we take a look at the prompts that have been used for these key duties.
For the assembly summarization activity, we used a persona task, prompting the LLM to generate a abstract in
For the motion merchandise extraction activity, we gave particular directions on producing motion gadgets within the prompts and used chain-of-thought to enhance the standard of the generated motion gadgets. Within the assistant message, the prefix
tag is supplied as a prefilling to nudge the mannequin technology in the proper course and to keep away from redundant opening and shutting sentences.
Totally different mannequin households reply to the identical prompts in a different way, and it’s essential to observe the prompting information outlined for the actual mannequin. For extra data on greatest practices for Amazon Nova prompting, confer with Prompting greatest practices for Amazon Nova understanding fashions.
Dataset
To guage the answer, we used the samples for the general public QMSum dataset. The QMSum dataset is a benchmark for assembly summarization, that includes English language transcripts from tutorial, enterprise, and governance discussions with manually annotated summaries. It evaluates LLMs on producing structured, coherent summaries from complicated and multi-speaker conversations, making it a priceless useful resource for abstractive summarization and discourse understanding. For testing, we used 30 randomly sampled conferences from the QMSum dataset. Every assembly contained 2–5 topic-wise transcripts and contained roughly 8,600 tokens for every transcript in common.
Analysis framework
Attaining high-quality outputs from LLMs in assembly summarization and motion merchandise extraction generally is a difficult activity. Conventional analysis metrics equivalent to ROUGE, BLEU, and METEOR deal with surface-level similarity between generated textual content and reference summaries, however they usually fail to seize nuances equivalent to factual correctness, coherence, and actionability. Human analysis is the gold customary however is pricey, time-consuming, and never scalable. To handle these challenges, you need to use LLM-as-a-judge, the place one other LLM is used to systematically assess the standard of generated outputs primarily based on well-defined standards. This method affords a scalable and cost-effective approach to automate analysis whereas sustaining excessive accuracy. On this instance, we used Anthropic’s Claude 3.5 Sonnet v1 because the choose mannequin as a result of we discovered it to be most aligned with human judgment. We used the LLM choose to attain the generated responses on three predominant metrics: faithfulness, summarization, and query answering (QA).
The faithfulness rating measures the faithfulness of a generated abstract by measuring the portion of the parsed statements in a abstract which are supported by given context (for instance, a gathering transcript) with respect to the overall variety of statements.
The summarization rating is the mixture of the QA rating and the conciseness rating with the identical weight (0.5). The QA rating measures the protection of a generated abstract from a gathering transcript. It first generates an inventory of query and reply pairs from a gathering transcript and measures the portion of the questions which are requested accurately when the abstract is used as a context as an alternative of a gathering transcript. The QA rating is complimentary to the faithfulness rating as a result of the faithfulness rating doesn’t measure the protection of a generated abstract. We solely used the QA rating to measure the standard of a generated abstract as a result of the motion gadgets aren’t alleged to cowl all facets of a gathering transcript. The conciseness rating measures the ratio of the size of a generated abstract divided by the size of the overall assembly transcript.
We used a modified model of the faithfulness rating and the summarization rating that had a lot decrease latency than the unique implementation.
Outcomes
Our analysis of Amazon Nova fashions throughout assembly summarization and motion merchandise extraction duties revealed clear performance-latency patterns. For summarization, Nova Premier achieved the best faithfulness rating (1.0) with a processing time of 5.34s, whereas Nova Professional delivered 0.94 faithfulness in 2.9s. The smaller Nova Lite and Nova Micro fashions supplied faithfulness scores of 0.86 and 0.83 respectively, with sooner processing instances of two.13s and 1.52s. In motion merchandise extraction, Nova Premier once more led in faithfulness (0.83) with 4.94s processing time, adopted by Nova Professional (0.8 faithfulness, 2.03s). Apparently, Nova Micro (0.7 faithfulness, 1.43s) outperformed Nova Lite (0.63 faithfulness, 1.53s) on this specific activity regardless of its smaller measurement. These measurements present priceless insights into the performance-speed traits throughout the Amazon Nova mannequin household for text-processing functions. The next graphs present these outcomes. The next screenshot exhibits a pattern output for our summarization activity, together with the LLM-generated assembly abstract and an inventory of motion gadgets.
Conclusion
On this submit, we confirmed how you need to use prompting to generate assembly insights equivalent to assembly summaries and motion gadgets utilizing Amazon Nova fashions out there by Amazon Bedrock. For big-scale AI-driven assembly summarization, optimizing latency, price, and accuracy is important. The Amazon Nova household of understanding fashions (Nova Micro, Nova Lite, Nova Professional, and Nova Premier) affords a sensible various to high-end fashions, considerably bettering inference pace whereas decreasing operational prices. These elements make Amazon Nova a gorgeous selection for enterprises dealing with massive volumes of assembly knowledge at scale.
For extra data on Amazon Bedrock and the most recent Amazon Nova fashions, confer with the Amazon Bedrock Consumer Information and Amazon Nova Consumer Information, respectively. The AWS Generative AI Innovation Heart has a gaggle of AWS science and technique specialists with complete experience spanning the generative AI journey, serving to clients prioritize use circumstances, construct a roadmap, and transfer options into manufacturing. Try the Generative AI Innovation Heart for our newest work and buyer success tales.
Concerning the Authors
Baishali Chaudhury is an Utilized Scientist on the Generative AI Innovation Heart at AWS, the place she focuses on advancing Generative AI options for real-world functions. She has a robust background in pc imaginative and prescient, machine studying, and AI for healthcare. Baishali holds a PhD in Pc Science from College of South Florida and PostDoc from Moffitt Most cancers Centre.
Sungmin Hong is a Senior Utilized Scientist at Amazon Generative AI Innovation Heart the place he helps expedite the number of use circumstances of AWS clients. Earlier than becoming a member of Amazon, Sungmin was a postdoctoral analysis fellow at Harvard Medical Faculty. He holds Ph.D. in Pc Science from New York College. Exterior of labor, he prides himself on maintaining his indoor vegetation alive for 3+ years.
Mengdie (Flora) Wang is a Information Scientist at AWS Generative AI Innovation Heart, the place she works with clients to architect and implement scalable Generative AI options that deal with their distinctive enterprise challenges. She makes a speciality of mannequin customization methods and agent-based AI methods, serving to organizations harness the complete potential of generative AI expertise. Previous to AWS, Flora earned her Grasp’s diploma in Pc Science from the College of Minnesota, the place she developed her experience in machine studying and synthetic intelligence.
Anila Joshi has greater than a decade of expertise constructing AI options. As a AWSI Geo Chief at AWS Generative AI Innovation Heart, Anila pioneers progressive functions of AI that push the boundaries of risk and speed up the adoption of AWS providers with clients by serving to clients ideate, determine, and implement safe generative AI options.