Google Gemini Adds Audio File Upload Capability

Google has expanded its Gemini AI assistant to accept audio file uploads, allowing users to obtain transcriptions, summaries and key information from recordings up to ten minutes long. The feature, described as the most‑requested addition by Gemini’s VP Josh Woodward, works through the web and mobile apps and complements existing Gemini Live voice interactions. While free‑tier users face daily limits and pricing details remain undisclosed, the update positions Gemini alongside competitors like Anthropic’s Claude and Perplexity, which also offer audio processing tools.

New Audio Upload Feature

Google’s Gemini AI assistant now supports the upload of audio files. Users can submit recordings through the web interface or mobile applications, and Gemini will automatically transcribe the content, generate concise summaries, and pull out key details. The functionality handles files up to ten minutes in length, making it suitable for short voice memos, meeting snippets, lecture excerpts, and interview clips.

Motivation and Positioning

The addition was highlighted by Gemini’s vice president, Josh Woodward, as the most‑requested enhancement from the user community. Unlike Gemini Live, which focuses on real‑time voice commands, the new capability processes pre‑recorded audio as a data format similar to text or images, streamlining the workflow for users who previously relied on separate transcription services.

How It Works

After selecting an audio file via the standard upload dialog, Gemini returns a full transcription and optional outputs such as simplified language, speaker‑specific excerpts, question generation, or study guide creation. The tool’s ability to extract actionable items from the transcript is highlighted as a practical benefit for personal organization and professional tasks.

Limitations and Pricing

Current limits restrict each upload to ten minutes, and free‑tier accounts are subject to daily usage caps. Google has not released a detailed pricing model for high‑volume audio processing, noting that the feature is included within the regular Gemini quota.

Competitive Landscape

Other AI assistants also offer audio handling capabilities. Anthropic’s Claude includes audio features in certain developer tools, while Perplexity can extract information from YouTube videos. Gemini’s integration of audio uploads adds a direct, consumer‑focused option that competes with these alternatives.

Implications

The rollout reflects a broader trend of AI platforms expanding multimodal support to match how users capture information. By turning voice recordings into searchable, actionable text, Gemini aims to reduce reliance on third‑party transcription services and enhance productivity for a range of everyday scenarios.