Azure speech to text read audio

3/22/2023

Try Rev AI Free Price Comparison Winner: Rev AI Otherwise, it’s usually better to go with a more platform-agnostic product like Rev AI. If you’re already there, then that’s the single best reason to sign up for Azure Cognitive Services ASR. In general, Azure’s products are going to be far easier to use for those who are already on Azure’s infrastructure and who already leverage the Azure ecosystem. However, Azure can only process audio that goes through Azure’s servers and is stored on their cloud platform, while Rev takes any URL. Rev has SDKs for Python, Java, and Node JS, while Microsoft has ones for C#, C++, Go, Java, JavaScript, Objective-C / Swift, and Python. One thing that you need to consider is the programming languages that the software development kits (SDKs) support. How quickly can we go from signing up for a service to making the first API call in production? Rev consistently hears from our customers that they’re able to get a proof of concept up and running within hours, a big difference from the days or even weeks that it takes with services like Azure Cognitive Services. Ease of Use Winner: It Depends on your Tech Stack Unfortunately, Azure cognitive services does not list turnaround times publicly. In effect, we achieve the following bench-marking metrics: Rev AI uses batch transcription to break the recording into multiple chunks so that we can process them in parallel to achieve faster results. Rev sees an average latency between 1 and 3 milliseconds, while Azure doesn’t specify.Īsynchronous ASR, on the other hand, deals with tasks that don’t happen in real-time, such as generating transcripts from a recording. The former includes real-time speech to text applications like providing live captions for streaming media. How fast can the ASR turn words into text? We can break this question down into two general categories: synchronous and asynchronous uses. However, note that they charge substantially more for their translation service than average transcription, over twice as much. Rev works in 31 different languages, including diverse options such as German, French, Spanish, Russian, Japanese, Chinese, Korean, Arabic, and Turkish.Īzure performs similarly with support for 44 total languages for speech-to-text use cases, though that number drops to 30 languages for translating speech in real-time. If you want to serve an international customer base or if you’re building anything involving translation, supporting multiple languages is essential. Rev AI, on the other hand, promises support for 8 English speakers or 6 non-English speakers.īoth solutions can identify speakers equally well. Microsoft claims their tech supports diarization, but they don’t ever say how many speakers it can handle. Identifying who is talking and when is a key feature for high performance ASR systems. Try Rev AI Free Speaker ID and Diarization Winner: Rev AI The reason that Rev’s AI outperforms others is because our network of over 60,000 human transcriptionists contribute data that we use to constantly improve our models. In our podcast transcription benchmarks, we compared Rev AI to Microsoft’s ASR for 30 podcasts and found that Rev’s WER, 14.22%, is about 2% lower than Microsoft’s, which came in at 16.51%. A 20% WER, for instance, means that it got 20% of the words wrong. The gold standard for accuracy benchmarking is word error rate (WER), which measures how many words the ASR tech deletes, inserts, or substitutes as an overall percentage. After all, if the ASR engine messes up too many words, using it will be difficult at best and impossible at worst. Accuracy Winner: Rev AIīy far the most important point of comparison is accuracy. No matter what, you want to get a holistic picture of each technology and how they stack up against each other.

Depending on your unique needs and uses, some will be more important than others. If you’re picking between Azure Cognitive Services and Rev.ai for your project, you want to compare these solutions along several metrics. There’s nothing that turns users off more than the frustration of trying to talk to a device that just can’t understand them, no matter how hard they try.

Whether you have a great idea for the next Internet of Things (IoT) device, you want to add live-captioning to your media streaming service, or you’re creating a hands-free voice user interface for a mobile application, you’re going to need an automatic speech recognition (ASR) solution that’s up for the job.

0 Comments

Azure speech to text read audio

Leave a Reply.

Author

Archives

Categories