An artificial neural network is a computational model used in the field of machine learning (artificial intelligence). This computational model currently yields the best results in a variety of artificial intelligence applications (object recognition within an image, speech recognition, translation from one language to another…). The structure of this computational model is inspired by the structure of biological neural networks, e.g. in the brain. Accordingly, an artificial neural network consists of a large number of small computational units (neurons) which are interconnected in series and parallel circuits. Much like the brain, this network is also capable of learning – in the learning process, the weight of the connections between the individual neurons are adjusted in the artificial neural network. Any given neuron in the network then sends a signal to the subsequent layers only if the sum of its input signals, multiplied by the learned weighting factors, exceeds a certain threshold value (this is similar to the workings of neurons in the human brain).
What are the advantages of using neural networks in speech recognition, compared to the previous system?
The system of machine learning using neural networks offers a significantly higher accuracy in speech recognition. This is especially apparent under difficult conditions, e.g. when transcribing a compressed recording, with excessive background noise, when the sound has been recorded from a greater distance, etc. In cases such as these, the neural network is much more robust and the quality of speech recognition is retained more successfully than with the previous version.
From the point of view of machine learning theory, one of the advantages of neural networks is that if they are deep enough, the network is capable of essentially creating some of its own internal abstract indexes between layer which are far better than what man can produce from a processed signal through various sophisticated transformations and algorithms. However, even the previous algorithms had to be learned. This is not a novelty for neural networks. But it is important to remember that no system can learn entirely on its own. There always has to be a teacher for learning to be able to take place.
Under ideal conditions, where even the previous system worked well, we can expect a relative decrease in the error rate by 10% – 20%. This can increase the accuracy from 90% up to 91% even 92%. Under difficult conditions, where the previous system functioned e.g. with an accuracy of only 40% – 60%, we can now expect significantly better results, e.g. around 80%.
Learning with the aid of several hundred hours’ worth of voice recordings takes approximately 24 hours using one powerful graphics card.
What is the difference between the NEWTON Dictate program and the service for transcribing recordings?
NEWTON Dictate is most appreciated by those who want to take down notes, make log entries or dictate a text previously written by hand. In contrast, the the service for transcribing recordings is best suited for recognizing previously recorded audio files (such as recordings of interviews, meetings etc.). Recordings are best transcribed using our program NEWTON SpeechGrid.
NEWTON Dictate is designed for dictating general texts in standard language. It is available in Czech, Slovak, Polish and Croatian. To transcribe spontaneous speech or to dictate professional texts, it is necessary to use the corresponding general or specialized dictionary.
What is the minimum recommended computer configuration needed in order for the program to work properly?
The program requires a computer with a processor of min. Intel Core i5 (1.7GHz and up), 4GB RAM.
Supported OS: Microsoft Windows 10, 8 and 7 32-bit or 64-bit. Installation: Microsoft .NET 4 (is included in the package or available for download at http://www.microsoft.com/net/). Sufficient space on the hard drive is needed (up to 600MB for the general dictionary). A standard sound card supporting a sampling rate of 16kHz, with a 16-bit resolution. The program will also work on computers with lower performance, but in that case there will be a delay in the recognition process.
For dictation, it is advisable to use a so-called directional microphone, which, unlike the computer’s internal microphone, will only capture sounds in its immediate vicinity. A high-quality microphone is also included in the NEWTON Dictate package.
The application always tries to recognize the entire dictation. Therefore, unknown words are not left as blank spaces, but are replaced with what is judged to be the phonetically most similar variant. If you need to dictate an unknown word repeatedly, you can add it to the user dictionary. The application will learn the word and recognize it in the next dictate.
The recognized text can be saved in the standard RTF or TXT format. The application also retains the audio recording of your dictation, which you can then export in MP3, WAV or SPX formats. If you want to continue working with the text and sound recording in NEWTON Dictate, the program allows you to save the entire document in TTAX format.
If you want NEWTON Dictate to transcribe your dictation directly into another program, you can use the “MINI” feature, which writes the dictated text into the current location of the mouse cursor. This allows you to dictate into any application, information system or Internet browser.
If the program has trouble recognizing your speech, first check whether your microphone is selected in the settings and correctly positioned in front of your mouth. The introductory tutorial to the program will take you through the microphone settings step by step. An incorrectly set microphone is the most frequent cause of problems with the program’s functionality.
Yes, the program will automatically adapt to the voice of every new user, and is capable of eliminating the effects of minor speech defects such as the inability to correctly pronounce the sound.
Leave us a message or call us. We will reply to your queries as soon as possible.
Na Pankráci 1683/127,
140 00 Praha 4