An artificial neural network is a computational model used in the field of machine learning (artificial intelligence). This computational model currently yields the best results in a variety of artificial intelligence applications (object recognition within an image, speech recognition, translation from one language to another…). The structure of this computational model is inspired by the structure of biological neural networks, e.g. in the brain. Accordingly, an artificial neural network consists of a large number of small computational units (neurons) which are interconnected in series and parallel circuits. Much like the brain, this network is also capable of learning – in the learning process, the weight of the connections between the individual neurons are adjusted in the artificial neural network. Any given neuron in the network then sends a signal to the subsequent layers only if the sum of its input signals, multiplied by the learned weighting factors, exceeds a certain threshold value (this is similar to the workings of neurons in the human brain).
What are the advantages of using neural networks in speech recognition, compared to the previous system?
The system of machine learning using neural networks offers a significantly higher accuracy in speech recognition. This is especially apparent under difficult conditions, e.g. when transcribing a compressed recording, with excessive background noise, when the sound has been recorded from a greater distance, etc. In cases such as these, the neural network is much more robust and the quality of speech recognition is retained more successfully than with the previous version.
From the point of view of machine learning theory, one of the advantages of neural networks is that if they are deep enough, the network is capable of essentially creating some of its own internal abstract indexes between layer which are far better than what man can produce from a processed signal through various sophisticated transformations and algorithms. However, even the previous algorithms had to be learned. This is not a novelty for neural networks. But it is important to remember that no system can learn entirely on its own. There always has to be a teacher for learning to be able to take place.
Under ideal conditions, where even the previous system worked well, we can expect a relative decrease in the error rate by 10% – 20%. This can increase the accuracy from 90% up to 91% even 92%. Under difficult conditions, where the previous system functioned e.g. with an accuracy of only 40% – 60%, we can now expect significantly better results, e.g. around 80%.
Learning with the aid of several hundred hours’ worth of voice recordings takes approximately 24 hours using one powerful graphics card.
What is the difference between the NEWTON Dictate program and the service for transcribing recordings?
NEWTON Dictate is most appreciated by those who want to take down notes, make log entries or dictate a text previously written by hand. In contrast, the the service for transcribing recordings is best suited for recognizing previously recorded audio files (such as recordings of interviews, meetings etc.). Recordings are best transcribed using our program NEWTON SpeechGrid.
NEWTON Dictate is designed for dictating general texts in standard language. It is available in Czech, Slovak, Polish and Croatian. To transcribe spontaneous speech or to dictate professional texts, it is necessary to use the corresponding general or specialized dictionary.
What is the minimum recommended computer configuration needed in order for the program to work properly?
The program requires a computer with a processor of min. Intel Core i5 (1.7GHz and up), 4GB RAM.
Supported OS: Microsoft Windows 10, 8 and 7 32-bit or 64-bit. Installation: Microsoft .NET 4 (is included in the package or available for download at http://www.microsoft.com/net/). Sufficient space on the hard drive is needed (up to 600MB for the general dictionary). A standard sound card supporting a sampling rate of 16kHz, with a 16-bit resolution. The program will also work on computers with lower performance, but in that case there will be a delay in the recognition process.
For dictation, it is advisable to use a so-called directional microphone, which, unlike the computer’s internal microphone, will only capture sounds in its immediate vicinity. A high-quality microphone is also included in the NEWTON Dictate package.
The application always tries to recognize the entire dictation. Therefore, unknown words are not left as blank spaces, but are replaced with what is judged to be the phonetically most similar variant. If you need to dictate an unknown word repeatedly, you can add it to the user dictionary. The application will learn the word and recognize it in the next dictate.
The recognized text can be saved in the standard RTF or TXT format. The application also retains the audio recording of your dictation, which you can then export in MP3, WAV or SPX formats. If you want to continue working with the text and sound recording in NEWTON Dictate, the program allows you to save the entire document in TTAX format.
If you want NEWTON Dictate to transcribe your dictation directly into another program, you can use the “MINI” feature, which writes the dictated text into the current location of the mouse cursor. This allows you to dictate into any application, information system or Internet browser.
If the program has trouble recognizing your speech, first check whether your microphone is selected in the settings and correctly positioned in front of your mouth. The introductory tutorial to the program will take you through the microphone settings step by step. An incorrectly set microphone is the most frequent cause of problems with the program’s functionality.
Yes, the program will automatically adapt to the voice of every new user, and is capable of eliminating the effects of minor speech defects such as the inability to correctly pronounce the sound.
For the recording to be good, make sure to record in an environment with as little background noise as possible. Speak up close to the recording device or microphone, but do not shout. If you plan on recording on your mobile phone, for example, talk straight into the phone as if making a phone call.
We recommend using a lavalier microphone (“clip mic”) or specialized podcast equipment. For recording in conference rooms, use high-quality conference systems.
If you need help choosing a recording device, feel free to contact our customer service.
If you can choose in what format or quality the recording will be saved, choose MP3 or WAV format. You may also use AAC (mp4 audio), VORBIS or OPUS.
The optimal sampling frequency is 16kHz. If you set a higher frequency, the result will not improve much, but your recording will be unnecessarily large. The second key parameter is the so-called bitrate. Set it to the highest value, at least 128 kbps.
Your recording will usually be stereophonic, that is, one track for the left and the other for the right sound channel. However, it should be noted that voice recognition is always monaural, meaning that both channels are merged into one before recognition occurs. If you can choose the recording mode, choose mono.
Note: MP3 files may contain a number of specific properties, such as embedded images, etc. These do not affect the quality of the transcription, but they can cause processing problems. Therefore, we do not recommend saving any additional information to the files.
Beey works with most video formats. However, with some non-standard formats, errors may occur during processing or while exporting subtitles, for example.
Note: With video and audio files, it is not possible to rely on the extension of the files, as with text documents or pictures.
We recommend using MP4 files. In some cases it will be necessary to save the file in the correct format before uploading it to Beey.
A more detailed description of the correct video format: MP4 container (file), MP3 or AAC audio track, H.264 video codec, faststart format with fragmented MP4 content. Video and audio files should have a constant FRAMErate and BITrate.
If you encounter problems with your file, please contact our customer support.
The quickest and easiest way to find out if your video is in the correct format is to play it in Google Chrome. All you have to do is drag and drop your file into the browser window or onto the browser icon on your desktop.
If the file starts playing, it is in the correct format and Beey should process it without any problems.
The Beey editor guarantees trouble-free processing of recordings up to two hours long. You can upload longer recordings if you need to, but bear in mind that the application may run slower and problems may occur while editing.
If you need to process a longer recording, we recommend dividing it into smaller segments before uploading it to Beey.
Leave us a message or call us. We will reply to your queries as soon as possible.
Na Pankráci 1683/127,
140 00 Praha 4