Frequently asked questions – NEWTON Technologies

FAQ

Voice technologies

The most frequently asked questions about voice technologies.

What are neural networks and how are they useful in voice technologies?

An artificial neural network is a computational model used in the field of machine learning (artificial intelligence). This computational model currently yields the best results in a variety of artificial intelligence applications (object recognition within an image, speech recognition, translation from one language to another…). The structure of this computational model is inspired by the structure of biological neural networks, e.g. in the brain. Accordingly, an artificial neural network consists of a large number of small computational units (neurons) which are interconnected in series and parallel circuits. Much like the brain, this network is also capable of learning – in the learning process, the weight of the connections between the individual neurons are adjusted in the artificial neural network. Any given neuron in the network then sends a signal to the subsequent layers only if the sum of its input signals, multiplied by the learned weighting factors, exceeds a certain threshold value (this is similar to the workings of neurons in the human brain).

What are the advantages of using neural networks in speech recognition, compared to the previous system?

The system of machine learning using neural networks offers a significantly higher accuracy in speech recognition. This is especially apparent under difficult conditions, e.g. when transcribing a compressed recording, with excessive background noise, when the sound has been recorded from a greater distance, etc. In cases such as these, the neural network is much more robust and the quality of speech recognition is retained more successfully than with the previous version.

Can your system for speech recognition learn on its own?

From the point of view of machine learning theory, one of the advantages of neural networks is that if they are deep enough, the network is capable of essentially creating some of its own internal abstract indexes between layer which are far better than what man can produce from a processed signal through various sophisticated transformations and algorithms. However, even the previous algorithms had to be learned. This is not a novelty for neural networks. But it is important to remember that no system can learn entirely on its own. There always has to be a teacher for learning to be able to take place.

How significant are the improvements brought by neural networks and where do they manifest?

Under ideal conditions, where even the previous system worked well, we can expect a relative decrease in the error rate by 10% – 20%. This can increase the accuracy from 90% up to even 98%. Under difficult conditions, where the previous system functioned e.g. with an accuracy of only 40% – 60%, we can now expect significantly better results, e.g. around 85%.

What does the process of “learning” in neural networks look like and how long does it take?

Learning with the aid of several hundred hours’ worth of voice recordings takes approximately 24 hours using one powerful graphics card.

FAQ

NEWTON Dictate

Everything you need to know about the program for the automatic recognition of dictated speech.

What is the difference between NEWTON Dictate and the BEEY service for transcribing records?

NEWTON Dictate can be used for dictating notes, meeting entries or re-dictating a document that was made earlier. On the other hand, the Beey application is suitable for transcribing and editing existing audio/video files (recordings of interviews, meetings etc.).

What can I dictate with the program?

NEWTON Dictate is suitable for dictating general texts in formal language. Corresponding specialized dictionary is required for dictating spontaneous speech or professional texts.

What are the supported languages?

The program is available in several variants, focusing on Slavic languages. It is therefore not available in English.

What is the minimal recommended HW specification needed for program to work properly?

Processor min. Intel Core i5 (at least1.7 GHz), 4GB ram

OS: Windows 8.1, 10 and 11; 32 or 64bit.

Min. 600 MB HDD. (with general dictionary)

A standard sound card supporting a sampling rate of 16kHz, with a 16-bit resolution.

Microsoft .NET framework version 4.8 or higher is required for installation of NEWTON Dictate version 5.1.0.86 or newer.

Our program does work on slower computers as well, however the transcription might be significantly delayed.

Can I use any microphone?

The directional microphones or headphones with microphone are recommend for use. You can find the recommended accessories on our e-shop.

If you are interested in recommended accessories, please contact us on [email protected]/[email protected]

The internal microphones built in computers are not suitable for dictating.

What will the program write if I dictate a word which is not in the NEWTON Dictate dictionary?

The application always tries to recognize the entire dictation. Therefore, unknown words are not left as blank spaces, but are replaced with what is judged to be the phonetically most similar variant. If you need to dictate an unknown word repeatedly, you can add it to the user dictionary. The application will learn the word and recognize it in the next dictate.

Which formats can I use for saving the resulting text?

The recognized text can be saved in the standard RTF or TXT format. The application also retains the audio recording of your dictation, which you can then export in MP3, WAV or SPX formats. If you want to continue working with the text and sound recording in NEWTON Dictate, the program allows you to save the entire document in TTAX format.

What if I need my dictated text to be written directly into another program?

If you want NEWTON Dictate to transcribe your dictation directly into another program, you can use the “MINI” feature, which writes the dictated text into the current location of the mouse cursor. This allows you to dictate into any application, information system or Internet browser.

What should I do if the program does not understand me?

If the program has trouble recognizing your speech, first check whether your microphone is selected in the settings and correctly positioned in front of your mouth. The introductory tutorial to the program will take you through the microphone settings step by step. An incorrectly set microphone is the most frequent cause of problems with the program’s functionality.

Can I dictate if I have a minor speech defect?

Yes, the program will automatically adapt to the voice of every new user, and is capable of eliminating the effects of minor speech defects such as the inability to correctly pronounce the sound.

FAQ

Beey

Your most frequent questions about uploading files to the Beey editor.

How do I make a good recording?

For the recording to be good, make sure to record in an environment with as little background noise as possible. Speak up close to the recording device or microphone, but do not shout. If you plan on recording on your mobile phone, for example, talk straight into the phone as if making a phone call.

What kind of microphone should I use?

We recommend using a lavalier microphone (“clip mic”) or specialized podcast equipment. For recording in conference rooms, use high-quality conference systems.

If you need help choosing a recording device, feel free to contact our customer service.

In what format should I save the recording?

If you can choose in what format or quality the recording will be saved, choose MP3 or WAV format. You may also use AAC (mp4 audio), VORBIS or OPUS.

Which technical parameters should I set when recording?

The optimal sampling frequency is 16kHz. If you set a higher frequency, the result will not improve much, but your recording will be unnecessarily large. The second key parameter is the so-called bitrate. Set it to the highest value, at least 128 kbps.

Mono or stereo?

Your recording will usually be stereophonic, that is, one track for the left and the other for the right sound channel. However, it should be noted that voice recognition is always monaural, meaning that both channels are merged into one before recognition occurs. If you can choose the recording mode, choose mono.

Note: MP3 files may contain a number of specific properties, such as embedded images, etc. These do not affect the quality of the transcription, but they can cause processing problems. Therefore, we do not recommend saving any additional information to the files.

Does Beey work with different video formats?

Beey works with most video formats. However, with some non-standard formats, errors may occur during processing or while exporting subtitles, for example.

Note: With video and audio files, it is not possible to rely on the extension of the files, as with text documents or pictures.

What is the recommended video format?

We recommend using MP4 files. In some cases it will be necessary to save the file in the correct format before uploading it to Beey.

A more detailed description of the correct video format: MP4 container (file), MP3 or AAC audio track, H.264 video codec, faststart format with fragmented MP4 content. Video and audio files should have a constant FRAMErate and BITrate.

If you encounter problems with your file, please contact our customer support.

How do I find out if the video file is in the right format?

The quickest and easiest way to find out if your video is in the correct format is to play it in Google Chrome. All you have to do is drag and drop your file into the browser window or onto the browser icon on your desktop.

If the file starts playing, it is in the correct format and Beey should process it without any problems.

What is the recommended recording length?

The Beey editor guarantees trouble-free processing of recordings up to two hours long. You can upload longer recordings if you need to, but bear in mind that the application may run slower and problems may occur while editing.

If you need to process a longer recording, we recommend dividing it into smaller segments before uploading it to Beey.

Didn’t find an answer to your question? Do you need advice regarding the selection or settings of our products? Leave us a message or call us on (+420) 225 540 120.

Kateřina Morozová

Customer support

Leave us a message. We will reply to your queries as soon as possible.

Na Pankráci 1683/127,
140 00 Praha 4
Czech Republic

E: [email protected]