CVoiceControl consists of three executables:
First of all, you have to calibrate your microphone to use it
for speech recognition. Use
do this. When your sound hardware is prepared
you can use
model_editor to create speaker models.
These are the main objects needed for the speech recognition
cvoicecontrol is the actual speech recognition program.
These three components are described in the next three sections. Also the structure of the speaker models will be described more closely.
Microphone_config must be started by entering the
at a command prompt (Linux console, xterm, kvt etc.). First, the tool generates a list of available mixer and audio devices. If the tool fails at this time, it is because it could not find any appropriate mixer and/or audio devices in your system. In this case, make sure that your sound device is installed correctly and that your sound driver is working properly.
The microphone calibration process is divided into five steps. These steps can be run from the main menu. A step that has been completed successfully is displayed in bold face, a step that has not been completed is displayed in normal face and a step that can not be run at this time is not displayed at all.
The five steps are:
Select Mixer Device
Select Audio Device
Adjust Mixer Levels
Calculate Recording Thresholds
Estimate Characteristics of Recording Channel
microphone_config managed to detect your audio hardware
automatically the first two steps (
Select Mixer Device and
Select Audio Device) are displayed in bold, i.e. marked as
``completed successfully''. (The selected device files are displayed
in parantheses behind the menu entries.)
In this case you may continue with step three.
Nevertheless, if you have more than one sound card installed or if
you have to select a non-default mixer or audio device, hit
enter on the respective menu item and select a device from the list.
In case of doubt, stick with the suggested settings!
Next, run step three:
Adjust Mixer Levels.
Here, we try to estimate good values for the mixer channels
MICROPHONE IN (MIC) and (if available) INPUT GAIN (IGAIN). You will be
guided through the process by detailed information dialogs.
To succeed, this step strongly relies on your cooperation!
Initially, the MIC level is set to the maximum and the IGAIN level (if available) is set to the minimum value.
If an IGAIN channel is available then its level is increased while you speak at a conversational volume until the input signal is strong enough. Hint: Reasonable values for the IGAIN level on my system range between 1 and 8.
Next, the microphone level is reduced repeatedly while you speak at a ``maximum volume level'' until the incoming signal does not exceed an upper limit anymore. Hint: Reasonable values for the MIC level on my system range between 60 and 95.
Upon successful completion of this step, the next two steps are available for selection from the main menu.
Calculate Recording Thresholds from the menu.
During this step, we try to find reasonable energy levels at which to start the automatic voice recording and at which to stop the recording. Again, you will be guided through the process by detailed information dialogs.
In the next step
Estimate Characteristics of Recording Channel
the characteristics (like background noise etc.) of the recording
channel are estimated. Again, there is online information to
guide you through the process.
If all five steps have been completed successfully, the item
becomes available in the main menu. Please select it to
store all the gathered information to the file
is put in the directory
.cvoicecontrol in your home directory.
.cvoicecontrol is created if necessary.
If the configuration has been saved successfully you can leave the
configuration tool by selecting
Exit from the main menu.
Congratulations, your microphone is set up for speech recognition!
CVoiceControl is a template-matching based speech recognition system, i.e. for each command that can be recognized there have to be some sample utterances which an incoming utterance can be compared to. All this stuff is collected in a so-called speaker model.
A speaker model consists of a variable number of reference items where each reference item corresponds to a command that can be recognized. A reference item consists of a label (a transcription of what is said), a command (a unix command that is executed upon recognition of this reference item) and a variable number of sample utterances.
Roughly speaking, to recognize an incoming utterance, it is compared to all sample utterances of all reference items in the active speaker model. If the sample utterances of one reference item are most similar to the incoming utterance (i.e. have the smallest distance score), this reference item will be chosen as recognition result.
To launch the speaker model editor open a console and type:
From the main menu of the editor you can reset the current speaker
New Speaker Model), load one from file
Load Speaker Model), edit the model (
Edit Speaker Model),
save it (
Save Speaker Model) and leave the editor
The model editor shows the reference items of the current speaker
model in a table view, one reference per line. A reference item
in the table can be highlighted (selected) using the up and down
At the bottom
of the dialog a brief summary of keyboard commands is displayed
for your convenience. Press
a to add a new reference item to
the model, press
d to delete the currently highlighted item,
Enter to edit the currently highlighted item and press
b to return to the main menu.
So for example, to add and edit a new reference item,
a followed by
Edit Speaker Model Item:
Selecting a reference item by pressing
Enter opens the
item editor dialog. This dialog displays the label and
command of the selected item as well as a list of donated
sample utterances. A brief summary of keyboard
commands is displayed at the bottom.
Sample utterances in the list view can be highlighted using the up
and down cursor keys.
To record a new sample utterance press
r. The recording is
then done automatically, i.e. no further keyboard interaction is
required to record the utterance. Note: After pressing
should wait a second or so before starting to talk! This is because
an audio buffer needs to be filled before the actual automatic recording
can be started!
To delete a highlighted sample utterance press
d, to play it
To edit the label string of the current item press
To edit the command string press
To leave the current dialog press
Important: Listen to every utterance you record to make sure that nothing has been cut off at the boundaries! If many utterances are cut off, please rerun the microphone configuration tool!
Note: To ensure a good recognition quality, a minimum number of sample utterances per reference item is required. By default, the minimum number is set to ``4''.
Note: Recognized commands are executed in the foreground by default. This means that the speech recognizer blocks until the executed command has finished! This behaviour is required because many sound cards do not allow for recording and playing at the same time. So, if one wants to output any acoustic reaction to the sound card, the speech recognizer will need to wait until the command was executed before continuing in auto recording mode. If you want to have the speech recognizer run a command in the background and continue with recognition you have to append a ``&'' to the command!
By the way, the command may consist of a sequence of commands separated by ``;''.
Important: If a reference item has been recognized by the speech recognizer the associated command will be executed! There is no guarantee that the recognition result is correct. Also, the speech recognizer does not check whether the execution of a command would harm your system (we talk about commands like
rm). Thus, it is the users responsibility to define harmless commands in the speaker model and to make sure that the reference items in a speaker model are not too confusable!
Once you have finished editing the speaker model, save it to disk
Save Speaker Model from the main menu. Note that speaker
model files must have the extension .cvc. If you do not
specify this extension it will be appended to the file name
To start the speech recognizer open a console and type:
% cvoicecontrol <model_file>
<model_file>is the name of the speaker model you want to use. The speech recognizer enters auto recording mode automatically.
Note: Make sure that no application needs access to the sound device at this time, as most sound devices only allow for exclusive access!
After a command was recognized successfully the speech recognizer reenters automatic recording mode, being ready for the next speech command.
To finish the program, you have to kill the speech recognizer explicitely
Ctrl-C in the console where you started the recognizer
or by issuing the command
killall cvoicecontrol from any command prompt.
Hint: There is also a special command name that can be used in a speaker model's
reference item to finish cvoicecontrol. It is called
Note: The speech recognizer can be started in a special mode by
specifying the command line option
--once, i.e. by starting it
the follow way:
% cvoicecontrol --once <model_file>
In this case, the speech recognizer will exit automatically after the
first successful recognition run. The exit code of the program is set
to the id number of the reference item that has been recognized.
As an example let us consider a speaker model
contains two reference items. The first one being ``Yes'', the
second one being ``No''. Invoked like
the speech recognizer returns 0 if ``Yes'' was recognized and 1 if ``No'' was recognized. Using speech prompts in shell scripts is then straightforward. Example:
% cvoicecontrol --once yes-no.cvc
#!/usr/bin/tcsh cvoicecontrol --once yes-no.cvc set result = $status if ($result == "-1") then echo "Error!" else if ($result == "0") then echo "You said yes" else if ($result == "1") echo "You said no" endif exit
Note: In a
tcsh script the shell variable
always contains the exit code of the most recently executed
command! To obtain the exit code in a
bash script you have
to use the special parameter $?.
Have fun with CVoiceControl!