Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Discussion:

Andre Natal

2014-10-30 23:18:06 UTC

I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx

The required patches for achieve this are:

- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]

Also, other important features that we don't have patches yet:
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]

The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].

At this comment you can see a cpu usage on flame while recognition is
happening [14]

I wish to hear your comments.

Thanks,

Andre Natal

[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14

Nick Alexander

2014-10-30 23:36:00 UTC

Permalink

First, Andre, let me offer my congratulations on getting this project to
this point. We've talked a few times and I've always been impressed.

Can you point me at Fennec try builds? I vaguely recall that these
speech recognition approaches require large pattern matching files, and
I'd like to see what including the Speech API does to the Fennec APK
size. We're pushing pretty hard on reducing our APK size right now
because we believe it's a big barrier to entry and especially to
upgrading older devices.

Nick

Andre Natal

2014-11-09 04:20:50 UTC

Permalink

Thanks Nick, I appreciate your help.

I created two versions of Fennec apk: one [1] with the english models
bundled (43.7 mb), and other [2] without it (34.6mb). This was the
mozconfig I used [3]

Actually, I had a conversation with Jonas Sicking some months ago and we
agreed that the ideal scenario about this is to allow the user to download
the package for the language he prefer from some sort of preferences
screen, instead ship them bundled into the apk.

[1]
https://www.dropbox.com/s/6snv6e3mqqcs4zi/fennec-34.0a1.en-US.android-arm.apk?dl=0
[2]
https://www.dropbox.com/s/zxxop34unj21r1s/fennec-35.0a1.en-US.android-arm.apk?dl=0
[3]
#DEBUG
#ac_add_options --enable-debug
#ac_add_options --enable-trace-malloc
#ac_add_options --enable-accessibility
#ac_add_options --enable-signmar
ac_add_options --disable-tests

# android options
ac_add_options --enable-application=mobile/android
ac_add_options --with-android-ndk="/Volumes/extra/android-ndk-r8e/"
ac_add_options
--with-android-sdk="/Volumes/extra/android-sdk-macosx/platforms/android-19/"

# FOR ARM
ac_add_options --target=arm-linux-androideabi
mk_add_options MOZ_OBJDIR=./obj-arm-linux-androideabi-debug

# FOR 386
#ac_add_options --target=i386-linux-android
#mk_add_options MOZ_OBJDIR=./objdir-droid-i386

Post by Nick Alexander

First, Andre, let me offer my congratulations on getting this project to
this point. We've talked a few times and I've always been impressed.
Can you point me at Fennec try builds? I vaguely recall that these speech
recognition approaches require large pattern matching files, and I'd like
to see what including the Speech API does to the Fennec APK size. We're
pushing pretty hard on reducing our APK size right now because we believe
it's a big barrier to entry and especially to upgrading older devices.
Nick
_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform

smaug

2014-10-31 00:21:04 UTC

Permalink

Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)

I wouldn't ship the implementation in desktop FF without plenty of more testing.

-Olli

Post by Andre Natal
I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.
The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx
- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]
The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].
At this comment you can see a cpu usage on flame while recognition is
happening [14]
I wish to hear your comments.
Thanks,
Andre Natal
[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14

smaug

2014-10-31 00:24:23 UTC

Permalink

Post by smaug
Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)
I wouldn't ship the implementation in desktop FF without plenty of more testing.

But I guess the question is what people think about shipping the pocketspinx + API, even if disabled by default.

Andre, we need some numbers here. How much does Pocketsphinx increase binary size? or download size?
When the pref is enabled, how much does it use memory on desktop, what about on b2g?

Post by smaug
-Olli

Chris Hofmann

2014-10-31 00:45:47 UTC

Permalink

Post by smaug

Post by smaug
Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)
I wouldn't ship the implementation in desktop FF without plenty of more testing.

But I guess the question is what people think about shipping the
pocketspinx + API, even if disabled by default.
Andre, we need some numbers here. How much does Pocketsphinx increase
binary size? or download size?
When the pref is enabled, how much does it use memory on desktop, what about on b2g?

This is important work and the competition is ramping quicky after many
years of promises about this year being the year of voice recognition.
We will probably fall behind quickly if we don't get something going
here in the next year.

Can you also talk a bit about what the plan and set of challenges look
like for expanding the supported languages, and how these would impact
the numbers ollie has asked for?

The place we really need this is b2g, but phones are only shipping in
international markets right now so english only is not all that helpful.

-chofmann

Post by smaug

Post by smaug
-Olli

Post by Andre Natal
I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU
pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.
The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The
preference to
enable it is: media.webspeech.service.default = pocketsphinx
- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]
The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].
At this comment you can see a cpu usage on flame while recognition is
happening [14]
I wish to hear your comments.
Thanks,
Andre Natal
[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12]
https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14

_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform

Andre Natal

2014-11-09 05:33:11 UTC

Permalink

Hi Chris.

For new languages, after the decoder get integrated inside gecko, you only
need to build new models (acoustic and language), since the decoder is
language agnostic.

The procedure of model building is the same for every language: in pretty
big picture, you need to record thousands of hours of spoken phrases
covering all phones of the aimed language from people of different genders
age, regions, accents and etc... all this data is compiled and transformed
in the acoustic model.

For the language model, you need to build a phonetic dictionary for that
language, to then allow tools that do grapheme-to-phoneme (like
phonetisaurus [1], e.g.) generate real-time phonetic representations of the
words input in your grammar.

Build models it is not a trivial task, and requires a closer work between
speech engineers and linguists.

Pocketsphinx offers some models besides English [2] and they have useful
tutorials about acoustic [3] and language [4] model creation.

Thanks,

Andre

[1] https://code.google.com/p/phonetisaurus/
[2]
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
[3] http://cmusphinx.sourceforge.net/wiki/tutorialam?s[]=acoustic&s[]=models
[4] http://cmusphinx.sourceforge.net/wiki/tutoriallm

Post by smaug

Post by smaug
Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)
I wouldn't ship the implementation in desktop FF without plenty of more testing.

But I guess the question is what people think about shipping the
pocketspinx + API, even if disabled by default.
Andre, we need some numbers here. How much does Pocketsphinx increase
binary size? or download size?
When the pref is enabled, how much does it use memory on desktop, what about on b2g?
This is important work and the competition is ramping quicky after many

years of promises about this year being the year of voice recognition. We
will probably fall behind quickly if we don't get something going here in
the next year.
Can you also talk a bit about what the plan and set of challenges look
like for expanding the supported languages, and how these would impact the
numbers ollie has asked for?
The place we really need this is b2g, but phones are only shipping in
international markets right now so english only is not all that helpful.
-chofmann

Post by smaug

Post by smaug
-Olli

_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform

Andre Natal

2014-11-19 07:16:42 UTC

Permalink

Chris,

I was discussing with sphinx leaders and we can build models from
audiobooks as well.

This approach saves a lot of time and enhances the quality since the
narrative is well accurate and clear.

We are currently defining a way to create hindi and brazilian portuguese
models.

Thanks

Andre

Post by smaug

Post by smaug
Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)
I wouldn't ship the implementation in desktop FF without plenty of more testing.

But I guess the question is what people think about shipping the
pocketspinx + API, even if disabled by default.
Andre, we need some numbers here. How much does Pocketsphinx increase
binary size? or download size?
When the pref is enabled, how much does it use memory on desktop, what about on b2g?
This is important work and the competition is ramping quicky after many

Post by smaug

Post by smaug
-Olli

_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform

Mark Hammond

2014-10-31 01:50:13 UTC

Permalink

Post by Chris Hofmann
The place we really need this is b2g, but phones are only shipping in
international markets right now so english only is not all that helpful.

While this doesn't change the point you are making in any way, FWIW,
Firefox OS phones are on sale in Australia via one of our largest
electronics retailers:

https://www.jbhifi.com.au/phones/Outright-Mobile-Handsets/zte/zte-open-c-handset-grey/624980/

http://www.gizmodo.com.au/2014/10/jb-hi-fi-is-now-selling-australias-first-firefox-os-phone/

Nice!

Mark

Andre Natal

2014-11-09 04:50:44 UTC

Permalink

Hi Olli,

Post by smaug
How much does Pocketsphinx increase binary size? or download size?

In the past was suggested to avoid ship the models with packages, but yes
to create a preferences panel in the apps to allow the user to download the
models he wants to.

About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3
mb [1]. I don't know which type of compression the build system does when
compiling/packaging, but should be efficient enough.

[1]
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libsphinxbase.a
2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39
/usr/local/lib/libsphinxbase.a
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libpocketsphinx.a
2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52
/usr/local/lib/libpocketsphinx.a

When the pref is enabled, how much does it use memory on desktop, what

Post by smaug
about on b2g?

On b2g, it uses memory only after the decoder be activated and loaded the
models. I did a profile in Zte Open C and here is the report [2] and here
the exact snapshot [3]. Seems ~ 21 mb is used after load the models.

In desktop mac os Nightly, the memory usage was of ~11mb.

[2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
[3] Loading Image...

Post by smaug

-Olli

Sandip Kamat

2014-11-14 23:36:23 UTC

Permalink

Hi Andre, I suggest let's update the wiki for these sizes (as well as other questions in this thread) so we can use that as a central place of info.

-Sandip

----- Original Message -----

Sent: Saturday, November 8, 2014 8:50:44 PM
Subject: Re: Intent to ship: Web Speech API - Speech Recognition with
Pocketsphinx
Hi Olli,

Post by smaug
How much does Pocketsphinx increase binary size? or download size?

In the past was suggested to avoid ship the models with packages, but yes to
create a preferences panel in the apps to allow the user to download the
models he wants to.
About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb
[1]. I don't know which type of compression the build system does when
compiling/packaging, but should be efficient enough.
[1]
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libsphinxbase.a
2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39
/usr/local/lib/libsphinxbase.a
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libpocketsphinx.a
2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52
/usr/local/lib/libpocketsphinx.a

Post by smaug
When the pref is enabled, how much does it use memory on desktop, what
about
on b2g?

On b2g, it uses memory only after the decoder be activated and loaded the
models. I did a profile in Zte Open C and here is the report [2] and here
the exact snapshot [3]. Seems ~ 21 mb is used after load the models.
In desktop mac os Nightly, the memory usage was of ~11mb.
[2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
[3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0

Post by smaug

-Olli

Post by Andre Natal
I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the
management
of
Sandip Kamat.
The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service. default = pocketsphinx
- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]
The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].
At this comment you can see a cpu usage on flame while recognition is
happening [14]
I wish to hear your comments.
Thanks,
Andre Natal
[1] http://cmusphinx.sourceforge. net/
[2] https://dvcs.w3.org/hg/speech- api/raw-file/tip/speechapi. html
[3] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/ show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/ show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/ show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/ show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/ show_bug.cgi?id=967896
[11] https://github.com/andrenatal/ gecko-dev
[12] https://air.mozilla.org/ mozilla-weekly-project- meeting-20141027/
(Jump
to 12:00)
[13] https://wiki.mozilla.org/ SpeechRTC_-_Speech_enabling_
the_open_web
[14] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051148#c14

Sandip Kamat

2014-11-14 23:53:18 UTC

Permalink

Hi Olli, In general for FxOS devices, the thought is to let the OEMs decide which language models they would like to ship with, preloaded. That way there is a partner choice based on regions, but also the users could directly download the packages they like. For now, since we are very early stage, we just have English support. We need help to build and test other language models in parallel.

Sandip

----- Original Message -----

Sent: Saturday, November 8, 2014 8:50:44 PM
Subject: Re: Intent to ship: Web Speech API - Speech Recognition with
Pocketsphinx
Hi Olli,

Post by smaug
How much does Pocketsphinx increase binary size? or download size?

In the past was suggested to avoid ship the models with packages, but yes to
create a preferences panel in the apps to allow the user to download the
models he wants to.
About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb
[1]. I don't know which type of compression the build system does when
compiling/packaging, but should be efficient enough.
[1]
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libsphinxbase.a
2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39
/usr/local/lib/libsphinxbase.a
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libpocketsphinx.a
2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52
/usr/local/lib/libpocketsphinx.a

Post by smaug
When the pref is enabled, how much does it use memory on desktop, what
about
on b2g?

On b2g, it uses memory only after the decoder be activated and loaded the
models. I did a profile in Zte Open C and here is the report [2] and here
the exact snapshot [3]. Seems ~ 21 mb is used after load the models.
In desktop mac os Nightly, the memory usage was of ~11mb.
[2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
[3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0

Post by smaug

-Olli

Post by Andre Natal
I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the
management
of
Sandip Kamat.
The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service. default = pocketsphinx
- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]
The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].
At this comment you can see a cpu usage on flame while recognition is
happening [14]
I wish to hear your comments.
Thanks,
Andre Natal
[1] http://cmusphinx.sourceforge. net/
[2] https://dvcs.w3.org/hg/speech- api/raw-file/tip/speechapi. html
[3] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/ show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/ show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/ show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/ show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/ show_bug.cgi?id=967896
[11] https://github.com/andrenatal/ gecko-dev
[12] https://air.mozilla.org/ mozilla-weekly-project- meeting-20141027/
(Jump
to 12:00)
[13] https://wiki.mozilla.org/ SpeechRTC_-_Speech_enabling_
the_open_web
[14] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051148#c14

Marco Chen

2014-10-31 02:27:48 UTC

Permalink

Hi Andre,

It is a nice work and expect the voice recognition on B2G.

Beside this final result, I am also interesting in the reason of you migrate from SpeechRTC -> emscripten -> Web Speech API.
Could you also share what is the factor triggered these transition? Then that can be the lesson learn for us.

ex: SpeechRTC -> voice recognition can't be performed on local.
emscripten -> performance issue? or license issue? or ?

Thanks,
Sincerely yours.

----- Original Message -----

From: "Andre Natal" <***@gmail.com>
To: dev-***@lists.mozilla.org, "Sandip Kamat" <***@mozilla.com>, "Olli.Pettay" <***@mozilla.com>
Sent: Friday, October 31, 2014 7:18:06 AM
Subject: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx

The required patches for achieve this are:

- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]

Also, other important features that we don't have patches yet:
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]

The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].

At this comment you can see a cpu usage on flame while recognition is
happening [14]

I wish to hear your comments.

Thanks,

Andre Natal

[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
_______________________________________________
dev-platform mailing list
dev-***@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Chris Mills

2014-11-03 11:58:10 UTC

Permalink

Awesome to see this mail, Andre!

And remember that we do have the pages set up on MDN ready to be filled in also.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

Once this is shipped, do you think we can find some time to start collaborating on these docs?

Chris Mills
Senior tech writer || Mozilla
developer.mozilla.org || MDN

Post by Marco Chen
Hi Andre,
It is a nice work and expect the voice recognition on B2G.
Beside this final result, I am also interesting in the reason of you migrate from SpeechRTC -> emscripten -> Web Speech API.
Could you also share what is the factor triggered these transition? Then that can be the lesson learn for us.
ex: SpeechRTC -> voice recognition can't be performed on local.
emscripten -> performance issue? or license issue? or ?
Thanks,
Sincerely yours.
----- Original Message -----
Sent: Friday, October 31, 2014 7:18:06 AM
Subject: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.
The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx
- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]
The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].
At this comment you can see a cpu usage on flame while recognition is
happening [14]
I wish to hear your comments.
Thanks,
Andre Natal
[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform

Andre Natal

2014-11-09 05:39:33 UTC

Permalink

Thank you Chris, sure we can do it!

Here we have a straightforward page with all objects and methods for the
Speech API we are aiming to do:

https://github.com/andrenatal/webspeechapi/blob/gh-pages/index_clean.html

Maybe we can start from it.

Thanks!

Andre

Post by Chris Mills
Awesome to see this mail, Andre!
And remember that we do have the pages set up on MDN ready to be filled in also.
https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
Once this is shipped, do you think we can find some time to start
collaborating on these docs?
Chris Mills
Senior tech writer || Mozilla
developer.mozilla.org || MDN

Post by Marco Chen
Hi Andre,
It is a nice work and expect the voice recognition on B2G.
Beside this final result, I am also interesting in the reason of you

migrate from SpeechRTC -> emscripten -> Web Speech API.

Post by Marco Chen
Could you also share what is the factor triggered these transition? Then

that can be the lesson learn for us.

Post by Marco Chen
ex: SpeechRTC -> voice recognition can't be performed on local.
emscripten -> performance issue? or license issue? or ?
Thanks,
Sincerely yours.
----- Original Message -----
Sent: Friday, October 31, 2014 7:18:06 AM
Subject: Intent to ship: Web Speech API - Speech Recognition with

Pocketsphinx

Post by Marco Chen
I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop

for

Post by Marco Chen
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management

Post by Marco Chen
Sandip Kamat.
The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx
- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar

objects.

Post by Marco Chen
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148

[6]

Post by Marco Chen
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]
The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].
At this comment you can see a cpu usage on flame while recognition is
happening [14]
I wish to hear your comments.
Thanks,
Andre Natal
[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/

(Jump

Post by Marco Chen
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform

Andre Natal

2014-11-09 13:12:24 UTC

Permalink

Hi Marco.

SpeechRTC was my first tentative with the platform. At early 2013 neither I
had enough knowledge about gecko internals as even b2g was at very early
stage (in the very beggining, Steven Lee needed to send me patches to gum
work properly), so the fastest path was capture and stream online. The
great part is that opus is pretty efficient plus nodejs + a speech server
wrapping pocketsphinx turned the whole roundtrip really fast.

But I knew that was not ideal for command and control / grammar, then I
started to research a direct port of pocketsphinx using emscripten. Did
work but three reasons made me move to a full cpp version:

1) the whole speech api frontend in gecko was ready to roll only waiting a
backend, and this, as we know was built in cpp;

2) my tests ran very well, but on peak [2] for example, performed slower
than on low end devices running android [3]

3) with emscripten, the model loading inside decoder's creation at each
reload ended very slow and I couldn't figure out how to keep the decoder
instance between tabs and reloads while in cpp this happens only once, due
Gecko's architecture

Post by Marco Chen
Hi Andre,
It is a nice work and expect the voice recognition on B2G.
Beside this final result, I am also interesting in the reason of you
migrate from SpeechRTC -> emscripten -> Web Speech API.
Could you also share what is the factor triggered these transition? Then
that can be the lesson learn for us.
ex: SpeechRTC -> voice recognition can't be performed on local.
emscripten -> performance issue? or license issue? or ?
Thanks,
Sincerely yours.
------------------------------
*Sent: *Friday, October 31, 2014 7:18:06 AM
*Subject: *Intent to ship: Web Speech API - Speech Recognition with
Pocketsphinx
I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.
The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx
- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]
The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].
At this comment you can see a cpu usage on flame while recognition is
happening [14]
I wish to hear your comments.
Thanks,
Andre Natal
[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform

Andre Natal

2014-11-09 13:14:35 UTC

Permalink

Sorry, I forgot the links:

2 - Speechrtc offline on Firefox OS (Peak):

3 - Continuous speech recognition on android with poc…:

Post by Andre Natal
Hi Marco.
SpeechRTC was my first tentative with the platform. At early 2013 neither
I had enough knowledge about gecko internals as even b2g was at very early
stage (in the very beggining, Steven Lee needed to send me patches to gum
work properly), so the fastest path was capture and stream online. The
great part is that opus is pretty efficient plus nodejs + a speech server
wrapping pocketsphinx turned the whole roundtrip really fast.
But I knew that was not ideal for command and control / grammar, then I
started to research a direct port of pocketsphinx using emscripten. Did
1) the whole speech api frontend in gecko was ready to roll only waiting a
backend, and this, as we know was built in cpp;
2) my tests ran very well, but on peak [2] for example, performed slower
than on low end devices running android [3]
3) with emscripten, the model loading inside decoder's creation at each
reload ended very slow and I couldn't figure out how to keep the decoder
instance between tabs and reloads while in cpp this happens only once, due
Gecko's architecture

Post by Marco Chen
Hi Andre,
It is a nice work and expect the voice recognition on B2G.
Beside this final result, I am also interesting in the reason of you
migrate from SpeechRTC -> emscripten -> Web Speech API.
Could you also share what is the factor triggered these transition? Then
that can be the lesson learn for us.
ex: SpeechRTC -> voice recognition can't be performed on local.
emscripten -> performance issue? or license issue? or ?
Thanks,
Sincerely yours.
------------------------------
*Sent: *Friday, October 31, 2014 7:18:06 AM
*Subject: *Intent to ship: Web Speech API - Speech Recognition with
Pocketsphinx
I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.
The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx
- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]
The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].
At this comment you can see a cpu usage on flame while recognition is
happening [14]
I wish to hear your comments.
Thanks,
Andre Natal
[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform