Thursday, 10 December 2015

There you go: speech recognition solved

Deep Speech 2 architecture
(click to enlarge)
Just "Wow!" results from Andrew Ng and the team at Baidu. The first significant publishing of human competitive speech recognition performance is a real breakthrough, even if other non-public results may be lurking.

The paper is, "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin"  by Baidu Research – Silicon Valley AI Lab∗, Andrew Ng, etal, pushed to arXIV recently.

Google with Geoffrey Hinton, Andrew Ng, and others started much of the ball rolling with outstanding ImageNet results that are now human competitive. The father of getting Fukushima's Neocognitron from 1980 to evolve to learning as a convolutional neural net, Yann LeCun, has his team at Facebook beating human level performance on facial recognition. So, automatic speech recognition (ASR) getting to human level performance is not unexpected, but it is still a major achievement with enormous ramifications for the way we will work and interact within the world.

From the introduction,
"The Deep Speech 2 ASR pipeline approaches or exceeds the accuracy of Amazon Mechanical Turk human workers on several benchmarks, works in multiple languages with little modification, and is deployable in a production setting. It thus represents a significant step towards a single ASR system that addresses the entire range of speech recognition contexts handled by humans."

A nice grab from the paper showing human competitive performance.
(click to enlarge)
From the conclusion,
"End-to-end deep learning presents the exciting opportunity to improve speech recognition systems continually with increases in data and computation. Indeed, our results show that, compared to the previous incarnation, Deep Speech has significantly closed the gap in transcription performance with human workers by leveraging more data and larger models. Further, since the approach is highly generic, we’ve shown that it can quickly be applied to new languages. Creating high-performing recognizers for two very different languages, English and Mandarin, required essentially no expert knowledge of the languages.
We believe these techniques will continue to scale, and thus conclude that the vision of a single speech system that outperforms humans in most scenarios is imminently achievable."
Much still to be done, but that is just work. Exciting times.


Low cost e-commerce not possible in Australia

I wished to try out a PAM8403 audio amplifier IC. It's a modest 3W class-D audio amp that is very common. So onto E-Bay and Alibaba to have a look for an inexpensive module. The picture here shows the most common form factor I see. I ordered one from Shenzhen and hope it turns up like all the other little $0.20 to $2.00 items that magically materialise, usually at 6AM, by the front door.

One of the common PAM8403 modules
This module cost $AUD 0.48, including international postage, to buy and get delivered. You can do better per item if you need ten. My budget, aka wife's credit card, doesn't stretch that far.

Whilst the module may only cost a few cents to build, the total cost, module and postage, is below the rate for posting an empty letter domestically in Australia.

It is a remarkable logistics story as this is not unusual. If you're prepared to wait the 2-8 weeks for such little things, you're better off getting them from Shanghai, Shenzhen, or Hong Kong. They almost always turn up if you stick to reputable (i.e. highly rated, thousands sold) e-commerce front ends.

China Post is the real story here. They provide the smart logistics and bulk bundling that empower this machine. There is certainly a hint someone must be getting exploited in this distribution system to make it work like this, but who knows.

You can make a clever product, completely automate the build, and have competitive input costs but establishing distribution is always a problem. Direct is best for avoiding rent on distribution, but that is not always possible or wise depending on the product. What is undoubtedly true though, the spiderly low cost logistics of a China Post give SME's an unprecedented ability to start-up and grow, as long as you build and distribute from China or HK.



PS: It appears much of the credit, including for enabling ePacket in China/HK, goes to the Universal Postal Union which is now part of the United Nations. International mail was originally free when it arrived at the destination, but now there is a nett bulk weight differential paid, of approximately $1 per kg. The Straight Dope of 1990 vintage has a decent short explanation. It seems a clever hack for a sovereign nation to assist exporters take advantage of the UPU agreements, if only Australia Post could do the same...