Soft and hard combination, Baidu station artificial intelligence voice track

On February 16th, two news about Baidu were very interesting: one was to acquire the Raven team, and the other was to upgrade the secret team to the secret business department, and both sides reported to Lu Qi. This strategic choice of soft and hard integration is extremely correct.

This is not a post-mortem. I wrote in October 2016 that "I am very committed, but Li Yanhong may be missing artificial intelligence." One of the core ideas is that Baidu is the company that invests the most in artificial intelligence, but it does not touch hardware. The strategy is likely to cause it to eventually miss the most critical part of artificial intelligence. Although the general direction is correct, if the Internet company really wants to learn Amazon, it is still a little worse.

Soft and hard combination, Baidu station artificial intelligence voice track

What is the problem with the AI ​​of the voice track?

There is no shortage of products similar to Echo in China, but the fate of the previous ones is to smash the sand. The key here is that it is unclear about what is the core problem to be solved in such products. Free communication, wake-up words, speed, dialect, content, habits, etc. can all seem to be the core problem to be solved, but in fact all the core problems of players on this track are: the rigid constraints in semantics and dialects are not broken. Create products that match speed, accuracy, and content.

In other words, intelligent voice means that it is not a problem of wide adaptation. It is to work hard on the technology in the face of the mainstream people and do a good job of the experience. The space in the voice interaction is very small, and you can feel the experience in two sentences. The most concerned about this is the problem of semantics, because people always yearn for free communication, semantics is definitely a bottleneck of interaction, but we can decompose the speech interaction into two stages according to the technical progress of semantics:

The first stage is that there is no core breakthrough in semantics, which means that the ideal state will reach 75 points. At this time, the fundamental starting point for building products is not to expect free language communication, but to allow voice interaction to be commanded, but it also creates an excellent user experience. Echo's big position is such a product. At this time, the core bottleneck of the product is the front-end acoustics (microphone array + acoustic algorithm), which is the problem at hand, and it is impossible to solve the product experience. Based on this judgment, I started to make great efforts to find a company for the company at the end of 2015. In the end, this is the core reason for my investment in the wisdom technology in 2016. At that time, the logic was very simple: this problem is obviously better at the Institute of Acoustics of the Chinese Academy of Sciences, and the wisdom Technology is almost the only AI company founded by the Acoustics.

The second stage is that the semantics really make a breakthrough, which will undoubtedly expand the application range of voice interaction. Once this point is broken, the voice interaction will become ubiquitous, but we must admit when the semantics can be completely solved. Ok, this is a thing of the future.

This kind of confusion on the timeline is terrible, and it will challenge impossible problems in existing products, such as not using wake-up words, and expecting Echo products to do anything.

To sum up, we can say this: If you really want to make products like Echo, acoustics (microphone array + algorithm), speech recognition, the bottleneck in the three layers of semantics is acoustic, the bottleneck in the future is semantics, the former affects the current product sales. Do not sell out, the latter affects the size of the scope of application. But even if it is only the former, this is a new category that is sufficient for the standard Pad.

a little bit worse

The key elements of the path of soft and hard integration are: good product definition, sufficiently mature technical elements, and the ability to integrate products in the system (in fact, there are smooth sales channels). System integration requires a company to integrate soft and hard teams. Otherwise, different interests will make it difficult for soft and hard teams to cooperate with each other. In the early stage of the industry, interface standardization between different levels is very poor, which easily leads to product failure. Product definitions are highly dependent on individuals, depending on the capabilities of the parties and require a bit of luck for the company.

The only thing that is lacking is the technical elements that are mature enough. This is a very troublesome point. It is especially difficult for Internet companies, because it is not the computer algorithm and the physics that are lacking in this road. This is a big blind spot for Internet companies.

From this perspective, you can understand the difference between Apple and Google. Apple is paying close attention to physical aspects, such as materials, batteries, screens, sensors, etc. Google will obviously pay more attention to algorithms. So Apple's cloud technology is always bad, and Google is always a product. Jobs was so eye-catching that everyone saw his paranoia, madness and even artistic atmosphere, but in fact there was a group of people behind him who helped him solve the physical troubles. Now Cook is one of them.

This is not just a question for Baidu, but a systemic challenge for Internet companies in the new era. As I mentioned in the previous article, the rise of an industry is actually three big stages:

The first stage is the maturity of Enabling technology. The analogy is Qualcomm and MTK on mobile phones. Simetic technology plays a similar role in the voice industry chain.

The second stage is the maturity of the hardware products. The analogy is Apple II or iPhone 1 generation, and Echo is in the voice industry chain. Note that Echo is the starting point, not the end point, and is a product that has just crossed the available line.

The third stage is the maturity of software applications on the new hardware platform, such as Office on the PC and WeChat on the mobile phone. This has not yet begun in the voice industry chain.

The Internet has shortened the transition process between the first phase and the second phase, which is equivalent to the simultaneous occurrence of both, and it is therefore the need for soft and hard convergence. But the ability of Internet companies to solve the first problem alone will be weak, which is a little bit lacking.

Smoke will start today

Perhaps to boost the stock price, Baidu announced this matter in a very high-profile manner. This is also very interesting, because Baidu is likely to cause a chain reaction. Before this, everyone is basically in a wait-and-see state. When one of them really bets, other people will begin to systematically consider the impact that this person has made on me.

Once this matter begins to be scrutinized, Tencent and Ali will realize that this is a war that they cannot afford to lose. I mentioned this in the previous article, but it is necessary to repeat it again when Baidu takes action:

Let's make a basic assumption first, that is, Alexa has achieved great success (Echo, smart speakers, Alexa are related, but in fact it is different things, as mentioned in the previous article, it is not repeated here), infiltrated into each Among the devices, people's lives are surrounded, people spend 50% of their time dealing with voice and equipment, and they are approaching Android.

At this time, there will be such a demand, the user will say: Alexa, help me to inform Ma Huateng, I will not see him tomorrow. At this time, you need a communication IM to achieve this demand. At this time, Amazon has two choices: First, it accesses an existing IM, such as Whatsapp, or Skype. The second is to do it yourself in Alexa. Amazon has at least half the chance to choose the latter, rather than opening up this infrastructure to others because the first two are either Facebook or Microsoft. If Alibaba made Alexa in China, it is estimated that 100% will not choose to connect to WeChat.

In this way, with the core features of a voice interaction, subversive things appear: the various applications hidden behind the voice interaction are unique.

It's hard to imagine this kind of voice interaction: Alexa helped me with a message to Ma Huateng with Whatsapp, and I won't see him tomorrow. In the voice interaction, the identity of applications such as WhatsApp is likely to be optimized. If Alexa is still only a million-level, tens of millions of DAU applications, then this feature is not critical, but if it is a 1 billion DAU system, then the impact of this feature will be infinitely magnified, that is, search, IM, E-commerce is likely to have only one, not as small as it is now, but there are 1, 2, and 3.


Almost no one today doubts the establishment of voice interaction. The interesting thing is that this change only occurred in less than half a year. Now watching the 2017 bustle is likely to happen here, this is a very common thing: it will involve artificial intelligence (acoustics, voice semantics company), it will upgrade the smart hardware company, the original smart hardware has actually been despised But Echo is obviously not an all-in-one track but an artificial intelligence track. Ok, I want to do it as soon as possible. If I don't want to be in this industry, I can move a bench to watch the fun.

Mono Solar Panel

Monocrystalline solar panels have a higher efficiency rate in generating electricity from light and are a more space-efficient solution. Durable, long living and aesthetically pleasing, these cells are the ideal solution for residential and small commercial rooftop installations.

Mono Solar Panel,Solar Cell Panel,Perc Solar Panels,Solar Panel Monocrystalline

Wuxi Sunket New Energy Technology Co.,Ltd ,