Internet Conference: How much do you know about AI's “not reliable”?

On November 17, the third World Internet Conference entered the second day. Sogou CEO Wang Xiaochuan shared the current bottlenecks and hopes of artificial intelligence technology represented by deep learning at the meeting, and announced the real-time translation technology of Sogou's self-developed machine for the first time on the spot.

Wang Xiaochuan said that after AlphaGo, everyone paid attention to the progress of artificial intelligence technology, but in fact, in the field of text, the progress of artificial intelligence is still relatively slow. Today, there are some breakthroughs in machine translation, but in terms of question and answer and semantic understanding, not enough.

In this regard, Wang Xiaochuan pointed out two major challenges: First, the speech recognition in the multi-person scenario. At present, Sogou's technology can achieve 95%-97% recognition accuracy in a quiet environment, but once two people speak at the same time, The machine could not be identified, Wang Xiaochuan said that this issue still has no solution in academics.

Another challenge is semantic understanding. Wang Xiaochuan said that Google’s previous solution was the knowledge map, but now it has encountered bottlenecks. For example, the machine asks the user if it needs to stop. The user has to answer or not, but if the answer is “I don’t have a car”, the machine can’t understanding. "The processing of natural language can be done, but the understanding of speech is still an unreliable stage."

Sogou Wang Xiaochuan shared AI's “unreliable” and released real-time machine translation function for the first time.

Wang Xiaochuan also mentioned the reliability of driverless driving. He believes that driverless cars can be used in closed scenes, but in a truly open environment, with current technology, it is still not safe.

Although there are still many difficulties in artificial intelligence technology, Wang Xiaochuan believes that search and input will still be the two major areas where the technology has potential applications, and this is the two core businesses of Sogou.

Wang Xiaochuan believes that the future of search should be a question and answer robot, and for input, he said that the ultimate in this technology should be able to start looking for information and help users think. Later, Wang Xiaochuan also showed Sogou's latest real-time machine translation function in the live demo video.

The following is a speech record:

I really like the sharing that Mr. Zhang Yiming just shared. As a company with a short start-up time, I have achieved such a result today, both technical and emotional. I have listened to 12 games and the last one is my understanding of artificial intelligence.

There are technologies and products in the middle of the first 12 games. I hope that I can share some different contents for you and have their own unique perspectives.

At the beginning of the day, everyone mentioned AlphaGo. As a beginning of today's artificial intelligence detonation, deep learning took the most important responsibility in the middle. When everyone starts to think about it today, it is possible to think that the future of artificial intelligence will really replace people. I hope that today's sharing will know more about what artificial intelligence can do today, what can't be done, and what the ultimate ideal of the future is. After AlphaGo, we saw that the most important domain breakthroughs were in speech and images, but in fact the progress in the field of text is slow. Today there will be some breakthroughs in machine translation, but more questions or answers or other pairs of speech. The understanding is not enough.

So, we went back to the Turing test outside of AlphaGo. In the 1950s, Turing came up with the concept of a question-and-answer machine and a dialogue system. In the beginning, we have an intuitive feeling today that the speech image is progressing very fast, but the processing of natural language is slow.

Putting aside technology, as a product manager, I will mention that artificial intelligence has three product directions. We talked about it today, one is recognition, speech recognition, image recognition, and visual recognition. Another one that you mention is more creation. Based on a picture, we generate its text description, generate music, and generate images. Another thing is judgment, that is, we make decisions. Among these three things, I talked to some people who are investing. I told them that the most important and significant business significance lies in the judgment. Today, when we share, we have mentioned similar concepts.

Everyone mentioned several levels of advancement of artificial intelligence. I want to describe it in a different language, that is, engineers will be in an increasingly important position in today's artificial intelligence era. We began to mention that the traditional method is to hand over the rules to the machine. With the development of statistical systems, including deep learning, we began to more easily hand over the answers to the machine. The supervised learning that Tang Daosheng said just now is such a way, so with enough data accumulation, we can make the machine smarter.

The most cutting-edge way is to hand the target to the machine. AlphaGo incorporates several sets of algorithms. But when I communicated with their engineers, the intensive learning of such a goal to the machine was not mature. In other words, if there is no previous 30 million game console game, such an AlphaGo machine is not capable of winning humans only through reinforcement learning. I will feel that at the technical level, this is a key point that needs to be broken down. If we give the machine a new breakthrough in self-learning, then we are closer to the new era of artificial intelligence.

I went to London, England in June this year and also communicated with engineers at DeepMind. I am particularly curious about the fact that the fourth game of the game was lost. The first question I encountered was the fourth game. They told me that it is not a bug in the program, or a deep bottleneck in deep learning. The Go game is March, I went to London in June and it has been three months. 3 months time. Sorry, this problem has still not been resolved. But it's good that in the first week after I left, their program was able to correctly face the previous fourth game, but asked him if the bug was fixed? The engineer told me that there is no, but the specific problem in the fourth game is exactly what the machine can solve. However, we still don't know under what circumstances such an AlphaGo will continue to go wrong, so we know that deep learning such a system still has its bottleneck. . Therefore, I want to talk more to you today about the artificial intelligence technology represented by deep learning. What is not reliable is not applicable on the product.

The first question is to ask about speech recognition. Just now Baidu and Tencent have mentioned the ability of speech recognition. Today, my presentation with everyone also talked about speech recognition. This is Sogou's own technology. In a quiet environment, our recognition accuracy is 95% or 97%, but once there is a rapid drop in noise, this noise may be just the noise of the car engine, the noise of the wind, we enter the noise into the supervised learning system as raw data. Turn this noise into one of the problems that the machine can see. But what if two people are talking at the same time? I can tell you that there is still no solution in today's academic world. At the same time, two people are talking. We have never seen such noise. I can't do any training in advance.

In June of this year, I was also asking people in the academic world how to solve the difference between people and machines in speech recognition. We can use the stereo method to make the orientation recognition on our machine. We can make a matrix of microphones. We can know that one of them is talking and the other person is removed by stereoscopic way. Is that person doing this? If I plug one of my ears, can I not tell the person who speaks between the two people, or record the voice of the two people in a mono tape, can people still recognize it? What do you think of everyone here? People are ok, so people's methods are not the same as machines. Therefore, I am talking to some doctors. How do people recognize them? Is it because the sounds of two people are different, is it that one person has a loud voice, one person has a small voice, or is it because they speak different languages? The doctor laughed, he said, but when two people talk at the same time, as long as they can find a difference, one can identify one of them, so people are still very different from the process of machine processing. I am not here to start. We believe that speech recognition is already in the most mature field of machine intelligence and is quite different from people.

Is the other thing semantically reliable? It is the understanding of language. Google used to solve the problem with the knowledge map, and now it has encountered bottlenecks. In June of this year, I saw their most advanced man-machine dialogue system in a lab that can help you order and book a hotel. In the dialogue process, the performance of the machine is very amazing, let us go to try, there is a link in the middle, please pay attention, the machine starts to ask you, do you need a parking space, or do not stop? If this time, we answer yes or no, no problem. We said, I have no car. Do you know what the machine is like? It doesn't understand at all that if I don't have a car, it means that I don't need a parking space. This is because today's machines are not enough in the understanding of natural language concepts. Therefore, the processing of natural language can be done, but the understanding of speech is still an unreliable stage.

This year, Google also released an engine that can analyze sentences in natural language. The subject, predicate, and object can all be proposed, but the accuracy may be 90%, and it will not be mentioned. It also says why. What? It is because at this time, relying on statistics and grammar is not enough to support it. It is necessary to understand the specific concepts in the sentence to eliminate ambiguity. Just like we know that it is impossible to put a road on a refrigerator. It's a very simple thing, but the challenge to the computer is very big, so this is not enough for the artificial intelligence represented by deep learning.

The very sensitive question is, is driverless driving reliable? Today, Baidu also proposed to release an unmanned car at the scene, but from my understanding, if we use today's human technology, it is really usable for the scenes we saw, the driverless cars for closed scenes, but For a truly open environment, it is not just a car that runs at high speed or on the Fifth Ring Road. Sorry, it is not safe to use human technology. Because this scene as long as it has not seen it, it may make a serious mistake, just like AlphaGo plays chess, it will suddenly go crazy, so we may call assisted driving is feasible, unmanned before there is a new technological breakthrough, I don't think I can do it.

Therefore, the weakness of deep learning today is much more raised, and it is opaque, so its reliability is limited, lack of reasoning ability, and especially lack of understanding of symbols. If you can't understand the symbol, the understanding of natural language becomes a bottleneck. Even so, we mentioned that it can replace some industries, such as chess players, doctors, drivers, machines can do a good job in it, but for some creative things, planning, research that we have not seen before. In fact, it is very difficult. The machines we see in the media today can automatically write articles and draw pictures automatically. I think it is more to show us some examples of it at the scientific level, but it is not the stage to replace the real use of people.

So here I first lowered everyone's expectations for artificial intelligence, and some people are asking if there will be a third low tide? The first two times we all think that artificial intelligence is coming, but this time it may be better than before. Before the previous two artificial intelligence ebbs, we asked a teacher, are you studying artificial intelligence? This is what he is saying, because artificial intelligence is not reliable. This time it was the first time it really came into use. It really does better than humans in language processing, sound processing, image processing and some high-dimensional data spaces. Therefore, the difference this time is that a large amount of capital and capital are invested in artificial intelligence, and a large number of researchers are engaged in artificial intelligence after graduation. This is different from the previous work, so on the one hand, on the other hand, on the other hand, on the other hand We are beginning to look forward to new breakthroughs.

I personally are optimistic about this time, but I will be very nervous, maybe our own search engine will be part of the subversion.

In this, I want to start thinking about where the future is. From my own description, the future of search is the crown of the era of artificial intelligence. Why do you say that? What is the future of the search? What is the future of artificial intelligence? Why is the crown? In short, I think the future of search is a question and answer robot, because we are used to one thing. When we do a search, we first enter the key words and then give you 10 results or 10 links, but this is true. Is the best way? Certainly not enough. We will also mention whether we can make the results of the search more accurate with a personalized approach, but in fact personalization can provide very limited information. The way to really make this system useful is to use questions. One reason I didn't have to ask questions before was because the machine didn't understand what you were talking about. It’s much better after it’s time to ask questions and change it from giving you 10 links to giving you an answer.

We can imagine that if you ask the machine four words, "Wuzhen Conference", this information can not give you the content you want, at most you can only introduce the news, Wuzhen's encyclopedia or official website, but if you ask Wuzhen Which day does the conference open? This time the machine will have a chance to give you a better answer. So I believe that as technology breaks through, search engines will naturally turn into a question and answer engine.

In this road, many companies are doing it, including Apple, Microsoft, Amazon, Google, and everyone who started the dialogue system at the beginning is known as Apple's Siri. This system is not successful. There are very few people in China. Know what it is like to use English now? Why is it not successful? The simple reason is that the current technology has not yet arrived. Now our ability to understand natural language processing and natural language is very limited. I wonder why Apple, such a company with the utmost pursuit, will release this system. One possibility is that Apple doesn't know enough about technology. Another possibility is that I think it is a wish of Jobs. We know that when I released the iPhone, he was already lying in the hospital bed and watching the press conference. After the conference was completed, he I soon passed away. So this is like a premature baby released in the iPhone 4S, so I think such a system represents Steve Jobs's ultimate human-computer interaction for human beings.

In fact, in a large number of literary works, movies, science fiction, the question and answer machine will be mentioned, whether it is "Star Wars", "Super Marines", "Interstellar Crossing" will be mentioned. One of the greatest sci-fi writers, Asimov, also has a short film in his novel called "The Last Question." He describes that human beings create a machine to use the energy of the entire planet and the entire universe. This machine can't answer the ultimate question, how is this universe born, but it can answer other questions. So literary works actually represent a reflection of our question and answer machine.

In addition to the search engine to ask questions and answers, we know that the input method of Sogou in China has 300 million users. What is the future of input method? I will also mention that it is related to automatic question and answer. Show everyone a video. (play video)

When we discussed the input method, many friends told me that voice is the most important. Sogou has complete speech recognition technology and speech synthesis technology, but in my heart, this is far from the extreme of input method, its true ultimate is Be able to start looking for information to help you think. What you just demonstrated is a sharing ability. The real answering ability can give you a new demonstration later. (play video)

Input method may be the best cut-in scenario when we discuss question-and-answer techniques and discuss human-machine dialogue. We mentioned Baidu's secret or Google's Assistant. It is an independent engine, but the input method is a human being. The concept mentioned by Yu Chengdong is easier to help you build people's thinking. It also starts with a pinyin tool and goes to a system of dialogue and question and answer.

Sogou We have two core products, one is input method, the other is search, it refers to expressing information and obtaining information, making expression and acquisition easier. With the development of AI technology, we are really able to better liberate people's thinking. So we have a concept, two things, one is to do natural interactions, not just voice, but language. In addition, it is the calculation of knowledge that allows the machine to gradually build the ability to reason. Sogou input method has the largest accumulation of linguistic data, and we have the best chance to make breakthroughs in this field.

Energy Storage System

Energy Storage System,Solar Energy Storage System,2Mw Energy Storage System,Lifepo4 Energy Storage System

Jiangsu Stark New Energy Co.,Ltd , https://www.stark-newenergy.com