When will Quora have a voice interface

Brian Rummel: What is the meaning of Siri?

The time has come to talk about the role Siri technology is playing in the world of modern technology, or may play in the future under certain circumstances. Brian Roemmele dedicated the topic "Why is Siri Important?", Published by Quora. The answer to this simple question is quite extensive, so the reader is advised to stock up on drinks, chips, and be patient. Such long texts are difficult to read on an empty stomach. So what's in Siri that allows you to talk about importance, not just usefulness and attractiveness? The review is full of forecasts, assumptions, and assumptions. It is difficult to say what the reasons for such conclusions are.

It's not just a voice recognition system

It's tempting to think of Siri as just another speech recognition application, but that would be a mistake. Siri is much bigger than simple speech technology. It is even more than the dynamic infrastructure of artificial intelligence and not a self-learning and context-oriented system. Siri is all of the above and even a little more than that, something that gives the user a sense of real interaction with virtual assistants. About this you can briefly say: "Two or more components working together, making it possible to get the result, inaccessible if you use each [the components] separately". None of the components is an absolute innovation, but in combination, implemented in Siri, there are previously unheard of possibilities.

One day, the dream of researchers in the field of computer technology was to develop a device that would become properly talkative and sensible in order to maintain dialogue with the person. Brian Rummel speaks of the rich experience of interacting with speech recognition systems, which is rather amusing and does not lead to a solution to the task. So far there has been no such combination of technologies and their interaction. Siri is a by-product of the development complex over the past few years.

Apple is actually the embodiment of forty years of work by scientists

Siri embodied the result of four decades of research initiated by DARPA through the SRI International Center for Artificial Intelligence (SRI). Siri Inc. was created in the depths of SRI Intentional as part of the Personal Assistant That Learns programs ("Personal Trainee Assistant"); "Cognitive agent who learns" and also organizes the program (CALO). Details of Siri's story were found in the article "9to5Mac: Sensational Interview with the Creator of Artificial Intelligence iPhone 5" on October 4, 2011, with which each of our readers has the opportunity to review.

The projects were led by groups of scientists from Carnegie Mellon University, the University of Massachusetts, the University of Rochester, the Institute for Human and Machine Cognition, the University of Oregon (Oregon State University), Southern California University (University of Southern California ) and Stanford University (Stanford University).

Technology is waiting for its time

There were several turning points along the way of the previous forms of speech recognition and artificial intelligence systems. These turning points mainly occurred when new computing skills and working models of a person with technology emerged. Moore's Law, the Internet, and Apple offered practical opportunities, and forty years of university research has been done by Siri. So the corresponding device finally found the right language technology.

There are three main characteristics of Siri technology:
- voice interface
- Ability to act depending on the situation, depending on the context
- Access to services

The fourth generation of computer interfaces

Don't forget that the current Siri is version 1.0, so it is only useful to compare versions 1.0 of other products. Siri marks the beginning of the fourth and perhaps most important way of interacting with devices. Mechanical user interfaces (keyboard, mouse, and touchscreen) are too early to take into account yesterday's interfaces. They will coexist with the voice interface for a long time. Brian Rummel had previously predicted the emergence of a new series of gestures for the touchscreen and holographic displays based on Apple patents. This article is dedicated to his article "How will Apple's new 3D display technology and 3D hand movements work?" ("How do 3D gestures and the new technology of 3D displays work?").

The person asks the questions and the device answers him. This is the most effective way to interact with a machine, if only because that's how people communicate with each other. The main problem on the path of human interaction with the computer was and still is the need to reformulate a simple question so that the computer can understand it. To dream of the correct answer to the question the computer asks immediately (the "right" means what is necessary for the user) is not necessary in the short term. The idea of ​​asking the device a question in the same way that questions are asked by friends and library staff, that is, using language, attracts the reader.

Sophisticated baby

Currently, the iPhone screen is small, even the iPhone 5 screen, known from rumors, is still very limited in size. In such a small device, the voice interface is very convenient, as a mini on-screen keyboard cannot be compared to a full computer keyboard. Siri is not only, and not so much, an interface to the search engines. It is used to get all kinds of information "on the fly" that a user may need at any time. Given the limitations of the mobile form factor, the capabilities of the pseudo-intelligent assistant look even more attractive.

A small screen and a relatively slow connection to the Internet do not often allow the same question to be phrased and rephrased. Here comes the usual speech, understood by Siri, to the rescue. This interface offers a number of advantages. The user just has to ask the right questions and Siri will give them pretty detailed answers. This interaction approach allows you to feel the acceleration of the process of getting the result without setting up intermediate tasks, the result of which may not be what the user expected.

In the way it is transported, you just can't have enough time to navigate the pages that are linked and toggle between different applications to get a simple answer to a simple question. Just one question you asked can sometimes replace two dozen preliminary measures. And therein lies the power of Siri.

Siri's purpose is to complete the task assigned to it

With conventional input systems (mechanical user interfaces) it is difficult to keep track of all intermediate tasks. To get an answer to the question, you need to go through it step by step.

Each step has to be taken by the user as there is no other option. With the help of Siri, one can get rid of many of these manual actions and reduce them to a common question. Siri's actions are described by three basic conceptual models.

What does it do for you, bring the task closer?
- Selects multiple vertical and horizontal search criteria
- "On the fly" combines information from different sources
- Information is processed in real time based on dynamic criteria
- Bring the solution to the problem to the endpoint (for example, before buying a ticket)

Recognizes the intentions of the user in his words, taking into account:
- Geographical context
- Temporary context
- Task context
- Dialogue context

Recognizes his user as a person, studies information about him and takes it into account in his work, in particular:
- Who are the user's friends?
- Where does the user live?
"How old is he?"
"What does he like?"

Imperceptibly to the user, Siri does the hard work to get an acceptable result.This work includes:
- awareness of the location
- time awareness
- Awareness of the task
- semantic data
- Cloud application programming interface
- Models of tasks and domains
- voice interface
- Clarification of the nature of the problem
- Convert speech to text
- Conversion of text to speech
- dialogue process
- Access to personal information and demographic information
- social graphics
- social data

Of course, the dual-core A5 processor also has an impact on the end result of the work, but the focus is on cloud computing in connection with the preparatory conversion of speech to text on the client side.

Practical use

Siri was demonstrated on October 4, 2011. To start working it is very simple, just click the appropriate button and you can ask a question. This type of interaction is called "press to ask" ("click to ask a question"). There is also the option of using an accelerometer, this feature is known as "lift to ask". Siri can stay active in voice perceptual mode for a long time when the user needs to work closely with their virtual secretary. In this case, no manual activation actions are required. The possibility is still purely theoretical and will most likely only appear in later versions when a number of noise reduction algorithms are developed, as well as more elegant methods of implementing active speech perception. Siri is also optimized for use with Bluetooth 4 headsets, increasing the level of problem highlighting through a continuous flow of speech. In the future, Siri will be constantly active and will insert its comments and answers "within reasonable limits", even if there is no direct question. This will bring the interaction with the device closer to familiar communication with friends.

New infrastructure: the interface for access to the "cloud" database

As people become aware of how Siri is used, understanding will grow: an entire layer of very popular applications (and the business models built around them) are redundant, or at least less useful, than is currently believed. As part of the new model, there will be enough API to access Siri and return the result from Siri. It is possible that over time an ecosystem will be formed around Siri and its cloud infrastructure that also allows third-party developers to join. Brian Rummel immediately warns the reader that he by no means foresees the disappearance of applications. He believes that in time we will see them adapt to the new ecosystem that will form around Siri. The process of developing and adapting business models to this new trend will be extremely interesting. It is possible that the possibilities offered by the cloud-based programming interface Siri are comparable to the functions that the iTunes App Store has created. Obviously, this is about the benefits that end users and developers get.

Siri will form the ecosystem of the remote buffer's cloud-based software interface. In its simplest form, this interface will evaluate the importance of available Siri data from the Internet. The concept "ontologies as specification" ("ontology as specification") was formed by the founder and head of Siri's technical department, Tom Gruber (Tom Gruber), who is now working on the development of his descendants in Apple. He approached this in order to get access to data on the Internet in order to extract something useful from it. With the help of a special programming interface and a persistent programming interface, an ecosystem can be formed that enables relevant data to be obtained more quickly and easily.

It is important to know that the programming interface that Tom is addressing is a cloud-based programming interface for the remote buffer. Access to it is only done by the Siri engine through the relevant requests that are relevant. The user will not contact the program interface himself, as is now the case. This is Siri for him to do. Brian Rummel has doubts whether Apple will open a programming interface for third party developers and is pretty sure that Cupertino will retain the right to control the API and data sources directly.

Developers are waiting for difficult times in which to grapple with the peculiarities of the new reality of the Semantic Web. Brian Rummel covers this topic in more detail in the article "What Should Application Developers Need to Know About Siri in order to Interact With It?" ("What should application developers do to interact with Siri?"), Published by the same source.

Apple Garden behind a high fence

Apple has always been an example of "walled garden". In the previous Steve Jobs return period, this concept nearly bankrupted the company. At the same time, it is the "high fence" that underlies Apple's current success. This approach gives the impression of a complex range of emotions: frustration and uniqueness, but also well-founded fear. The story of Apple from the first Apple II to the iTunes Store stretches before us as a story of puzzles. Brian Rummel describes Apple's concept as follows:

Apple wants to own a garden and at the same time invites everyone to play in it. The walls of the garden brought Apple unimagined wealth. Some don't like the garden path they choose, but few out of the thousands of businesses can compete with its success.

Erected around Apple Garden, the Wall of Mystery is the last bastion Apple can turn down against competitors and the smartphones they offer. But one secret is not enough. In order to compete with other companies, it is necessary to look for similar or even superior solutions from technology competitors. And Apple acquired such technology in the form of Siri. In a moment, Apple became the owner of Siri, the fruit of forty years of research by researchers. Of course, Google is able to compete. The search giant currently has an excellent voice recognition system, the practical implementation of which is an excellent example of voice search. The inherent scope of Siri is still missing, but Brian Rummel believes that this state of affairs will soon change and there will be similar proposals for the Android platform and Google will turn to the problem of working with the semantic web .

Here it is important to emphasize that Apple has a patent application that describes how to connect the API (API) to Siri and limit competition on the scope of their potential response. Brian Rummel asked this question earlier in the article "Does Apple Have Patents That May Show the Future of Siri?" (“Does Apple have patents shedding light on the future of Siri?”) Posted on Quora website.

At the beginning of the journey

It is obvious that this new way of human interaction with technology will continue to evolve. It has already been said that Brian Rummel is not inclined to believe that the interfaces and means of accessing information will become the property of the story. There is no way to access data using applications and traditional web surfing. At the same time, there is no doubt that Siri will have a powerful impact on how users interact with their devices and how devices respond to questions from users.

Although the new technology first appeared in the iPhone 4S, it will soon appear on the iPad 3 and even Apple TV. At least that's what Brian Rummel thinks. Great prospects and the Siri c Bluetooth 4 and Bluetooth Low Energy (BLE) bands. BLE devices, such as a door lock, can be controlled using Siri. The user can ask about their electronic assistant: "Siri, close the front door" or "Open the front door when Sarah comes." Brian Rummellet devoted this topic to a separate article, "What are the Expected Consequences of Integrating Bluetooth 4.0 in the iPhone 4S?", Published on the pages of the same web resource. "What impact will the addition of Bluetooth 4.0 have on the iPhone 4S?"

All of this is wonderful, but it remains to be seen how well the theoretical research will work in practice. There is no doubt that the beginning of the path will not be covered in roses. And only in five years will it be possible to draw conclusions about the true extent of the influence of the voice interface. It is impossible to completely rule out the possibility that all of this will remain an amusing focus.And maybe this way of interacting with electronics will be the most popular for users? This is a question the answer of which can only be given by the future, and there is and cannot be any certainty about this or that development of events.

Source: Quora.com, Tomgruber.org