從命令地址到位智,再到要求Siri等私人助理為您完成任務(wù),越來越多的產(chǎn)品和服務(wù)正在遷移到云中,并受到語音的控制。Hila Yonatan正在從用戶體驗的角度討論這一新趨勢,它對我們將要使用的下一個產(chǎn)品/功能有何看法?
From dictating addresses to Waze to asking personal assistants such as Siri to do tasks for you, more and more products and services are migrating to the cloud, and being controlled by voice. Hila Yonatan is discussing this new trend from a UX perspective, and what does it say about the next product/feature we’re going to work with?
當Apple首次推出Siri時,他們可能已經(jīng)計劃改變世界-但他們可能沒有預(yù)料到我們對技術(shù)設(shè)備的看法會有所改變。的確,圍繞Siri的大量宣傳是蘋果旗艦產(chǎn)品(如果您可以將Siri稱為“產(chǎn)品”)這一事實的直接衍生,但是即使是苛刻的批評家也很難忽略新Siri的重要性。閃亮的私人助理,喬布斯將所有籌碼都放在上面。有無數(shù)人上傳了他們與Siri互動的視頻。BuzzFeed列出了“最有趣的20個Siri反應(yīng)”列表,甚至該主題的戲劇化版本在電影“她”中也得到了永久體現(xiàn),由斯嘉麗·約翰遜(Scarlett Johansson)飾演Siri-not-Siri。
When Apple first launched Siri, they probably have planned on changing the world – but they might not have anticipated the extent of the shift in our perspective towards technological devices. It’s true, much of the hype surrounding Siri is a direct derivative of the fact that it’s an Apple flagship product (if you can call Siri “a product”), but even a harsh critic would have a tough time overlooking the importance of the new & shiny personal assistant, on which Jobs had all his chips on. Countless people had uploaded videos of them interacting with Siri. BuzzFeed lists of the “20 of the funniest Siri reactions” were published, and even a dramatized version of the topic was immortalized in the movie “Her”, starring Scarlett Johansson as Siri-not-Siri.
從那里,語音用戶界面被推到舞臺中央。這項技術(shù)使我們能夠與家用電器對話,改變了我們眼前的世界呢?它將如何影響我們將來使用的產(chǎn)品?從專業(yè)的角度來看:它對即將發(fā)布的界面有什么看法,設(shè)計界面的人員將在多大程度上調(diào)整他們的思維方式?
From there, voice user interfaces were pushed to the center of the stage. What is it about this technology, which allows us to talk to our home appliances, that changes the world in front of our eyes? How does it affect the products we’ll use in the future? From a professional standpoint: what does it say about the upcoming interfaces, and to what extent the people who design them will have to adjust their way of thinking?
我們實際上如何使用這個東西?
How do we actually use this thing?
我們對自動化了解的越多,我們周圍的設(shè)備就越智能-我們在各種產(chǎn)品中看到了更大范圍的語音用戶界面。當我第一次了解該功能以指示W(wǎng)aze或Google Maps上的“我要去”的位置,或在不鍵入鍵的情況下設(shè)置提醒時,我以為我們會在應(yīng)用程序中看到越來越多的此類活動。
The more we learn about automation, and the devices around us get smarter – we see a larger array of voice user interfaces in various products. When I first learned about the feature to dictate where “I’m going to” on Waze or Google Maps, or set a reminder without typing keys, I assumed we’d see an increasing number of such activities in apps.
如果幾年前我們談?wù)撐锫?lián)網(wǎng)(“物聯(lián)網(wǎng)”,一個聲稱每個設(shè)備都是基于網(wǎng)絡(luò)的實體的概念),那么今天我們看到它不再是科幻想法,而是事實。這種趨勢已經(jīng)在兩架飛機相互幫助的情況下逐漸消失。
If we talked about IoT (the “Internet of Things”, a concept that claims that every device is a network-based entity) a few years back, nowadays we see that it’s no longer a sci-fi idea, but a factual reality. This trend had come to pass with the help of two planes that had grown one towards the other.
第一個 -事實是“一個萬能的應(yīng)用程序”。大多數(shù)電器都以一種方式來控制。它從通過打印機和智能電視機連接互聯(lián)網(wǎng)的燈泡開始,直到通過單擊專用應(yīng)用程序中的大型透明按鈕鎖定前門的能力。
The first – the fact that there’s “an app for everything”. Most electric appliances, in one way or another, are being controlled with an app. It’s starting with internet-connected light bulbs, via printers and smart TV sets, up until the ability to lock your front door by clicking a large, clear button in a dedicated app.
第二個 -智能助手。幾年前,這是科幻小說,但今天我們可以看到每個人都對此感興趣。它從Siri開始,進入具有Alexa的Amazon Echo設(shè)備,谷歌也與助手一起加入了潮流(我想知道為什么他們決定不給自己的角色起一個人的名字),甚至微軟也從Corani那里撤下了Cortana。電子游戲世界變成現(xiàn)實。
The second – smart assistants. A few years ago this was a science fiction scenario, but today we can see that everyone’s onboard on this. It had started with Siri, made its way to the Amazon Echo devices featuring Alexa, Google also joined the bandwagon with their Assistant (I wonder why they’ve decided not to give their persona a human name) and even Microsoft had pulled Cortana from the videogame world into reality.
所有這些都與您的帳戶相關(guān)聯(lián),能夠提醒您重要的會議或設(shè)置計時器來煮雞蛋,但是這些應(yīng)用程序具有的潛力不僅僅可以幫助用戶做事或記住事情!
All of these are connected to your accounts, are able to remind you of an important meeting or set a timer for boiling an egg, but the potential that these applications have is much more than just helping a user do or remember stuff!
最近,我們已經(jīng)看到這兩個向量合并為一個乘積。因此,如果您可以聘請一位真正的私人助理,他可以從事許多活動(除了未定義的活動,例如“從地板上撿起”或“整理行李”),您會要求他們做什么?做,什么時候?
Recently, we’ve seen those two vectors merge into one product. So, if you could have hired a real personal assistants, who could do many of your activities (aside of non-defined activities, such as “pick up from the floor” or “pack a bag”), what would you ask them to do, and when?
從云端管理您的一天
Managing your day from the cloud
就日常工作而言–將手機連接至無線車載揚聲器,我可以說“嗨,谷歌,早上好!” –這句話可以確保我在家的窗簾關(guān)閉,空調(diào)關(guān)閉,照明燈亮著變暗,可以讓我了解路線上即將到來的交通情況,可以從我的日常日程中讀出較大的內(nèi)容,甚至可以通過啟動“行車中的” Spotify播放列表來將所有內(nèi)容排在首位。只要說四個字,我就能獲得一整套完整的動作,并帶來很大的愉悅感。
As far as daily routine goes – connecting my phone to my wireless car speaker allows me to say “Hey Google, good morning!” – a phrase that would make sure my curtains at home are shut, the air conditioner is off, the lights are dimmed, will update me on upcoming traffic in my route, read out loud items from my daily agenda – and maybe even will top it all with launching my “driving” Spotify playlist. By saying exactly 4 words, I get a complete suite of actions, with a large portion of delight.
預(yù)先基于語音或提供此類組件的產(chǎn)品數(shù)量正在上升。在表面刮擦?xí)r,我可以提到語音控制的飛利浦和小米燈泡,它們提供了各種情緒和情景。更引人注目的解決方案包括控制一系列設(shè)備的基于紅外的RM-Pro設(shè)備,控制熱水器和AC的產(chǎn)品,冰箱,自動吸塵器,眼罩,電源插座等-都是極限。
The sheer amount of products that are pre-based on voice, or offer such components, is on the rise. Scratching the surface, I can mention the Philips and Xiaomi lightbulbs that are voice controlled and offer a gallery of moods and scenarios. More notable solutions include the infrared-based RM-Pro device that controls an array of devices, products that control your water heater and AC, refrigerators, autonomic vacuums, blindfolds, power sockets and more – the sky’s the limit.
最重要的是,我們擁有控制應(yīng)用程序(例如Google Home),可以通過創(chuàng)建儀式,同時無縫連接到助手來充當該樂隊的指揮。這意味著我們的整個房屋都可以聯(lián)機并根據(jù)要求進行控制,更重要的是–初始設(shè)置后,某些應(yīng)用程序界面一旦“插入”到您的Assistant中就可以視為無關(guān)緊要。在可訪問性方面,這些功能還幫助我們歡迎需要額外關(guān)注(例如視障人士)的未開發(fā)的新觀眾。因此,除了視覺界面及其好處外,VUI(語音用戶界面)為我們所有人打開了新的交互可能性之門。
Above all these, we have control apps such as Google Home that act as conductors to this orchestra – by creating rituals, while being seamlessly connected to the Assistant. It means that our entire house can be online and controlled on request, and more importantly – after an initial setup, some of the applicative interfaces can be deemed irrelevant once “plugged” into your Assistant. In terms of accessibility, these capabilities also help us welcome new and untapped audiences that require the extra attention (such as the visually-impaired). So – alongside visual interfaces and their benefits, VUI (voice-user-interfaces) open the door to new interaction possibilities to us all.
如果這還不夠,請花點時間看一下這個概念證明,最近它已成為現(xiàn)實:
If that’s not enough, invest a moment of your day and watch this proof of concept, which recently became a reality:
屏幕后和用戶體驗時代
The post-screen and user experience era
基于語音和語音的界面是各種感覺所構(gòu)想的純?nèi)藱C交互的示例。沒有可視界面,屏幕接近零。進行了一個(很短的)入職過程,顯然需要正確計劃。我們希望普通用戶能夠操作該產(chǎn)品,并使用一個簡單的應(yīng)用程序?qū)⑵溥B接到網(wǎng)絡(luò)-從這一點開始,無需進一步考慮就可以與它進行交談。
Voice and speech-based interfaces are examples of pure man-machine interaction, as conceived by various senses. There isn’t a visual interface, with near-zero screens. A (hopefully short) on-boarding process takes place, with an obvious need to plan it right. We expect the average user to be able to operate the product, to connect it to the web using a simple app – and from this point on start talking to it with no further thought.
作為用戶體驗設(shè)計師,我的目標是目標受眾的直覺。我確信我們大多數(shù)人已經(jīng)投入了時間和精力來計劃,研究和實施從用戶的行為中學(xué)到的外賣方法,所有這些目的都是為了能夠以最方便和透明的方式完成中央行動。向前邁進:系統(tǒng)與用戶進行對話的能力至關(guān)重要,請考慮產(chǎn)生直觀參與的難易程度,此外,還要求您不要使用視覺輔助。
Being a UX planner, I aim for my target audience’s intuition. I’m sure that most of us already invest time and thought into planning, researching and implementing the takeaways learned from our users’ behavior, all with the intention to be able to accomplish the central action in the most convenient and transparent manner. Taking it a step forward: the ability of your system to have a conversation with your users is crucial, think about how difficult it is to generate intuitive engagement – and add to that the fact that you’re required not to use visual aids.
從我們作為界面和經(jīng)驗設(shè)計師的角度來看,我們看到了范式的轉(zhuǎn)變。我們專注于屏幕,外觀,微交互(所有這些仍然很重要)–現(xiàn)在我們要繪制一個新的圖表世界,其中包括:對話樹,直接端點,語氣,數(shù)據(jù)源,對話主題,實時等等。
We see a paradigm shift, from our perspective as interface and experience designers. We had focused on screens, appearance, micro-interactions (all of these are still important) – now we have a new world to chart, that includes: conversation trees, immediate endpoints, tone of voice, data sources, conversation topics, analysis in real time and much more.
相信我,對此類事物進行定性的用戶研究需要改變觀點。我們必須考慮所有問題,從檢查最簡單的表達問題的最基本方式(以使其簡短而準確地回答問題)到創(chuàng)建不太機器人化或令人毛骨悚然的體驗。
Trust me, conducting a qualitative user research for such a thing requires a change of perspective. We have to consider everything from checking what’s the most basic way to phrase a question (in order for it to get a short and accurate response) and up to the creation of an experience that’s not too robotic or creepy
與平行區(qū)域的同事進行的對話確認,除了用戶體驗之外,還需要進行進一步的調(diào)整。內(nèi)容必須清晰,重點突出,才能被認為是對口頭問題的正確答案。我指的是會影響Google(和其他公司)掃描和解釋文本能力的技術(shù)準備。如果正在進行市場營銷工作,那么對他們來說,至關(guān)重要的是要與我們試圖與之互動的新經(jīng)驗保持一致。這些不再只是算法,而是無縫連接到物理設(shè)備的虛擬系統(tǒng),同時消除了對“官方應(yīng)用程序”的需求。因此,UXer在包括(或基于)語音組件的產(chǎn)品開發(fā)階段中的作用更加關(guān)鍵和重要。
Conversations with colleagues in parallel areas confirm that further adjustments are required, apart from user experience. The content must be clear and focused enough to be deemed a worthy answer to a spoken question. I’m referring to technical preparation that affects Google’s (and other companies’) ability to scan and interpret the text. In case there are marketing efforts taking place – it’s crucial for them to be in line with the new experience we’re trying to generate engagement with. Those are no longer just algorithms, but virtual systems that seamlessly connect to physical appliances, while negating the need for “official apps”. Therefore, the role of UXers in the development stage of products that include (or based on) voice components, is more crucial and central.
下次您說“嘿Google,給我講個笑話”或“使綠松石變成淺綠色”或“在我打ze睡之前先數(shù)一下羊”(請:)),并得到適合所有人的清淡,有益或有用的答案,而不會冒犯任何人–請記住,有一些用戶體驗專家會花時間和精力來設(shè)計整件事。
Next time you say “Hey Google, tell me a joke” or “make the light turquoise” or “count sheep before I doze off” (please :)) – and get a light, informative or useful response that suits all and offend none – remember that there are user experience professionals that invest time and thought into engineering this entire thing.
調(diào)整自己
Adjusting ourselves
當您考慮設(shè)計語音接口時,我整理了幾個主題供您參考。讓我們看看人類之間的交談方式,看看是否可以從中推斷出語音界面的聲音和感覺。
I had assembled several topics for you to refer to when you think about designing a voice interface. Let’s look at the way we humans talk with each other and see if can deduce from that how a voice interface should sound and feel.
1.引起注意,直覺和持續(xù)
1. Addressing attention, intuition and continuance
當我們與另一個人交談時,我們實際上需要以上三個條件。這個過程需要我們談話的人的注意,然后是關(guān)于響應(yīng)可能如何的某種直覺,然后準備根據(jù)該響應(yīng)繼續(xù)對話。基于語音的系統(tǒng)應(yīng)該受到這些方面的啟發(fā)。
When we have a conversation with another person, we actually need all three of the above. The process requires attention from the person we are speaking to, followed by some kind of intuition as to how the response might be and then be prepared to continue a conversation based on that response. A speech based system should be inspired by those aspects.
一個成功的基于語音的系統(tǒng)的定義很大程度上取決于用戶與之流暢交互的能力。口吃,誤解或無關(guān)緊要將加深概念上的差距,并提醒該人基本的事實–他們正在與軟件對話。為了使我們產(chǎn)生完美的幻覺,我們必須瞄準盡可能流暢的體驗。當我們達到一個用戶可以與產(chǎn)品進行完整對話的階段時(幾個句子,與單個查詢相反),我們可以假定其他所有因素都可以解決。
The definition of a successful speech-based system largely depends on the user’s ability to interact smoothly with it. Stuttering, misunderstanding or irrelevance will deepen the conceptual gap and remind the person the fundamental truth – that they’re talking to a piece of software. In order for us to create the perfect illusion, we must aim for the most fluent experience possible. When we reach a stage in which the user is being able to have a complete conversation with the product (several sentences, in contrast to a single query), we can assume that every other factor checks out.
重要的是要記住,在解決人們的注意力范圍時,交互應(yīng)該比平時更加??直觀,以便使我們達到與本質(zhì)上是算法的語言對話的程度。如果Turing測試檢查了計算機對人類測試人員而言聲音可靠的能力–我認為語音助手已經(jīng)遠遠超過了這一點,直到線路模糊為止(有時,需要發(fā)出聲音略似“機器狀”,以避免產(chǎn)生不和諧感)。
It’s important to remember that when addressing people’s attention span, the interaction should be even more intuitive than usual, in order for us to reach a point where we have a verbal conversation with what’s essentially an algorithm. If the Turing test examines the ability of a computer to sound reliable to a human tester – in my opinion, the voice assistants are well past this point, up until the fact that the lines are fuzzy (sometimes, there’s a need to make the voice slightly “machine-like”, to avoid a dissonance).
2.場景樹
2.Scenario trees
讓我們看一下一個相對基本的過程–與朋友自己安排早午餐,包括所有影響。我們必須在空閑時間前后進行相互調(diào)整,以確認并在日歷中做筆記(帶有特定時間(用于提醒和放置)(用于可單擊的導(dǎo)航))。可以很容易地看到這一系列動作如何使一個程序陷入困境,以及這個瑣碎的動作需要多少個場景樹。創(chuàng)建基于語音的界面的主要部分取決于創(chuàng)建詳細方案樹的需求。您可以通過使系統(tǒng)“學(xué)習(xí)”并自己生成新方案而獲得獎勵積分-這導(dǎo)致了流行的“機器學(xué)習(xí)”流行語。
Let’s take a look at a relatively basic process – scheduling a brunch with a friend, with all of the implications, by ourselves. We have to adjust to each other around our free time, to confirm, to make a note in the calendar (with the specific time – for a reminder, and place – for clickable navigation). It’s easy to see how this chain of actions could stump a program, and how many scenario trees this trivial action requires. A major part of creating voice-based interfaces is dictated by the need to create detailed scenario trees. You get bonus points for making your system “l(fā)earn” and generate new scenarios by itself – which leads to the popular “machine learning” buzzword.
例如,當我與Google Assistant交談時,我傾向于在語音交流結(jié)束時添加“謝謝”。對我來說,作為UXer,這可能意味著經(jīng)驗是成功的(并且已經(jīng)得出了方案的結(jié)論)。當主要的用戶輸入是語音,并且用戶添加了對系統(tǒng)表示感謝的“額外”提示時–這意味著已達到目標。短信和故事文化引導(dǎo)我們創(chuàng)建簡短的內(nèi)容并立即采取行動。從語言的角度來看,語音界面以完整清晰的句子甚至一點禮貌來“重新訓(xùn)練”我們的會話技巧,歡迎您自己嘗試…
For example, when I speak with Google Assistant, I tend to add “thank you” at the end of our vocal exchange. For me, as a UXer, it can mean that the experience was successful (and the conclusion of the scenario had been reached). When the primary user input is voice, and a user adds the “extra” touch of thanking the system – this means that the goal was reached. The SMS and Story culture lead us to creating abbreviated content and immediate actions. In the perspective of language, voice interfaces “re-train” us in conversation skills, with complete and clear sentences, and even a bit of politeness You’re welcome to try it yourselves…
3.測量和用戶研究
3.Measurements and user research
對于那些從事語音接口(或組件)開發(fā)的人員,我建議您盡早在產(chǎn)品生命周期內(nèi)考慮進行定量的用戶研究。在某些時候,您必須能夠衡量系統(tǒng)的性能-這源于設(shè)置特定的KPI。
For those of you engaged in the development of voice interfaces (or components), I’d suggest thinking about quantitative user research, as early in the product’s life as possible. At some point, you’ll have to be able to measure the system’s performance – which originates in setting specific KPIs.
“顯而易見”的指標是成功完成的動作的數(shù)量,但就我個人而言,我認為流暢的交談也是成功的。含義:系統(tǒng)具有在相同上下文中進行包含多個動作的會話的能力,同時保留了收集大數(shù)據(jù)的能力,有利于改進算法。
The “obvious” metric is the number of actions that were accomplished successfully, but personally, I consider a fluent conversation also a success. Meaning: the system’s ability to have a session that includes more than one action within the same context, while retaining the ability to collect big-data, in favor of improving the algorithm.
在基于語音的系統(tǒng)研究領(lǐng)域,有兩種通用方法:定性用戶研究和定量數(shù)據(jù)。
In the field of voice-based system research, there are two general approaches: qualitative user research, and quantitative data.
讓我們首先談?wù)勓哉Z方面。定性的用戶研究非常簡單。最好的情況是與用戶自己交談,聽普通的對話結(jié)構(gòu)。最好的情況很少是對話的文字記錄。此處的目的是確定過程何時成功完成,以及交換是否為正。
Let’s address the verbal aspect first. Qualitative user research is pretty much straightforward. Best case is speaking with the users themselves, listening to an average conversation structure. Slightly-less best case is a textual transcript of conversations. The purpose here is to identify when the process was completed successfully, and whether the exchange was positive or not.
定量研究不是那么簡單。一般而言,在這種情況下,UX研究需要專有工具的初期開發(fā),或者至少是大量定制現(xiàn)有工具。我們必須教一個系統(tǒng)“成功”和“失敗”是什么,以識別過程中的痛點,甚至放棄。在沒有數(shù)據(jù)的情況下,可能會出現(xiàn)該過程根本無法成功的狀態(tài),但就系統(tǒng)而言,確實是這樣。系統(tǒng)可以根據(jù)您要求的巧克力曲奇配方找到您要的歌曲(對此結(jié)果有幾種解釋)。
Quantitative research is not that simple. Generally speaking, UX research in this context requires the initial development of proprietary tools, or at least heavily customizing existing tools. We have to teach a system what “success” and “failure” are, to identify pain points within the process, and even abandonment. In the absence of data, there could be a state in which the process wasn’t successful at all, but as far as the system sees it – it was. The system was able to find the song you asked for, in response to asking for a chocolate chip cookies recipe (there are several explanations for this outcome).
簡而言之,期望遍歷許多流程圖
In short, expect to go over many, many flowcharts
“即將推出”部分
The “coming soon” part
如果能夠體現(xiàn)出市場領(lǐng)導(dǎo)者的遠見,我們將能夠選擇是否在不借助屏幕的情況下,通過云很快地進行自己的操作-連接了所有內(nèi)容:日程安排,電器,智能汽車和符合我們所有需求的例程。用戶體驗架構(gòu)師將這留在哪里?我們必須開始考慮是否應(yīng)該在某個位置集成語音用戶界面(或組件),并在該位置進行集成。當您能夠在正確的時機提供正確的解決方案時–您將掌握殺手級功能。
If the vision of the market leaders will be manifested, we’ll be able to choose whether to conduct ourselves without screens, via cloud, very soon – with everything connected: schedules, electric appliances, smart cars and routines that tend to our every need. Where does this leave us, user experience architects? We must start to think if and where we should integrate a voice user interface (or component), assuming it serves the purpose. When you are able to offer the right solution at the right moment – you have a killer feature in your grasp.
亞馬遜的Alexa通過不斷增強自身兼容性,改進AI并為外部“技能”留出足夠的空間來增強Alexa的知識和能力,從而不斷擴展其生態(tài)系統(tǒng)。每個技能都是用戶體驗的縮影,包裹在相同而熟悉的聲音中。
Amazon’s Alexa is constantly expanding its ecosystem, by making itself more compatible, by improving the AI and leaving enough space for external “skills” to enhance Alexa’s knowledge and abilities. Each skill is a microcosmos of user experiences, wrapped in the same, familiar voice.
如果上述所有操作都能完美執(zhí)行,并且不需要上例中所述的“緊要”時刻進行修復(fù),那么我們將擁有出色的使用經(jīng)驗,可以跨平臺,在后臺運行并進行量化和測量。最重要的是:如果所有這些都歸結(jié)為滿意的用戶,他們獲得了他們想要的東西,并且我們能夠使他們免于更多的操作和更多的屏幕來查看–我們就是黃金。那就是我們所有人的目的,不是嗎?
If all of the aforementioned is executed flawlessly and doesn’t require attendance and fixes in “crucial” moments like in the example above – we have an excellent usage experience, that crosses platforms, runs in the background and is subjected to quantification and measurement. On top of it all: if all of this sums up to happy users that got what they wanted, and we were able to spare them from more actions and more screens to look at – we’re gold. That’s what we’re all here for, isn’t it?