Recently, revealed by Security Research Labs researchers Luise Frerichs and Fabian Bräunlein, there are major loopholes in Alexa and GoogleHome. The two products are smart assistants from Amazon and Google.
It is understood that hackers can use Amazon and Google to eavesdrop on user conversations. Such backends are used to customize Alexa or Google Home applications, and by adding a “�. ” (U+D801, dot, space) character sequence inside it, hackers can use voice assistant devices to send phishing emails, eavesdrop on user conversations, and more. After a hacker adds such a sequence of characters, the target device becomes stuck, but its light shows that the device is still active. The user will then receive a phishing email disguised as an Amazon/Google update message asking the user to provide their Amazon/Google account information. Often, it’s hard for users to realize that this is a phishing email.
The researchers explained how they developed the Alexa phishing skills:
1. Create a seemingly innocent skill that already contains two intents:
– an intent that is started by “stop” and copies the stop intent
– an intent that is started by a certain, commonly used word and saves the following words as slot values. This intent behaves like the fallback intent.
2. After Amazon’s review, change the first intent to say goodbye, but then keep the session open and extend the eavesdrop time by adding the character sequence “(U+D801, dot, space)” multiple times to the speech prompt.
3. Change the second intent to not react at all
When the user now tries to end the skill, they hear a goodbye message, but the skill keeps running for several more seconds. If the user starts a sentence beginning with the selected word in this time, the intent will save the sentence as slot values and send them to the attacker.
To develop the Google Home eavesdropping actions:
1. Create an Action and submit it for review.
2. After review, change the main intent to end with the Bye earcon sound (by playing a recording using the Speech Synthesis Markup Language (SSML)) and set expectUserResponse to true. This sound is usually understood as signaling that a voice app has finished. After that, add several noInputPrompts consisting only of a short silence, using the SSML element or the unpronounceable Unicode character sequence “�.”
3. Create a second intent that is called whenever an actions.intent.TEXT request is received. This intent outputs a short silence and defines several silent noInputPrompts.
After outputting the requested information and playing the earcon, the Google Home device waits for approximately 9 seconds for speech input. If none is detected, the device “outputs” a short silence and waits again for user input. If no speech is detected within 3 iterations, the Action stops.
When speech input is detected, a second intent is called. This intent only consists of one silent output, again with multiple silent reprompt texts. Every time speech is detected, this Intent is called and the reprompt count is reset.
The hacker receives a full transcript of the user’s subsequent conversations, until there is at least a 30-second break of detected speech. (This can be extended by extending the silence duration, during which the eavesdropping is paused.)
In this state, the Google Home Device will also forward all commands prefixed by “OK Google” (except “stop”) to the hacker. Therefore, the hacker could also use this hack to imitate other applications, man-in-the-middle the user’s interaction with the spoofed Actions, and start believable phishing attacks.
SRLabs said they reported the problem to the two companies earlier this year, but the two companies still have not given a solution.