Gergő Turcsányi (Content Development Expert, Avatao)
The bug was found by Masato Kinugawa and LiveOverflow has made a video about it which went viral! (it really is worth it to check it out, plus you can find a lot of additional great stuff on his YouTube channel as well). The video clearly explains the bug, but practice makes perfect so we’ve created a tutorial challenge about the bug, where you can:
- Exploit the same vulnerability
- Play with the HTML parser of your browser
- See how the vulnerable version of the sanitizer worked
Also, we think it’s worth highlighting the key elements and main points of the story, which is also the goal of this post
Dealing with user input
Then let’s sanitize it on the client-side
It sounds really wrong – so why would Google do that? Because parsing HTML isn’t as easy as you may think. The specification is really complex and the implementation can be different in the browsers as well. Additionally, browsers don’t just simply parse the HTML – they’re fixing malformed code, completing missing tags, paying attention to headers, loading external resources. Implementing and maintaining a library for that would be really hard – especially because of the different versions of the different browsers. I guess using the client for that makes more sense now.
But how to do it securely?
There’s a very special
<template> tag which is perfect for the job. Its content is parsed, but not rendered. That means the browser does its magic (like fixing the missing closing tags), but it won’t execute scripts or load images. The basic concept of sanitizing HTML on the client is the following:
- Loading the user input into a
<template>tag and letting the browser parse it
- Removing scripts and unwanted tags and attributes (by using a whitelist for example)
- The result can now be used securely in the HTML code
And here comes the
<noscript> tag which is really special as well. The specification says:
noscriptelement represents nothing if scripting is enabled, and represents its children if scripting is disabled. It’s used to present different markup to user agents that support scripting and those that don’t support scripting, by affecting how the document is parsed.
<template> element scripting is disabled, but in the browser (after using the sanitized HTML in the DOM) scripting is enabled. Combining this fact with the helpful behavior of the browsers (where they finish incomplete tags) led to the ultimate payload, which could’ve been used to bypass the Google – Closure and the Cure53 – DOMPurify libraries (both are popular HTML sanitizers):
<noscript><p title='</noscript><img src=x onerror=alert(1)>'>
It’s parsed inside the template element like this:
But when it’s used in scripting enabled context it becomes:
And boom – the
alert(1) is executed. It’s a really awesome example of how weird browsers can be and motivation for all the bug bounty hunters out there. I bet almost none of us thought that we’d see a working XSS on the homepage of Google.
How could it have been avoided?
The funny thing is that they (at Google) knew about this attack vector already and fixed the code a long time ago. However they didn’t add any unit tests and when someone later reverted it, the build passed (because it didn’t break the non-existent test) and the vulnerability ended up in the production code. So I think the most important takeaway from the story is tests are really important.
I’ve seen the video multiple times, checked many-many comments about this vulnerability, and played with the JS debugger for hours, but none of these could answer why was the user input parsed as HTML in the search engine?, which is the real question here I think. But even if it wasn’t the case – the payload probably could’ve worked in Gmail.
Now it’s your time to analyze the bug interactively by solving our tutorial challenge about it. Have fun!
Reading Time: 9 minutes Banking information, login credentials, insurance numbers. A few of the data stored by many financial institutions. We asked an expert about the best practices to protect these information.
Reading Time: 9 minutes Python is a high-level, flexible programming language that offers some great features. To be as effective as possible, it is important to possess the knowledge to make the most out of coding with Python.
Reading Time: 7 minutes Money management moves towards complete automation, and the evolution of cybercrime follows along. The money heist has changed, we all know that. Cyberspace takes more and more of that cake, but the reason behind attacks remains the same: money, in any form.
Reading Time: 7 minutes Telecommunications is everywhere. Hence, this area is more exposed to external threats than others. It is crucial to ensure a strong line of defense in this industry, so your entire organization has up-to-date protection and is aware of best practices.
Reading Time: 7 minutes Security champions represent an essential part of any security programs. They lead their teams on security projects, ensure internal security and help keeping security on the top of your mind. But how exactly they operate in a business? We asked Alexander Antukh, Director of Security at Glovo for professional insights.
Reading Time: 9 minutes Security champions play a vital role in establishing and maintaining a security culture in an engineering organization. See how to turn your developers into security champions!
Reading Time: 6 minutes As the company grows the leadership wants to establish a security program to ensure the solid and undisrupted operation of the business. Security at this point is essential, especially when calculating the loss from a halted business, even for a few hours.
Reading Time: 9 minutes OWASP Top 10 Vulnerabilities in 2021 based on the non-official proposal of Ivan Wallarm. Here is what we know.
Reading Time: 6 minutes For most companies, security is considered a side quest, which is partly related to the daily processes. In reality, security ought to be a strong foundation of any organization. To ensure the defense of the enterprise, the relevant teams need strong security knowledge and abilities.
Reading Time: 8 minutes Exposing data, especially sensitive data, is a long-time-coming threat. Since personal information such as addresses, payment details, non-hashed passwords, config files, and so on are very popular targets among attackers, it’s obvious that sensitive information is supposed to be protected from unauthorized access.
Reading Time: 8 minutes Compliance standards are a valuable but mostly misunderstood part of the corporate culture. Like any other certificate, a compliance certificate demonstrates that the entity/business operates according to a commonly accepted standard and signals trust towards third parties. A successful compliance certificate eases regulatory processes, opens new markets, and in general speeds up revenue generation, which is the key metric for businesses.