Mixed Content

Definition

When an HTTPS website references insecure (HTTP) resources, this is called mixed content.

Browsers prevent an HTTPS website from loading most insecure resources, like fonts, scripts, etc. Migrating an existing website from HTTP to HTTPS means identifying and fixing or replacing mixed content.

Mixed content comes in two varieties:

  • Active mixed content includes resources that can greatly change the behavior of a website, such as JavaScript, CSS, fonts, and iframes. Browsers refuse to load active mixed content, which often results in affected pages being completely unstyled or broken. Browsers treat these very aggressively because of the consequences if they were compromised. For example, a single compromised JavaScript file compromises the entire website, regardless of how other resources are loaded.
  • Passive mixed content includes resources whose impact on the page’s overall behavior is more minimal, such as images, audio, and video. Browsers will load passive mixed content, but will typically change the HTTPS indicator.

In Firefox, a website indicator for passive mixed content looks like this:

passive mixed content indicator in Firefox

Migration strategy

Every website’s mixed content situation will be different, but the general approach is:

  • Identify the most obvious and widespread pieces of mixed content by loading your website in a browser over https:// and observing breakages. Chrome, Opera, and Firefox will log any mixed content warnings to the console, which should point out necessary site-wide changes. Use these to secure your resource links.
  • After fixing them, tackle the long tail by scanning your code and crawling your website.

Note: the below instructions use tools optimized for an OS X or Linux environment. Documentation for Windows-based tools would be a welcome contribution to this guide.

Linking to resources securely

Most commonly used third party services, such as Google Analytics or AddThis, will automatically adapt when migrating to HTTPS.

Other services may require manual updates, but have an https:// version ready:

<link href="https://fonts.googleapis.com/css?family=Open+Sans" rel="stylesheet">

Generally speaking, for content on your own domain, stick to site-relative URLs wherever possible:

<img src="/media/my-picture.png" />

When migrating a site with a lot of user- or staff-submitted content (e.g. a blog), you may find media hotlinked from a third-party domain which doesn’t support HTTPS.

This is a great opportunity to improve your website’s privacy and lessen your dependency on third parties, by copying those media files to your own server instead and hosting them yourself.

Scanning your code

After identifying and fixing the obvious issues, you can scan your website’s files for leads. On a Mac or Linux-based system, grep is very handy:

Images and scripts:

grep -r "src=\"http:" *

Stylesheets and fonts:

grep -r "href=\"http:" * | grep "<link"

CSS imports and references:

grep -r "url(\"http:" *

Finding links in JavaScript is more challenging, but you can look for all http: references and try to exclude hyperlinks in HTML or Markdown:

grep -r "http:" * | grep -v "href=\"http:"
grep -r "http:" * | grep -v "](http:"

Crawling your website

Finding and using a crawling tool is a complex task. If you really want to explore doing it yourself, mixed-content-scan is a very handy command line tool that can crawl an http:// or https:// website to see if it contains any references to insecure resources. This is especially helpful if your content is primarily managed in a CMS. However, it is not for the feint of heart as it takes some effort to get it configured properly (it took us a couple of weeks) and to interpret the results to weed out the false positives.

Please note that this tool is not 100% perfect! It can (and does) flag things that are innocent as well as miss a few things that are not so innocent. Still, as with all tools, it can be useful.

Why do browsers block mixed content?

If mixed content were not blocked, an attacker could control the main website by conducting a “man in the middle” (MITM) attack against any of its active resources.

Even with passive content like images, attackers can manipulate what the page looks like, and so the yellow-lock icon is intended to communicate that security has been weakened and user confidence should be reduced. In addition, an attacker will be able to read any cookies for that domain which do have the Secure flag, and alter or set cookies.

When a website is accessible over http://, loading other insecure resources does not generate any sort of warning, and so websites operating over plain HTTP often accumulate many of these sub-resources.

Security considerations for third party content

Incorporating or loading content from third party domains creates an additional attack vector.

Even if a page has all page elements loaded over HTTPS, variations in HTTPS configurations could result in security vulnerabilities. For example, if ‘foo.purdue.edu’ loads a page element over HTTPS from ‘bar.com’ but ‘bar.com’ is not as fastidious with its HTTPS/SSL configuration, the page element from ‘bar.com’ may allow injection of malicious software into the page.

For example, if ‘bar.com’ uses an SSL configuration that is known to be weak, a malicious network adversary may be able to modify or replace the page element to inject software that could read the page contents or, potentially, exploit browser vulnerabilities and accomplish more global access to the client device. Accordingly, it will be important to also evaluate the configurations of the domains that serve third-party page elements.

Note that this is still a strict improvement over incorporating content third party domains over unencrypted HTTP. Attacks on the privacy, integrity, and security of connections to third party domains over unencrypted HTTP are trivial.