Avoiding Spam When Building Strong Backlinks Part 2

Last we spoke of the dangers of building backlinks through spammy techniques.  Here are examples of those dastardly techniques and the proper methods of replacing them when dealing with blogs.  Saturday we will discuss the manners of building strong directory references without being spammy.


You will need to first download the SEOmoz toolbar for Firefox.  This toolbar has a function that shows “No Follow” tags on page links.  Using this tool will allow you to determine which blogs are worth commenting and interacting with for links and track-backs.

Most everyone with a blog has visited or even pursued Tecnorati… (if you haven’t you will after reading this.)  Within Techgnorati, you have access to blogs that are separated by category and relevance.  There are other blog index sites that maintain a strong list of blogs by content, but Tecnorati tends to be the most reliable for the search engines.

Once you are on the site, search for a blog that resembles the topic of whatever page you wish to link to.  If you are choosing to link to your main page, choose a blog that is closely associated with the Keywords of your entire site.  If you are using the link of a specific page, search for a page that has that specific topic.  Tecnorati has a ranking system that allows the user to comb through different levels of relevance. Obviously, you want to get links from the most relevant site to the topic being covered.

Using this method, you’ll want to start by performing a “Time Honored Blogging Tradition” RTFA!!! Read the article people.  Seriously, if you don’t know what the topic is, it will show in what you’ve written. Once you read the article, then it’s time to interact.

Make sure that the comment you leave is more than the following junk:

“nice post”

“I agree with the points you made”

“good thoughts but I take issue with your points”

These comments are common and quite annoying pieces of spam.  These statements will likely be caught in any filter and removed from any site that cares about their relevancy.   It would always be suggested to be Part of a conversation.  If you’ve read a post, use enough of the material within it to make a valid statement.

Here’s a list of other rules that will avoid negative treatment for blog comments:

  • Only leave comments that are a full, comprehensible sentence.
  • Sign up for updates on future comments.  This can ensure many links and a future relationship between your site and the site you are commenting on.
  • Leave only on link on the site, the one in your name description.  Leaving tons of links in the content of your comments makes it strongly resemble an unwanted communication, leaving many to list your comments as spam.  If your site links or email addresses becomes associated as spam, it’s very likely that your future comments on other blogs will be filtered as well.
  • Leave the auto commenting software to those who don’t mind being banned from the search engines.  It’s just not worth it.
  • Read the articles!!! There will always be a better exchange of ideas when you do and you’ll likely receive more convertible visitors to your website if they believe your communication to be respectful.

Ensuring Quality Backlinks For Your Website Part 1

This is the first of three post that will be added over the next few days.  There is very little chance that the thoughts will stay completely congruent, but there’s just far more than one post within this topic…

While linking structure has always been an important piece of the SEO equation, it seems that a more intuitive linking structure is becoming more and more necessary for modern SEO development.  The day of submitting to random directories or commenting on unrelated blogs is long gone.  Spam comments and unrelated directory listings cause damage to website relevance and search engines are constantly looking for ways to counter their effects. That being said, there are highly effective methods of getting related backlinks, they just require more work than the spammers and scammers are likely to give.

If you’re wondering, “is Christian about to rail on the Spammers again?” the answer is always, yes!  Spammers not only made the system harder for all of us, they’ve caused themselves to become endangered.  Yes, you can still find tons of affiliate websites offering their “Link Building Software”.  This software is usually some form of Blog Spamming software.  While these sites can be found all over the place, they’re owned by only a small group of spam-jockeys who and are becoming marginalized by people becoming more aware of affiliate scams and MLM’s.`

Ok, now that we got that out of the way, blog backlink software is going to do nothing for your site but possibly get it banned from Google.  Not only is the software disastrous for your reputation with the SERPS, they almost always have a very short and mitigated list of sites to choose from.  They claim that their program makes searches through Google, but this is only a half truth, they are most often pre-rigged with a select amount of search criteria.  The resulting searches leave the user with the same tired group of blogs that were used 1,000’s of times before by other Spammers. Most of these sites and pages that have been bombarded by spam are already flagged by Google.  Add your link to this page, and you risk being considered spam as well.

That out of the way, the question of how to get the quality links and avoid being all spammy is going to be asked.  Unfortunately, the answer will have to wait until tomorrow. We’re out of coffee.

Excellent Free SEO Monitoring Websites

Like everyone else, we are constantly looking for ways to save money on our bottom line.  We are often testing and grading monitoring and Analytic websites and software to determine which is the most effective and gives the most accurate results

If you’ve played with even half of the major names, you’ve likely run into and identified one major problem… they all give different numbers.  Many find themselves wondering which ones to trust and what to make of the numbers they’ve received.  Here is our take on the top 6 site Analytics and why we use them.

  1. Google Analyticor- OK,  To those who always wish to bash the resource, I start by agreeing with you that Google looses a lot of information when scripts are blocked.  That being said, the number of people using script blockers is still not that high, so it’s really a moot point.  Google Analyticor is the most widely used website traffic index and is at least one must for web traffic monitoring. -to avoid the debate, that’s all we’re saying about Google today-
  2. Majestic SEO Tools-  While Majestic has been around and has had a major resource for a few years now, the past year has offered a lot of new and useful tools within the site.  While it requires an account to gain access to some of the better tools, they are well worth the free account and the ten minutes it takes to set it up.  After a month of indexed tracking, Majestic provides a complete  breakdown of all your website backlinks.  This is quite useful in determining and targeting where to make more backlinks and what anchor text to use with them. Granted, there are plenty of costly and buggy forms of software that will perform this function, but most of them accompany an affiliate scam or someone’s get rich quick “Pyramid Scam”.  While the data for the links may be lagging, it’s fairly respectable in determining the  needs of a backlink campaign.
  3. Quantcast -  Same as below…
  4. Compete-    Do they help? Honestly, not so much.  What they do offer is valued information for those considering advertising on your website.  While many would argue, and often I’d agree with them that the information in both sites is a random guess, they are free and offer a strong form of traffic evidence when looking to sell ad-space on your site.  Whether you like them or not, it would be suggested to get listed within their database.  While you may not like or trust their services, most advertising groups are going to want to see their numbers to put ads on your site.
  5. SEOmoz Free Tools-  These are an incredible assortment of custom tools designed for and by SEOmoz.  The SEO tools and link structure tools provided here are much stronger than most of their competition offers.  While we have yet to try their Pro Tools for review, we will soon be putting it to the test.  All I can say it that if the pro tools excel well above the free tools, we will likely remain a customer and refer our clients.  The free tools include several strong tools, but the best one and most unique is the “Trifecta Tool”.  This tool gives a full breakdown of a site in Page, Domain, or Blog format.   The information given here is usually free in it’s individual formats, but the way that they collate and arrange the data makes for a clean and valuable view of a websites performance.  It’s well worth signing up for a free account to try it out.
  6. HUBSpot Grader.com Program – What can’t be said for the brilliant strategy of Inbound Marketing that HUBSpot has highlighted with this and many of its other websites.  Websitegrader.com is one of the most widely used website evaluation websites in use today.  With over 2.6 Million sites graded to date and an Alexa ranking around 2,000, they are becoming the most widely use “quick check” for determining any websites effectiveness.  We, like anyone else who has received it, value the 99% grade we were given by websitegrader.com and view and updated score from them on a weekly basis to see what adjustments need to be made.  If you have yet to run your site through their system, give it a try.  The information given will only serve to increase the reach and effectiveness of your website.

Downtime Can Be A Nightmare

We’re glad to be back in the land of the tubes.  The past two weeks has made for an incredible experience of getting our site back up and working out the finer points with our host.

Before we get into that portion though, we should probably explain why we were down…

As we often point out, we are hosted by Byethost.  They and Bluehost are the two hosting companies we’ve come to trust the most for shared server space. Unfortunately, we discovered that a plugin we were using for the feeds was not well accepted by our neighbors on the server.  We were capturing all of our feeds with the WordPress plugin wp-o-matic. What we didn’t realize is that this plugin causes for some major drain on the server side, and when on shared space, damages your neighbors bandwidth… OOPS!!!

We honestly never considered that such a widely-used plugin would be the cause of such issues, but test afterward have shown that it couldn’t have been anything else. Because of the excessive bandwidth, resources, and memory that wp-o-matic was stripping from the shared space, we were booted to a Virtual Private Server… Ie…Banished to the Degaba System.

It took a week of negotiating, but byethost was willing to settle the discourse once we were well assured that it had to be the plugin that was causing all of he trouble.  One week later, we’re back up and on the way again…. Needless to say, our feeds will no longer run through WP-O-Matic.

Now for the sob-story

While we were only down about a day and back and forth from IP’s for one week, Google took a lot of notice.  Our impressions went from an average of 1,000/day to 22 tonight.  Our traffic went from an average of 200/day to 30 tonight.

The past week of being back up is representative of absolute downtime in the SERPs.

Notice in the screen-shot that the exact date of being down is visible, as well as the subsequent falling out with Google ranking.

naper rank Downtime Can Be A Nightmare

While this is a pretty steep setback for the short-term, we will return to our previous standing rather quickly.  We’ve run into similar situations with client sites, but never thought we would need to do damage control of our own.

Here are the following steps that must be taken directly after an event like this.

1. Immediate push for Back-links- During the month following a down-time like this one, the need for sites around the web to verify your existence is crucial.  The pages that Google attempts and fails to Crawl can be put on the back burners for a period of time.  Making a strong push for links and acknowledgment can get a faster crawl to those skipped pages than would likely come otherwise.

2. Add more content.  We all know that the best way to get Google’s attention is to give it something new to look at. While we feed several different blogs on this site, we also enjoy adding our own content on a regular basis  At this moment its pertinent to add more than we normally would.  The additional content will signal that the site is not dead and is in fact very much alive and active.

3.  In addition to adding the content, we have to make sure that Google is aware of it being added.  To drive home this point, we submit sitemaps… Several sitemaps.
From XML to ROR, wee submit every possible format of sitemap available.  Some may think this a bit eccentric, but in dire moments like these, it can be the difference between being demoted  for weeks or for months.  For additional sitemaps in WordPress, we suggest using the following

We also have some made at xml-sitemaps.com in url.txt and rss.ror format.

Our running experiment will now be to see how long it will take to return to the traffic and ranking that we had last month.  We will share this experience with you all and welcome any suggestion or thoughts you would like to add.

A Greatful Month in Naperville

This month we have picked up some very exciting work.  Over the course of July, we will be building or restyling seven websites in the Chicago and Naperville areas.  Some are just in need of a redesign, but some are complete database developments.  We are looking forward to the opportunity to expand our range.

Here are some of the websites we are inherating.

We’ll show progress with them to all of you in the coming month.

While we are also engaged in the creation of content and SEO for other clients, we offer these as sites to visit and grade the experience this month. Over the course of July, you should have several opportunities to witness growth with them all.

Statistics a Win for SEO

Posted by bhendrickson

We recently posted some correlation statistics on our blog. We believe these statistics are interesting and provide insight into the ways search engines work (a core principle of our mission here at SEOmoz). As we will continue to make similar statistics available, I’d like to discuss why correlations are interesting, refute the math behind recent criticisms, and reflect on how exciting it is to engage in mathematical discussions where critiques can be definitively rebutted.

I’ve been around SEOmoz for a little while now, but I don’t post a lot. So, as a quick reminder, I designed and built the prototype for the SEOmoz’s web index, as well as wrote a large portion of the back-end code for the project. We shipped the index with billions of pages nine months after I started on the prototype, and we have continued to improve it since. Recently I made the machine learning models that are used to make Page Authority and Domain Authority, and am working on some fairly exciting stuff that has not yet shipped. As I’m an engineer and not a regular blogger, I’ll ask for a bit of empathy for my post – it’s a bit technical, but I’ve tried to make it as accessible as possible.

Why does Correlation Matter?

Correlation helps us find causation by measuring how much variables change together. Correlation does not imply causation; variables can be changing together for reasons other than one affecting the other. However, if two variables are correlated and neither is affecting the other, we can conclude that there must be a third variable that is affecting both. This variable is known as a confounding variable. When we see correlations, we do learn that a cause exists — it might just be a confounding variable that we have yet to figure out.

How can we make use of correlation data? Let’s consider a non-SEO example.

There is evidence that women who occasionally drink alcohol during pregnancy give birth to smarter children with better social skills than women who abstain. The correlation is clear, but the causation is not. If it is causation between the variables, then light drinking will make the child smarter. If it is a confounding variable, light drinking could have no effect or even make the child slightly less intelligent (which is suggested by extrapolating the data that heavy drinking during pregnancy makes children considerably less intelligent).

Although these correlations are interesting, they are not black-and-white proof that behaviors need to change. One needs to consider which explanations are more plausible: the causal ones or the confounding variable ones. To keep the analogy simple, let’s suppose there were only two likely explanation – one causal and one confounding. The causal explanation is that alcohol makes a mother less stressed, which helps the unborn baby. The confounding variable explanation is that women with more relaxed personalities are more likely to drink during pregnancy and less likely to negatively impact their child’s intelligence with stress. Given this, I probably would be more likely to drink during pregnancy because of the correlation evidence, but there is an even bigger take-away: both likely explanations damn stress. So, because of the correlation evidence about drinking, I would work hard to avoid stressful circumstances. *

Was the analogy clear? I am suggesting that as SEOs we approach correlation statistics like pregnant women considering drinking – cautiously, but without too much stress.

* Even though I am a talented programmer and work in the SEO industry, do not take medical advice from me, and note that I construed the likely explanations for the sake of simplicity :-)

Some notes on data and methodology

We have two goals when selecting a methodology to analyze SERPs:

  1. Choose measurements that will communicate the most meaningful data
  2. Use techniques that can be easily understood and reproduced by others

These goals sometimes conflict, but we generally choose the most common method still consistent with our problem. Here is a quick rundown of the major options we had, and how we decided between them for our most recent results:

Machine Learning Models vs. Correlation Data: Machine learning can model and account for complex variable interactions. In the past, we have reported derivatives of our machine learning models. However, these results are difficult to create, they are difficult to understand, and they are difficult to verify. Instead we decided to compute simple correlation statistics.

Pearson’s Correlation vs. Spearman’s Correlation: The most common measure of correlation is Pearson’s Correlation, although it only measures linear correlation. This limitation is important: we have no reason to think interesting correlations to ranking will all be linear. Instead we choose to use Spearman’s correlation. Spearman’s correlation is still pretty common, and it does a reasonable job of measuring any monotonic correlation.

Here is a monotonic example: The count of how many of my coworkers have eaten lunch for the day is perfectly monotonically correlated with the time of day. It is not a straight line and so it isn’t linear correlation, but it is never decreasing, so it is monotonic correlation.

 Statistics a Win for SEO

Here is a linear example: assuming I read at a constant rate, the amount of pages I can read is linearly correlated with the length of time I spend reading.

 Statistics a Win for SEO

Mean Correlation Coefficient vs. Pooled Correlation Coefficient: We collected data for 11,000+ queries. For each query, we can measure the correlation of ranking position with a particular metric by computing a correlation coefficient. However, we don’t want to report 11,000+ correlation coefficients; we want to report a single number that reflects how correlated the data was across our dataset, and we want to show how statistically significant that number is. There are two techniques commonly used to do this:

  1. Compute the mean of the correlation coefficients. To show statistical significance, we can report the standard error of the mean.
  2. Pool the results from all SERPs and compute a global correlation coefficient. To show statistical significance, we can compute standard error through a technique known as bootstrapping.

The mean correlation coefficient and the pooled correlation coefficient would both be meaningful statistics to report. However, the bootstrapping needed to show the standard error of the pooled correlation coefficient is less common than using the standard error of the mean. So we went with #1.

Fisher Transform Vs No Fisher Transform: When averaging a set of correlation coefficients, instead of computing the mean of the correlation coefficients, sometimes one computes the mean of the fisher transforms of the coefficients (before applying the inverse fisher transform). This would not be appropriate for our problem because:

  1. It will likely fail. The Fisher transform includes a division by the coefficient minus one, and so explodes when an individual coefficient is near one and outright fails when there is a one. Because we are computing hundreds of thousands of coefficients each with small sample sizes to average over, it is quite likely the Fisher transform will fail for our problem. (Of course, we have a large sample of these coefficients to average over, so our end standard error is not large)
  2. It is unnecessary for two reasons. First, the advantage of the transform is that it can make the expect average closer to the expected coefficient. We do nothing that assumes this property. Second, as mean coefficients are near to zero, this property holds without the transform, and our coefficients were not large.

Rebuttals To Recent Criticisms

Two bloggers, Dr. E. Garcia and Ted Dzubia, have published criticisms of our statistics.

Eight months before his current post, Ted Dzubia wrote an enjoyable and jaunty post lamenting that criticism of SEO every six to eight months was an easy way to generate controversy, noting "it’s been a solid eight months, and somebody kicked the hornet’s nest. Is SEO good or evil? It’s good. It’s great. I <3 SEO." Furthermore, his twitter feed makes it clear he sometimes trolls for fun. To wit: "Mongrel 2 under the Affero GPL. TROLLED HARD," "Hacker News troll successful," and "mailing lists for different NoSQL servers are ripe for severe trolling." So it is likely we’ve fallen for trolling…

I am going to respond to both of their posts anyway because they have received a fair amount of attention, and because both posts seek to undermine the credibility of the wider SEO industry. SEOmoz works hard to raise the standards of the SEO industry, and protect it from unfair criticisms (like Garcia’s claim that "those conferences are full of speakers promoting a lot of non-sense and SEO myths/hearsays/own crappy ideas" or Dzubia’s claim that, besides our statistics, "everything else in the field is either anecdotal hocus-pocus or a decree from Matt Cutts"). We also plan to create more correlation studies (and more sophisticated analyses using my aforementioned ranking models) and thus want to ensure that those who are employing this research data can feel confident in the methodology employed.

Search engine marketing conferences, like SMX, OMS and SES, are essential to the vitality of our industry. They are an opportunity for new SEO consultants to learn, and for experienced SEOs to compare notes. It can be hard to argue against such subjective and unfair criticism of our industry, but we can definitively rebut their math.

To that end, here are rebuttals for the four major mathematical criticisms made by Dr. E. Garcia, and the two made by Dzubia.

1) Rebuttal to Claim That Mean Correlation Coefficients Are Uncomputable

For our charts, we compute a mean correlation coefficient. The claim is that such a value is impossible to compute.

Dr. E. Garcia : "Evidently Ben and Rand don’t understand statistics at all. Correlation coefficients are not additive. So you cannot compute a mean correlation coefficient, nor you can use such ‘average’ to compute a standard deviation of correlation coefficients."

There are two issues with this claim: a) peer reviewed papers frequently published mean correlation coefficients; b) additivity is relevant for determining if two different meanings of the word "average" will have the same value, not if the mean will be uncomputable. Let’s consider each issue in more detail.

a) Peer Reviewed Articles Frequently Compute A Mean Correlation Coefficient

E. Garcia is claiming something is uncomputable that researchers frequently compute and include in peer reviewed articles. Here are three significant papers where the researchers compute a mean correlation coefficient:

"The weighted mean correlation coefficient between fitness and genetic diversity for the 34 data sets was moderate, with a mean of 0.432 +/- 0.0577" (Macquare University – "Correlation between Fitness and Genetic Diversity", Reed, Franklin; Conversation Biology; 2003)

"We observed a progressive change of the mean correlation coefficient over a period of several months as a consequence of the exposure to a viscous force field during each session. The mean correlation coefficient computed during the force-field epochs progressively…" (MIT – F. Gandolfo, et al; "Cortical correlates of learning in monkeys adapting to a new dynamical environment," 2000)

"For the 100 pairs of MT neurons, the mean correlation coefficient was 0.12, a value significantly greater than zero" (Stanford – E Zohary, et al; "Correlated neuronal discharge rate and its implications for psychophysical performance", 1994)

SEOmoz is in a camp with reviewers from the journal Nature, as well as researchers from MIT, Stanford and authors of 2,400 other academic papers that use the mean correlation coefficient. Our camp is being attacked by Dr. E. Garcia’s, who argues our camp doesn’t "understand statistics at all." It is fine to take positions outside of the scientific mainstream, although when Dr. E. Garcia takes such a position he should offer more support for it. Given how commonly Dr. E. Garcia uses the pejorative "quack," I suspect he does not mean to take positions this far outside of academic consensus.

b) Additivity Relevant For Determining If Different Meanings Of "Average" Are The Same, Not If Mean Is Computable

Although "mean" is quite precise, "average" is less precise. By "average" one might intend the words "mean", "mode", "median," or something else. One of these other things that it could be used as meaning is ‘the value of a function on the union of the inputs’. This last definition of average might seem odd, but it is sometimes used. Consider if someone asked "a car travels 1 mile at 20mph, and 1 mile at 40mph, what was the average mph for the entire trip?" The answer they are looking for is not 30mph, which is mean of the two measurements, but ~26mph, which is the mph for the whole 2 mile trip. In this case, the mean of the measurements is different from the colloquial average which is the function for computing mph applied to the union of the inputs (the whole two miles).

This may be what has confused Dr. E. Garcia. Elsewhere he cites Statsweb when repeating this claim. Which makes the point that this other "average" is different than the mean. Additivity is useful in determining if these averages will be different. But even if another interpretation of average is valid for a problem, and even if that other average is different than the mean, it neither makes the mean uncomputable nor meaningless.

2) Rebuttal to Claim About Standard Error of the Mean vs Standard Error of a Correlation Coefficent

Although he has stated unequivocally that one cannot compute a mean correlation coefficient, Garcia is quite opinionated on how we ought to have computed standard error for it. To wit:

E. Garcia: "Evidently, you don’t know how to calculate the standard error of a correlation coefficient… the standard error of the mean and the standard error of a correlation coefficient are two different things. Moreover, the standard deviation of the mean is not used to calculate the standard error of a correlation coefficient or to compare correlation coefficients or their statistical significance."

He repeats this claim even after making the point above about mean correlation coefficients, so he clearly is aware the correlation coefficients being discussed are mean coefficients and not coefficients computed after pooling data points. So let’s be clear on exactly what his claim implies. We have some measured correlation coefficients, and we take the mean of these measured coefficients. The claim is that we should have used the same formula for standard error of the mean of these measured coefficients that we would have used for only one. Garcia’s claim is incorrect. One would use the formula for the standard error of the mean.

The formula for the mean, and for the standard error of the mean, apply even if there is a way to separately compute standard error for one of the observations the mean was over. If we were computing the mean of the count of apples in barrels, lifespans of people in the 19th century, or correlation coefficients for different SERPs, the same formula for the standard error of this mean applies. Even if we have other ways to measure the standard error of the measurements we are taking the mean over – for instance, our measure of lifespans might only be accurate to the day of death and so could be off by 24 hours – we cannot use how we would compute standard error for an observation to compute standard error of the mean of those observations.

A smaller but related objection is over language. He objects to my using the standard deviations in reference to a count of how far away a point is from a mean in units of the mean’s standard error. As wikipedia notes, the "standard error of the mean (i.e., of using the sample mean as a method of estimating the population mean) is the standard deviation of those sample means" So the count of how many lengths of standard error a number is away from the estimate of a mean, according to Wikipedia, would be standard deviations of our mean estimate. Beyond it being technically correct, it also fit the context, which was the accuracy of the sample mean.

3) Rebuttal to Claim That Non-Linearity Is Not A Valid Reason To Use Spearman’s Correlation

I wrote "Pearson’s correlation is only good at measuring linear correlation, and many of the values we are looking at are not. If something is well exponentially correlated (like link counts generally are), we don’t want to score them unfairly lower.”

E. Garcia responded by citing a source whom he cited as "exactly right": "Rand your (or Ben’s) reasoning for using Spearman correlation instead of Pearson is wrong. The difference between two correlations is not that one describes linear and the other exponential correlation, it is that they differ in the type of variables that they use. Both Spearman and Pearson are trying to find whether two variables correlate through a monotone function, the difference is that they treat different type of variables – Pearson deals with non-ranked or continuous variables while Spearman deals with ranked data."

E. Garcia’s source, and by extension E. Garcia, are incorrect. A desire to measure non-linear correlation, such as exponential correlations, is a valid reason to use Spearman’s over Pearson’s. The point that "Pearson deals with non-ranked or continuous variables while Spearman deals with ranked data" is true in that to compute Spearman’s correlation, one can convert continuous variables to ranked indices and then apply Pearson’s. However, the original variables do not need to originally be ranked indices. If they did, Spearman’s would always produce the same results as Pearson’s and there would be no purpose for it.

My point that E. Garcia objects to, that Pearson’s only measure’s linear correlation while Spearman’s can measure other kinds of correlation such as exponential correlations, was entirely correct. We can quickly quote Wikipedia to show that Spearman’s measures any monotonic correlation (including exponential) while Pearson’s only measures linear correlation.

The Wikipedia article on Pearson’s Correlation starts by noting that it is a "measure of the correlation (linear dependence) between two variables".

The Wikpedia article on Spearman’s Correlation starts with an example in the upper right showing that a "Spearman correlation of 1 results when the two variables being compared are monotonically related, even if their relationship is not linear. In contrast, this does not give a perfect Pearson correlation."

E. Garcia’s position neither makes sense nor agrees with the literature. I would go into the math in more detail, or quote more authoritative sources, but I’m pretty sure Garcia now knows he is wrong. After E. Garcia made his incorrect claim about the difference between Spearman’s correlation and Pearson’s correlation, and after I corrected E. Garcia’s source (which was in a comment on our blog), E. Garcia has stated the difference between Spearman’s and Pearson’s correctly. However, we want to make sure there’s a good record of the points, and explain the what and why.

4) Rebuttal To Claim That PCA Is Not A Linear Method

This example is particularly interesting because it is about Principle Component Analysis(PCA), which is related to PageRank (something many SEOs are familiar with). In PCA one finds principal components, which are eigenvectors. PageRank is also an eigenvector. But I am digressing, let’s discuss Garcia’s claim.

After Dr. E. Garcia criticized a third party for using Pearson’s Correlation because Pearson’s only shows linear correlations, he criticized us for not using PCA. Like Pearson’s, PCA can only find linear correlations, so I pointed out his contradiction:

Ben: "Given the top of your post criticizes someone else for using Pearson’s because of linearity issues, isn’t it kinda odd to suggest another linear method?"

To which E. Garcia has respond: "Ben’s comments about… PCA confirms an incorrect knowledge about statistics" and "Be careful when you, Ben and Rand, talk about linearity in connection with PCA as no assumption needs to be made in PCA about the distribution of the original data. I doubt you guys know about PCA…The linearity assumption is with the basis vectors."

But before we get to the core of the disagreement, let me point out that E. Garcia is close to correct with his actual statement. PCA defines basis vectors such that they are linearly de-correlated, so it does not need to assume that they will be. But this a minor quibble.  This issue with Dr. E. Garcia’s his position is the implication that the linear aspect of PCA is not in the correlations it finds in the source data like I claimed, but only in the basis vectors.

So, there is the disagreement – analogous to how Pearson’s Correlation only finds linear correlations, does PCA also only find linear correlations? Dr. E. Garcia says no. SEOmoz, and many academic publications, say yes. For instance:

"PCA does not take into account nonlinear correlations among the features" ("Kernel PCA for HMM-Based Cursive Handwriting Recognition"; Andreas Fischer and Horst Bunke 2009)

"PCA identifies only linear correlations between variables" ("Nonlinear Principal Component Analysis Using Autoassociative Neural Networks"; Mark A. Kramer (MIT), AIChE Journal 1991)

However, besides citing authorities, let’s consider why his claim is incorrect. As E. Garcia imprecisely notes, the basis vectors are linearily de-correlated. As the sources he cites points out, PCA tries to represent the source data as linear combinations of these basis vectors. This is how PCA shows us correlations – by creating basis vectors that can be linearly combined to get close to the original data. We can then look at these basis vectors and see how aspects of our source data vary together, but because it only is combining them linearly, it is only showing us linear correlations. Therefore, PCA is used to provide an insight into linear correlations — even for non-linear data.

5) Rebuttal To Claim About Small Correlations Not Being Published

Ted Dzubia suggests that small correlations are not interesting, or at least are not interesting because our dataset is too small. He writes:

Dzubia: "out of all the factors they measured ranking correlation for, nothing was correlated above .35. In most science, correlations this low are not even worth publishing. "

Academic papers frequently publish correlations of this size. On the first page of a google scholar search for "mean correlation coefficient" I see:

  1. The Stanford neurology paper I cited above to refute Garcia is reporting a mean correlation coefficient of 0.12.
  2. "Meta-analysis of the relationship between congruence and well-being measures"  a paper with over 200 citations whose abstract cites coefficients of 0.06, 0.15, 0.21, and 0.31.
  3. "Do amphibians follow Bergmann’s rule" which notes that "grand mean correlation coefficient is significantly positive (+0.31)."

These papers were not cherry picked from a large number of papers. Contrary to Ted Dzubia’s suggestion, the size of a correlation that is interesting varies considerably with the problem. For our problem, looking at correlations in Google results, one would not expect any single high correlation value from features we were looking at unless one believes Google has a single factor they predominately use to rank results with and one is only interested in that factor. We do not believe that. Google has stated on many occasions that they employ more than 200 features in their ranking algorithm. In our opinion, this makes correlations in the 0.1 – 0.35 range quite interesting.

6) Rebuttal To Claim That Small Correlations Need A Bigger Sample Size

Dzubia: "Also notice that the most negative correlation metric they found was -.18…. Such a small correlation on such a small data set, again, is not even worth publishing."

Our dataset was over 100,000 results across over 11,000 queries, which is much more than sufficient for the size of correlations we found. The risk when having small correlations and a small dataset is that it may be hard to tell if correlations are statistical noise. Generally 1.96 standard deviations is required to consider results statistically significant. For the particular correlation Dzubia brings up, one can see from the standard error value that we have 52 standard deviations of confidence the correlation is statistically significant. 52 is substantially more than the 1.96 that is generally considered necessary.

We use a sample size so much larger than usual because we wanted to make sure the relative differences between correlation coefficients were not misleading. Although we feel this adds value to our results, it is beyond what is generally considered necessary to publish correlation results.

Conclusions

Some folks inside the SEO community have had disagreements about our interpretations and opinions regarding what the data means (and where/whether confounding variables exist to explain some points). As Rand carefully noted in our post on correlation data and his presentation, we certainly want to encourage this. Our opinions about where/why the data exists are just that – opinions – and shouldn’t be ascribed any value beyond its use in applying to your own thinking about the data sources. Our goal was to collect data and publish it so that our peers in the industry could review and interpret.

It is also healthy to have a vigorous debate about how statistics such as these are best computed, and how we can ensure accuracy of reported results. As our community is just starting to compute these statistics (Sean Weigold Ferguson, for example, recently submitted a post on PageRank using very similar methodologies), it is only natural there will be some bumbling back and forth as we develop industry best practices. This is healthy and to our industry’s advantage that it occur.

The SEO community is the target of a lot of ad hominem attacks which try to associate all SEOs with the behavior of the worst. Although we can answer such attacks by pointing out great SEOs and great conferences, it is exciting that we’ve been able to elevate some attacks to include mathematical points, because when they are arguing math they can be definitively rebutted. On the six points of mathematical disagreement, the tally is pretty clear – SEO community: Six, SEO bashers: zero. Being SEOs doesn’t make us infallible, so surely in the future the tally will not be so lopsided, but our tally today reflects how seriously we take our work and how we as a community can feel good about using data from this type of research to learn more about the operations of search engines.

Do you like this post? Yes No

 Statistics a Win for SEO  Statistics a Win for SEO  Statistics a Win for SEO  Statistics a Win for SEO

 Statistics a Win for SEO

Must-Have SEO Recommendations: Step 7 of the 8-Step SEO Strategy

Posted by laura

This post was originally in YOUmoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

You know the client.  The one that really needs your help.  The one that gets pumped when you explain how keywords work.  The one that has an image file for a site.  Or maybe the one that insists that if they copy their competitor’s title tags word-for-word, they’ll do better in search results (I had a product manager make his team do that once. Needless to say (I was thrilled when) it didn’t work). 

In Step 6 of the SEO Strategy document I noted that this strategy document we’ve been building isn’t a best practices document, and it’s more than a typical SEO audit.  It is a custom set of specific, often product-focused recommendations and strategies for gaining search traffic.  For that reason I recommended linking out to SEO basics and best practices elsewhere (in an intranet or a separate set of documents).

But most of the time you’ll still need to call out some horizontal things that this client must have put in front of their faces, or else it will be missed completely.  SEO/M is your area of expertise, not theirs, so help them make sure they’ve got their bases covered. You can create an additional section for these call-outs, wherever you feel it is appropriate in your document.

WHAT CAN I INCLUDE HERE?

Here are some examples of things you could include if you felt your client needed this brought to their attention:

  1. Press Release optimization and strategy
  2. SEO resources for specific groups in the company:
    1. SEO for business development (linking strategies in partner deals)
    2. SEO for writers/editorial
    3. SEO for designers
  3. SEO for long term results rather than short term fixes
  4. International rollout recommendations
  5. Content management system – how it is impairing their SEO
  6. Risks and avoidances
  7. Anything that you feel should be covered in more detail for this particular client, that wasn’t covered in your strategy in the last step. This is a catchall – a place to make sure you cover all bases.
  8. Nothing – if you dont feel it’s needed.

If the client really needs a lot of help, you’d want to provide training and best practices, either as separate deliverables along with the strategy document, or better yet – work on training and best practices with them first, then dive into more specific strategy. You don’t want to end up with a 15 page (or even 4 page for that matter) best practices document in your strategy doc. Remember, we’re beyond best practices here, unless, in this case there’s something specific that needs to be called out.  

If the client needs more than one thing called out, do it.  If it’s several things, consider either adding an appendix, or as I mentioned, creating a separate best practices document.

The reason I recommend best practices as a separate document is because it is really a different project, often for an earlier phase.

EXAMPLE 1:

Let’s say for example, my client has the type of content the press loves to pick up. They don’t do press releases, mostly because they don’t know how exactly to write them and where to publish them, but they want to.  I‘ll add a Press Releases section after the strategy and I might give them these simple tidbits:

  • High level benefit of doing press releases
  • What person or group in the company might be best utilized to manage press releases
  • Examples of what to write press releases about
  • Channels they can publish press releases to
  • Optimization tips
  • References they can go to for more detailed information

EXAMPLE 2:

My client gets it. They’re pretty good at taking on most SEO on their own. This strategy document I’m doing for them is to really dig in and make sure all gaps are closed, and that they’re taking advantage of every opportunity they should.  Additionally, in a few months they are going to roll out the site to several international regions. 

My dig into the site and its competitors (and search engines) for this strategy have all been for the current site in this country. Because the Intl rollout hasn’t started yet, I will add a section to my document with specific things they need to keep in mind when doing this rollout.

  • Localized keyword research (rather than using translate tools)
  • ccTLD  (country code top level domain) considerations
  • Tagging considerations (like “lang”)
  • Proper use of Google Webmaster Tools for specifying region
  • Potential duplication issues
  • Maybe even a lit of popular search engines in those countries
  • Point to more resources or list as a potential future contract project

Make sense?  Use your judgment here. Like we’ve seen in the rest of the steps, this strategy document is your work of art, so paint it how your own creative noggin sees it, Picasso.

Other suggestions for what you might include here? Love it? Hate it? Think this step stinks or mad I didn’t include music to listen to for this one? Let’s hear about it in the comments!

Do you like this post? Yes No

 Must Have SEO Recommendations: Step 7 of the 8 Step SEO Strategy  Must Have SEO Recommendations: Step 7 of the 8 Step SEO Strategy  Must Have SEO Recommendations: Step 7 of the 8 Step SEO Strategy  Must Have SEO Recommendations: Step 7 of the 8 Step SEO Strategy

 Must Have SEO Recommendations: Step 7 of the 8 Step SEO Strategy

How To Combine Brand & SEO

Our latest feed from SEOBOOK

Patience is an SEO Virtue

Posted by Kate Morris

We have all been there once or twice, maybe a few more than that even. You just launched a site or a project,  and a few days pass, you login to analytics and webmaster tools to see how things are going. Nothing is there. 

WAIT. What?!?!?! 

Scenarios start running through your mind, and you check to make sure everything is working right. How could this be?

It doesn’t even have to be a new project. I’ve realized things on clients’ sites that needed fixing: XML sitemaps, link building efforts, title tag duplication, or even 404 redirection. The right changes are made, and a week later, nothing has changed in rankings or in webmaster consoles across the board. You are left thinking "what did I do wrong?"

funny pictures of dogs with captions

A few client sites, major sites mind you, have had issues recently like 404 redirection and toolbar PageRank drops. One even had to change a misplaced setting in Google Webmaster Tools pointing to the wrong version of their site (www vs non-www). We fixed it, and there was a drop in their homepage for their name.

That looks bad. Real bad. Especially to the higher ups. They want answers and the issue fixed now … yesterday really.

Most of these things are being measured for performance and some can even have a major impact on the bottom line. And it is so hard to tell them this, even harder to do, but the changes just take …

Patience

That homepage drop? They called on Friday, as of Saturday night things are back to normal. The drop happened for 2-3 days most likely, but this is a large site. Another client, smaller, had redesigned their entire site. We put all the correct 301 redirects for the old pages and launched the site. It took Google almost 4 weeks to completely remove the old pages from the index. There were edits to URLs that caused 404 errors, fixed within a day, took over a week to reflect in Google Webmaster Tools. 

These are just a few examples where changes were made immediately, but the actions had no immediate return. We live in a society that thrives on the present, immediate return. As search marketers, we make c-level executives happy with our ability to show immediate returns on our campaigns. But like the returns on SEO, the reflection of changes in SEO take time. 

The recent Mayday and Caffeine updates are sending many sites to the bottom of rankings because of the lack of original content. Many of them are doing everything "right" in terms of onsite SEO, but now that isn’t enough. The can change their site all they want to, but until there is relevant and good content plus traffic, those rankings are not going to return for long tail terms. 

There has also been a recent crack down on over optimized local search listings. I have seen a number of accounts suspended or just not ranking well because they are in effect trying too hard. There is a such thing as over optimizing a site, and too many changes at once can raise a flag with the search engines. 

One Month Rule

funny pictures of cats with captions

Here is my rule: Make a change, leave it, go do social media/link building, and come back  to the issue a month later. It may not take a month, but for smaller sites, 2 weeks is a good time to check on the status of a few things. A month is when things should start returning to normal if there have been no other large changes to the site. 

We say this all the time with PPC accounts. It’s like in statistical analysis, you have to have enough data to work with to see results. And when you are waiting for a massive search engine to make some changes, once they do take effect in the system, you then have to give it time to work. 

So remember the next time something seems to be not working in Webmaster Tools or SERPs:

  1. If you must, double check the code (although you’ve probably already done this 15 times) to ensure it’s set up correctly. But then,
  2. Stop. Breathe. There is always a logical explanation. (And yes, Google being slow is a logical one)
  3. When did you last change something to do with the issue?
  4. If it’s less than 2 weeks ago, give it some more time.
  5. Major changes, give it a month. (Think major site redesigns and URL restructuring)

Do you like this post? Yes No

 Patience is an SEO Virtue  Patience is an SEO Virtue  Patience is an SEO Virtue  Patience is an SEO Virtue

 Patience is an SEO Virtue

New Directories

We’ve revived and added new directories to the ever-expanding Naper Design Business Network. While the number of Business Directories is currently limited, there will be many more to come.  We are using the DirectoryPress produced by Mark Fail, and will be asking for everyone to list a business that they know of.  Our hope is to add 200 businesses a month for each directory.  Here is a list of the first batch….  more will be added in days to come.

Naoerville Business Directory

Raleigh Business Directory (serving the entire Triangle Area)

Chicago Business Directory

Aurora Business Directory (Aurora, Illinois)

Charleston Business Directory (For the entire Tri-County and Low Country Area)

There will be many more directories added to this list and all will be managed for accuracy.   After implementation on July 1st, listings will be automatic for all registered users of each site.

Leave your thoughts, and we’ll see what we can add to the directory plot that will make it more user friendly.

And the winner of the BlueGlass LA prize is…

4ef18 BlueGlass logo 300x193 And the winner of the BlueGlass LA prize is…

The anticipation was killing me, so I decided to let the news go a few minutes early. Without further ado, I would like to announce who the BlueGlass LA moderators announced as the winner of a free ticket and hotel room for BlueGlass LA!

Drum roll pleaaaaaaaaaaaaaaaaaaaaaaaaase!

*bang bang bang bang bang* …. *tap tap tap tap tap*????

okay… pathetic drum roll…but really, who can type that sound out without it coming off as weird? So yeah- the winner is…..

@JoshuaTitsworth!

He came through by an astounding landslide, and he has also promised us a video of his tweet. For those of you who haven’t been following the contest and didn’t see what he so deeply promised to share, this is it:

@JoshuaTitsworth: I would cross dress as Paris Hilton and sing Miley Cyrus to go to #blueglassla http://bit.ly/blueglass

He also posted up a blog post telling us all lovely reasons why we should not go to BlueGlass LA.

I would like to give honorable mention to @kristibug for coming in second- she put up a good fight and also had a blog post up. Many thanks :)

Last, but certainly not least… we absolutely had to give funniest tweet to this one:

@TallChickVic: I would cross dress as @JoshuaTitsworth dressed up as Paris Hilton & sing Miley Cyrus #blueglasslahttp://bit.ly/blueglass ;)

So I want to thank everybody who participated in this! we had tons of funny tweets come through, and I think this has been our most fun contest yet. Hope to see everyone out there! Don’t forget, the BlueGlass LA Early Bird pricing ends today- make sure to grab your ticket at the price of $495 by 11:55 PM EST.

Sign up with the discount code sejreader for an additional 10% off the Early Bird price.

Check out the SEO Tools guide at Search Engine Journal.

And the winner of the BlueGlass LA prize is…

 And the winner of the BlueGlass LA prize is…

 And the winner of the BlueGlass LA prize is…

 And the winner of the BlueGlass LA prize is…  And the winner of the BlueGlass LA prize is…  And the winner of the BlueGlass LA prize is…  And the winner of the BlueGlass LA prize is…

5 Simple Steps to Viral Video Results

Let’s start off funny. The following are NOT the 5 steps to viral video results:

  1. Sit around drinking and talking about funny video ideas.
  2. List a bunch of successful viral videos and come up with knock-off ideas to copy them.
  3. Call every funny video idea “viral” before it’s even created and before someone has ever seen it and before anyone has ever passed it to anyone.
  4. Let your clients disapprove all your good ideas and then run with the lame ones.
  5. Create videos without thinking about distribution because OF COURSE it’s going to go viral after just the first person views it.

You get the point.

I recently interviewed a couple of people for my Social Media Expert Interview seriesScott Stratten of UnMarketing fame and Carrie Wilkerson, The Barefoot Executive – who gave me a new perspective on viral video creation.

Why You Shouldn’t Try To Be Funny

The upshot is: funny videos are the hardest to get to go viral. Sense of humor is very personal. And it doesn’t matter if 100,000 people see your skateboarding dog catch fire and faceplant if you don’t get anything out of all those video views. Unless you’re just trying to have fun. But if you care about results, keep reading.

There’s a simpler way. And you can tie it to a conversion event you want to get.

A More Effective Viral Video Style

Just create what I call the “Emotional Slideshow” (because there was no name for it and that’s all I could think of on the spot) type of viral video.

These are nothing new, but they work like gangbusters.

a769e timemovie 5 Simple Steps to Viral Video Results

The Time Movie has received 1.4MM views despite being very simply and cheezier than Fabio movie backed by a Yanni soundtrack. Scott admits to being sick of it. But his goal was to get motivational speaking gigs and launch his speaking career without years of painful free gigs- and it worked.

a769e bossmovie 5 Simple Steps to Viral Video Results

The Boss Movie helped Carrie Wilkerson build a list of 24,000 work-at-home women to market to in just 9 months.

0d3a3 crappydayimage 5 Simple Steps to Viral Video Results

The Crappy Day Movie just debuted and is my first attempt at one of these Emotional Slideshow movies. But it has its own Facebook page and I hear those are really hard to start and very, very expensive.

The Five Steps to Creating An “Emotional Slideshow” Viral Video

  1. Know your audience
  2. List their three biggest problems and three biggest obstacles (you’ll have 6 points)
  3. List their three biggest dreams (not goals), then three examples each of life with those dreams fulfilled (you’ll have 9 points)
  4. Create a line of negative and positive affirmations for each of those 15 points above
  5. Find an emotionally evocative image for each point, and music for the entire slideshow, and create the movie

You can have more points, but you want each image and sentence to last about 7 seconds, and the full movie to be 3-4 minutes.

Then, of course, watch it on several occasions and have other people proof it, especially if they’re your target audience.

Getting Results

If you watched The Boss Movie, you’ll notice Carrie brings up an opt-in page after the movie for a pdf about the 7 Things Your Boss Doesn’t Want You To Know. This is a bribe that goes straight to the audience’s core problem- the limitations of employment. When conceiving your bribe, make sure you start with titles and think like a copywriter before you create the content you’re going to give away.

Check out the SEO Tools guide at Search Engine Journal.

5 Simple Steps to Viral Video Results

 5 Simple Steps to Viral Video Results

 5 Simple Steps to Viral Video Results

 5 Simple Steps to Viral Video Results  5 Simple Steps to Viral Video Results  5 Simple Steps to Viral Video Results  5 Simple Steps to Viral Video Results