Search Engine Optimization: There are No Free Lunches

January 12th, 2007

I don’t think there is any area of the web which has more snake oil being peddled than the area of Search Engine Optimization, aka SEO.

I get emails every day from outfits guaranteeing that I will get top-ten listing on Google, Yahoo, and MSN… All I need to do is sign up with these guys and pay them a lot of money. Amazing… they don’t know what I am selling, what my website content looks like, or how much competition I have… But they can still guaranty me top-ten listing. How can they do that?

The answer, of course, is they can’t. This is the web-age equivalent of the old UHF TV ad’s for veggiematics or some other poorly manufactured household product. You will pay them some amount (and it will vary all over the map… from a few bucks to thousands of dollars depending on how dumb you are) and, wonder of wonders, you won’t get top-ten listing Google, Yahoo, or MSN. The only difference between the ten dollar haircut and the thousand dollar one will be the ingenuity and creativity of their explanations of what went wrong.

So now you are both ripped off and ticked off… Are you going to sue them? If you only paid a few hundred it’s not worth it… If you paid them thousands then either they’re long gone to some South American country or they’ve put half their ill-gotten gains into hiring fancy lawyers to defend themselves.

The actual ranking algorithms of the major search engines are closely guarded secrets and there is a whole industry of folks reading tea leaves trying to reverse-engineer them.

But I think one can establish some pretty good guidelines based on simple common sense. So here are Denholm’s Common Sense Laws of Search Engine Optimization:

  1. The major search engines are genuinely trying to produce search results that are as useful as possible to their user base.
  2. What makes a website useful to their user base is content.
  3. The only thing that search engine spiders and crawlers can analyze is text. So it is only text (and not images) that matters.
  4. Apart from the content itself, the only thing that will matter to search engines in terms of ranking will be the number and quality of other sites that link to yours.
  5. The only additional thing you can do is to make sure that your content is properly indexed using metatags and h1, h2, h3 element tags.

Any attempt to “game the system” will only have temporary success and will, ultimately, backfire once the search engines figure out how the gaming works. Note that I am talking about “natural ranking”, not sponsored links or pay-for-click. The latter are, indeed, available to the highest bidder. But it is the natural rank that matters. Most search engine users either knowingly or instinctively ignore sponsored and pay-for-click listings.

So, the bottom line is that you want to have your website filled with as much useful text content as possible. Use meta-tags and h# tags to highlight for the search engines which keywords best describe your content. Get authoritative incoming links that tell the search engines that others find your content of value. Do not establish reciprocal links with sites which have no logical connection to yours. That is merely the latest form of gaming the system and it will not help… And it probably hurts.

Good incoming links are likely to be those from professional organizations, chambers of commerce, trade groups, large companies, government agencies, universities, etc. One of our customers is a kayak dealer. He has incoming links from about 20 kayak manufacturers. My belief is that has a serious, positive impact on his search engine ranking.

And even if you have all the above in place, your ranking will still depend on how many other websites are competing in the same conceptual space and how good their content is, and their incoming links, and their keyword indexing.

There are no magic bullets and anyone who claims they can guarantee a high search engine ranking on the major search engines is blowing smoke.

That is not to say that you shouldn’t try to optimize your site for search engine visibility. I frequently see sites that seemed to have been designed for stealth instead of visibility.

Apart from having good content, good indexing of said content, and incoming links from good sites that have a logical content-based reason to link to you, there are some things that you can do and some things you can avoid:

  • Do not create entire sites using Macromedia Flash (or any other dynamic architecture such as PHP/MySQL, Cold Fusion, or ASP). The search engines cannot be bothered figuring out how to analyze websites where navigation is dynamic and all content delivered dynamically will be ignored. The sole exception to this is where your dynamic mechanism uses mod_rewrite (or an equivalent) to mimic static links. (Note that Flash is fine for individual animated images… Just don’t build the site navigation using it.)
  • Do write compliant, valid HTML or XHTML and do validate your HTML/XHTML against the W3C validation engines. The search engine spiders and ranking algorithms depend on your code being correctly written in order to analyze your site. If your code is poorly written and incorrect, they will ignore your site. Note that your site can look fine to a human viewer using a browser and still be “broken” from a spider’s point of view.
  • Do use alt-tags if your site is based on image content. The search engines “like” images but the alt-tags are the only way they “understand” what the images represent.
  • Do submit your websites to the search engines when you first create them and after major changes or updates.
  • Do promote your website address in non-web channels. Use it in print advertising, your letterhead, business cards, press-releases. This does not help directly with your ranking but it does increase traffic to your site… And a surprising number of people actually search for a web address rather than enter it directly in the browser.

Problems with spam filters…

January 10th, 2007

This last few months I have noticed an increasing problem with legitimate emails getting blocked by poorly conceived and configured spam filters.

It appears that many ISP’s (Internet Service Providers) are responding to complaints by their customers (about the torrent of spam they receive) by modifying and reconfiguring their spam filtering technology.

Unfortunately, these responses have not been well thought out and implemented.

I will provide a couple of examples but first I need to define some terms:

  • User PC – the personal computer used by an end-user to send or receive an email
  • Email Client – the software on the User PC used to compose and read email
  • Email Server – the server used by multiple users to handle both incoming and outgoing emails
  • ISP – a provider of internet connectivity to individuals and small businesses (includes big phone companies like Verizon, big cable TV operators like Comcast, and smaller local providers).
  • Spammer – the bad guys, typically using someone else’s hijacked PC’s or servers to broadcast thousands of undesirable junk emails to huge lists of email addresses

Example #1

I recently sent an email to a customer (an firm of architects) and had it bounced back to me as spam by the firm’s ISP. In this case the ISP was small, local outfit but a lot of the big guys are doing the same thing.

I created the email on my Windows XP computer using my email client (Mozilla Thunderbird). My email client is configured to use myname@salemdesign.com as the “From” address. It is also configured to send email via an email server provided by the hosting service I use to host salemdesign.com. This email server has a name along the lines of smtp.hostingserverdomain.net.

Why was my legitimate email bounced back to me? The local ISP analyzed the header information of my email and found that the email server I was using was not part of the same domain as my email address and decided that this meant my email was spam.

My “From” address was myname@salemdesign.com which had a different domain than the email server smtp.myhostingservice.net. The email header also includes the numeric IP address of the originating SMTP email server which would also be associated with my hosting service rather than my salemdesign.com domain.

Now, it is true that essentially all spam is sent from SMTP servers which are not associated with their purported “From” or “Reply-to” email addresses. But it is also true that the vast majority of small to medium size businesses and organizations do not have their own SMTP email servers. So this policy of blocking emails where the “From” domain is not associated with the originating email server means that a lot of legitimate emails are being blocked.

In this particular instance, I called up my customer on the phone and explained the situation and asked for his fax number… And then I faxed him what I would have otherwise emailed.

Note that this means that this firm of architects is having some portion of its legitimate incoming email blocked. If the sender is not that motivated, he or she may just shrug and “walk away” and some potential business is lost.

The big email providers such as Yahoo, MSN/Hotmail, and Google/GMail also follow this practice. They don’t block unassociated emails but they do automatically put them in the end-user’s spam or bulk folder.

SPF Record

There is a partial work-around but it’s a little complicated and very few small or medium size businesses know enough to apply it.

The partial work around is called the SPF Record. SPF stands for Sender Policy Framework and you can find out more about at www.openspf.org. But in essence, the SPF record is part of a domain’s registration record and it provides a list of domains and servers that can legitimately send email associated with that domain.

So I set up an SPF Record that included the email server information for my hosting provider’s email server (and a third party email server I use). I waited a few hours and then sent test emails to accounts I have on Yahoo, MSN, and GMail… And it worked. None treated my emails as spam.

I then sent a test email to my architect customer and it also got through fine.

To summarize… Once the SPF record is implemented, an ISP (such as Yahoo) receiving an email purported to be from me@salemdesign.com can check the domain records for salemdesign.com and confirm that the smtp server I used was authorized to send salemdesign.com emails.

Example #2

My hosting service is very good about ensuring that their servers are not used for spamming. This is critical because each of their servers hosts dozens of domains and they all share a single email server. If any one of those domains is sending spam (either deliberately or because they were hacked) then all mail from that email server gets blocked or labeled as spam by the big ISP’s.

But recently a new problem has arisen with one of the larger ISP’s (Comcast).

The scenario is that someone hosting their domain with the hosting service decides to forward their email (sent to, for example, userA@theirdomain.com) to an email address they have on Comcast (i.e. userA@comcast.net). Presumably Comcast is the ISP for either their small business or their home computer.

This should not be a problem but it is… Because Comcast views any spam that is being forwarded as being generated by the forwarding server and Comcast then blocks said forwarding server. This is causing such a problem that the hosting service has banned anyone from forwarding email to Comcast addresses. This is, of course, hurting Comcast’s own customers… Some are finding that they are not allowed to forward emails to their own comcast email accounts… And even more are having legitimate emails sent to them blocked.

I have a number of customers with comcast addresses, and I now send emails to them via my GMail account. That seems to work fine even though I am sure a lot of spam gets forwarded via GMail. Perhaps Comcast is scared of blocking emails from an entity as large as Google.

Dealing with Spam

October 31st, 2006

I am writing this with a view to advising clients who have their email hosted on my server environment but much of the discussion is relevent to anyone with an internet email address.

First some definitions:

Email Server: This is the computer out there somewhere on the internet that handles all the mail being sent to individual accounts under a given domain. For example, if you have an email account with Verizon or Comcast (i.e. yourname@verizon.net or yourname@comcast.net) then any mail sent to you is initially sent to the verizon.net or comcast.net email servers.

Email Client: Is the software mechanism that you use to access and read your email. Common email clients are Microsoft Outlook Express, Microsoft Outlook, Mozilla Thunderbird, Apple Mail, Microsoft Entourage, and Eudora. These clients just listed all run on your PC, Macintosh, or Linux machine. There are also web-based email clients such as Google GMail, Yahoo Mail, Microsoft Hotmail, Horde, SquirrelMail, and others.

So, if I were to send you an email from my office computer the steps involved would be as follows:

  1. I would compose the email on my local computer using an email client (in my case, Mozilla Thunderbird).
  2. Once I am ready to send the email (having addressed it to yourname@yourdomain.com), I would click the send button in my Thunderbird client.
  3. Thunderbird would then contact an outgoing email server (usually either an SMTP or Microsoft Exchange server) and request that the email be sent. The outgoing email server will usually require me to provide it a login and password combination. In my case, the outgoing email server could be owned by my broadband provider (Verizon), or by my hosting environment (SalemDesign.com).
  4. Assuming Thunderbird provided a valid login/password combination, the outgoing email server will upload my email. It then looks at the address yourname@yourdomain.com and sends the email off across the internet to your incoming email server. (It is a tad more complicated than that but we don’t want to get bogged down in those details.)
  5. The incoming email server associated with yourdomain.com receives the email and it will check to see if it “knows” about an email account belonging to “yourname”. If you do have a valid account on the incoming email server then the email gets stored in that account.
  6. The next time you run your email client, it will query the incoming email server and “ask” if you have any emails waiting to be read. If you do, those emails get downloaded to your email client and (usually) deleted off the incoming email server. You can then open the individual emails and read them.

One would like, of course, all these emails that get downloaded to our email clients to be ones we want to read (i.e. from friends, business associates, etc.). Unfortunately, as we all know too well, most of the email we receive is junk or worse from people trying to sell us something we don’t want or worse.

How do our email addresses get onto spammer lists?

Spambots: These are software mechanisms that “crawl” over the websites (in the same way as search engine spiders used by Google and Yahoo do) and identify and collect email addresses (basically anything that looks like blahblah@blahblah.com or .net, or .edu, etc.). Given the existence of these evil mechanisms, any time you have your email address listed on a website whether it is your own or someone else’s, then you will be getting spam.

SelfInflicted: Anytime you provide your email to someone else, they may turn around and use it to spam you or sell your address to someone else who does. So be careful who you give your email address to… Even if they are apparently legit, ask them if you have to provide your email address and ask them how they use their lists and whether they sell or provide them to third parties.

Domain Registrations: any email used as part of a domain registration is publi

Options for home and small business networking

August 12th, 2006

Assuming you have (or are getting) a DSL or Cable broadband connection… The next question is whether you want to make that connection available to more than one computer and, if so, how.

Almost any small business will end up with multiple computers and most middleclass households have at least two computers, one for the parents and one for the kids.

If one is only connecting one computer to DSL or Cable broadband then all that is needed is a DSL or Cable modem. But the need for networking two or more computers is now so common that most broadband providers routinely offer a combined modem and router unit for no additional charge. It is the router that provides the ability to create a local area network that allows multiple computers to share a single internet/broadband connection.

In our area, the broadband providers are routinely offering a combined modem and WIFI/wired ethernet router unit although you may need to ask for the WIFI/wired router specially or they will pawn off a much less expensive wired ethernet unit.

The WIFI capability will typically support both 802.11g and the older 802.11b wireless standards. These will allow you to share the broadband connection with notebook or desktop PC’s that have the appropriate WIFI card installed in them. The speed of the WIFI connection ranges from a nominal 11mpbs (for 802.11b) to a nominal 54mbps (802.11g). In real life application this range is more like 5.5mbps to 20mbps. However, even at 5.5 mbps, this is faster than your typical broadband connection which will typically run 0.768 mbps to 1.5 mbps. So the bottleneck will not be your router or LAN connection.

The wired ethernet ports and cables will give you either 10 mbps or 100 mbps depending on which router you have and whether your computer’s NIC card supports the 100 mbps standard. Again, even the 10 mbps is so much faster than your broadband that your local network will never be the bottleneck.

802.11b/g WIFI
WIFI has the advantage that you do not need to trail wires between the router and the computer (or pull wires through the wall). But the WIFI signal does become attenuated by distance and walls/floors. For example, our WIFI router is in our second floor office. I had no problem using my notebook in the living room on the first floor but the signal dropped significantly if I took the notebook down into the basement.

Homeplug 1.0
Since I wanted to run a Linux server in the basement, I ended up getting two Home Plug 1.0 adapters that allow one to establish an ethernet connection over the house 120 volt AC wiring. I ended up getting one adapter from NetGear and the other from Belkin. In theory they should work together and I was relieved to find that they did. The claimed speed for Homeplug is about 14 mbps but in actual use it is probably about 4 mbps… Maybe a little slower than 802.11b WIFI but without, at least in my case, the distance attentuation that you get with WIFI.

I am not sure why but Homeplug has never really taken off. There are far more WIFI products and a lot more public awareness of WIFI… But, in some circumstances, Homeplug will work and WIFI will not.

So we now have a LAN with a Mac OS X desktop, a Windows XP desktop, a SUSE Linux server physically running on 802.11b wireless, Homeplug 1.0 power circuit, and wired ethernet. We have two printers (a laser printer and a multifunction printer/scanner/fax) that are also accessed from all three computers via the LAN.

Security
One needs to be aware that both WIFI and Homeplug networking have some security issues.

If you have ever used or seen someone else using a WIFI notebook at a coffeeshop or other WIFI hotspot you will have realized that there is no security and no barrier to the public accessing the network. Unless you are careful, your home network will be equally wide open. At the very least you may find that your neighbors are piggybacking on your DSL or Cable broadband connection, at the worst some local high school hacker may be stealing your identity or storing porn on your computer.

There are three basic steps to making your WIFI network more secure. First is to encrypt the connection using a 128 bit password. This is not as easy as it should be on most systems but make the effort any way.

Basically you set a password on the router and then enter the same password on each of the machines you wish to have connected via WIFI. The tricky part is that one typically enters a plain language password and the router will generate a long hexadecimal string derived from that password… And you then need to enter that long string of characters exactly into each of the WIFI computers you wish to use.

You should also tell your router to stop broadcasting its presence. If people don’t know the router is there they won’t try to hack into it. (A sophisticated attacker will detect it regardless but it may keep to local highschool kids out of your hair.)

Finally, if you really feel paranoid about your WIFI, you can restrict your network by MAC address. Each device (computer, printer, etc.) on your network will have a unique MAC address. You can enter a list of these addresses into your router and it will then only communicate with the machines on the list.

Homeplug is, in a sense, more secure because so few people use it that hackers do not look for it and are unlikely to understand it well enough to hack it. But even so, it is advisable to use the 56 bit network encryption option available for the Homeplug adapters. My understanding is that the Homeplug network signal will only be accessible as far as the nearest power transformer. In my case that is about 3 houses away.

DHCP
Dynamic Host Configuration Protocol is a router capability which automatically assigns each computer on your LAN with a temporary but unique IP address. That is what allows you to walk into a Borders Bookstore cafe and just connect your notebook to their WIFI hotspot. Turning this off and using assigned IP addresses might increase your security marginally but using a MAC list would be much more effective. In my case I turned off DHCP simply because I never could get it to work properly and disabling DHCP and assigning IP addresses was the line of least resistance.

ar.atwola, mapquest, and Firefox Adblock extension

July 31st, 2006

I was reading an email from a friend the other day telling me where she and her family had gone on vacation (Rehoboth Beach, Delaware). I wasn’t sure where Rehoboth Beach was, so I opened up a browser and went to mapquest.com.

The mapquest link showed up in my Firefox address field but nothing appeared in the main browser window. I tried a refresh, and still nothing appeared. I looked at the status window and saw that the browser was trying to load a link from some domain called ar.atwola.com… Which didn’t sound as if it was related to mapquest.com. I was starting to wonder if my browser had been hijacked. But I tried going to a few other sites (CNN, BBC, etc.) and they came up fine.

So I did a google on ar.atwola and found that it is a adserver run by AOL Time Warner. It seems that mapquest has some ad’s on their homepage that are served by the Time Warner server… And it appeared that the Time Warner server was having a bad hair day.

I still wanted to use Mapquest to see where Rehoboth Beach was so I thought about how I could get around the problem with the ar.atwola.com site.

And I remembered reading about the Firefox Adblock extension. I use Firefox as my main browser anyway so I found and installed the Adblock extension. I went back to the mapquest site and it still hung on me but I then used Adblock to identify and block all the ar.atwola.com links on the page. I then refreshed and, voila, mapquest worked and I figured out where Rehoboth Beach is (south side of the mouth of Delaware Bay).

My compliments to the folks working on the Adblock Project http://adblock.mozdev.org/index.html