Archive for the ‘Search Engines’ Category

Bing’s Blatant Censorship In Germany

Sunday, June 7th, 2009

Microsoft’s new search (decision) engine decides too much in Germany with very blatant and in my opinion stupid censorship despite no obvious reason for it. Of course Google doesn’t have these problems in Germany, so with Google in Germany you may actually search for a new pantyhose with Bing it’s impossible (this link only shows the message if accessed from Germany):

bingcensorship_1

The text says (in German):

The search pantyhose may return sexually explicit content.
To get results, change your search terms.

This censorship was already in use sometime on the MSN Live Search before. So does Microsoft really think this kind of censorship makes sense and will lead to more use than Google in Germany sometime in the future?

The problem I see is that the restrictions seem to be based only on specific words, so it is very easy to circumvent it. Sometimes it is even sufficient just to add an additional word. The most simple solution would be to change the country in the top right corner on the Bing website. Even if you are in Germany - just by switching the country to e.g. the United States there are no restrictions anymore whatsoever.

This kind of censorship also happens in other countries like India.

I have compiled a list of terms here that shows some of the words that are blocked and their corresponding translation. Ambiguity of words does not seem to matter at Bing. Very explicit terms have been excluded - anyways viewer discretion is advised:

bing-censored-search

It is amazing that even such terms like “handcuffs” and “pantyhose” are blocked by Bing.

Doublethink? Advertisers Are Allowed To Use These Terms

Interestingly there does not seem to be such a restrictions for advertisers. Although the ads are not shown if the search has been blocked by Bing searching for related words shows ads containing words that you can’t even search for. There are even some more explicit terms in the ads than I have used in the sample above.

bing-censored-ads

Do advertisers know that these words have been blocked? If I would be a website owner trying to sell pantyhose I would like to have my ad shown when someone searches for the word.

Good For Microsoft?

I highly doubt that this kind of censorship will be beneficial for Microsoft in Germany given the current discussions on internet censorship in Germany. But it shows what may be used in the future anyways not only on Bing.com.

If you want to protect your child from potentially harmful content on the internet (in a way that cannot be circumvented by two clicks as with Bing currently) there are other ways like talking with your child about it or in the worst case install a filter software on your PC. But if someone is trying to find websites on syringes (blocked in Germany) on Bing please let him find results.

My personal conclusion: I do not like censorship therefore I do not like Bing. I’ll stick with Google although it is being censored as well but not to such a degree based on single results not on search queries.

How To Use The Bing Webmaster Tools To Get Info On Your Site

Monday, June 1st, 2009

Just today bing.com, Microsoft’s new search engine, has launched. Surely you have already checked out the search results and checked positions of your favorite keywords. But have you also checked out all the tools Microsoft now offers webmasters to analyze their websites?

Verify Ownership Of Your Website

Just go to the Bing Webmaster Center and click on the “Add a site” button to add your website. In the form that is shown enter the URL of the website you wish to add. Bing even allows you to provide an email address ” to contact you if [they] encounter specific issues with your site” which sounds very interesting because Google does not provide that feature. Only the following weeks and months will show what the result of using that email feature will be.

bing-scr01

After submitting the form you have to add some verification code to your site (or your server). In contrast to Google which only requires you to create an empty file with a specific name Microsoft wants you to add an XML file to your server which a specific content. You can also choose to add a META tag to your site but I recommend using the XML file because it’s much simpler - you only need to upload it once to your server whereas you’d have to add the META information to the homepage template.

bing-scr02

After you have added the META tag to your homepage or uploaded the XML file click on the “Return to list” button. You’ll see your website in the list. Just click on the domain name.

Bing will access your website immediately and check for both the META tag or the XML file. If you have done everything correctly you will be taken to the site summary page which provides a wealth of information on your site as seen by Bing.

bing-scr03

Site Summary And Domain Score

The site summary shows you when your site was last crawled by the Bing crawler, the number of indexed pages, whether Bing has been blocked from accessing your site (if you have blocked it via the robots.txt file for example) and a domain score which is shown as five boxes. Microsoft writes here:

“Domain Score provides a measurement of how authoritative Bing views your domain to be, with five green boxes being the highest rating and five empty boxes being the lowest. This is based on many of the same factors Bing uses to determine static rank, but isn’t directly comparable.”

Luckily this blog has a domain score of 5/5 at the time of writing.

Bing also shows you the top 5 pages of your site.

Your Profile

When selecting “Profile” from the top navigation you can change the settings you have already seen when you added your site. You can also see the current verification method Bing is using to verify your site ownership.

Crawl Issues

This section shows you crawling issues that may have occurred on your site such as pages that Bing could not find (404 error) or pages blocked by the robots.txt file.

It also shows you a list of long dynamic URLs Bing has flagged because they think it might lead the crawler into an infinite loop trying to crawl the dynamic URLs and may also lead to duplicate content.

The Crawl Issues page also tells you whether the crawler found pages on your site which it believes to be infected with malware or using unsupported content types.

bing-scr05

Backlinks

The backlinks page shows you all of the backlinks Bing has found to your domain together with the page score, language and region of the page linking to your content. I really like the inclusion of the page score because it may be used to find “bad neighborhoods” linking to your site although Microsoft says that the score isn’t directly comparable.

The page will only show the first 20 backlinks but you can download the complete list as a CSV file to your system.

bing-scr04

Outbound Links

This page will show you all of the links on your site Bing has found that are leading to other websites. Just like on the backlinks page it shows you the page score, language and region as well and even allows you to show your outbound links to malware sites - let’s hope you don’t have any on your site.

Just like before you can also download the complete list as a CSV file.

Interestingly all of the links on my page leading to Twitter (the source of which is a Twitter plugin for Wordpress which shows the latest tweets on my blog) have a page score of 5/5. Does that mean that Bing sees Twitter as an authoritative site?

bing-scr06

Keywords

This page allows you to see “how your site performs in search results for searches using specific keywords” although I don’t quite understand the results. You can enter a keyword in the text field provided and it will show you the page on your site, the page score of that page and once again the language, region, last crawl date and whether the Bing crawler was prevented from accessing the page.

bing-scr07

It is interesting but I had expected to see SERP positions for the given keyword which would be a great feature. Entering “wolframalpha” shows a page score of 5 for my article on WolframAlpha yet when searching for “wolframalpha” on bing.com that page is not listed in the first 100 results.

More (Not So) Interesting Stuff

You can also add your sitemap directly by clicking on the Sitemaps tab.

The “Related Tools” section in the navigation on the left side lists some links that sound interesting at first but in my opinion they are a bit disappointing. If you thought that by clicking on the Robots.txt validator link you would be able to analyze the robots.txt file for your current site you’re wrong. You can copy the contents of any robots.txt file there to check it for incompatibilites with the MSNBot but that’s all. Slightly disappointing.

Likewise the HTTP Verifier and Keyword Research Tool links lead you directly to the default pages on the Microsoft website.

Bottom Line…

I recommend that you add your site(s) to the Bing Webmaster Center so that you can access the interesting statistics they provide - I’m sure many more tools will be provided in the future.

You should also check out the forum for many interesting discussions.

I’m amazed that Microsoft provides these features just from the launch day on.

We’ll see what else will be provided in the future.

Google Trends Gadget Reveals Interesting Weekly Search Behavior

Friday, May 22nd, 2009

Although Google Trends is nothing new embedding it on a website is. I just tried some Google Trends searches analyzing user search behaviour and some interesting  facts arose. When performing these searches you should always be aware that keywords may not have one meaning only thereby distorting the graph. So you shouldn’t use Google Trends to compare “apple” and “pear” because of the well known company named “Apple”.

The disadvantage of the Google Trends Gadget is that you cannot give a specific date for displaying statistics, only relative values so these graphs will surely look different each day you are looking at them on this page.

It’s Monday: Searching For A Doctor

It is interesting to note that these search terms have the highest search volume on Mondays and then fall off over the next days. In contrast to what one might expect there is not an even distribution over the weekdays and a lower search volume on weekends.

Weekend Priorities

As the weekend arrives other search terms dominate the search engine queries. The following graph shows some queries that show this weekend trend.
I just picked some words that came to my mind which are mostly unambiguous and have a comparable search volume.

Rather Even Distribution

As one might expect general search terms have a more even distribution over the week but a slight drop on weekends is noticeable which is somewhat expected.

Opposing Trends

Using Google Trends you can  also find somewhat opposing trends like this comparison of restaurants vs. hotels.

Browser Wars? Not Really.

This graph shows the distribution of the search queries for different browsers. With this graph you also need to keep in mind that the words “opera” and “safari” are ambiguous.

Using The Gadget For Your Own Statistics

You can of course use this information to find out when to launch specific campaigns or when to expect a higher AdWords search volume. If you want to see current statistics on Google search volumes for different keywords you can now simply create a page and embed the Google Gadget several times on that page just like I did on this page. So you don’t have to go to the Google Trends website to search for different terms one by one.

Embedding The Google Trends Gadget On A WordPress Blog

If you are using WordPress and want to embed these statistics on your blog you need to have a plugin for WordPress that enables the use of IFrames like Embed Iframe otherwise for security reasons WordPress will silently eat your IFrame code.

This is a sample code for embedding the gadget into your WordPress blog if you have installed the Embed IFrame plugin (you need to remove the space in front of the “iframe” word at the beginning and just before the closing bracket):

[ iframe http://www.gmodules.com/ig/ifr?url=http://www.google.com/ig/modules/trends_gadget.xml&source=imag&up_is_init=true&up_cur_term=firefox,internet%20explorer,explorer,opera,safari&up_date=mtd&up_region=US 330 250 ]

tweetthis-15

How To Create Search Engine Friendly RewriteRules For Domains

Tuesday, May 19th, 2009

Do you have multiple domains pointing to your website? From an SEO standpoint you shouldn’t have them all pointing to your document root directory because this might be interpreted as duplicate content by the search engines or you might even get a mixup of pages distributed over several domains and subdomains (like example.com and www.example.com) in the organic search results.

If you are using Apache and have access to the RewriteModule you should configure the VirtualHost like this so that all domains and subdomains are redirected to the main hostname via a search engine friendly HTTP 301 redirect:

<VirtualHost *:80>
ServerAdmin webmaster@example.com
DocumentRoot /srv/www/example.com/htdocs
ServerName www.example.com
ServerAlias example.com
ServerAlias www.example.org
ServerAlias example.org
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com$1 [L,R=301]
…
</VirtualHost>

This will redirect all requests not going directly to http://www.example.com to that hostname. So even if someone enters http://example.org/index.php the server will issue a redirect to http://www.example.com/index.php.

Of course this also works if someone links to your page with a link like http://example.com/file.gif. In that case the redirect will be issued to http://www.example.com/file.gif. Google likes that.

tweetthis-15

Why WolframAlpha Should Hire An SEO

Sunday, May 17th, 2009

Although my first look at WolframAlpha was rather disappointing I didn’t really intend to show everything that’s not working as I had expected. However I stumbled upon problem after problem.

Now I didn’t intend to write this article at all but just searching for what Google has already indexed of WolframAlpha and following some links made it clear that they obviously didn’t think about SEO at all prior to launching their site.

The problem is: they are doing many things that  might actually harm their Google rankings that are just beginning to show up. We don’t yet know who will be using WolframAlpha in two months, if it will grow to become something that most of the people will use  just like Google or if it is going to fail like WikiaSearch.

Remember: they received over 1000 links to their site within two days and this is likely to grow so it makes sense to optimize the site now to prevent damage.

So from a logical point of view if I had been responsible for the WolframAlpha website I would not have done any of the following that WolframAlpha has implemented:

  • You Should Prevent No-Result Pages From Being Indexed
    “no result” pages return an 200 HTTP status code, not a 404 - Google will likely index these pages. The page doesn’t contain any META robots tag containing “noindex,follow” either.
    Because the differences between different “no result” pages depending on the query are only marginal this might be seen as duplicate content which you should never have on your own site.
  • Don’t Use Redirects For Loadbalacing
    Using the search field on the homepage now redirects via a 302 redirect to a hostname like “www18.wolframalpha.com” leading to even more duplicate content if Google indexes the same page with two different hostnames, e.g. www18 and www76. Normally you would be using a loadbalancer and not use HTTP redirects to distribute the load. They should at least be using the canonical tag to tell Google the original URL, but they aren’t, regretfully.
  • Use A Robots.txt File
    WolframAlpha is currently not using a robots.txt file so they are not telling search engines what should not be indexed.
  • Optimize The Clickthrough Rate In The SERPS
    The result pages do not contain a META description field which I would recommend for a higher click-through rate in the organic search results.
  • Use Text To Display Text, Not Images
    The title tag could also be optimized but I still don’t understand why even simple text is rendered as an image.

What do you think?

tweetthis-15

Get Info On Domains With WolframAlpha

Saturday, May 16th, 2009

Did you know that WolframAlpha allows you to get info on specific websites and domains?

It’s quite easy as you just need to enter the domain name into the search field. Most of the data seems to have been retrieved from Alexa but it’s still a great idea. I still don’t understand the HTML element hierarchy graph but it looks interesting. However I don’t know what the tag information could be used for.

The following image shows the info WolframAlpha shows currently for this blog. And I think I need to say that the number of visitors is a bit far off ;-)

WolframAlpha Sascha Kimmel

A Disappointing First Look At WolframAlpha

Saturday, May 16th, 2009

After all the hype that has occurred prior to launch of the new WolframAlpha knowledge engine I checked it out after it went live a couple of hours ago. Here is my personal evaluation of this new site. Having seen many search engines and technologies rise and fall within the last decade I was curious to test the site as soon as possible. As I am not a scientist I will try to provide a view which focuses on the normal internet user.

First Tests

As I had recently been trying to find out how many castles there are in Scotland I typed „castles scotland” into the input field. WolframAlpha returned the not very helpful message „Wolfram|Alpha isn’t sure what to do with your input.” so I changed the query to „number of castles in scotland” which still returned not a single result. Maybe WolframAlpha just does not know that so I tried „number of rivers in scotland” which was interpreted as “is Rivers, Manitoba, Canada in Scotland, Connecticut, United States” and returned „Result: no”.

So I changed the query to „number of rivers in Germany” which worked out fine - great! The first correct result was returned!

In the minutes that followed I went on to enter different queries and analyzed the results to come up with the following list of inconveniences and problems from my point of view.

Where Are All The Links?

The fundamental elements of the web that keep it all glued together are links yet I didn’t get any result which allowed me to dive deeper into the information. Surely after clicking on some values a layer occurred that allowed me to copy the values but only sometimes there were links contained that I could click on directly. Links from one page to another page is what sometimes keeps me for half an hour or even longer on Wikipedia traversing through the links from one article to the next although I only have been looking for a simple answer to a problem. WolframAlpha doesn’t seem to offer that kind of linking most of the time.

Just searching for Inverness which is a city in the Scottish Highlands near the famous Loch Ness returns some useful information on the city. It also lists cities nearby. Good idea but I’d like to click on any city name directly to perform a search for that city, e.g. Edinburgh. What I need to do is either click on the name of the city, find the link to the city and click on it or copy the city name or enter it manually into the input field and perform a new search.

Another example: searching for „Walt Disney” does not return a result on the person Walt Disney but on the company. I need to select „Use as a person instead” which by the way is in a very small font which I personally regard as bad usability. From the Walt Disney (person) result there is no link back to the Walt Disney Company he founded. Why not? Wikipedia has it.

Weird Incompleteness

WolframAlpha doesn’t return anything on “london underground” which is the oldest subway system of the world. Yet “new york subway” returns a result. Likewise you can search for “longest subway system” which returns the New York City subway system but don’t search for “oldest subway system” (which would be the London Underground) which will return nothing at all. This is even the more fascinating if you keep in mind that Stephen Wolfram was born in London.

Google’s top results for “oldest subway system” show that it’s the London Underground - and I don’t even need to click on the results to get that information as it is contained in the snippets Google provides.

If you enter “liberty island” WolframAlpha doesn’t find anything, however Google does. Yet WolframAlpha knows the Brooklyn Bridge. Seems to be more important to know the Brooklyn Bridge than to know the island where the Statue Of Liberty is located.

There Often Is More Than One Answer To A Question!

Most of the time you are stuck with the result without any helpful links whatsoever. Searching for “toons” shows “Interpreting “toons” as “towns”" which for me seems very far-fetched. It also does not return a result at all but allows you to select a city. I would have expected to receive the result for “toon” instead which is the correct singular for “toons”.

The word “simpsons” is interpreted as “sum formula” yet a search for “bugs bunny” returns information on a Warner Brothers movie entitled “Bugs Bunny’s 3rd Movie: 1001 Rabbit Tales (movie)” with data from the IMDB. I would rather have seen some historical info on Bugs Bunny as on Wikipedia.

Likewise a search for “james bond” returns no info on the fictional character but returns information on the movie “A View To A Kill” from 1985 which is a somehow matching result but not really what I had expected. Why has exactly this movie been selected? Luckily WolframAlpha also shows a list of all of the other James Bond movies. But, again, something unexpected happens. If you select “Casino Royale” from the list you won’t see any information on the movie from 2006 but instead on the 1967 TV version of the book. There seems to be no ranking. And there is no way for you to find out that there is another movie from 2006. No link, no info. If you only depend on this information you’re doomed to fail. If you search for “Casino Royale” manually info on the 1967 movie is shown but you can select the movie from 2006 directly.

Using Google searching for “Casino Royale” shows the IMDB entry for the 2006 movie as the first result which is what I would have expected. The second result from Google shows the 1967 version. Great!

If you query WolframAlpha for “Wolfram” you’ll be shown info on Stephen Wolfram - the creator of WolframAlpha.

Yet if you try searching for the Google Founder’s last names “Brin” and “Page” on Google that kind of bias doesn’t exist there. For me WolframAlpha’s result in this case is not an objective result.

Some results seem to be very blatant errors. Just searching for my surname “Kimmel” as a single word without any spaces is interpreted as the distance between “Kim, Sughd, Tajikistan” and “Mel, Veneto, Italy”. Ouch! However searching for “Jimmy Kimmel” returns information on the talk show host.

Searching for “Illuminati” returns no result (conspiracy theorists: here we go) yet searching for “Adam Weishaupt” which has founded the Order of the Illuminati returns a result.

I could literally go on for hours but you should just try it for yourself but forget searching for “Mickey Mouse” and “Seinfeld” as no results are returned for these terms currently.

Bottom line: contrary to mathematics there is not always only one solution to a problem. Just imagine Google would only show you the one result it thinks is the best match. You wouldn’t like that either I suppose.

Diving Into The Scientific World

Although I am not a scientist I just tried some searches with some unexpected results as well.

I have learned not only from Google but also from the previous WolframAlpha searches performed above that queries seem to be case-insensitive. That was an error. I searched for “h2o” all in lowercase to get info on the water molecule. Yet WolframAlpha interpreted this as a degree value. No water here. Searching with H2O in uppercase works though returning the expected result.

Second try: I entered “au” which is what I believe to be the chemical abbreviation for gold (from Latin “aurum”) but this has been interpreted as “astronomical unit”. Although there are many links at the top of the page there is no link for “as a chemical element”. Searching for “Au” returns the expected result however. To get the correct results you obviously need to know the correct capitalization of the word you are looking for.

I don’t think the normal web user knows that.

Third try: searching for “fly genome” returns no result. Google shows the expected results with the Berkeley Drosophila Genome Project first. I then searched for “drosophila genome” on WolframAlpha but got no result either just a reference to “Animals: drosophila”.

Some More Searches

Here are some searches I performed which really give great results and bad results, respectively:

Good:

Bad:

One-Fits-All Approach Is Wrong - Case-Sensitivity Is A Problem

In my opinion WolframAlpha should not return only one result or if it does it should offer a better disambiguation to the user. The one-fits-all approach is wrong. Currently it still doesn’t return the correct result quite often. Google on the other hand shows not only one result but (most of the time, apparently) the results that it believes are the ones the user has been searching for but does not decide for the user what he seems to have intended. Searching for “flytrap” on WolframAlpha returns a word definition as “a trap for catching flies”, not even the botanical definition of the “Venus Fly Trap” or anything else. If you search Google for “flytrap” the results contain completely different entries allowing for a manual disambiguation. There is a company named “Flytrap Technologies”, an eZine named “Flytrap”, a Wikipedia article on the Venus Flytrap and much more. Therefore you can refine your search and search for “venus flytrap” on Google to get more information on that.

Bottom line: If you know exactly what you are looking for and know the complete correct term and capitalization you will most of the time get the results you are looking for. If you don’t you’re lost quite often. WolframAlpha knows only one “John Smith”, Wikipedia knows more than 80 people with that name.

Let’s hope WolframAlpha gets better for every one of us, not just for scientists - I’m sure they’ll love it. For now I’ll try WolframAlpha often but I’ll stick with Google and Wikipedia for most of the searches. What about you?

Anways, it still contains a huge amount of knowledge and definetly is something to thank the creators for. Surely it will develop over time. Let’s hope it doesn’t go where WikiaSearch has gone before.

tweetthis-15

Website Performance Checklist (PDF)

Thursday, May 14th, 2009

In this blog I have previously posted an article series on how to achieve maximum website performance. Now if you wish to follow the steps described in the articles I thought it’s quite helpful to have a checklist ready that you can print out and tick each box for every single optimization step that you have checked and optimized.

I have refrained from using any colors so it’s purely black and white for your day-to-day use.

So I created this checklist and offer it here as a free PDF download. Just click on the following button to access the website performance checklist. I always appreciate your comments, feedback and suggestions.

downloadnow-free