Creating Friendly Websites and Pages
by Walt Stoneburner
These are a number of secret and unconventional rules
and guidelines that help make a
site popular and keep visitors coming back. Make your sites
friendly to visiting humans, make your sites friendly to
search engines so people can find your sites.
Understanding How Web Engines Work
Search engines that operate by a database of categories, like
Yahoo,
rank sites depending on when you registered,
how the operators manually rank you, and
how much you pay for your listing. Free
listings typically get ranked below paid listings,
unless there is something very special or popular about
the site.
Search engines that operate by combing through the
entire web, such as
AltaVista,
index on keywords
found both in the header and body of documents.
Search engines that operate by looking at how many other
sites are pointing at you, like
Google does,
make the assumption that the more other people link
to you, the more useful you are, and thus more deserving of a
higher ranking.
Trying to Trick the Search Engines
Some techniques people have tried are:
- Registering to a zillion search engines.
Save yourself the time and money, pick a few popular
search engines. Many of the "register with 1000 search
engines" are special purpose ones that are targeting
foreign countries and specific interest groups. Visit somewhere like
DogPile, which is
a search engine that searches other search engines, and
see where it's looking. Go register at those
sites, which are usually free.
Search engines quite often rip off the database of other
search engines in order to be a superset; hit just the popular
ones.
- Registering over and over at the same engine.
Search engines will come and visit you once, grab a few pages,
and come back later (as not to tax your server). Continual
resubmissions are usually ignored, and at worse can get your
site blacklisted. Register once. If your site does a major
overhaul, then consider resubmitting.
- Duplicating keywords.
Search engines are on to this trick, and fold out duplicate words.
Even better, some search engines are refusing to
list people who do this. Salting the mine usually has the
opposite effect than you intended.
- Adding suffixes, plurals, and case changes to keywords.
Search engines are getting much smarter; you need only use
just the root word. Keywords are typically case insensitive these days;
case changes are another form of duplicated keywords and can
get you excluded from listings as well.
- Adding lots of different keywords in the header section.
Search engines have combated this by assuming that pages
with a lot of keywords are more generic. You can get better
listing results by splitting up your work and making individual pages
with distinct keywords that are focused to a topic.
- Keywords in comments.
By putting keywords in comments,
<!-- keywords, ... -->,
people have tried to bloat up their document's body artificially.
Good search engines parse the HTML code and eliminate comments from
candidate indexable words.
- Using white text on a white background.
To get past comment exclusions, people have tried to use the
same color text as the background. Search engines are on to
this and may opt to exclude you in their listings.
- Tiny, tiny, print.
By adding lots of visible, but very, very small text (usually scuffled
off at the bottom of a page), people have gotten past the same-color
exclusions. This trick appears to be effective, until you realize
that the more generic your page's content, the less likely you'll
be listed. Visitors who see your page instantly know they're being
scammed.
In summary, search engine technology has gotten a lot smarter,
and the best way to get listed -- after all, that was the goal -- is to
have specific content that's well described by a terse set of meta-tags.
There is just no substitute for good content, well described content,
and sharing links with other sites.
Site Analysis
By nature or design, the majority of your site's visitors won't reach
you through the front door. Thus your
domain name is not as
important as you think! People typically go to a search engine,
click a link and bookmark it without bothering to read the URL.
Consequently, the most popular pages on your site most likely won't be
your main page ...they will be the best indexed and most interesting page(s)
you have!
What you want to do is ask your web site administrator for your site's
usage statistics. Aggregate them across the pages and produce counts.
A small handful of these pages will have an astronomically higher hit
rate than your other pages.
Use this information to your benefit.
- These are the pages you want to use as sub-portals into the other areas
of your site! You know people are getting to these pages, so
capitalize on that knowledge -- provide more links, even if in an
unrelated sidebar.
- For whatever reason, your popular pages can quickly turn you
into an authoritative source of information. Consider adding more
of that information to your site -- as you're already indexed for it.
You can use the additional pages as secondary portals as well.
- Revisit the meta-tags on pages that you want to have more hits.
Give the pages a more interesting title.
Consider partitioning the information into several different pages
and link them.
- Save yourself a lot of wasted and unappreciated effort. By knowing what
doesn't interest people and what's being neglected on your
site, you can reclaim a lot of time.
The Big DOs
- Separate Content from Presentation.
Whether by use of Cascading Style Sheets
(CSS),
Personal Home Page (PHP),
or
mod_layout,
it is easier to maintain and automate pages when the content that's
displayed isn't polluted by the mechanism that provides the format.
Tags should convey what something is, and styles should
convey how the client wants to display it in form
(layout) and presentation (color and graphics).
- Hints for Search Engines: A Links Page.
Make a links page that isn't intended for human readers but
rather for search engines by listing the important URLs at
your site in a bland and terse style that a simple search engine
can parse. Then submit that page to the search engines and let them
comb your site.
Since the search engine is going to read the links and go
visit them all, it's a clever way of submitting a batch of
links all at once. This way you can list multiple entry points
into your site.
- Make Readable Links.
It's considered bad form to say something like
"for more information, click here" rather than
indicating the nature of the link -- even though it appears
from statistical sampling that this technique is more effective.
The reason isn't that there's a bunch of lowest common denominator users
floating at the low end of the gene pool, it's because site authors don't
help readers by saying what's at the other side of a link. Construct sentences
that allow a reader to dig for more information or
examples,
letting the context
of the sentence imply where the link goes.
- Keep the Site Fresh.
A site that changes,
via automation, is more exciting to users,
more apt to be linked to by other users, and demonstrates the
need for search engines and users to come back more often.
- Show How New Something Is.
Ever revisited a site, but you didn't know what changed? Ever been to
a site where "new" splashes were themselves stale? Don't just use the
concept of new or old, instead show how new something is. This attracts
visitors to click through, and it acts as a passive nag reminding you
to give your site the attention it needs. See an
example.
- Keep Pages Small.
Keep your HTML small and tight to make your pages load fast.
Visit www.the5k.org
and see how HTML masters make very elaborate and
interactive pages in a small amount of space.
You can learn a lot by doing a View Source
on other people's pages; this is perhaps the single best
way to learn HTML.
- Trust Yourself to Hand Roll Your Pages.
Tools like FrontPage and exporting Word documents are convenient, but they
generate huge volumes of unnecessary tags that slow load
times and introduce browser compatibility rendering errors.
Learning the underlying
raw
HTML,
using a simple text editor,
a compliant browser,
and
validating your pages will
provide fast, small, easy to maintain, and attractive pages.
There are even tools to help
tidy your
HTML and
check your pages
for dead links.
- Reflect Header Keywords in Your Document Body.
Search engines score your pages more highly when the keywords in the
body match the keywords in the header. This may mean you have
to edit your content a bit to use certain words, but it helps people find
your pages easier. Think of what someone will type into a search
engine to find your page, then include that wording in a sentence.
- Have a Good Opening Paragraph.
Some search engines will display the first paragraph or sentence
of your page. If your page starts with a short synposis, overview,
or description of what's on it, it will get more hits.
- Use HEIGHT and WIDTH in Image Tags.
If the browser doesn't know how large an image's dimensions are,
it has to wait for the image to finish downloading before it can
render the rest of the page correctly. When you provide
image sizes, it can render the page and fill the image later.
- Don't Use the Browser to Scale Images.
Browsers are usually pretty poor about scaling images with
decent quality, so make your HEIGHT and WIDTH attributes exact.
Too small, and you load more image data than you really needed to,
giving the appearance of a slow page. Too large, and the picture
becomes blocky. Have multiple images on your server if several
sizes are needed.
- Take Advantage of Image Caching.
Once your browser has loaded an image for a page, you can use it
repeatedly without penalty within the page. Browsers are also
good at caching between pages, so if you have the same graphic
on another page, it should load ultra fast.
- Learn to use META-information.
Each web page consists of two parts, the header and the body.
Many web search engines get their clues from
META
tags in
the header section. Learn to use these:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
Describe which XML format you want to use for your HTML pages.
<HTML LANG="EN-US">
Tell what language your pages are in.
<HEAD>
<TITLE>Page Title Goes Here</TITLE>
A good search engine will display the title of the page in the
listing. Make it clear and concise.
<META NAME="Content-Type" CONTENT="text/html; charset=iso-8859-1">
Tell what kind of page this is, and the character set used.
<META NAME="AUTHOR" CONTENT="Your Name Here">
Tell who authored the page.
<META NAME="KEYWORDS" CONTENT="Keywords,Separated,By,Commas">
Use a short list of relevant, non-duplicated keywords
that also appear in your document body.
<META NAME="PUBLISHED_DATE" CONTENT="20-Aug-2001 14:32:05">
Tell when the page was made; automation can be used to change this
value, enticing search engines to return more often.
<META NAME="DESCRIPTION" CONTENT="Concise description of page's content and purpose">
This is usually the only text users will see in a search engine
listing that you can count on; its job is to entice them to come
visit your page. You have
about 250 characters to play with. Keep it short. Keep it relevant.
<META NAME="DISTRIBUTION" CONTENT="global">
Who should this page's distribution go to?
<META NAME="COPYRIGHT" CONTENT="Copyright message">
Provide a copyright. Use the word Copyright, with an optional
(c), or © (©) followed by the
year, the copyrighter's name, and include the words All Rights
Reserved. to validate it for international treaties.
<META NAME="GENERATE" CONTENT="Hand Edited">
Tell how the page was made.
<META NAME="RATING" CONTENT="G">
Rate yourself.
Help content policing browsers,
so the government doesn't step in and police servers.
<LINK REV="MADE" HREF="mailto:[email protected]">
Tell how to contact the owner of the page.
<META NAME="ROBOTS" CONTENT="ALL">
Invite robots like search engines and web spiders
to comb your whole web site.
<META NAME="REVISIT" CONTENT="15 days">
Suggest when the indexing agent should return again.
<META NAME="CATEGORY" CONTENT="Main Page">
Tell what kind of page this is.
<META NAME="LANGUAGE" CONTENT="English">
Tell what language the page is in.
</HEAD>
<BODY>
...
</BODY>
</HTML>
- Tell
Robots
When To Visit Again.
Register with a few primary web pages, but also use the
META tag REVISIT to influence browsers
to keep revisiting; don't be greedy by setting too small a value.
(Robots and spiders are different from
WebSnakes
which download a site to the local hard drive.)
- Have Consistency in Design.
Related areas on your website should have a similar design theme,
so it's noticeable when you've left an area.
- Avoid Forcing Readers to Click More Than Needed.
Breaking tightly related information into multiple pages,
forcing readers to click for more information, etc. are all sources
of unnecessary interactive distraction. Longer pages read and print
better than ones with artificial boundaries. Consider the Internet
a new kind of media with its own rules. Where the physical size of
a book forced us to have to turn the page to continue, that's
no longer necessary -- even with
comic books.
- Use Color Wisely and Sparingly.
Use a simple yet consistent color scheme.
Visit Visibone for great
posters, discussions, and add-ins for web-based palettes.
Additionally,
The
Designer's Guide to Color Combinations is a great
textbook resource for theme-based colors.
The Big DON'Ts
- Avoid Font Soup.
Just as color can be abused, having too many
fonts,
or mixing the wrong
font
families can create a terrible look. This is definitely
a case where
more
isn't better.
- Don't Throw Technology Around for Technology's Sake.
Recognize that advanced features limit your audience.
Every ActiveX control cuts you off from the very large non-Microsoft
world (this is a good way to lose foreign visitors and businesses
that are using Linux).
Every Cascading Style Sheet, JavaScript,
graphic,
Flash, and
Java
app you add reduces your audience further
due to bandwidth and capabilities. Some people
can't display these
things, others are impatient, and some people have to
pay
for local phone service (especially outside the USA). Help them.
You need to decide upon a proper balance; consider
having a plain vanilla page in addition to the snazzy GUI.
It takes effort to make a good site. Evaluate if the return on
investment is worth your additional time, expense, site maintenance,
and lost visitors by pushing the envelope too far. Content wins
over presentation in all but a few exceptional cases.
- Avoid Advertisements.
We know from experience that people do not get rich off of
selling ads on their website. Ads unrelated to you
are annoying, distracting, space consuming, and usually
slow down
page load times.
Besides, serious surfers use
ad killing software.
We hate ads
in ICQ, we hate them on web
pages.
- Avoid Remote Resources.
Anything that's not local to your server, such as a counter
or a graphic, means that the browser has to go look up
information from another site. If that site is slower,
or congested, your page loads slower. If that site is
down, or temporarily unavailable, your page is broken.
- Avoid Large Graphics.
Learn how to use
GIF (though
many people have valid
political issues
with this format),
JPEG,
and
PNG
files optimally; keep the file
size for any given graphic to under 32k if possible. If you have
multiple graphics, try to keep the total size in this range.
Most visitors are using modems, and many aren't getting 56K
connections. Be bandwidth friendly, because a visitor who's
bored waiting for a page to load will go browse elsewhere.
- Avoid Lots of Small Graphics.
Most web browsers, to reduce load on a given server, can only download
four to eight graphics concurrently. Thus it can be better to download
one larger graphic than dozens of smaller ones. This is why many photo
albums take so long to load.
- Avoid Clip-Art.
Clip-art usually conveys amateur; only add graphics if it
adds to a page's readability or content. Otherwise, you're
just chewing up bandwidth. Problems with clip-art are:
- it's the wrong size (often too big)
- it's usually blocky (poor resolution)
- it's the wrong color
- it's usually flat
- it doesn't match the page's style or theme
- it doesn't match the page's topic
- it isn't unique
- Avoid Motion.
Movement is distracting on static text pages because it
diverts the eye's attention while trying to read.
The evil <BLINK> attribute should be used ultra-sparingly, if at all.
Animated graphics pose similar problems.
- Avoid Frames.
It takes a lot of hard work to get frames to operate correctly,
with the biggest problems being:
- recursing into your own site
- printing the browser window
- invoking JavaScript that hasn't loaded yet
- the open-link-in-another-window problem
There are also legal considerations about wrapping
someone else's site in your frames; courts are usually well
behind the times in rational
thinking when it comes to citizen rights and
drawing parallels between technology and the real world.
More importantly, frames
consume a lot of screen real estate - if you want a toolbar, just
put it in somewhere in the document; users have continually shown
they don't mind scrolling to get to it.
- Avoid Background Sounds and Music.
Pages with sound usually take longer to load, depend on users
having sound cards, and it is darn annoying to have the same
"welcome" message playing each time the user presses the
back button from a link on your page.
Consider the fact that users "power browse" by
visiting more than one site or page at a time. But the worse
offense is that it limits when a visitor will be able
to visit your site: if it's late, he make wake the baby by
going to your page; if she's at work, her boss may hear her
surfing non-work related pages. Even at lunch, this disturbs
co-workers. Make the sound something they can get to, not
an insupressable default.
- Don't Use Wide Pages.
Forcing your page width, for example by enclosing the body
of the document inside a fixed size table, can cause
a page to be too wide to print. Wide pages also
create long lines of text, which are harder for the eye
to follow.
- Avoid Counters.
Ask yourself, when visiting other people's pages,
if you really care how many visitors they have?
No. You don't. You are interested in content.
It's hard to find useful, accurate, non-intrusive counters
that don't slow a page down, which fit the color, theme,
and size of the page. Without using browser cookies,
sessions, and other advanced techniques, counters are
going to give an artifically inflated number. If a user
uses a different browser program, has dynamic IP addresses,
or accesses your pages from different locations, even at
different times, it taints the actual visits. Instead,
work with the web master to do an analysis of your site's
logs. After all, just because several thousand people
have read a page doesn't make the information on the page
have more value (though it does make he location of that
page valuable to potential advertisers).
If you have corrections, ideas you'd like to contribute for
credit here, spotted a dead link, or would like to suggest
a useful resource, please feel free to
send them to the author.