Clean URLs, also sometimes referred to as RESTful URLs, user-friendly URLs, pretty URLs or search engine-friendly URLs, are URLs intended to improve the usability and accessibility of a website or web service by being immediately and intuitively meaningful to non-expert users. Such URL schemes tend to reflect the conceptual structure of a collection of information and decouple the user interface from a server’s internal representation of information. Other reasons for using clean URLs include search engine optimization (SEO),[ conforming to the representational state transfer (REST) style of software architecture, and ensuring that individual web resources remain consistently at the same URL. This makes the World Wide Web a more stable and useful system, and allows more durable and reliable bookmarking of web resources.
Clean URLs also do not contain implementation details of the underlying web application. This carries the benefit of reducing the difficulty of changing the implementation of the resource at a later date. For example, many URLs include the filename of a server-side script, such as example.php, example.asp or cgi-bin. If the underlying implementation of a resource is changed, such URLs would need to change along with it. Likewise, when URLs are not “clean”, if the site database is moved or restructured it has the potential to cause broken links, both internally and from external sites, the latter of which can lead to removal from search engine listings. The use of clean URLs presents a consistent location for resources to user-agents regardless of internal structure. A further potential benefit to the use of clean URLs is that the concealment of internal server or application information can improve the security of a system.
Table of Contents
Structure
A URL will often comprise a path, script name, and query string. The query string parameters dictate the content to show on the page, and frequently include information opaque or irrelevant to users—such as internal numeric identifiers for values in a database, illegibly encoded data, session IDs, implementation details, and so on. Clean URLs, by contrast, contain only the path of a resource, in a hierarchy that reflects some logical structure that users can easily interpret and manipulate.
Original URL | Clean URL |
http://example.com/about.html | http://example.com/about |
http://example.com/user.php?id=1 | http://example.com/user/1 |
http://example.com/index.php?page=name | http://example.com/name |
http://example.com/kb/index.php?cat=1&id=23 | http://example.com/kb/1/23 |
http://en.wikipedia.org/w/index.php?title=Clean_URL | http://en.wikipedia.org/wiki/Clean_URL |
The implementation of clean URLs involves URL mapping via pattern matching or transparent rewriting techniques. As this usually takes place on the server side, the clean URL is often the only form seen by the user.
For search engine optimization purposes, web developers often take this opportunity to include relevant keywords in the URL and remove irrelevant words. Common words that are removed include articles and conjunctions, while descriptive keywords are added to increase user-friendliness and improve search engine rankings.
A fragment identifier can be included at the end of a clean URL for references within a page, and need not be user-readable.
Slug
Some systems define a slug as the part of a URL that identifies a page in human-readable keywords. It is usually the end part of the URL (specifically of the path / pathinfo part), which can be interpreted as the name of the resource, similar to the basename in a filename or the title of a page. The name is based on the use of the word slug in the news media to indicate a short name given to an article for internal use.
Slugs are typically generated automatically from a page title but can also be entered or altered manually, so that while the page title remains designed for display and human readability, its slug may be optimized for brevity or for consumption by search engines, as well as providing recipients of a shared bare URL with the rough idea of the page’s topic. Long page titles may also be truncated to keep the final URL to a reasonable length.
Slugs may be entirely lowercase, with accented characters replaced by letters from the Latin script and whitespace characters replaced by a hyphen or an underscore to avoid being encoded. Punctuation marks are generally removed, and some also remove short, common words such as conjunctions. For example, the title This, That, and the Other! An Outré Collection could have a generated slug of this-that-other-outre-collection.
Another benefit of URL slugs is the facilitated ability to find a desired page out of a long list of URLs without page titles, such as a minimal list of opened tabs exported using a browser extension, and the ability to preview the approximate title of a target page in the browser if hyperlinked to without title.
Web sites that make use of slugs include Stack Exchange Network with question title after slash, and Instagram with ?taken-by=username URL parameter.
Source: Wikipedia, the free encyclopedia