INTRODUCTION TO URL ENCODING / URL ENCODED STRINGS July 19, 2010Posted by Tournas Dimitrios in Uncategorized.
URL Encoding is the process of converting string into valid URL format. Valid URL format means that the URL contains only what is termed “alpha | digit | safe | extra | escape” characters. You can read more about the what and the whys of these terms on the World Wide Web Consortium site: http://www.w3.org/Addressing/URL/url-spec.html and http://www.w3.org/International/francois.yergeau.html.
URL encoding is normally performed to convert data passed via html forms, because such data may contain special character, such as “/”, “.”, “#”, and so on, which could either: a) have special meanings; or b) is not a valid character for an URL; or c) could be altered during transfer. For instance, the “#” character needs to be encoded because it has a special meaning of that of an html anchor. The character also needs to be encoded because is not allowed on a valid URL format. Also, some characters, such as “~” might not transport properly across the internet.
One of the most common encounters with URL Encoding is when dealing with
As an example :If a form contains a text field with the value of ” This is a simple & short test. ” , and the form uses the GET method , which means that the data will be appended as query string.If you click the button and look at the resulting URL in the browser address bar, you should see something like this (the query string portion, which is automatically URL encoded by the browser, is shown in blue):
Here, you can see that:
- The <space> character has been URL encoded as “+”.
- The & character has been URL encoded as “%26”.
<space> character and & character are just some of the special characters that need to be encoded. As you can see, when a character is URL-encoded, it’s converted as %XY, where X and Y is a number. You will see later where these numbers come from.
What Should be URL Encoded?
As a rule of thumb, any non alphanumeric character should be URL encoded. This of course applies to characters that are to be interpreted as is (ie: is not intend to have special meanings) . In such cases, there’s no harm in URL-Encoding the character, even if the character actually does not need to be URL-Encoded.
Some Common Special Characters :Here’s a table of some of often used characters and their URL encodings.
|<space>||%20 or +|
Note that because the <space> character is very commonly used, a special code ( the “+” sign) has been reserved as its URL encoding. Thus the string “A B” can be URL encoded as either “A%20B” or “A+B“.
Where Does the Numbers Come From?
The number following the % sign is the hexadecimal ASCII code of the character being encoded.ASCII stands for American Standard Code for Information Exchange. The purpose of ASCII is to create a standard for character-sets used in electronic equipments. The standard ensures that different devices (which might be manufactured by differing companies) can communicate to each other with the same character-code. With this kind of standardization, programmers and computer manufactures will be “on the same page”. Imagine the problem if MAC and PC have different keyboard codes. The standard ASCII character-sets not only includes all the alphabetical letter and numbers, but also punctuations.Click on pictures to enlarge.
Well don’t panic if you need to convert a text to url encoded string there are plenty of sites that make the conversion on site .One link that i use often is here
Most web programming languages already provide built in method to perform URL Encoding and URL Decoding. Here are the common ones, click the method name to find more info.
|Languagege||URL Encoding||URL Decoding|
or see this link.
|See this link.|
Use CGI.pm module. Link.