
Returns the encoding object corresponding to ENCODING. See "The UTF8 flag" below.īytes2str may be used as an alias for decode. Though both contain the same data, the UTF8 flag for $string is on. See "LEAVE_SRC" if you want your inputs to be left unchanged.įor example, to convert ISO-8859-1 data into a string in Perl's internal format: $string = decode("iso-8859-1", $octets) ĬAVEAT: When you run $string = decode("UTF-8", $octets), then $string might not be equal to $octets. For encoding names and aliases, see "Defining Aliases" for CHECK, see "Handling Malformed Data".ĬAVEAT: the input scalar OCTETS might be modified in-place depending on what is set in CHECK. As with encode(), ENCODING can be either a canonical name or an alias. This function returns the string that results from decoding the scalar value OCTETS, assumed to be a sequence of octets in ENCODING, into Perl's internal form. #decode $string = decode(ENCODING, OCTETS) Str2bytes may be used as an alias for encode. If the $string is undef, then undef is returned. When you encode anything, the UTF8 flag on the result is always off, even when it contains a completely valid UTF-8 string. Though both contain the same data, the UTF8 flag for $octets is always off. See "LEAVE_SRC" if you want your inputs to be left unchanged.įor example, to convert a string from Perl's internal format into ISO-8859-1, also known as Latin1: $octets = encode("iso-8859-1", $string) ĬAVEAT: When you run $octets = encode("UTF-8", $string), then $octets might not be equal to $string. For CHECK, see "Handling Malformed Data".ĬAVEAT: the input scalar STRING might be modified in-place depending on what is set in CHECK. For encoding names and aliases, see "Defining Aliases". ENCODING can be either a canonical name or an alias. # THE PERL ENCODING API # Basic methods #encode $octets = encode(ENCODING, STRING)Įncodes the scalar value STRING from Perl's internal form into ENCODING and returns a sequence of octets. #octetĨ bits of data, with ordinal values 0.255 term for bytes passed to or from a non-Perl context, such as a disk file, standard I/O stream, database, command-line argument, environment variable, socket etc. #byteĪ character in the range 0.255 a special case of a Perl character. 2**32-1 (or more) what Perl's strings are made of. #TERMINOLOGY #characterĪ character in the range 0. perlunitut and perlunifaq explain the why. This is not a problem for Perl: because a byte has 256 possible values, it easily fits in Perl's much larger "logical character".

When Perl is processing "binary data", the programmer wants Perl to process "sequences of bytes". Perl is widely used to manipulate data of many types: not only strings of characters representing human or computer languages, but also "binary" data, being the machine's representation of numbers, pixels in an image, or just about anything. The exceptions are platforms where the legacy encoding is some variant of EBCDIC rather than a superset of ASCII see perlebcdic.ĭuring recent history, data is moved around a computer in 8-bit chunks, often called "bytes" but also known as "octets" in standards documents. On most platforms the ordinal values of a character as returned by ord( S) is the Unicode codepoint for that character. The repertoire of characters that Perl can represent is a superset of those defined by the Unicode Consortium. Perl strings are sequences of characters. The Encode module provides the interface between Perl strings and the rest of the system.

For other topics and more details, see the documentation for these modules: # Encode::Alias - Alias definitions to encodings # Encode::Encoding - Encode Implementation Base Class # Encode::Supported - List of Supported Encodings # Encode::CN - Simplified Chinese Encodings # Encode::JP - Japanese Encodings # Encode::KR - Korean Encodings # Encode::TW - Traditional Chinese Encodings #DESCRIPTION This one itself explains the top-level APIs and general topics at a glance. $octets = encode('UTF-8', $characters, Encode::FB_CROAK) # Table of ContentsĮncode consists of a collection of modules whose details are too extensive to fit in one document. $characters = decode('UTF-8', $octets, Encode::FB_CROAK)
