1. Introduction: The origins of Unicode date to
1987, when Joe Becker from Xerox and Lee Collins and Mark Davis from Apple
started investigating the practicalities of creating a universal character set.
In August 1988, Joe Becker published a draft proposal for an “international/multilingual
text character encoding system, tentatively called Unicode”. He explained that
the name “Unicode” is intended to suggest a unique, unified, universal
encoding”.
2. Unicode Consortium: The Unicode Consortium is a
nonprofit organization that coordinates Unicode’s development. The Unicode
Consortium was incorporated on January 3, 1991, in California. There are various levels of
membership, and any company or individual willing to pay the membership dues
may join this organization. Full members include most of the
main computer software and hardware companies with any interest in
text-processing standards, including Adobe Systems, Apple, Google, IBM,
Microsoft, Oracle Corporation, Yahoo!, etc. The Consortium has the ambitious
goal of eventually replacing existing character encoding schemes with Unicode
and its standard UTF (Unicode Transformation Format) schemes, as many of the
existing schemes are limited in size and scope and are incompatible with
multilingual environments.
3. Definition: Unicode is a computing industry
standard for the consistent encoding, representation and handling of text
expressed in most of the world’s writing systems. The standard is maintained by
the Unicode Consortium.
Unicode
provides a unique number for every character, no matter what the platform, no
matter what the program, no matter what the language.
4. Encoding: The Unicode standard consists of a set of code
charts for visual reference, an encoding method and set of standard character
encodings, a set of reference Data computer files, and a number of related
items, such as character properties, rules for normalization, decomposition,
collation, rendering, and bidirectional display order (for the correct display
of text containing both right-to-left scripts, such as Arabic and Hebrew, and
left-to-right scripts).
Unicode defines
two mapping methods: the Unicode Transformation Format (UTF) encodings, and the
Universal Character Set (UCS) encodings.
Unicode can be implemented by different character encodings. The most
commonly used encodings are UTF-8 and UTF-16.
5. Versions: Unicode is
developed in conjunction with the International Organization for
Standardization and shares the character repertoire with ISO/IEC 10646: the
Universal Character Set. Unicode and ISO/IEC 10646 function equivalently as
character encodings, but The Unicode Standard contains much more information
for implementers, covering in depth topics such as bitwise encoding, collation
and rendering. The Unicode
Consortium first published the first volume of The Unicode Standard (ISBN
0-321-18578-1) in October 1991 and continues to develop standards based on that
original work. The latest version of the standard, Unicode 7.0, was released in
June 2014 and is available from the consortium’s web site. The latest version
of Unicode Standard contains a repertoire of more than 110,000 characters
covering 100 scripts and multiple symbol sets. The Unicode standard version 8.0
is released in June 2015, and new versions 9.0.0 is released in June 2016.
6. Scripts covered: In Unicode, a script is a
collection of letters and other written signs used to represent textual
information in one or more writing systems. Unicode covers almost all scripts
(writing systems) in current use today. Unicode 7.0 includes over 80 modern scripts plus over 40
ancient (out of use a thousand years or more) and historic (out of use several
hundred years) scripts covering alphabets, abugidas and
syllabaries, although there are still scripts that are not yet encoded,
particularly those mainly used in historical, liturgical, and academic
contexts.
7. Assamese and Bengali Script: The meaning of
“Bengali script” in the Unicode Standard includes all of the letters used both
for the Bangla language of Bangladesh and of West Bengal state in India, but
also for the Assamese language (and other languages) of Assam state. There are
some letters used in Bangla that are not used in Assamese, and some letters in
Assamese that are not used in Bangla, but in the Unicode Standard, the Bengali
script refers to the whole set of letters needed for both.
This situation
for the Bengali script can be compared to that for the Arabic script, for
example. The Arabic script is used to write the Arabic language, of course, but
it is also used to write the Persian language in Iran, as well as the Urdu
language in India, and many others. The set of Arabic letters needed to write
the Arabic language is different from the set of Arabic letters needed to write
Persian. In the Unicode Standard, the "Arabic script" refers to the
superset of Arabic letters needed for writing all of those different languages.
8. Conclusion: Unicode has become the dominant
scheme for internal processing and storage of text. Unicode is used almost
exclusively for building new information processing systems. Unicode’s success
at unifying character sets has led to its widespread and predominant use in the
internationalization and localization of computer software. The standard has
been implemented in many recent technologies, including modern operating
systems, XML, the Java programming language, and the Microsoft .NET Framework.
How to Cite this
Article?
APA Citation, 7th Ed.: Barman, B. (2020). A comprehensive book on Library and Information Science. New
Publications.
Chicago 16th Ed.: Barman, Badan. A Comprehensive Book on Library and Information Science. Guwahati:
New Publications, 2020.
MLA Citation 8th Ed: Barman, Badan. A Comprehensive Book on Library and Information Science. New
Publications, 2020.

No comments:
Post a Comment