Search Anything Related to Library and Information Science

Unicode


1. Introduction: The origins of Unicode date to 1987, when Joe Becker from Xerox and Lee Collins and Mark Davis from Apple started investigating the practicalities of creating a universal character set. In August 1988, Joe Becker published a draft proposal for an “international/multilingual text character encoding system, tentatively called Unicode”. He explained that the name “Unicode” is intended to suggest a unique, unified, universal encoding”.

2. Unicode Consortium: The Unicode Consortium is a nonprofit organization that coordinates Unicode’s development. The Unicode Consortium was incorporated on January 3, 1991, in California. There are various levels of membership, and any company or individual willing to pay the membership dues may join this organization. Full members include most of the main computer software and hardware companies with any interest in text-processing standards, including Adobe Systems, Apple, Google, IBM, Microsoft, Oracle Corporation, Yahoo!, etc. The Consortium has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard UTF (Unicode Transformation Format) schemes, as many of the existing schemes are limited in size and scope and are incompatible with multilingual environments.

3. Definition: Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world’s writing systems. The standard is maintained by the Unicode Consortium.
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

4. Encoding: The Unicode standard consists of a set of code charts for visual reference, an encoding method and set of standard character encodings, a set of reference Data computer files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering, and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic and Hebrew, and left-to-right scripts).
Unicode defines two mapping methods: the Unicode Transformation Format (UTF) encodings, and the Universal Character Set (UCS) encodings.  Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8 and UTF-16.

5. Versions: Unicode is developed in conjunction with the International Organization for Standardization and shares the character repertoire with ISO/IEC 10646: the Universal Character Set. Unicode and ISO/IEC 10646 function equivalently as character encodings, but The Unicode Standard contains much more information for implementers, covering in depth topics such as bitwise encoding, collation and rendering. The Unicode Consortium first published the first volume of The Unicode Standard (ISBN 0-321-18578-1) in October 1991 and continues to develop standards based on that original work. The latest version of the standard, Unicode 7.0, was released in June 2014 and is available from the consortium’s web site. The latest version of Unicode Standard contains a repertoire of more than 110,000 characters covering 100 scripts and multiple symbol sets. The Unicode standard version 8.0 is released in June 2015, and new versions 9.0.0 is released in June 2016.

6. Scripts covered: In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. Unicode covers almost all scripts (writing systems) in current use today. Unicode 7.0 includes over 80 modern scripts plus over 40 ancient (out of use a thousand years or more) and historic (out of use several hundred years) scripts covering alphabets, abugidas and syllabaries, although there are still scripts that are not yet encoded, particularly those mainly used in historical, liturgical, and academic contexts.

7. Assamese and Bengali Script: The meaning of “Bengali script” in the Unicode Standard includes all of the letters used both for the Bangla language of Bangladesh and of West Bengal state in India, but also for the Assamese language (and other languages) of Assam state. There are some letters used in Bangla that are not used in Assamese, and some letters in Assamese that are not used in Bangla, but in the Unicode Standard, the Bengali script refers to the whole set of letters needed for both.
This situation for the Bengali script can be compared to that for the Arabic script, for example. The Arabic script is used to write the Arabic language, of course, but it is also used to write the Persian language in Iran, as well as the Urdu language in India, and many others. The set of Arabic letters needed to write the Arabic language is different from the set of Arabic letters needed to write Persian. In the Unicode Standard, the "Arabic script" refers to the superset of Arabic letters needed for writing all of those different languages.

8. Conclusion: Unicode has become the dominant scheme for internal processing and storage of text. Unicode is used almost exclusively for building new information processing systems. Unicode’s success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, the Java programming language, and the Microsoft .NET Framework.



How to Cite this Article?
APA Citation, 7th Ed.:  Barman, B. (2020). A comprehensive book on Library and Information Science. New Publications.
Chicago 16th Ed.:  Barman, Badan. A Comprehensive Book on Library and Information Science. Guwahati: New Publications, 2020.
MLA Citation 8th Ed:  Barman, Badan. A Comprehensive Book on Library and Information Science. New Publications, 2020.

Badan BarmanBadan Barman at present working as an Assistant Professor in the Department of Library and Information Science, Gauhati University, Guwahati-781014, Assam, India. He is the creator of the LIS Links (http://www.lislinks.com) - India’s most popular social networking website for Library and Information Science professionals. He also created the UGC NET Guide (http://www.netugc.com) and LIS Study (http://www.lisstudy.com) website.

No comments:

Post a Comment

Website Pageviews