新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> Web Architecture,探讨下一代万维网的架构/体系结构。
    [返回] 中文XML论坛 - 专业的XML技术讨论区W3CHINA.ORG讨论区 - Web新技术讨论『 Web架构 』 → Unicode 5.0 发表 - 支持最新版的 GB 18030 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 18399 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: Unicode 5.0 发表 - 支持最新版的 GB 18030 举报  打印  推荐  IE收藏夹 
       本主题类别:     
     admin 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      头衔:W3China站长
      等级:计算机硕士学位(管理员)
      文章:5255
      积分:18406
      门派:W3CHINA.ORG
      注册:2003/10/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 Web架构 』 的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客楼主
    发贴心情 Unicode 5.0 发表 - 支持最新版的 GB 18030

    详见:http://www.unicode.org/versions/Unicode5.0.0/

    Unicode 5.0.0

    Unicode 5.0.0 is a [URL=http://www.unicode.org/versions/]major version[/URL] of the Unicode Standard and supersedes all previous versions. The publication of the book, The Unicode Standard, Version 5.0, is pending and is expected in the fourth quarter of 2006.

    However, all of the [URL=http://www.unicode.org/Public/5.0.0/ucd/]online data files[/URL] for version 5.0 of the [URL=http://www.unicode.org/ucd/]Unicode Character Database[/URL] are stable and final. In order to provide an opportunity for developers to develop Unicode 5.0 as soon as possible, these data files have been released ahead of the publication of the text of the standard.

    The text of the Unicode Standard Annexes for Version 5.0 is currently in copy edit; online versions of these will also be available in the fourth quarter of 2006. The Unicode Standard Annexes will also be published in the book.

    Version 5.0.0 of the Unicode Standard consists of the publication The Unicode Standard, Version 5.0 plus the Unicode Character Database, Version 5.0.0. The book gives the general principles, requirements for conformance, and guidelines for implementers, followed by character code charts and names and the text of all of the Unicode Standard Annexes.

    To order The Unicode Standard, Version 5.0, see the [URL=http://www.unicode.org/book/bookform.html]online order form[/URL].

    A complete specification of the contributory files for Unicode 5.0.0 is found on [URL=http://www.unicode.org/versions/components-5.0.0.html]the Components page[/URL]. Version 5.0.0 of the Unicode Standard should be referenced as:

    The Unicode Consortium. The Unicode Standard, Version 5.0.0, defined by: The Unicode Standard, Version 5.0 (Boston, MA, Addison-Wesley, 2007. ISBN 0-321-48091-0)

    Online Edition
    The text of The Unicode Standard, Version 5.0 will be available online via the navigation links on this page, starting in the first quarter of 2007. Those pdf files may be viewed but may not be printed. The Unicode 5.0 Web Bookmarks page will have links to all sections of the online text.

    Final character code charts for Version 5.0 will be available online soon.

    What's New in Version 5.0
    For the first time, the book provides the complete text of the standard, including all the Unicode Standard Annexes. The book will also be printed in a smaller, lighter, easier-to-use format.

    For stability of protocols on the Internet and elsewhere, Unicode 5.0 also makes changes to guarantee case-folding stability. Unicode 5.0 incorporates all the changes introduced in Unicode 4.1, including full interoperability with the most recent versions of GB 18030, JIS X 0213, and HKSCS, and support for stable identifiers and pattern syntax characters.

    Unicode 5.0 revises and improves property values and behavioral specifications in areas such as character, word, line, and sentence segmentation, and tightens conformance requirements on Bidi implementations (used for Arabic and Hebrew). The text is significantly revised for clarity and completeness, especially for Unicode conformance.

    Unicode 5.0 covers the full repertoire of ISO/IEC 10646:2003, including Amendments 1 and 2, which add characters required for some languages of India, for mathematicians, for minority languages, and for academic use.

    The Unicode Standard is closely connected with other Unicode software globalization standards in such key areas as collation (used for sorting, searching, and matching), character set conversion, regular expressions, and the interchange and registration of locale data for the world's languages and local cultural conventions [[URL=http://www.unicode.org/cldr/]CLDR[/URL]]. It has been further significantly augmented by several new Unicode Technical Standards that provide recommendations and data to assist in secure implementation of Unicode, and to establish the registration mechanism for Ideographic Variation Sequences needed by the publishing industry for Chinese and Japanese.

    Other major additions to Version 5.0 since Version 4.0 are discussed in the sections below.

    New Characters
    1,369 new character assignments were made to the Unicode Standard, Version 5.0 (over and above what was in Unicode 4.1.0). These additions include new characters for Cyrillic, Greek, Hebrew, Kannada, Latin, math, phonetic extensions, symbols, and five new scripts: Balinese, N’Ko, Phags-pa, Phoenician, and Sumero-Akkadian Cuneiform.

    The new character additions were to both the BMP and the SMP (Plane 1). The following table shows the allocation of code points in Unicode 5.0.0. For more information on the specific characters, see the file [URL=http://www.unicode.org/Public/UNIDATA/DerivedAge.txt]DerivedAge.txt[/URL] in the [URL=http://www.unicode.org/ucd/]Unicode Character Database[/URL].

    Graphic 98,884
    Format 140
    Control 65
    Private Use 137,468
    Surrogate 2,048
    Noncharacter 66
    Reserved 875,441

    The character repertoire corresponds to ISO/IEC 10646:2003 plus Amendment 1, Amendment 2, and four Sindhi characters from Amendment 3. For more details of character counts, see Appendix D, Changes from Unicode Version 4.0.

    Unicode Character Database
    The Unicode Character Database (UCD) was extended to cover the character repertoire additions, and new block definitions and script values were added. A number of other updates were made, as listed here:

    Scripts. Unassigned code points were given a new Script property value of "Zzzz": this may require some change in code using this property. Three Mongolian punctuation marks and two archaic letters changed script value.
    Case-Related Properties. To allow for the new policy on case-folding stability, lowercase variants of several characters were added, and the mappings for the uppercase variants changed.

    Bidirectional Behavior. The list of characters with the Bidi_Mirrored property was made consistent for brackets and quotation marks, in preparation for new constraints on bidi mirroring. The Bidi_Class property for five archaic characters was changed to L.
    Line Break. The Line_Break property of seven punctuation characters and two bracket characters was changed to Alphabetic (AL) to better match their expected behavior. Numerous characters for Southeast Asian scripts, which require complex contextual linebreaking, were changed to Complex_Context (SA).
    New Properties. Normative_Name_Alias and the metaproperty, Deprecated, were added. The Jamo_Short_Name property was documented as a contributory property.
    General Category. Seven archaic characters plus U+0294 LATIN LETTER GLOTTAL STOP changed categories.

    Numeric Properties. The archaic character U+10341 GOTHIC LETTER NINETY was given the numeric value 90.
    Unihan. The kIICore field was made a normative property, and three new provisional properties were added: kCheungBauer, kCheungBauerIndex, and kFourCornerCoverage. There were numerous additions to the kCangjie property.
    Text Breaking. Grapheme_Link was deprecated as a property.
    For more information, see the file [URL=http://www.unicode.org/Public/5.0.0/ucd/UCD.html]UCD.html[/URL] in the [URL=http://www.unicode.org/Public/5.0.0/ucd/]Unicode Character Database[/URL].

    Conformance
    Details regarding the conformance changes to the standard for Version 5.0 are specified in the text of the standard itself, including the Unicode Standard Annexes. As noted above, the book and the Unicode Standard Annexes will be available in the fourth quarter of 2006.

    Chapter 3, Conformance, was substantially improved by incorporating much of the Unicode Property Model, enhancing the treatment of combining characters, and further clarifying canonical ordering behavior through the addition of clearly defined principles. Additionally, conformance clauses and definitions were renumbered for overall readability and clarity of the text. Significant clarifications or modifications to character behavior include those listed below:

    Stability of Cased Letters. If uppercase characters are added in cased scripts, the corresponding lowercase characters will be added as well, so that case folding is stable.
    Stability of Named Character Sequences. An initial provisional phase was incorporated into the process for defining Named Character Sequences, so that approved Named Character Sequences will be immutable.
    Disunification of Diacritics. Criteria for disunifying diacritics were established.
    Indic Scripts. Zero width joiner and zero width non-joiner can now be used to encourage or discourage ligation in Bengali; the sequence for Gurmukhi double vowels was determined, and the shaping of ra in Tamil was updated.
    Combining Marks. The use of combining grapheme joiner with Latin script diacritics was clarified.
    Unicode Standard Annexes
    In UAX #9, "Bidirectional Algorithm," for better interoperability, the algorithm was modified to tighten up the conformance requirements for using mirrored glyphs for characters. Higher level protocols are discouraged, due to interoperability and security considerations. The definition of directional run was changed to be the same as level run, and the use of soft-hyphen with bidi text was clarified.
    In UAX #14, "Line Breaking Properties," a number of rules were modified, the use of soft hyphen in cursive scripts was documented, the conformance clauses were restated and the algorithm was reorganized into tailorable and non-tailorable sections, and the normative status was made consistent with Chapter 3, Conformance. As a result of the restatement of conformance, the Line_Break property became normative.
    In UAX #15, "Unicode Normalization Forms," the new Stream-Safe Text Format was added, allowing the use of normalization in protocols designed for streaming. The stability guarantees are described in more detail, with guidelines provided for guaranteeing process stability, and a new appendix listing precisely those characters sequences that require special handling. Additional figures clarify the effects of normalization, and the types of characters affected.
    In UAX #29, "Text Boundaries," the format of the rules was changed to make them much easier to implement --  without changing the results. The guidelines for how to use regex-style rules was revamped completely. A number of edge cases are also now handled properly, and information was added on the relation to identifiers, use of normalization, tailoring, application to spelling checkers, and how to use the supplied test data. Tailorings for text boundaries can now also be entered into the Unicode Common Locale Data Repository [[URL=http://www.unicode.org/cldr/]CLDR[/URL]].
    UAX #31, "Identifier and Pattern Syntax," introduced profiles, and added notes on profiles of identifiers for natural languages and the use of spaces in identifiers.


       收藏   分享  
    顶(0)
      




    ----------------------------------------------

    -----------------------------------------------

    第十二章第一节《用ROR创建面向资源的服务》
    第十二章第二节《用Restlet创建面向资源的服务》
    第三章《REST式服务有什么不同》
    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》
    [InfoQ文章]解答有关REST的十点疑惑

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2006/9/5 14:06:00
     
     byltd 帅哥哟,离线,有人找我吗?
      
      
      等级:大一(高数修炼中)
      文章:16
      积分:112
      门派:XML.ORG.CN
      注册:2007/1/14

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给byltd发送一个短消息 把byltd加入好友 查看byltd的个人资料 搜索byltd在『 Web架构 』 的所有贴子 点击这里发送电邮给byltd 访问byltd的主页 引用回复这个贴子 回复这个贴子 查看byltd的博客2
    发贴心情 
    有中文版的吗?

    ----------------------------------------------
    PurifierOn B&Y

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/1/14 9:49:00
     
     GoogleAdSense
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 Web架构 』 的所有贴子 点击这里发送电邮给Google AdSense 访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2024/4/18 18:26:39

    本主题贴数2,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    1,187.500ms