Unicode 14 что нового
Вышла 14 версия Unicode
Консорциум Unicode утвердил 14 ревизию списка эмодзи, сообщает сайт организации. В новой версии набора добавится 37 символов с 75 дополнительными цветовыми решениями эмодзи. Всего Unicode 14.0 представил 838 изменений разного характера.
Среди новых эмодзи Unicode презентовал тролля, прикушенную губу, диско-шар, жест сердца из сложенных пальцев, пользующийся популярностью среди фандома K-Pop, воинское приветствие и расплавленное лицо.
Эмодзи беременного мужчины, появление которого в Unicode анонсировали прошедшим летом во Всемирный день смайликов и который поднял много споров в соцсетях, также увидел свет в новом релизе.
В первую очередь символы придут на платформы Google, ориентировочно октябре-декабре этого года, а начиная с января 2022 года постепенно появятся в арсенале Apple, Twitter, Samsung Galaxy и Facebook.
Unicode обычно публикует только готовый список утвержденных вариантов эмодзи, а какие именно из представленных вариантов будут использовать ― остается на усмотрение OEM-производителей и разработчиков приложений. Однако с уверенностью можно сказать, что новые эмодзи не появятся в iOS 15.0.
Unicode ― это стандарт универсальной кодировки символов, который используется для поддержки символов, не входящих в набор ASCII, он определяет связь между символом и некоторым числом, а формат, согласно которым эти числа будут превращаться в байты, определяются кодировками (например, UTF-8 или UTF-16).
Сейчас в стандарте немногим более 100 тысяч символов, тогда как UTF-16 позволяет поддерживать более одного миллиона (UTF-8 — больше).
Unicode® 14.0.0
2021 September 14 (Announcement)
This page summarizes the important changes for the Unicode Standard, Version 14.0.0. This version supersedes all previous versions of the Unicode Standard.
A. Summary
Unicode 14.0 adds 838 characters, for a total of 144,697 characters. These additions include 5 new scripts, for a total of 159 scripts, as well as 37 new emoji characters.
The new scripts and characters in Version 14.0 add support for lesser-used languages and unique written requirements worldwide, including numerous symbols additions. Funds from the Adopt-a-Character program provided support for some of these additions. The new scripts and characters include:
Popular symbol additions:
Other symbol and notational additions include:
Support for CJK unified ideographs was enhanced in Version 14.0 by significant corrections and improvements to the Unihan database. Changes to the Unihan database include updated source lists, regular expressions, and new and updated fields. See UAX #38, Unicode Han Database (Unihan) for more information on the updates.
Additional support for lesser-used languages and scholarly work was extended, including:
Important chart font updates, including:
Synchronization
Several other important Unicode specifications have been updated for Version 14.0. The following four Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 14.0:
Some of the changes in Version 14.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51.
See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.
B. Technical Overview
Version 14.0 of the Unicode Standard consists of:
The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.
Core Specification
The core specification is available as a single pdf for viewing. (14 MB) Links are also available in the navigation bar on the left of this page to access individual chapters and appendices of the core specification.
Code Charts
Several sets of code charts are available. They serve different purposes:
For Unicode 14.0.0 in particular two additional sets of code chart pages are provided:
The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.
Unicode Standard Annexes
Links to the individual Unicode Standard Annexes are available in the navigation bar on the left of this page. The list of significant changes in the content of the Unicode Standard Annexes for Version 14.0 can be found in Section G below.
Unicode Character Database
Data files for Version 14.0 of the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap to the functions of the various subdirectories. Zipped versions of the UCD for bulk download are available, as well.
Version References
Version 14.0.0 of the Unicode Standard should be referenced as:
The Unicode Consortium. The Unicode Standard, Version 14.0.0, (Mountain View, CA: The Unicode Consortium, 2021. ISBN 978-1-936213-29-0)
http://www.unicode.org/versions/Unicode14.0.0/
The terms “Version 14.0” or “Unicode 14.0” are abbreviations for the full version reference, Version 14.0.0.
The citation and permalink for the latest published version of the Unicode Standard is:
A complete specification of the contributory files for Unicode 14.0 is found on the page Components for 14.0.0. That page also provides the recommended reference format for Unicode Standard Annexes. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.
Errata
Errata incorporated into Unicode 14.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 14.0, see the list of current Updates and Errata.
C. Stability Policy Update
There were no significant changes to the Stability Policy of the core specification between Unicode 13.0 and Unicode 14.0.
D. Textual Changes and Character Additions
Five new scripts were added with accompanying new block descriptions:
Script | Number of Characters |
---|---|
Vithkuqi | 70 |
Old Uyghur | 26 |
Cypro-Minoan | 99 |
Tangsa | 89 |
Toto | 31 |
Changes in the Unicode Standard Annexes are listed in Section G.
Character Assignment Overview
838 characters have been added. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see delta code charts.
E. Conformance Changes
There are no significant new conformance requirements in Unicode 14.0.
F. Changes in the Unicode Character Database
The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 14.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in Section M.
G. Changes in the Unicode Standard Annexes
In Version 14.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.
Note that for Unicode 14.0, all pertinent links to URLs on the Unicode website in these Unicode Standard Annexes were updated to use the https protocol.
Unicode Standard Annex | Changes |
---|---|
UAX #9 Unicode Bidirectional Algorithm | Section 6.2, Vertical Text was clarified to indicate how the Bidirectional Algorithm is (or is not) used when text is laid out in vertical orientation. |
UAX #11 East Asian Width | No significant changes in this version. |
UAX #14 Unicode Line Breaking Algorithm | One redundant rule part was removed from LB27 in Section 6.1, Non-tailorable Line Breaking Rules. Also, LB30b was updated to include potential emoji. |
UAX #15 Unicode Normalization Forms | No significant changes in this version. |
UAX #24 Unicode Script Property | No significant changes in this version. |
UAX #29 Unicode Text Segmentation | A Swedish «AIK:are» example was added to the word boundary discussion. The description of the charts in the auxiliary data files was updated, to make it more accurate. Other small editorial fixes were applied to the text. |
UAX #31 Unicode Identifier and Pattern Syntax | Scripts new to Unicode 14.0 were added to the appropriate tables. A new Section 1.5, Notation, was added, referring to the LDML for the UnicodeSet notation used in this annex. |
UAX #34 Unicode Named Character Sequences | No significant changes in this version. |
UAX #38 Unicode Han Database (Unihan) | The kCantonese field was redefined, and its description was updated accordingly. The new kStrange field was added. Regular expressions, source lists, and descriptions were updated for many other fields. |
UAX #41 Common References for Unicode Standard Annexes | All references were updated for Unicode 14.0. |
UAX #42 Unicode Character Database in XML | New code point attributes, values, and patterns were added for Unicode 14.0. |
UAX #44 Unicode Character Database | The documentation was updated to describe the changes to the UCD for Version 14.0. The distinction between properties of strings and string-valued properties was clarified. A note was added clarifying that Vertical_Orientation defaults to U in some blocks associated with notational systems. An erroneous statement about which General_Category values can be associated with ccc≠0 was corrected. |
UAX #45 U-Source Ideographs | Descriptions were added for new data fields (total strokes and first residual stroke) in the data file associated with UAX #45. The KangXi dictionary index field was obsoleted. New information was added about the submission process. |
UAX #50 Unicode Vertical Text Layout | No significant changes in this version. |
H. Changes in Synchronized Unicode Technical Standards
There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.
Unicode Technical Standard | Changes |
---|---|
UTS #10 Unicode Collation Algorithm | No significant changes in this version. |
UTS #39 Unicode Security Mechanisms | Section 3, Identifier Characters was adjusted to better introduce the topic of identifiers. The text in Section 3.1, General Security Profile for Identifiers was clarified regarding the rationales for restricting a character. The descriptions of identifier types in Table 1 were clarified. |
UTS #46 Unicode IDNA Compatibility Processing | No significant changes in this version. |
UTS #51 Unicode Emoji | The introduction was reworded. The definition of Basic_Emoji was clarified, and it was noted that emoji sets are binary properties of strings. In Section 2.6.2, Multi-Person Skin Tones, the handshake was added to the list of emoji with RGI skin tones. |
M. Implications for Migration
There are a significant number of changes in Unicode 14.0 which may impact implementations upgrading to Version 14.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.
Script-related Changes
Five new scripts have been added in Unicode 14.0.0. Some of these scripts have particular attributes which may cause issues for implementations. The more important of these attributes are summarized here.
Casing Issues
Numeric Property Issues
CJK/Unihan Changes
WARNING: There are changes to the ends of three existing CJK unified ideograph ranges in Unicode 14.0.0. Because implementations often hard-code ideographic ranges to short-cut lookups and reduce table sizes, it is especially important that implementers pay close attention to the implications of range changes for Version 14.0.0. These extensions bump up the end ranges of the encoded ideographs by a few code points within each block:
See Section 4.4, Listing of Characters Covered by the Unihan Database in UAX #38 for the version history of all these small CJK unified ideograph additions inside existing blocks.
Плавящееся лицо, беременная персона и слезы счастья: какие эмодзи могут появиться в Unicode 14
На некоторых устройствах и в приложениях они будут доступны уже в конце 2021 года
Платформа Emojipedia опубликовала предварительный список эмодзи, которые могут войти в стандарт Unicode 14.0 и Emoji 14.0, который выйдет 14 сентября 2021 года. Хотя эмодзи нужно сначала пройти отбор, в Emojipedia отмечают, что в прошлые годы большинство кандидатов попадали в финальный список.
Новый набор эмодзи разрабатывался дольше обычного, поскольку введение Unicode 14.0 было отложено из-за пандемии.
В предварительный список вошли шесть эмодзи с лицами. Среди них актуальное этим летом плавящееся лицо; салютующий эмодзи, который сможет заменить в чатах «F»; лицо с открытыми глазами, прикрывающее рот. Уже существует подобный эмодзи с закрытыми глазами, который выглядит улыбающимся — новый значок подойдет для более серьезных ситуаций.
Рука со скрещенными указательным и большим пальцами может служить либо жестом «сердце», популярным среди поклонников k-pop, либо любым другим жестом, использующим указательный и большой пальцы, например «деньги», отмечают в Emojipedia.
Коралл обычно используется как знак для обсуждения изменения климата, учитывая негативное влияние, которое оказывает повышение температуры по всему миру на коралловые рифы.
Человек с короной — это гендерно-инклюзивная альтернатива уже существующим эмодзи принца и принцессы. Также в новом наборе может появиться беременная персона. Авторы признают, что беременность возможна у некоторых трансгендерных мужчин и небинарных людей. Это означает, что теперь почти все эмодзи могут иметь гендерно-нейтральный вариант. Остается несколько исключений — например, танцующие люди. Этот вопрос уже рассматривает специальный комитет Unicode.
Даты появления новых эмодзи варьируются в зависимости от операционной системы, приложения или устройства. В Emojipedia отмечают, что некоторые компании добавят их в конце 2021 года, однако большинство обновлений придется на первую половину 2022-го.
Последнее обновление Emoji 13.1 было утверждено в сентябре 2020 года и появилось на смартфонах Pixel в декабре 2020 года, а на iOS — в апреле 2021-го. При этом на многих телефонах с операционной системой Android, включая устройства Samsung, до сих пор нет смайлов из этого выпуска.
Unicode 14 что нового
Консорциум Unicode недавно представил новый стандарт Unicode версии 14. Здесь предлагается более 838 символов, в том числе 37 смайлов. Расширение библиотеки символов даёт возможность лучше выражать свои эмоции на разных операционных системах, в разных культурах с разными языками.
реклама
В последние годы использование смайлов в социальных сетях находится на подъёме. В июле нынешнего года в 20% твитов были смайлы на стандарте Unicode.
Версия 14.0 призвана улучшить процесс общения между представителями разных культур на разных цифровых платформах. Появилась поддержка пользователей в таких странах, как Индия, Индонезия, Малайзия, Пакистан, Филиппины, Африка и Северная Америка.
В новой версии появилось пять новых скриптов, символы валют, ноты и не только. Теперь Unicode официально поддерживает 144697 символов.
Unicode Consortium представляет собой некоммерческую организацию. Он поддерживает используемый по всему миру стандарт Unicode. С ним разработчики и пользователи могут обрабатывать и хранить данные на любом языке, проще обмениваться информацией и идеями. Здесь есть старые и современные символы, широко известные смайлы позволяют быстро понять сообщение. Новые смайлы появляются на основе предложений публики. Рекомендации оценивают по совместимости, частоте использования, различимости и полноте.
Смайлы появились в Японии в конце 1990-х годов. Более широкое распространение они получили на мобильных устройствах и сейчас общение без них представить трудно вне зависимости от языка.
What’s New in Unicode 14.0
Keith Broni
Today the latest emoji list will be released by the Unicode Consortium, with additions including Biting Lip, Troll, Saluting Face, as well as two heart-related gestures: Heart Hands and Hand with Index Finger and Thumb Crossed (aka finger heart, popular in K-Pop circles).
The release date for version 14.0 of the Unicode Standard was aptly set for the 14th day of September, and formalizes what has until now been only a draft release.
Browse Unicode 14.0 on Emojipedia or see the Unicode 14.0.0 release notes provided by the Unicode Consortium.
Among the 838 new characters in #Unicode14 are 37 new #emoji, along with new emoji sequences, that are expected to show up on 📱s, 💻s, and other platforms sometime next year → https://t.co/deSr1g6m8k #絵文字 pic.twitter.com/xuTf8Os02K
🧮 How Many?
Unicode 14.0 includes 838 new characters, of which 37 are brand new emoji code points.
Additionally, 75 skin tone variations do not require new code points, and are included as part of Emoji 14.0 making for a total of 112 new emojis on their way to devices in the coming months.
Above: Emojipedia Sample Images for Emoji 14.0. Image: Emojipedia.
No changes have been made to the draft emoji list since we last took a look on July 17, aka World Emoji Day.
As of Emoji 14.0, there are now a total of 3,633 emojis.
The distinction between Unicode 14.0 and Emoji 14.0 is that the latter includes sequences where two or more code points can be combined to display a single emoji. By way of example, Emoji 13.1 only included emoji sequences, and came out at a separate time to any full Unicode release.
Above: Seven new smileys in Unicode 14.0. Image: Emojipedia Sample Image Collection.
🆕 Emoji Updates
Amongst the new emoji characters in this release are Coral (partly to represent the effects of climate change), Mirror Ball
Gender options for pregnancy have been added using new code points in this release. A change to the usual format where emoji sequences are used for gender variations. Pregnant Person and Pregnant Man are both new code points, part of Unicode’s ongoing effort to make gender options consistent for all emojis.
All new code points can be seen in the relevant Unicode documentation.
While formal documentation for the Unicode Standard only provides glyphs in black and white, color emoji implementations can and do vary from these designs.
New entries are shown in yellow and other additions include Nest with Eggs as well as Empty Nest, Jar, Identification Card, and Low Battery.
Proposals for new emojis can come from a variety of sources, including members of the public. Example color images are commonly shown on pages of new emoji information by Unicode and come from various sources. These are intended to convey the preferred design choices for vendors when implementing emojis.
In recent years new updates have been more closely aligned to these color images than in the past. While the designs shown on these emoji information pages aren’t formally part of the Unicode Standard, they do provide useful direction for implementors.
All new people in the emoji set support the five standard skin tones commonly found for existing emojis, as do the six new hand gestures.
Above: Pregnant Person is new in Unicode 14.0, with skin tones supported in Emoji 14.0.
Above: Emojipedia’s sample image for Heart Hands alongside its skin tone modifier variants.
The new Pregnant Man and Pregnant Person emojis has received a notable amount of attention. Addressing these emojis in on the blog Emojipedia blog, our Senior Emoji Lexicographer Jane Solomon has stated:
The new pregnancy options may be used for representation by trans men, non-binary people, or women with short hair. People of any gender can be pregnant too. Now there are emojis to represent this.
With the release of Emoji 14.0, major platforms will for the first time support 🤝 Handshake with a combination of skin tones in the coming year. This is due to two new individual hand characters in Unicode 14.0 being given 25 handshake sequences in Emoji 14.0. Why 25? Five skin tones choices per hand means 5 x 5 = 25.
Most vendors except Apple extended this support to allow a shared skin tone (eg a black handshake, brown handshake, or white handshake) ahead of formal recommendation from Unicode.
When this emoji update comes to each platform, all of the shared and mixed skin tone handshakes will become available for cross-platform use.
🤭 Disambiguation
One new emoji approved in Unicode 14.0 is in place to fix a cross-platform display issue with 🤭 Face with Hand Over Mouth. The current version of this emoji displays with smiling laughter on most platforms, but serious on iOS and Facebook.
Above: 🤭 Face with Hand Over Mouth will appear smiling on all platforms in future. Image: Vendor designs / Emojipedia composite.
A new character Face with Open Eyes and Hand Over Mouth will take on the serious ‘open eyes’ appearance on all platforms in future. This will likely result in Apple and Facebook updating their existing 🤭 emoji to show laughter.
Win for cross-platform compatibility, though people who have used this emoji on Apple platforms to mean shock or surprise in the past should be aware it will have a different emotion in future.
🔡 Non-Emoji Updates
The majority of characters in the Unicode Standard are not emojis. Emoji updates are given priority here at Emojipedia, but its worth taking a moment to look at some of the other new characters approved in this release.
The Unicode Consortium is the non-profit standards body responsible for the Unicode Standard. Voting members include Apple, Google, Microsoft, and Emojipedia.
Regarding version this update, Unicode notes:
Unicode 14.0 adds 838 characters, for a total of 144,697 characters. These additions include 5 new scripts, for a total of 159 scripts, as well as 37 new emoji characters.
To put it in perspective, the total number of RGI emoji characters and sequences totals 3,633 in Unicode 14, compared to the 144,697 characters in the entire Unicode Standard.
Symbols added in this release (which aren’t implemented as emojis) include the som currency sign used in the 🇰🇬 Kyrgyz Republic and 185 Znamenny musical notation symbols used to write Znamenny Chant.
Above: New Znamenny musical notation symbols in Unicode 14.
Among the updates now approved, the Unicode Consortium highlighted the following script and characters additions in this update:
Toto, used to write the Toto language in northeast India
Cypro-Minoan, an undeciphered historical script primarily used on the island of Cyprus
Vithkuqi, an historic script used to write Albanian, and undergoing a modern revival
Old Uyghur, an historic script used in Central Asia and elsewhere to write Turkic, Chinese, Mongolian, Tibetan, and Arabic languages
Arabic script additions used to write languages across Africa and in Iran, Pakistan, Malaysia, Indonesia, Java, and Bosnia, and to write honorifics, and additions for Quranic use
See all release notes from Unicode for version 14.0.0 of the Unicode Standard.
🙅 Not Included
No emojis were removed between the final draft of Emoji 14.0 nor during the Unicode 14.0 alpha beta period.
A popular request every time there is a new emoji list approved is a pink heart. That’s not included in version 14.0, but it could be coming in future.
In the past year, the Unicode Emoji Subcommittee has published a document looking at improving coverage of the heart emoji.
While the report stresses that this is not a formal proposal, it goes on to note the intention to:
“draft proposals for a small set of colored hearts: PINK HEART, GRAY HEART, and LIGHT BLUE HEART”
Will a pink heart make the cut for Emoji 15.0? We’ll know more in 2022.
🗓️ Emoji Release Schedule
The release of Unicode 14.0 does not mean users can immediately access or use any new emoji from this list.
What today’s release from the Unicode Consortium does indicate is when major vendors such as Apple, Google, or Samsung can implement these new emojis in their software.
Expect to see some companies come out with early emoji support in late 2021, and the majority of updates to take place in the first half of 2022.
Notably, Google has announced its plans to decouple emoji updates from operating system updates in 2021, meaning faster emoji updates for more Android users in future. This makes Emoji 14.0 support on Android devices quite likely to take place in the latter months of 2021.
Apple’s last major emoji update was in iOS 14.5, released in April 2021 after a longer than usual beta period. This added support for Emoji 13.1, approved in September 2020.
If Apple sticks to this release schedule, expect to see Emoji 14.0 support come to iOS 15.5 in March or April 2022. No new emojis are expected in the forthcoming release iOS 15.0 for iPhone 13.
Now that the code points for Unicode 14.0 are stable, these remain in place forever.
Sending a 🪩 won’t show as a mirror ball (aka disco ball) on any platforms today, but once your app or operating system supports the latest new emoji additions, that missing character above will be replaced by a colorful emoji.
📚 Resources
New Couple Emojis Come To WhatsApp
WhatsApp has released an update for their Android messaging app, bringing over 200 emoji combinations from late 2019 and late 2020 to the platform’s emoji set for the first time. Above: the 242 new emojis included in WhatsApp 2.21.23.23. Image: WhatsApp designs / Emojipedia composite. This update has
Windows 11 November 2021 Emoji Changelog
This week Microsoft has released an update for Windows 11, officially bringing their new Fluent emoji set to Windows users. The update also brings popular new emojis such as Face with Spiral Eyes and Heart on Fire to the platform for the first time. Above: a selection of 😃 Smiley emojis
Thanksgiving Emojis Get Gobbled Up
As we approach Thanksgiving in the United States, various emojis experience a considerable holiday-based jump in use. This is especially true of those emojis that can be associated with the traditionally sizable Thanksgiving dinner. These of course include 🦃 Turkey and 🥧 Pie emojis. We are also able to see some movement