ISO/IEC JTC/SC2/WG2 N1395 dated June 11, 1996, and the USA ANSI response ISO/IEC JTC/SC2/WG2 N1395 |
Background
Since the distribution of document N1446 there has been only a very limited amount of informed feedback on the original proposal and the US comments. There is however an increasing volume of software development that requires an accurate and comprehensive set of ISO standards. Much of this work seeks to restore the full richness of Armenian orthography, which is now well within the capability of modern information technologies. There is also a need to revise other ISO standards that will be derived from ISO10646. This discussion document seeks to provide additional explanatory information, and to propose a workable reconciliation of the original proposal and the US ANSI feedback that permits the maximum restoration of Armenian orthography with minimal impact on existing applications.
Administrative Issues
In response to recommendations from the UNICODE consortium and the chairmen of ISO committees, the submitters of the original document have made recommendations to SARM (the State body of the Republic of Armenia responsible for Standardization ). SARM has upgraded its status with ISO so as to become a P member of TC46 and TC46/SC2, and is currently exploring the possibilities of becoming a member of ISO/IEC JTC 1 and ISO/IEC JTC1/SC2.
It may be useful to clarify that the original submitters, the Computer
Standardization Committee of the Armenian Engineers and Scientists of America
(AESA) (hereafter referred to as AESA) is an authorized agent
of SARM and
also act in an advisory capacity to SARM. The AESA committee
apologizes for any difficulties caused by a misunderstanding of the
term (requester type), and requests that the original document be understood
as an expert contribution in accordance with the recommendation of US ANSI.
The AESA committee will be glad to use either a technical corrigendum or
defect report in place of an addition form , if this is the preferred ISO
mechanism for communication. This topic has been
discussed with Mike Ksar, convenor of the ISO/IEC JTC1/SC2/WG2 committee.
AESA is prepared to follow his recommendations in this regard. The
AESA committee emphasizes its role as an advisory
body to SARM , which is the official state standards body of the Republic
of Armenia and which is the final Armenian national source of authority
on computer standards. This document reflects the views of the AESA committee
that are not necessarily endorsed or rejected by SARM, and merely
seeks to clarify current open issues so that informed discussion can lead
to informed decision making.
Technical Issues
a) Implementation Level
There is some semantic confusion that results from the UNICODE usage of "modifier letters" and the ISO10646 usage "implementation level". This is an area that requires the critical focus and attention of ISO and UNOCODE specialists if the full set of Armenian orthographic characters are to be correctly described in ISO10646. We draw particular attention to the pitifall of misinterpreting correct usage and function from print samples that demonstrate little other than the electromechanical limitation of the printing device. We emphasize the importance of encoding both the full range of characters and encoding them in the correct manner in accordance with ISO and UNICODE terminology, and are keen to disseminate full details of correct Armenian usage in order to achieve that objective.
The original request for level 3 implementation was based upon the description
in section 15.3 of ISO10646 standard that prescribes that level 3 "MAY
contain coded representations of any characters". This seemed advantageous
, since "MAY" is non-definitive and flexibly, whereas level 1 is definitive
and restrictive. We share the view of US ANSI that this is a substantial
change, and understand the concern of
impacting current systems. However, we feel that this needs to be counter-balanced
by the need to correct errors before they are perpetuated in future systems.
We therefore urge systematic review and discussion of the differences of
opinion so that a final resolution can be embraced that will serve future
generations of
software developers and users.
b) Unification
The AESA committee would like to point out that significant effort has
been expended on explaining and discussing the philosophy and ramifications
of ISO10646 and UNICODE to scientific and governmental authorities in the
Republic of Armenia, where they were considered with great interest and
reviewed in considerable depth. In general, we are sympathetic to unification
where it is appropriate, yet do not want to see specific features of Armenian
orthography obliterated by a simplistic and naïve form of UNICODE philosophy,
or an incorrect application of appropriate philosophy. If the symbols described
are indeed missing in ISO10646, we are receptive to suggestions on the
appropriate way to include them. The various suggestions that have been
made to unify characters where there is no identity of both function and
glyph form seem to be inappropriate applications , if not overt violations,
of ISO10646/UNICODE philosophy.
I point out for the sake of those who lcak access to the UNICODE Standard
2.0 that that standard specifies
"Identifying a character A as a compatibility variant of another character
B implies that generally A can be remapped to B without loss of information
other than formatting."
In both formal and informal reviews and email dialogues, we have sought to explain the nature of the full Armenian orthographic system in order to facilitate correct classification and unification decisions.. If individuals familiar with ISO/UNICODE classifications are prepared to examine these symbols and the description of their usage closely, we are sure that the appropriate classification and status can be determined.
The following section is written to focus that necessary level of critical attention on the status of the specific characters.
Please note that the transliterated names are not being recommended for adoption for ISO or to codify any specific transliteration scheme.
We would like to achieve a definitive resolution (i.e. agree to agree or agree to disagree) on the place of these symbols in ISO10646/UNICODE so that applications development is not further restricted.
1) Armenian MIJAKET
Description: MIJAKET is a separation mark used inside a sentence, and literally translated as "middle dot". It is larger than the Armenian VERJAKET, has left alignment and has a different function.
Original Proposal: We proposed that this should be added to at location 058A.
US ANSI comment: Unify with 00B& Middle Dot..
Discussion Comment: While this appears to be a useful and constructive suggestion, and there is some similarity of function between MIJAKET and the Middle Dot at 00B7, the symbol is nevertheless different and does not satisfy the criteria of common functionality and glyph form. As a compromise, the US ANSI unification proposal could be accepted, albeit at the price of weakening ISO10646/UNICODE principles, since MIJAKET is not truly identical to Middle Dot as a character.
2 and 3) ARMENIAN DZAKH and AJ CHAKERT
Description: (see original document).
Original proposal: A request was made for addition on the grounds that (as implied in the names) that these are derived from other Armenian characters.
US ANSI Comment: Unify as glyph variants of of 00AB and OOBB. "We have seen multiple instances of Armenian text using the regular quotation marks."
Discussion Comment: The last US ANSI comment seems to be an example
of the syndrome of accepting evidence from printing devices that
lack the full range of symbols to argue that an incomplete set is complete.
While there is a case for accepting unification solely on the basis of
function, this does not contravert the argument that
the character itself is not the same as the character with which US
ANSI proposes unification. The unification proposal is therefore
a violation of ISO10646/UNICODE philosophy.
4) Armenian YENTAMNA
Description: "This is one of two hyphenation signs in Armenian…" (see original text for complete description.)
Original Proposal: Addition due to lack of similarity to other characters.
US ANSI Comment:"Again we have seen evidences of usage of straight HYPHENATION (2010) in Armenian text. The behavior of the Armenian matches the one expected by the regular HYPHENAATION character which is not linked to a specific script. Therefore we don’t see the need for the addition of this character."
Discussion Comment: The US ANSI comment is prone to the same logical flaw exposed above. This appears to be a violation of ISO10646/UNICODE philosophy, or at least a poor application of it.
5 and 6) Armenian MIUTIAN GTSIK and Armenian ANJATMAN GITS
Descriptions: See original document.
Original Proposal: Addition
US ANSI Comment (abbreviated): "These are both explanatory marks and very similar in nature with the existing HORIZONTAL BAR (2015) and many other horizontal bars already encoded in ISO/IEC 10646. The standard does not mandate exact shapes of characters, nor does it specify character behaviour for symbols which can be used as explanatory marks. Therefore, before accepting these new characters, all effort should be made to merge them with symbols having similar shapes.".
Discussion Comment: This request could probably be accommodated
as glyph variants by unification of Armenian Miutian Gtisk
with 002D Hyphen, and Armenian ANJATMAN GITS with QUOTATION DASH (2105).
They
still need to be added to the Armenian code page.
7 and 8) Armenian SIUN and YERKVOREAK
Description: See original document.
Original proposal: Addition
US ANSI comment: Unify SIUN with PRIME (2032) and YERKVOREAK with DOUBLE PRIME (2033)
Discussion Comment: Recommend acceptance of US ANSI comment.
9) YEV
US ANSI Comment: This is a partially documented addition without
proper documentation, but since they are described in the original
proposal, we are commenting on them. (Other comments are made on nomenclature.)
The US
is willing to consider inclusion of this character to provide a capital
form of an existing small form.
Discussion Comment: This seems to be a positive resolution of a long-standing theoretical debate. While YEV may be decomposed into two separate characters, it serves a function as the Armenian ampersand, and already exists as a separate typographic symbol. The US ANSI proposal is in accordance with the pragmatic approach adopted on recent private code page implementations.
10) OU
Discussion Comments: This another problematic character from
a variety of perspectives. This character can be justified as an
independent character on the basis of existing typographic practices.
It can also be decomposed and therefore considered as a digraph that does
not need encoding. Arguments about its inclusion or exclusion tend
to be ideologically divisive due to variants in Soviet and Western Armenian
orthography.
The US ANSI argument that it is similar to other digraphs is only partially
valid, since those other digraphs (to the best of this author’s knowledge)
may not have been used (and taught in schools) as independent symbols.
There is no easy decision on this character that can satisfy all parties.
My personal preference is to include it on the grounds that any standard
should serve as wide an audience as possible, and exclude as few developers
and end users as possible. Those seeking to exclude it should do
so at the application level, and not from an international standard.
11-15) Adjustments
Five adjustments were proposed in the original document. We are quite
content to resubmit these as technical corrigendum if that is the wish
of the convenor of the working group. These were originally grouped
together since they are all combining characters that therefore require
an adjustment to their
status property.
11 and 12) Armenian DUR (0559) and PATIW(055F)
Everyone with any level of knowledge of Armenian characters recognizes that this is not used in any form of Armenian orthography. The US ANSI committee comment is that "the usage and meaning of the current character has always been unclear". We have yet to find any application that uses this character in any meaningful way.
Discussion Comments: Research indicates that this is most likely to be a misrepresentation of a pair of Armenian symbols (DUR and PATIW) that both need an glyph adjustment to reflect their correct form. The US ANSI comment that this be corrected by a technical corrigendum is very acceptable.
13, 14, and 15) Armenian SHESHT, BATSAKANCHAKAN and HARTSAKAN
Original Proposal: Change the property of these characters so that they become combining characters and are handled correctly by font manufacturers..
US ANSI Comment: Make any necessary changes using a technical corrigendum.
Discussion Comments: No further discussion required,. Implement in the manner acceptable to committee convenor.
Other issues
a) Name Changes
We have tried to avoid unnecessary controversy and discussion of transliteration
issues in a group dedicated to character set issues. We accept the
US ANSI recommendation that naming issues be handled
in Annex P and any other pertinent forum (e.g. TC46).
b) General Policy
We support the policy of avoiding unecessary changes. We understand
and share the concern of the US ANSIT committee that risks to existing
implementations be minimized by making additions rather than changes. However
that risk needs to be evaluated against the restrictions on current and
future growth
that result from errors and omissions. We support the serious efforts
of those members of the international
software development and standards committees who seek to establish,
robust, reliable and correct standards, and hope t hat this document will
lead to the endorsement of the optimal set of solutions.