Cloud Fax and Notifications API 2.6 Documentation
Common Types and Elements
Cloud Fax and Notifications API Forums Home  
    prev  next        Table of Contents  Common Elements  

3.5 Character Sets and Encodable Strings

The Cloud Fax and Notifications API uses XML documents to transmit information. Within an XML document, a single character encoding applies - most commonly UTF-8. EasyLink systems permit some data - documents and certain fields - to contain data using a different character encoding. There are also some fields which may contain data which may be in a different character encoding, but for which the character set information is not recorded.

Character sets should be specified using ISO standard names, or names in the following list. Note that EasyLink support for character sets is evolving and may vary depending on the particular systems a customer uses - testing is recommended to verify support for any particular application.s use of any of these (including UTF-8).

  • US-ASCII
  • ISO-8859-1
  • SHIFT-JIS
  • EUC-JP
  • EUC-KR
  • US-ASCII
  • ISO-8859-1
  • ISO-8859-1
  • BIG5
  • GB2312
  • KS_C_5601-1987
  • UTF-8
  • ISO-2022-JP
  • ISO-2022-KR
  • ISO-2022-CN
  • ISO-8859-2
  • ISO-8859-3
  • ISO-8859-4
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • ISO-8859-10
  • ISO-8859-13
  • ISO-8859-14
  • ISO-8859-15
  • windows-1252

3.5.1 Document Character Sets

The Document element (see DocumentType) allows a character set to be specified for the document. When it also contains the actual document (using a DocData element) the document may appear in "text" or "base64" format. If it appears as "text", it must conform to the character encoding used for the whole XML document. If a different character set is specified, an attempt will be made to convert the document to the specified character set. If it appears as "base64", or if the document is specified using one of the alternatives to DocData, it will be assumed to already be in the specified character set.

3.5.2 Field Character Sets

Certain EasyLink data fields (FAXTO, FAXFROM, ATT, INSERTn) associated with jobs and destinations may be represented in specific character sets, and have that character set information stored with them These fields are now accessible as EncodableStringType in the schema, which includes an attribute "b64charset". The value of the field must be supplied in base64-encoded form, and the b64charset attribute specifies the character set the decoded data is in. A special b64charset value "binary" simply specifies that no character set or conversion is to be used on the data - after being decoded from base64, the data is used as is.

In fact, due to its generality, the schema allows other fields to be specified with character set information, and the base64 format may allow arbitrary values to be supplied and stored, but except for those mentioned above the character set information may not be available during later processing, which may lead to some conversion inconsistencies.

Some fields (REF, BILLCODE, CREF) have historically been used to hold data in various character sets, although the character set information is not stored with them. If values for these fields are supplied with the "binary" value for the b64charset, the data will be stored as is after decoding from base64. If an actual character set is specified, the decoded value will still be stored without conversion, and the character set information will not be available later.

REF is a little different from the other fields since it is generally an attribute in the Cloud Fax and Notifications API schemas. For this field, the alternative "refb64" attribute has been added to permit entry of arbitrary data - the value must be base64 encoded, and (after decoding from base64) it will be passed to the system without further conversion.

The value of the b64charset attribute must be a "standard name" (as described above) or the value "binary". These values are not enforced by the schema, and testing is advised to confirm support for any particular application.

Examples on how to encapsulate encoding and character-set support with the given fields are shown here ("^A" encoding refers to an internal mechanism, and signifies that the characterset information can be stored with the value):

ExampleRemarks
<To>abcde</To>stored as UTF-8
<To b64charset="SHIFT-JIS">YWJjZGU=</To>b64-decoded, ^A encoded
<To b64charset="binary">YWJjZGU=</To>b64-decoded, stored without conversion or ^A-encoding
<BillingCode b64charset="SHIFT-JIS">YWJjZGU=</BillingCode>b64-decoded, stored without conversion
<BillingCode b64charset="binary">YWJjZGU=</BillingCode>b64-decoded, stored without conversion or A-encoding
<Property name="Phone" b64charset="SHIFT-JIS">YWJjZGU=</Property>stored without conversion (Phone not ^A-able)
<Property name="Phone" b64charset="binary">YWJjZGU=</Property>b64-decoded, stored without conversion or ^A-encoding

As mentioned above, "ref" is currently an attribute on various destination types, and in order to permit the use of arbitrary ref values an alternative "refb64" attribute is available which will contain base64-encoded data. This is equivalent to providing an element value using a "binary" b64charset. If both "ref" and "refb64" are present, the "refb64" value will be used.

Several API functions include request options UseBase64 and UseBinary. The UseBase64 option is meant to request that fields that may have character set information stored with them (like ATT), be returned when possible in base64-encoded form, with the character set indicated in an associated b64charset attribute. This allows the exact data value to be retrieved, and avoids possible conflict with the XML used to transmit it. The UseBinary option is meant to request that fields that do not have character set information, but that commonly do contain data in a variety of character sets (REF, BILLCODE, CREF) be returned in "binary" base64 form.

3.5.3 Table Data

Destination data may be submitted as a Table for list create and job submit functions. A Table is essentially a Document containing a CSV or Excel file. No special provision for character sets has been implemented for Excel files, but the CharacterSet may be significant for CSV files. The data will be read from the file assuming that it is in the specified character set. The default behavior is to store the data without conversion from the indicated character set. Some fields, (FAXTO, FAXFROM, ATT, INSERTn), will be stored along with the character set information. To accommodate the fields REF, BILLCODE, and CREF which have been historically used to hold other data, the Table/FieldMapping/Map element can specify with the IsBase64Encoded tag that a column contains base64-encoded data (written in the Document's character set) which will be base64-decoded and used without further conversion - this is analogous to the 'b64charset="binary"' behavior for the XML representation of destinations.

3.5.4 List and Object Names

Due to legacy considerations, list and object names may be stored on EasyLink systems in various character sets, but the character set information is not available. This may present some issues for both input and output when using the Cloud Fax and Notifications API, especially if the API is not the only access method used, or if legacy data must be accessed.

Ordinary text data in an XML document is normally in Unicode (UTF-8, typically). When names are received this way by the Cloud Fax and Notifications API, an attempt is made to locate a corresponding object with a name that matches the input value, taking into consideration the switch, domain settings, and the user's profile character set.

For output, an attempt is made to interpret names found and represent them with an accurate conversion in the result XML.

In both of the above cases, there may be rare cases where data is misinterpreted, or cannot be represented in the result.

In many cases, the Cloud Fax and Notifications API schema allows for base-64 encoded representations of names, and when these are available, this is used as a last resort in order to transmit the information. When this is not available for output, data that cannot be converted may be altered, with the Unicode character 0xFFFD being substituted for unrecognizable data.

 
    prev  next        Table of Contents  Common Elements  
© Copyright 2020 OpenText Corp. All Rights Reserved.
Privacy Policy | Cookie Policy
This information is subject to change. Please check frequently for updates.
Modified October 06, 2020