Next:   [Contents][Index]

Wayclip and the WaylandCB Class

Copyright © 2005 - 2024
            Mahlon R. Smith, The Software Samurai

This manual describes version 0.0.04 of the WaylandCB class,
and version 0.0.33 of the gString class.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled  "GNU Free Documentation License".

   This document is an extract from the NcDialog API library documentation.
         Please see the larger document for more details and examples.





Table of Contents


Wayland Clipboard

A Brief History

Since the early days of GUI (Graphical User Interface) computing under UNIX, the X Windows windowing system has been used to draw most of what the user sees, as well as hooks to enable most of the users’ interaction with the system. X Windows began life at MIT (Massachusetts Institute of Technology), but for most of its life has been based at the X.Org Foundation which is a non-profit standards group which manages a wide variety of computing projects.

X Windows was never really designed; but rather emerged bit-by-bit, butt-first from the Wild West days of computing. It is truly an amazing system that few of us can actually understand, even if we wanted to do so. However, the ad-hoc nature of X has led to a long series of unsupervised, obsolete or abandoned software protocols as hardware and software environments have evolved. It is also a breeding ground for security vulnerabilities. And it is this last point that has, more than anything else, driven the X.Org Foundation’s effort to replace X with a more modern and robust communications protocol for GUI environments. That new protocol is known as Wayland.

The primary beneficiary of the Wayland protocol is the GNU/Linux community. In 2017, the GNOME world began the painful and slow transition from X to Wayland. The interface to Wayland is primarily through the GTK or Qt developer’s toolkits, which provides only limited support for pure console applications.
Wayland was, and remains a hotbed of semi-functional code filled with annoying bugs and questionable design decisions. However, since early 2019 the Wayland protocol, as implemented within the “GNOME/Wayland compositor” has stabilized to the point where we are actually starting to get some real work done again instead of stopping twice a day to write a bug report or to fire off a rant about how “it used to work under X!!”

And here we are today, introducing our own first step into the Wayland world, the Wayland clipboard interface class “WaylandCB” for console applications.



Console Access to the Clipboard

For GUI developers, access to the Wayland clipboard is built into the GTK or Qt libraries which provide the framework for most GUI applications under Linux. We are happy to let the GUI developers draw their pretty (and sometimes undeniably awesome) pictures on the screen.

However, your author was developing UNIX software long before there was any such thing as a GUI, and believes that if it cannot be done at the command prompt, it probably shouldn’t be done at all. While this is arguably a Neandertal’s point-of-view, the command line is still the heart of the Linux system, and access to the system clipboard from within a terminal window is essential.



The Wayland Clipboard Interface

Several developers/enthusiasts, notably Sergey Bugaev, have created a suite of console utilities known as the “wl-clipboard” package. The package consists of two console utilities, “wl-copy” and “wl-paste”. While these utilities are simple in concept, they have undergone significant improvement since they were first released.

*** Well Done, Sergey!! ***

The wl-clipboard utilities now provide reliable console access to the Wayland system clipboard without the need application-specific implementations.

Visit Sergey at:
https://github.com/bugaevc/

Because these are console utilities, they are written primarily for passing text data between the console and the clipboard. They do however support the basics of a general-purpose clipboard.

If you are a developer in Perl, Ruby, Python, etc. these utilities may be called directly. If you are developing in C, C++ or another high-level programming language, it is inconvenient and often messy or dangerous to be calling external utilities. For this reason, we have created a small interface module, WaylandCB which sits between your application and the “wl-clipboard” utilities.

The “wl-clipboard” package must be installed on your system in order to use the WaylandCB clipboard access module. Installation is simple. The package can be installed directly from the repository using your preferred package manager:

sudo dnf install 'wl-clipboard'
or
sudo apt-get install 'wl-clipboard'

The two console utilities will be installed (probably in the '/usr/bin' directory).

In the console window, test the installation by copying something to the clipboard and reading it back:

wl-copy "Roses are red, violets are violet. Send me some soon, or I may become violent."

wl-paste

If you get back what you sent, the package is installed correctly.

Note that the usual rules of escaping characters apply to the text which prevents the shell program from “helpfully” interpreting them as shell instructions.

The package also includes a simple “man page.”
info 'wl-clipboard' or
man 1 'wl-clipboard'



The WaylandCB class

WaylandCB provides a simple interface between console applications and the Wayland clipboards. The Wayland Clipboard Class, “WaylandCB” is a very simple C++ class definition, consisting of two source files, and two dependencies.

The WaylandCB class itself is 800 lines of C++ code in two files:

WaylandCB.hpp // class definition
WaylandCB.cpp // class implementation

Software Sam’s first rule of software design is “Write the comments first!”, so you will not be surprised to find that the code is supported by detailed comments to assist the developer in understanding how the clipboard works and how to access it. Many of those comments are duplicated in this document.

It is not actually necessary to understand the mechanics of the code in order to get the full value of its functionality, but for the curious, the path to understanding is an easy one.


Dependencies

Dependency #1:
wl-clipboard utilities: wl-copy and wl-paste
The wl-clipboard utilities and the method for installing them are described in the previous section.

Dependency #2:
gString.hpp
gString.cpp
The gString module performs encoding conversions between UTF-8 and UTF-32 (wchar_t) text. In addition, it provides text analysis and formatting tools similar to the GTK Glib::ustring class, but much smaller and faster.

The gString module is bundled with the WaylandCB package.

Also, the gString package is integrated into the NcDialogAPI, or is available as a separate download from the author’s website.

Also included as part of the distribution is the “Wayclip” test and demonstration program which is described below:
See Wayclip Demo App.


Integration into your application

If your application is based on the NcDialogAPI, the WaylandCB interface is fully-integrated into the API, so building the stand-alone WaylandCB module into your application is unnecesary.
Please see the “Clipboard Interface” chapter of the NcDialogAPI documentation for details.



If your application does not use the NcDialogAPI, then the WaylandCB and gString modules may be built as a separate link library or may be built directly into your application source. Because they are both very small modules, it is recommended that they be built as ordinary source modules in your application. Your makefile would then include entries something like this.

.RECIPEPREFIX = > COMPILE = g++ -x c++ -std=gnu++11 -Wall -c yourapp: $(your .o files) WaylandCB.o gString.o > g++ -o yourapp $(your .o files) WaylandCB.o gString.o - - - # WaylandCB class WaylandCB.o: WaylandCB.cpp WaylandCB.hpp gString.hpp > $(COMPILE) WaylandCB.cpp # gString class gString.o: gString.cpp gString.hpp > $(COMPILE) gString.cpp

Your application would then call the public methods of the WaylandCB class as needed. See the next section for details.


WaylandCB Public Methods

  • ~WaylandCB ( void );
    Input  : none
    
    Returns: nothing
    
    

    Destructor:
       1) Delete all temp files.
       2) Return all resources to the system

  • WaylandCB ( void );
    Input  : none
    
    Returns:
       constructors implicitly return a pointer to the object
    
    

    Default Constructor:
       1) Initialize all class data members.
       2) Locate the system’s temp-file directory.
       3) Create a temporary file for communicating
          with the clipboard interface.
       4) Determine whether "wl-copy" and "wl-paste"
          utilities are installed.
       5) Test access to the GNOME/Wayland clipboard.


  • bool wcbIsConnected ( void );
    Input  : none
    
    Returns:
       'true' if connected to clipboard, else 'false'
    

    Report whether connection to the Wayland clipboard has been established.


  • short wcbGet ( uint8_t* trg, bool primary = false, short len = gsMAXBYTES );
  • short wcbGet ( char* trg, bool primary = false, short len = gsMAXBYTES );
  • short wcbGet ( wchar_t* trg, bool primary = false, short len = gsMAXCHARS );
  • short wcbGet ( gString& trg, bool primary = false, short len = gsMAXBYTES );
    Input  :
       trg     : pointer to buffer to receive data
                 uint8_t* (aka char*)  or  wchar_t*  or  gString&
       primary : (optional, 'false' by default)
                 'false', read from "regular" (ordinary, main) clipboard
                 'true', read from "primary" (highlighted-data) clipboard
       len     : (optional) specify length of target buffer
                 Default value:
                  for wchar_t* targets: gsMAXCHARS
                  for uint8_t* (char*) and gStdring targets: gsMAXBYTES
                 Important Note:
                  -- A minimum buffer length is not specified, but the
                     default value is the recommended minimum.
                  -- The maximum buffer length for UTF-8 (byte-oriented)
                     data is: MAX_CB_UTF8  which is currently defined as 16Kbytes.
                     The maximum buffer length for UTF-32 (wchar_t, 32-bit)
                     data is: MAX_CB_UTF32  which is currently defined as 4Kwords.
                     For a gString target, gsMAXBYTES is the maximum.
    
    Returns:
       for wchar_t target: number of UTF-32 characters (codepoints) read 
         including the null terminator
       for other target types: number of UTF-8 bytes read including the 
         null terminator.
       Error Codes:
         Returns wcbsNOCONNECT if clipboard connection not established.
         Returns wcbsNOINSTALL if 'wl-copy' utility not installed.
    
    

    Get a copy of the GNOME/Wayland clipboard data.

    Data returned will be NULL terminated.
    Data may contain newline characters (’\n’) but will not include a trailing newline.

    Important Note: If the data on the clipboard are larger than the target buffer, the data will be truncated and null terminated.

    Author’s Rant Please note that UTF-8 encoding is intrinsically unsigned character data; however, there is a polite fiction that it can be handled as if it were signed character data, so the WaylandCB class also handles signed and unsigned byte data as if they were equivalent, although they are not.


  • short wcbSet ( const uint8_t* csrc, bool primary = false, short ccnt = -1 );
  • short wcbSet ( const char* csrc, bool primary = false, short ccnt = -1 );
  • short wcbSet ( const wchar_t* csrc, bool primary = false, short ccnt = -1 );
  • short wcbSet ( const gString& csrc, bool primary = false, short ccnt = -1 );
    Input  :
       csrc    : pointer to buffer containing source data
                 uint8_t* (aka char*)  or  wchar_t*  or  gString&
       primary : (optional, 'false' by default)
                 'false', write to "regular" (ordinary, main) clipboard
                 'true', write to "primary" (highlighted-data) clipboard
       cnt     : (optional, -1 by default)
                 number of characters or bytes to write
                 -- If default (-1) specified, write all source data to
                    and including the null terminator
                    (OR to maximum buffer length)
                 -- If 'cnt' is reached before null terminator is
                    reached, a null terminator is appended and any
                    remaining source data are ignored.
                 -- If 'cnt' == ZERO, then instead of processing 'csrc'
                    data, the wcbClear() method is called instead.
    
    Returns:
       for wchar_t target: number of UTF-32 characters (codepoints) written,
        including null terminator
       for byte-oriented targets or the gString target: 
        number of bytes written, including null terminator
       returns wcbsNOCONNECT if error i.e. not connected to clipboard
       returns wcbsNOINSTALL if 'wl-copy' utility not installed
    
    

    Set the contents of the GNOME/Wayland clipboard.

    Source data are terminated at null terminator or if ’cnt’ specified, then specified number of items (PLUS null terminator if not included in count).

    Technical Note: Because communications between WaylandCB and the system clipboard is handled through console shell commands, the usual rules of escaping shell “special” characters are automatically applied to the source text to prevent the shell program from interpreting them as shell instructions. Because WaylandCB handles this escaping automatically, the application-level code should seldom (if ever) need to apply manual escaping.


  • bool wcbClear ( bool primary = false, bool useCmd = false );
    Input  :
       primary: (optional, 'false' by default)
                if 'false', clear the "Regular" clipboard
                if 'true',  clear the "Primary" clipboard
       useCmd : (optional, 'false' by default)
                if 'false', clear clipboard by sending an empty string
                if 'true', clear the clipboard using the
                           "--clear" argument (see note below)
    
    Returns:
       'true' if successful
       'false' if not connected, or other error
    
    

    Clear the system clipboard.

    Important Note: By default, WaylandCB transmits an empty string to clear the clipboard.
    This is done to avoid a logical problem within the 'wl-clipboard' utilities.

    When using the 'wl-copy --clear' command to clear the clipboard, the clipboard is cleared correctly; however, the 'wl-paste' command will subsequently report the literal string “Nothing is copied” to indicate that the clipboard is empty. This is logically incorrect because it means that the data returned by 'wl-paste' must be compared with “Nothing is copied” to determine whether the clipboard is actually empty.


  • short wcbBytes ( bool primary = false );
    Input  :
       primary: (optional, 'false' by default)
                if 'false', report the "Regular" clipboard
                if 'true',  report the "Primary" clipboard
    
    Returns: 
       bytes of data on the clipboard incl. NULLCHAR
       returns wcbsNOCONNECT if error i.e. not connected to clipboard
       returns wcbsNOINSTALL if 'wl-copy' utility not installed
    
    

    Report number of bytes of data stored on the specified clipboard.


  • short wcbTypes ( char* trg, short cnt = -1, bool primary = false );
  • short wcbTypes ( wchar_t* trg, short cnt = -1, bool primary = false );
  • short wcbTypes ( gString& trg, short cnt = -1, bool primary = false );
    Input  :
       trg    : buffer to receive descriptive text for available formats 
       cnt    : (optional, -1 by default) size of target buffer
                for uint8_t* (char*) and gString targets, default: gsMAXBYTES
                for wchar_t* target, default: gsMAXCHARS
                NOTE: On average, approximately 200-300 bytes of (ASCII)
                      data will be returned, but be safe.
       primary: (optional, 'false' by default)
                if 'false', report the "Regular" clipboard
                if 'true',  report the "Primary" clipboard
    
    Returns:
       for wchar_t target, number of characters read, including null terminator
       for byte-oriented targets or gString target, number of bytes read 
         including null terminator
       returns wcbsNOCONNECT if error i.e. not connected to clipboard
       returns wcbsNOINSTALL if 'wl-paste' utility not installed
    

    Report the available formats for retrieving data from the specified clipboard.

    For a given set of text data on the clipboard, the wcbTypes() method will report the formats in which that data may be retrieved by the application.

    The Wayland clipboard supports essentially the same MIME type designations for transmitting text as its predecessor, the X-clipboard.

    text/plain;charset=utf-8 UTF8_STRING TEXT STRING text/plain;charset=unicode text/plain;charset=UTF-16 text/plain;charset=UTF-8 text/plain;charset=UTF-16BE text/plain;charset=UTF-16LE text/plain;charset=ISO-8859-1 text/plain;charset=US-ASCII text/plain

    Please note however, that WaylandCB transmits and receives text using ONLY the “text/plain;charset=UTF-8” format. The 16-bit, ASCII and ISO-8859-1 formats are Windows(tm) garbage which is seldom seen on real (i.e. GNU/Linux) systems.

    For those who are unfamiliar with MIME types (media types), these are standards for transmitting data accross the internet. These standards are administered by the IANA (Internet Assigned Numbers Authority). Only a small subset of these types are associated with clipboard text data.

    Technical Note: In addition to standard UTF-8 encoding, the WaylandCB class also supports UTF-32 (wchar_t, 32-bit) data transfers to and from the application by locally re-codeing the data. See Introduction to gString for a discussion of encoding and re-encoding text data.


  • wcbStatus wcbTest ( const char* testdata );
    Input  :
       testdata: a short null-terminated string to use as test data
    
    Returns:
       member of enum wcbStatus:
       wcbsNOINSTALL  'wl-copy' and/or 'wl-paste' utility not installed
       wcbsNOCONNECT  unable to establish connection with system clipboard,
                       or communication interrupted
       wcbsACTIVE     system clipboard communication established and active
    
    

    Test the connection with the Wayland clipboard.

    Send a short (< 1000 bytes) message to the clipboard, then retrieve the data and compare the results.


  • bool wcbReinit ( void );
    Input  : none
    
    Returns:
       'true' if connection re-established, else 'false'
    
    

    Terminate the current connection (if any) with the Wayland clipboard, release all local resources, reinitialize local data members, and re-establish the connection.

    Technical Note:
    The ’wl-copy’ and/or ’wl-paste’ utility will occasionally go off into the weeds. The reasons for this are unclear, but communication can usually be re-established by running the initialization sequence again. However, to do this, it is necessary to first call 'wcbReinit' to reset all class resources to the initial state and then execute the startup sequence.


  • const char* wcbVersion ( void );
    Input  : none
    
    Returns:
       pointer to const string of the format: "x.y.zz"
    
    

    Report the WaylandCB-class version number.


  • short wcbVersion ( gString& gsVersion );
    Input  : (by reference) a gString object to receive the version strings
    
    Returns:
       ZERO if success
       Error Codes:
         Returns wcbsNOCONNECT if clipboard connection not established.
         Returns wcbsNOINSTALL if 'wl-copy' utility not installed.
    
    

    The overloaded version of wcbVersion reports both the WaylandCB-class version and the version of the 'wl-clipboard' utilities, separated by a newline character (’\n’).

    Example: "0.0.04\n2.2.1" where "0.0.4" is the WaylandCB-class version, and "2.2.1" is the version number of the wl-clipboard utilities.

    If communication with the wl-clipboard utilities has not been established, the WaylandCB-class version will still be reported, but the wl-clipboard version will be reported as unknown. Example: "0.0.04\nunknown"




Wayclip Demo App

The “wl-clipboard” suite: “wl-copy” and “wl-paste” must be installed on your system for the Wayclip application to function properly. Please see Wayland Clipboard Interface for installation instructions.


The Wayland communications protocol is the successor to the X-Windows protocol, and the GNOME/Wayland compositor is the new window manager based on the Wayland protocol.

Note that other communication protocols for UNIX-like systems exist at various stages of development; however, both GNOME and KDE have committed to supporting Wayland. Most systems based on these platforms have also announced support for Wayland as the “official” successor to X-Windows.

‘Wayclip’ is a simple console application that demonstrates the “WaylandCB” interface class which provides access to the Wayland clipboard.

Application Design

‘Wayclip’ is an embarrassingly simple console application written in C++. The application provides straightforward examples for accessing the public methods of the WaylandCB interface class.
Please see WaylandCB class for more information.

The WaylandCB class, in turn, provides access to the “wl-clipboard” utilities: “wl-copy” and “wl-paste” written by Sergey Bugaev (and friends).
Please see Wayland Clipboard Interface for more information.

The application is designed around a menu of available commands (see User Interface). These commands exercise each of the public methods of the WaylandCB interface class.

Wayclip produces color-coded output using simple ANSI color commands (escape sequences).

Although Wayclip will run in a terminal window of any reasonable size, it is recommended that you expand your terminal window to at least 36 rows and 132 columns for optimal viewing.


Invoking

–p Set target clipboard to “Primary”

Set the initial target clipboard to the Wayland “Primary” clipboard.

This clipboard is designed primarily for data that have been highlighted within an application; however for text data, this clipboard has all the same capabilities as the “Regular” clipboard.

The target clipboard may also be specified from within the application using the '-a' (Active-clipboard) command.


–r Set target clipboard to “Regular”

Set the initial target clipboard to the Wayland “Regular” clipboard.

This is the main system clipboard and can handle all the standard data types and data formats. This is the default clipboard, so if the target clipboard is not specified on the Wayclip command line, the “Regular” clipboard is the initial target.

The target clipboard may also be specified from within the application using the '-a' (Active-clipboard) command.


––version Version and copyright message.

Display the application version number and the basic copyright notice.

––help (or –h) Help for application.

Display a list of available command-line options.


User Interface

The Wayclip application’s user interface is menu driven so there is little need to refer to this documentation except to satisfy the naturally insatiable curiosity of the software programmer.

Copying Data To the Clipboard

Data may be copied to the Regular (main) clipboard from any application on the desktop, for instance a text editor, a LibreOffice(tm) document, or from a web page in your browser.

Data may be copied to the Primary clipboard by using the mouse to highlight text in any application on the desktop including the current terminal window.

For the tests performed by the Wayclip application, data are limited to text data only.

The Wayclip Menu

Wayclip - v:0.0.04 (c)2019-2024 The Software Samurai ====================================================== MAIN MENU: (Press ENTER key after command key) a : Set "active" clipboard. Currently active: Regular c : Copy test data to active clipboard. m : Display Main Menu p : Paste data from active clipboard. r : Report available MIME types and misc. info. x : Clear active clipboard. R : Reset and re-initialize clipboard connection. s : Specify text to be copied to clipboard. t : Test the clipboard connection. S : Specify internal communications format. T : Test I/O buffer-overrun protection. q : Quit w : Clear the terminal window. >>

Type the command letter and then press the ENTER key.

Explanation of each command



Three additional commands are available but to save space, were not included in the menu. These commands are primarily for the convenience of the developer, but the commands are fully functional and available for use.

Please note that these are upper-case letters.



Please note that this application disables the “Panic Button” (CTRL+C), so the user must exit the application using the ‘q’ (ENTER) command. The reason for this is that our beta-testers continually tried to use CTRL+C and CTRL+V as copy and paste, respectively.
No, no, beta-testers! RTFM! (We know that you would never do anything that foolish. :-)




gString Text Tool

’gString’ is a small, fast and flexible way to seamlessly convert, format and analyze both UTF-8 and wchar_t (’wide’) text.


Introduction to gString

Introduction to a Wider World

Modern applications must be designed for a worldwide audience, and for this reason, the application designer must plan for multi-language support.

Fortunately, the Universal Character Set standard ISO/IEC 10646 and UTF-8, the most popular and flexible method of character-encoding smoothly provide all the character sets, alphabets and special characters which are currently in general use.

Unfortunately, the C and C++ languages offer only minimal support for internationalization. std::string and std::wstring are nothing more than a cruel joke to a serious application designer. The GTK+ toolkit’s Glib::ustring class is an excellent internationalization tool, but it requires installation of the GTK+ toolkit’s ’glib’ and ’glibmm’ libraries. For more information on Glib::ustring, see:
http://library.gnome.org/devel/glibmm/unstable/classGlib_1_1ustring.html

’gString’ falls halfway between the full feature set of Glib::ustring and the meaningless garbage that is std::string. ’gString’ consists of one C++ header file and one C++ source code module. ’gString’ is integrated into the NcDialog API library, but may also be compiled independently as a small (16Kb) link library or the source may be embedded directly into your application.

Preparing to Support Multiple Languages

Here are the basic ideas you will need to understand in order for your application to smoothly support multiple languages.

  1. ASCII (American Standard Code for Information Interchange) is only the very first step in character encoding. It is an ancient and venerable encoding, but supports only the 95 printable characters of the basic Latin alphabet.
    If you think you can say "你是在欺骗自己!" in ASCII, you’re just deluding yourself!
  2. NEVER assume that one character can be represented with one byte.
  3. NEVER assume that one character is one display column in width.
  4. The idea that text flows from left-to-right, may only be PROVISIONALLY assumed, because again: "!איר זענט נאָר דילודינג זיך" you’re just deluding yourself.
  5. NEVER assume that "everyone reads English, so why bother?". Native speakers of Spanish, French, Chinese, the various flavors of Arabic and others (i.e. your potential customers) all have a significant impact on the daily events of our planet, so include them when planning for your next killer app.

See also a discussion of multiple-language support in the NcDialog API.




gString Public Methods

What follows is a list of all public methods of the gString class.
Methods are arranged in functional groups.

        gString Method Name            Chapter Reference    
 gString [constructor] see gString Instantiation
 ~gString [destructor] 
 operator= see Assignment Operators
  
 compose see Formatted Assignments
  
 formatInt see Integer Formatting
  
 gstr see Data Access
 ustr 
  
 copy see Copying Data
 operator<< 
 substr 
  
 append see Modifying Existing Data
 insert 
 limitChars 
 limitCols 
 shiftChars 
 shiftCols 
 padCols 
 strip 
 erase 
 replace 
 loadChars 
 textReverse 
 formatParagraph 
  
 compare see Comparisons
 operator== 
 operator!= 
 find 
 findlast 
 after 
 findr 
 findx 
 scan 
  
 gscanf see Extract Formatted Data
  
 gschars see Statistical Info
 gscols 
 utfbytes 
 isASCII 
  
 clear see gString Miscellaneous
 Get_gString_Version 
 dbMsg 



gString Instantiation

The following are the ’constructors’ for the gString class.

For those new to C++, a constructor creates an ’instance’ of the class. An instance is a particular, named object, and can be thought of as a complex variable.

  • gString ( void ) ;
      Input  :
         none
      Returns:
         nothing
    

    Constructor: Initialize members to default values (NULL string).


  • gString ( const char* usrc, short charLimit=gsMAXCHARS ) ;
      Input  :
         usrc      : pointer to a UTF-8-encoded, null-terminated string
         charLimit : (optional, gsMAXCHARS by default)
                     maximum number of characters from source array to 
                     convert
      Returns:
         nothing
    

    Constructor: Convert specified UTF-8-encoded source to gString.


  • gString ( const wchar_t* wsrc, short charLimit=gsMAXCHARS ) ;
      Input  :
         wsrc      : pointer to a wchar_t-encoded, null-terminated string
         charLimit : (optional, gsMAXCHARS by default)
                     maximum number of characters from source array to 
                     convert
      Returns: nothing
    

    Constructor: Convert specified wchar_t (’wide’) source to gString.


  • gString ( short iVal, short fWidth, bool lJust = false,
              bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( unsigned short iVal, short fWidth, bool lJust = false,
              bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( int iVal, short fWidth, bool lJust = false,
              bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( unsigned int iVal, short fWidth, bool lJust = false,
              bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( long int iVal, short fWidth, bool lJust = false,
              bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( unsigned long int iVal, short fWidth, bool lJust = false,
              bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( long long int iVal, short fWidth, bool lJust = false,
              bool sign = false, bool kibi = false, fiUnits units = fiK ) ;

  • gString ( unsigned long long int iVal, short fWidth, bool lJust = false,
              bool sign = false, bool kibi = false, fiUnits units = fiK ) ;


      Input  :
         iVal  : value to be converted
                 Supported value range: plus/minus 9.999999999999 terabytes
         fWidth: field width (number of display columns)
                 range: 1 to FI_MAX_FIELDWIDTH
         lJust : (optional, false by default)
                 if true, strip leading spaces to left-justify the value
                 in the field. (resulting string may be less than fWidth)
         sign  : (optional, false by default)
                 'false' : only negative values are signed
                 'true'  : always prepend a '+' or '-' sign.
         kibi  : (optional, false by default)
                 'false' : calculate as a decimal value (powers of 10)
                           kilobyte, megabyte, gigabyte, terabyte
                 'true'  : calculate as a binary value (powers of 2)
                           kibibyte, mebibyte, gibibyte, tebibyte
         units  : (optional) member of enum fiUnits (fiK by default)
                  specifies the format for the units suffix.
                  Note that if the uncompressed value fits in the field,
                  then this parameter is ignored.
    
      Returns: nothing
         Note: if field overflow, field will be filled with '#' characters.
    

    Constructor: Convert specified integer value to gString.
    Please see Integer Formatting 'formatInt' method group for formatting details.


  • gString ( const char* fmt, const void* arg1, ... )
              __attribute__ ((format (gnu_printf, 2, 0))) ;
      Input  :
         fmt       : a format specification string in the style of 
                     sprintf(), swprintf() and related formatting 
                     C/C++ functions.
         arg1      : pointer to first value to be converted by 'fmt'
         ...       : optional arguments (between ZERO and gsfMAXARGS - 1)
                     Each optional argument is a POINTER (address of) the 
                     value to be formatted.
      Returns:
         nothing
    

    Constructor: Convert formatting specification and its arguments to gString. Please refer to the compose() method:
    (see Formatted Assignments) for more information.

    Technical Note: There is no constructor using a “const wchar_t* fmt” format specification because it would conflict with the constructor which limits the number of characters used to initialize the instance.


  • gString ( const gsForm& gsf, short charLimit=gsMAXCHARS ) ;
      Input  :
         gsf       : initialized gsForm class object containing parameters 
                     for creating a formatted text string.
         charLimit : (optional, gsMAXCHARS by default)
                     maximum number of characters from source array to 
                     convert
      Returns:
         nothing
    

    DEPRECATED: May be removed in a future release.
    This method seemed like a good idea back in 2011, but neither we, nor our beta testers have ever had a desire to use it.

    Constructor: Convert formatting instructions in gsForm class object to gString. See compose() for more information.


  • ~gString ( void ) ;
      Input  :
         none
      Returns:
         nothing
    

    Destructor: Release all resources associated with the gString object.

    Object is destroyed either when it goes out of scope, or by explicitly deleting the object.

    For those new to C++, please note that if you use the ’new’ keyword to create objects, then those objects persist (take up space) until you explicitly delete them or until the application is closed, even if the pointer to the object has gone out-of-scope. See examples below.


    Examples

    void calling_method ( void )
    {
       gString *gsPtr = playful_kitten ( "Hello World!" ) ;
    
       // report contents of object created by called method
       wcout << gsPtr->gstr() << endl ;
       delete gsPtr ; // delete object created by called method
    }
    
    gString* playful_kitten ( const char* msg )
    {
       gString gs_local( "I love tuna!" ) ;  // local object
       gString *gsPtr1 = new gString,        // global object
               *gsPtr2 = new gString(msg) ;  // global object (initialized)
       gString *gsArray = new gString[4] ;   // global array
    
       *gsPtr1 = gs_local ;   // be a kitten: play with the strings...
       gsArray[2] = *gsPtr2 ;
       gsArray[3] = "Scratch my belly!" ;
       gsArray[1] = gsArray[3] ;
    
       delete gsPtr1 ;      // delete object referenced by gsPtr1
       delete [] gsArray ;  // delete object array referenced by gsArray
       return gsPtr2 ;      // return pointer to object referenced by gsPtr2
                            // (caller is responsible for deleting object)
    }           // 'gs_local' goes out of scope and is destroyed here
    



Assignment Operators

For those new to C++, an assignment operator assigns (initializes) the object to the left of the ’=’ using the data on the right of the ’=’. You may also hear the term ’overloaded operator’. This just means that the ’=’ assignment operator may be defined in more than one way, so it will perform different tasks according to the context or circumstance.

  • void operator = ( const char* usrc ) ;
      Input  :
         usrc  : pointer to an array of UTF-8-encoded characters
      Returns:
         nothing
    

    Assignment operator: converts UTF-8-encoded source to gString.

  • void operator = ( const wchar_t* wsrc ) ;
      Input  :
         wsrc  : pointer to an array of wchar_t 'wide' characters
      Returns:
         nothing
    

    Assignment operator: converts wchar_t (’wide’) source to gString.

  • void operator = ( const gString& gssrc ) ;
      Input  :
         gssrc : gString object to be copied (by reference)
      Returns:
         nothing
    

    Assignment operator. Copies one gString object to another.

  • void operator = ( const gsForm& gsf ) ;
      Input  :
         gsf   : an initialized gsForm object (by reference)
      Returns:
         nothing
    

    DEPRECATED: May be removed in a future release.
    Assignment operator: Converts gsForm-class instructions to gString.

Examples

char utf8Data[] = { "Youth is wasted on the young." } ;
gString gs1, gs2 ;

gs1 = utf8Data ;
gs2 = gs1 ;
wcout << gs2 << endl ;
 - - -> Youth is wasted on the young.



Formatted Assignments

  • const wchar_t* compose ( const wchar_t* fmt, ... )
                             __attribute__ ((format (gnu_wprintf, 2, 0))) ;

  • const wchar_t* compose ( const char* fmt, ... )
                             __attribute__ ((format (gnu_printf, 2, 0))) ;
      Input  :
         fmt  : a format specification string in the style of sprintf(),
                swprintf() and related formatting C/C++ functions.
         ...  : optional arguments (between ZERO and gsfMAXARGS)
                Each optional argument is a POINTER (address of) the value
                to be formatted.
                - Important Note: There must be AT LEAST as many optional
                  arguments as the number of format specifiers defined in
                  the formatting string. Excess arguments will be ignored;
                  HOWEVER, too few arguments will result in an application
                  crash. You have been warned.
      Returns:
         const wchar_t* to formatted data
    

    Create formatted text data from a format specification string including between ZERO and gsfMAXARGS format specifications and their corresponding argument pointers.

    Supported data types:
     %d, %i  integer (decimal)
     %o      integer (octal)
     %u      integer (unsigned)
     %x, %X  integer (hex lower or upper case)
     %f      floating point (fixed point)
     %e, %E  floating point (scientific notation, lower/uppercase)
     %g, %G  floating point (normal/exponential, lower/uppercase)
     %a, %A  floating point (hex fraction)
     %c      character
     %C      character (alias for %lc)
     %s      string
     %S      string (alias for %ls)
     %p      pointer
     %b, %B  (extension to swprintf - see description below)
     %m      capture 'errno' description string (see /usr/include/errno.h)
     %n      number of characters printed so far
             (value written to corresponding argument's location)
     %%      literal '%'
    
    See man pages for the C/C++ function 'swprintf' or
    'Table of Output Conversions' for additional details.
    

Examples

char      Greeting[] = { "Hello!" } ;
int       iValue = 27064 ;
long long int qValue = 7842561 ;
long int  lValue = 485772 ;
short int sValue1 = 28875, sValue2 = -261, sValue3 = 529 ;
bool      flagValue = true ;
float     fltValue = 278.5610234 ;
double    dblValue = 9982.5610234 ;
gString gs ;
gs.compose( "%s - %d %12hd, %-hi, %#hx %08lXh %lld %hhd",
            Greeting, &iValue, &sValue1, &sValue2, &sValue3,
            &lValue, &qValue, &flagValue ) ;
wcout << gs << endl ;
 - - -> Hello! - 27064        28875, -261, 0x211 0007698Ch 7842561 1
gs.compose( "floating downstream:%10.2f and doubling our pace:%.4lf",
            &fltValue, &dblValue ) ;
wcout << gs << endl ;
 - - -> floating downstream:    278.56 and doubling our pace:9982.5610

See also formatted instantiation: gString Instantiation.


Important Note on Formatting

Because THE PARAMETERS ARE POINTERS TO THEIR DATA, similar to the C/C++ library function ’sscanf’ and friends, the compiler cannot perform automatic promotions from short int* to int* or from float* to double*, and so-on as it would for swprintf.

This implementation was selected because a) it eliminates data-width conflicts when moving among hardware platforms, and b) it reduces code size while increasing performance.

This implementation relies on you, the designer, to use care that the data type you specify in the formatting string matches the data type of the variable referenced by its parameter pointer AND that you use the ’address-of’ (’&’) operator to reference non-pointer variables. Note also that ’literal’ values may not be used as parameters because literals have no address.

The following constructs will produce errors:

gString gs ;
char   grade = 'A' ;
short  age   = 21 ;
int    sat   = 1550 ;
double gpa   = 3.75 ;

   // These examples fail to use the 'address-of' operator for the 
   // referenced variables, and will cause a 'segmentation fault' 
   // i.e. an application crash.
   gs.compose( "My grade is an %c", grade ) ;
   gs.compose( "I got a %d on my SAT.", sat ) ;
   // The above should be:
   gs.compose( "My grade is an %c", &grade ) ;
   gs.compose( "I got a %d on my SAT.", &sat ) ;

   // These examples use mismatched format-specification/variable 
   // reference. This will result in either bad data out OR will 
   // cause a memory-access violation.
   gs.compose( "I can't wait to be %d.", &age ) ;
   gs.compose( "My GPA is %1.3f", &gpa ) ;
   gs.compose( "The hex value of %c is: %#x", &grade, &grade ) ;
   gs.compose( "My GPA is %1.3lf", 3.88 ) ; // (literal value)
   // The above should be:
   gs.compose( "I can't wait to be %hd.", &age ) ;
   gs.compose( "My GPA is %1.3lf", &gpa ) ;
   gs.compose( "The hex value of %c is: %#hhx", &grade, &grade ) ;

Parameter Type Checking:
Unfortunately, type-checking of wchar_t formatting strings is not yet supported by the gnu (v:4.8.0) compiler, (but see wchar.h which is preparing for the future). Thus, use care when constructing your ’wchar_t fmt’ formatting string. The ’char fmt’ string IS type-checked.

IMPORTANT NOTE:
Depending on your compiler version, you may get a warning when using the '%b' binary format specification (described below):
"warning: unknown conversion type character ‘b’ in format [-Wformat=]"
This is because the preprocessor does not recognize our custom format specifier. If this happens, use a ’wchar_t’ (wide) formatting template to avoid the preprocessor type checking.

Instead of:
      gs.compose( "bitmask: %b", &wk.mevent.eventType );
Use this (not type checked by the preprocessor):
      gs.compose( L"bitmask: %b", &wk.mevent.eventType );

Formatted binary output (extension to swprintf)

We implement an extension to the swprintf output-conversion-specifiers for binary formatted output. We have found this formatting option useful when working with bit masks, for verifying bit-shifting operations during encryption/decryption and other uses.

  • Base formatting specifier:
    %b , %B Note that the lower-case / upper-case variants have identical function, and indicate only the case of the identifier character. See format modifiers for data size.
  • Format modifiers for data size are the same as for swprintf:
    hh , h , l , ll , L , q Examples: %hhb %hB %llB %qb
  • Format modifier for prepending of a data-type indicator.
    '#' (hash character) This is the same principle as for prepending a '0x' indicator to hex output, and will place either a 'b' or 'B' character at the beginning of the output. Examples: %#hhb -> b0101.1010 %#hhB -> B0101.1010
  • Format modifier for appending of a data-type indicator.
    '-#' (minus sign and hash character) Rather than prepending the indicator, the indicator will be append to the end of the output. Examples: %-#hhb -> 0101.1010b %-#hhB -> 0101.1010B
  • Format modifier for specifying the group-seperator character.
    By default, the bit groups are seperated by a '.' (fullstop) character. To specify an alternate seperator character: % hB -> 0111 0101 1010 0001 (' ' (space) as seperator) %_hB -> 0111_0101_1010_0001 ('_' (underscore) as seperator) %#/hB -> B0111/0101/1010/0001 ('/' (slash) as seperator) %-#-hB -> 0111-0101-1010-0001B ('-' (dash) as seperator) Valid seperator characters are any printable ASCII character that IS NOT alphabetical, IS NOT a number, and IS NOT a '.'(fullstop)
  • Format modifier for specifying bit grouping.
    By default, bits are formatted in groups of four (4 nybble); however, if desired, bits can be formatted in groups of eight (8 byte): %.8hB -> 01110101.10100001 %-.8hB -> 01110101-10100001 %# .8hB -> B01110101 10100001 %-#`.8hb -> 01110101`10100001b

Field-width specification (swprintf bug fix)

The standard library ’swprintf’ function has a design flaw for format specifications that include a field-width specifier.

’swprintf’ pads the string to the specified number of CHARACTERS, not the number of COLUMNS as it should do. For ASCII numeric source values this is not a problem because one character equals one display column. For string source data, however, if the source string contains characters that require more than one display column each, then the output may be too wide.

Therefore, for string-source-formatting specifications ONLY:
          (examples: "%12s"  "%-6s"  "%16ls"  "%5S"  "%-24S")
we compensate for this ethnocentric behavior by interpreting the field-width specifier as number-of-columns, NOT number-of-characters. For non-ASCII string data, this will result in output that appears different (and better) than output created directly by the ’swprintf’ function.


Unsupported format specifications

Conversion modifiers that are not fully supported at this time:
                       ’j’, ’z’, ’t’, ’%[’
Also, the ’*’ field-width specification or precision specification which uses the following argument as the width/precision value IS NOT supported.




Integer Formatting

  • bool formatInt ( short iVal, short fWidth,
                     bool lJust = false, bool sign = false,
                     bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( unsigned short iVal, short fWidth,
                     bool lJust = false, bool sign = false,
                     bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( int iVal, short fWidth,
                     bool lJust = false, bool sign = false,
                     bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( unsigned int iVal, short fWidth,
                     bool lJust = false, bool sign = false,
                     bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( long iVal, short fWidth,
                     bool lJust = false, bool sign = false,
                     bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( unsigned long iVal, short fWidth,
                     bool lJust = false, bool sign = false,
                     bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( long long iVal, short fWidth,
                     bool lJust = false, bool sign = false,
                     bool kibi = false, fiUnits units = fiK ) ;

  • bool formatInt ( unsigned long long iVal, short fWidth,
                     bool lJust = false, bool sign = false,
                     bool kibi = false, fiUnits units = fiK ) ;

  Input  :
     iVal  : value to be converted
             Supported value range: plus/minus 9.999999999999 terabytes
     fWidth: field width (number of display columns)
             range: 1 to FI_MAX_FIELDWIDTH
     lJust : (optional, false by default)
             if true, strip leading spaces to left-justify the value
             in the field. (resulting string may be less than fWidth)
     sign  : (optional, false by default)
             'false' : only negative values are signed
             'true'  : always prepend a '+' or '-' sign.
     kibi  : (optional, false by default)
             'false' : calculate as a decimal value (powers of 10)
                       kilobyte, megabyte, gigabyte, terabyte
             'true'  : calculate as a binary value (powers of 2)
                       kibibyte, mebibyte, gibibyte, tebibyte
     units  : (optional) member of enum fiUnits (fiK by default)
              specifies the format for the units suffix.
              Note that if the uncompressed value fits in the field,
              then this parameter is ignored.

  Returns:
     'true' if successful
     'false' if field overflow (field will be filled with '#' chars)
             See notes below on field overflow.

Convert an integer value into a formatted display string of the specified width. Value is right-justified in the field, with leading spaces added if necessary (but see ’lJust’ parameter).

Maximum field width is FI_MAX_FIELDWIDTH. This is wide enough to display a 18-digit, signed and comma-formatted value: '+9,876,543,210,777'

Actual formatting of the value depends on the combination of: a) magnitude of the value b) whether it is a signed value c) the specified field-width d) the specified suffix format e) locale-specific grouping of digits according the LC_NUMERIC locale environment variable Important Note: the 'C' (default) locale defines an empty string as the grouping separator character. Therefore, the locale should be explicitly set before calling this method. (This is done automatically when the NcDialog API is initialized.) f) See notes below on the possible reasons for field overflow: see field overflow The following examples are based on the U.S. English locale: ‘en_US.utf8’.

Examples

1) Simple comma formatted output if specified field-width is sufficient.
   345    654,345    782,654,345    4,294,967,295

2) Output with values compressed to fit a specified field width.
   12.3K    999K    12.345M    1.234G    4.3G

   
gString gs ;      // gString object

3) Convert a signed integer value:
   int iValue = 28954 ;

   // field width == 8, right justified (note: compression unnecessary)
   gs.formatInt( iValue, 8 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :  28,954:

   // field width == 8, left justified (note: compression unnecessary)
   gs.formatInt( iValue, 8, true ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :28,954:

   // field width == 6
   gs.formatInt( iValue, 6 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :28.95K:

   // field width == 6 with forced sign
   gs.formatInt( iValue, 6, false, true ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :+29.0K:

   // field width == 5
   gs.formatInt( iValue, 5 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :29.0K:

   iValue = -28954 ;    // convert negative source value

   // field width == 8, right justified (note: compression unnecessary)
   gs.formatInt( iValue, 8 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  : -28,954:

   // field width == 8, left justified (note: compression unnecessary)
   gs.formatInt( iValue, 8, true ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :-28,954:

   // field width == 6
   gs.formatInt( iValue, 6 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :-29.0K:

   // field width == 5
   gs.formatInt( iValue, 5 ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  : -29K:

4) Convert an unsigned long long integer value (field width == 11):
   unsigned long long int qValue = 39000009995 ;

   // decimal compression (gigabytes) with "official" IEC suffix
   gs.formatInt( qValue, 11, false, false, false, fikB ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :39.000010gB:

   // binary compression (gibibytes) with "official" IEC suffix
   gs.formatInt( qValue, 11, false, false, true, fiKiB ) ;
   wcout << ':' << gs << ':' << endl ;
    - - >  :38.08595GiB:

Please see (NcDialog test application, ’Dialogw’ for more examples.)

Notes on the formatInt group

Optional specification of the units suffix for the ’formatInt’ methods. Units are specified using the optional 'units' parameter, which is a member of the 'fiUnits' enumerated type.. enum fiUnits : short { fiK, // 'K' 'M' 'G' 'T' (default) fik, // 'k' 'm' 'g' 't' fiKb, // 'Kb' 'Mb' 'Gb' 'Tb' fikB, // 'kB' 'mB' 'gB' 'tB' ("official" metric 'kilo' designation) fiKiB, // 'KiB' 'MiB' 'GiB' 'TiB' ("official" binary 'kibi' designation) } ; The ’formatInt’ methods use decimal (powers of 10) compression calculations by default. To use binary (powers of 2) compression, use the optional 'kibi' parameter. DECIMAL BINARY kilobytes (x/1000) kibibytes (x/1024) megabytes (x/1000000) mibibytes (x/1024000) gigabytes (x/1000000000) gibibytes (x/1024000000) terabytes (x/1000000000000) tebibytes (x/1024000000000)

The kilo/kibi controversy

The IEC (International System of Quantities) recommends lower-case for metric (powers of 10) and upper-case for binary (powers of 2). However, unless you must be accurate in presenting the data according to IEC standard, it is recommended that you choose the format according to: your space requirements, visual appeal, and clear communication with your users.
If you blindly follow style standards against your own better judgement, then be forever labelled as a weenie.

formatInt field overflow

As described above, the actual formatting of a fixed-width integer field depends on a number of factors. Every effort is made to compress the data to fit within the field while retaining an accurate representation of the numeric value.

There are cases, however, where it is not possible to represent the data within the specified field width. When this occurs, the entire field will be filled with HASH characters '#'.

The specified field must be wide enough to accomodate either the entire, uncompressed value, or the combination of compressed value, units designator and sign (if any). The following situations may cause field overflow.

a) Values <= -10.0 Tbytes or >= 10.0 Tbytes cannot be represented by 'formatInt' methods. b) One-column fields can display values between 0 and 9. Values outside this range will cause overflow. c) Two-column fields can display values between -9 and 99. Values outside this range will cause overflow. d) Three-column fields can display compressed data only if the combined width of value, sign and units require no more than three(3) columns. e) Four-column fields can display compressed data only if the combined width of value, sign and units require no more than four(4) columns. f) Five-column fields can accurately display any value IF the units designator requires only one(1) column. g) Six-column fields can accurately display any value IF the units designator requires no more than two(2) columns.

Fields of seven(7) or more columns can display any formatted value without danger of overflow.



How To Set the Application Locale In C++

The NcDialogAPI library automatically sets the application locale according to the console window environment. Please see the NcDialog documentation, chapter: Multi-language Support for details.

In brief, the “locale” specifies the upper-/lower-case text rules, numeric formatting, language-specific punctuation, currency symbols and so on.

The application locale should be taken from the terminal environment if possible, This is done by creating an instance of the "std::locale" structure referencing the empty string ("").
   locale* locptr = new locale("");
The captured locale is then made “global”, that is it replaces the so-called "classic" (C/C++ language) locale with the specified locale definition.
   locptr->global( *locptr );
Please see the C++ documentation for "std::locale" for details.




Data Access

  • const wchar_t* gstr ( void ) const ;
      Input  :
         none
      Returns:
         const pointer to array of wchar_t characters
    

    Return a const pointer to the wchar_t (wide) character array.


  • const wchar_t* gstr ( short& charCount ) const ;
      Input  :
         charCount : (by reference, initial value ignored)
                     receives number of characters in array, 
                     including null terminator
      Returns:
         const pointer to array of wchar_t characters
    

    Return a const pointer to the wchar_t (wide) character array, along with the number of characters in the array (including the null terminator).


  • const char* ustr ( void ) const ;
      Input  :
         none
      Returns:
         const pointer to array of UTF-8 characters
    

    Return a const pointer to the char (UTF-8) character array.


  • const char* ustr ( short& charCount, short& byteCount ) const ;
      Input  :
         charCount : (by reference, initial value ignored)
                     receives number of characters in array, 
                     including null terminator
         byteCount : (by reference, initial value ignored)
                     receives number of bytes in array, 
                     including null terminator
      Returns:
         const pointer to array of UTF-8 characters
    

    Return a const pointer to the char (UTF-8) character array, along with the number of characters and the number of bytes in the array (including the null terminator).


Examples

short charCount, byteCount ;
gString gs( "Wherever you go, there you are!" ) ;

const wchar_t* wPtr = gs.gstr() ;
const wchar_t* wPtr = gs.gstr( charCount ) ;
const char* utf8Ptr = gs.ustr() ;
const char* utf8Ptr = gs.ustr( charCount, byteCount ) ;



Copying Data

  • short copy ( char* uTarget, short maxBytes,
    short maxCols=(gsMAXCHARS*2) ) const ;
      Input  :
         uTarget  : pointer to target array to receive UTF-8-encoded text
         maxBytes : maximum number of bytes to copy (incl. NULL terminator)
         maxCols  : (optional, default == gsMAXCHARS*2)
                    maximum number of display-columns to copy
      Returns:
         number of bytes copied (incl. NULL terminator)
    

    Copy gString text to specified target buffer.


  • short copy ( wchar_t* wTarget, short maxChars,
    short maxCols=(gsMAXCHARS*2) ) const ;
      Input  :
         wTarget  : pointer to target array to receive wchar_t 'wide' text
         maxChars : maximum number of characters to copy (incl. NULL)
         maxCols  : (optional, default == gsMAXCHARS*2)
                    maximum number of display-columns to copy
      Returns:
         number of characters copied (incl. NULL terminator)
    

    Copy gString text to specified target buffer.


  • std::wostream& operator<< ( std:wostream& os, const gString& gs2 );
      Input  :
         IMPLIED reference to the output stream
         IMPLIED reference to the gString object
      Returns: reference to the specified output stream
    

    !! NON-MEMBER METHOD !!
    Insertion operator: Copies the contents of the gString object into the ’wcout’ (wide) standard output stream.

    Note that due to the way the output stream is defined, you cannot mix ’cout’ (narrow) and ’wcout’ (wide) output streams indiscriminately. −− If ’wcout’ is called first, then ’cout’ is disabled. −− If ’cout’ is called first, then both narrow and wide channels are active. −− ’wcout’ handles both narrow and wide data, however, ’cout’ handles ONLY narrow data. This is not related to gString, but is a characteristic of the default C++ output stream itself. We recommend that you always use the ’wcout’ stream in console applications for both narrow and wide text data.

  • std::ostream& operator<< ( std:ostream& os, const gString& gs2 );
      Input  :
         IMPLIED reference to the output stream
         IMPLIED reference to the gString object
      Returns: reference to the specified output stream
    

    !! NON-MEMBER METHOD !!
    Insertion operator: Copies the contents of the gString object into the ’cout’ (narrow) standard output stream.

    IMPORTANT NOTE: Access to the narrow output stream is provided for convenience only. It is recommended that the wide stream version (above), if available on your system, be used exclusively.

  • short substr ( char* uTarget, short offset, short charCnt ) const ;
  • short substr ( wchar_t* wTarget, short offset, short charCnt ) const ;
  • short substr ( gString& wTarget, short offset, short charCnt ) const ;
      Input  :
         targ    : (by reference, initial contents ignored)
                   receives null-terminated contents of specified
                   character range
                   -- If target buffer is a char*, then data returned is 
                      a UTF-8 text string.
                   -- If target buffer is a wchar_t*, then data returned is 
                      a wchar_t (wide) text string.
                   IMPORTANT NOTE: It is the caller's responsibility to 
                   specify a target buffer large enough to hold the data.
                   Recommended: wchar_t wbuff[gsMAXCHARS]; or 
                                char ubuff[gsMAXBYTES];
                   -- If target buffer is a gString object, then both 
                      UTF-8 and wchar_t data are returned 
                      (with no chance of target buffer overflow).
    
         offset  : character index at which substring begins
                   (this IS NOT a byte index)
         charCnt : number of characters to copy (not incl. NULL terminator)
      Returns:
         if target is a wchar_t* or gString object, then returns number of
           characters written to target (not including the NULL terminator)
         if target is a char*, then returns number of bytes written to
           target (not including the NULL terminator)
    
         Note: returns ZERO if either 'offset' or 'charCnt' out of range
         Note: If 'charCnt' extends beyond the end of the source data, 
               then returns the available data.
    

    Copy the specified character range to target buffer.
    These methods copy the indicated substring (null terminated) to the target buffer, leaving the original data unchanged.

    If you have a fixed-format field, then the offset and character count will be known in advance. Otherwise you can use the ’find()’ method to locate the substring to be copied.

    Please Note: The number of bytes can NEVER be assumed to be the same as the number of characters.
    Please refer to the ’Multi-language Support’ chapter of the ’ncdialogapi’ documentation.


Examples

gString gs( "That's not flying, that's falling -- with style!\n"
            "Buzz Lightyear" ) ;
char utf8Data[gsMAXBYTES] ;
wchar_t wideData[gsMAXCHARS] ;

gs.copy( utf8Data, gs.utfbytes() ) ;
gs.copy( wideData, gs.gschars() ) ;

gString gstream( "You're a child's TOY! -- Woody" ) ;
wcout << gstream << endl ;

// get a copy of the first word starting with 'c'
gString AusAnimal( "Aardvark Kangaroo Cockatoo Dingo Wombat " ) ;
gString gsc ;
short b = AusAnimal.find( " c" ) ;
if ( b >= 0 )
{
   short e = AusAnimal.find( L' ', b + 1 ) ;
   if ( e > b )
   {
      AusAnimal.substr( gsc, (b + 1), (e - b - 1) ) ;
      wcout << gsc << endl ;
   }
}
 - - -> Cockatoo



Modifying Existing Data

  • short append ( const wchar_t* wPtr ) ;
  • short append ( const char* uPtr ) ;
  • short append ( const wchar_t wChar ) ;
      Input  :
         wPtr  : pointer to array of wchar_t 'wide' text to be appended
                 OR
         uPtr  : pointer to array of char UTF-8 text to be appended
                 OR
         wChar : a single, 'wide' character
      Returns:
         number of characters in resulting string (incl. NULL terminator)
         Note: if value returned equals gsMAXCHARS, then 
               some data MAY HAVE BEEN discarded.
    

    Append text to existing gString text data up to a combined length of gsMAXCHARS. Characters in excess of the maximum will not be appended.

    Example

    gString gs( L"Be kind to your manager." ) ;
    gs.limitChars( gs.gschars() - 2 ) ;
    gs.append( L", and other lower forms of life." ) ;
    wcout << gs << endl ;
     - - -> Be kind to your manager, and other lower forms of life.
    


  • short append ( const wchar_t* fmt, const void* arg1, ... )
                   __attribute__ ((format (gnu_wprintf, 2, 0)));

  • short append ( const char* fmt, const void* arg1, ... )
                   __attribute__ ((format (gnu_printf, 2, 0)));
      Input  :
         fmt  : a format specification string in the style of sprintf(),
                swprintf() and related formatting C/C++ functions.
         arg1 : pointer to first value to be converted by 'fmt'
         ...  : optional arguments (between ZERO and gsfMAXARGS - 1)
                Each optional argument is a POINTER (address of) the value
                to be formatted.
    
      Returns:
         number of characters in resulting string (incl. NULL terminator)
         Note: if return equals gsMAXCHARS, then
               some data MAY HAVE BEEN discarded.
    

    Append formatted text data to existing gString text data up to a combined length of gsMAXCHARS. Characters in excess of the maxmum will not be appended.

    Please refer to the ’compose’ method (see Formatted Assignments) for more information on converting data using a format specification string.

    Example

    short gaddress = 2840 ;
    wchar_t gdirection = L'E' ;
    const wchar_t* gstreet = L"Colorado Blvd." ;
    double gcost = 29.95 ;
    gString gs( "Gorilla Men's Clothing" ) ; // existing text
    
    gs.append( ", %hd %C %S\n  Dress shirts on sale, $%.2lf.", 
               &gaddress, &gdirection, gstreet, &gcost ) ;
    
    wcout << gs << endl ;
     - - -> Gorilla Men's Clothing, 2840 E Colorado Blvd.
              Dress shirts on sale, $29.95.
    


  • short insert ( const wchar_t* wPtr, short offset = 0 ) ;
  • short insert ( const char* uPtr, short offset = 0 ) ;
  • short insert ( wchar_t wChar, short offset = 0 ) ;
      Input  : 
         wPtr  : pointer to array of wchar_t 'wide' text to be inserted
                 OR
         uPtr  : pointer to array of char UTF-8 text to be inserted
                 OR
         wChar : a single wchar_t 'wide' character
         offset: (optional, ZERO by default)
                 character offset at which to insert specified text into
                 existing text.
                 Note: if specified 'offset' > number of characters in
                       existing text, then acts like 'append' method.
      Returns:
         number of characters in resulting string (incl. NULL terminator)
         Note: if value returned equals gsMAXCHARS, then 
               some data MAY HAVE BEEN discarded.
    

    Insert text into existing gString text data up to a combined length of gsMAXCHARS. Characters in excess of the maximum will be truncated.

    Example

    gString gs( L"Remember to hurt people!" ) ;
    gs.insert( L"NOT ", 9 ) ;
    wcout << gs << endl ;
     - - -> Remember NOT to hurt people!
    


  • short limitChars ( short charCount ) ;
      Input  :
         charCount : maximum number of characters allowed in formatted data 
                     (not including NULL) Range: 1 to gsMAXCHARS-1
      Returns:
         number of characters in the adjusted data (including NULL)
    

    Truncate the data to no more than charCount display characters.
    Insert a null terminator after the specified number of characters.

    Example

    gString gs( "This shirt is available in yellow or red." ) ;
    gs.limitChars( 34 ) ;
    gs.append( "only." ) ;
    wcout << gs << endl ;
     - - -> This shirt is available in yellow only.
    


  • short limitCols ( short colCount ) ;
      Input  :
         colCount : maximum number of display columns allowed in formatted data
                    Range: 1 to (gsMAXCHARS * 2)
      Returns:
         number of columns needed to display the adjusted data
         Note: If specified column count occurs in mid-character, then the 
               partial character will be removed from the string.
    

    Truncate the data to no more than colCount display columns. Insert a null terminator after the number of characters required to fill the specified number of display columns.

    Example

    gString gs( "The manual is located at:\n"
                "http://cdn.funcom.com/aoc/pdf/aoc_manual.pdf" ) ;
    gs.limitCols( 55 ) ;
    wcout << gs << endl ;
     - - -> The manual is located at:
            http://cdn.funcom.com/aoc/pdf/
    
    Note that there are 25 display columns for the first line, (the newline 
    character requires no column), and 30 columns remain on the second line.
    


  • short shiftChars ( short shiftCount, wchar_t padChar = L’ ’ ) ;
      Input  :
         shiftCount: < ZERO: shift data to the left, discarding the
                             specified number of characters from the
                             beginning of the array
                     > ZERO: shift data to the right, padding the vacated
                             positions on the left with 'padChar'
                     ==ZERO: do nothing
         padChar   : (optional, SPACE character, 0x20 by default)
                     when shifting data to the right, use this character
                     to fill the vacated character positions
                     NOTE: Specify a one-column character ONLY as the
                     padding character. (multi-column characters ignored)
      Returns:
         number of characters in adjusted array
    

    Shift text data by the specified number of characters.

    Note for writers of RTL (right-to-left) languages:
     In the above descriptions, the terms 'left' and 'right' are used for 
     convenience, but actually 'left' refers to the head of the data 
     and 'right' refers to the tail.
    

    Example

    gString gs( "Your balance is: 2,521,697.56 USD" ) ;
    gs.shiftChars( -17 ) ;
    wcout << gs << endl ;
     - - -> 2,521,697.56 USD
    gs.shiftChars( 5, L'#' ) ;
    wcout << gs << endl ;
     - - -> #####2,521,697.56 USD
    
    Note: For this example, the optional fill character used for 
    right-shift is L'#'. The default fill character is space (L' ').
    


  • short shiftCols ( short shiftCount, wchar_t padChar = L’ ’ ) ;
      Input  :
         shiftCount: < ZERO: shift data to the left, discarding the
                             number of characters equivalent to the
                             specified number of display columns
                             NOTE: May discard one extra column if count
                             falls within a multi-column character.
                     > ZERO: shift data to the right, padding the vacated
                             positions on the left with 'padChar'
                     ==ZERO: do nothing
         padChar   : (optional, SPACE character, U+0020 by default)
                     when shifting data to the right, use this character
                     to fill the vacated column positions
                     NOTE: Specify a one-column character ONLY as the
                     padding character. (multi-column characters ignored)
      Returns:
         number of display columns in adjusted array
    

    Shift text data by the specified number of display columns.

    Note for writers of RTL (right-to-left) languages:
     In the above descriptions, the terms 'left' and 'right' are used for 
     convenience, but actually 'left' refers to the head of the data 
     and 'right' refers to the tail.
    

    Example

    gString gs( "您的帐户余额是500元" ) ; // "Your account balance is 500 yuan"
    gs.shiftCols( -14 ) ;
    wcout << gs << endl ;
     - - -> 500元
    gs.shiftCols( 5, L'.' ) ;
    wcout << gs << endl ;
     - - -> .....500元
    
    Note: Most Chinese characters are two display columns wide, 
    therefore we shift 14 columns (7 characters) out on the left.
    For this example, the optional fill character used for 
    right-shift is L'.' (U+002E, ASCII full stop). 
    The Chinese full-stop (U+3002) MAY NOT be used as a fill 
    character because it is a two-column character.
    


  • short padCols ( short fieldWidth, wchar_t padChar = L’ ’,
                    bool centered = false );
      Input  :
         fieldWidth: number of columns for the adjusted data array
         padChar   : (optional, ASCII space 0x20 by default)
                     character with which to pad the data
                     Note: Specify a one-column character ONLY.
                           Multi-column characters will be ignored.
         centered  : (optional, 'false' by default)
                     if 'false', all padding will be appended at the end 
                                 of the existing data
                     if 'true',  padding will be equally divided between 
                                 the beginning and end of the existing data
    
      Returns:
         number of display columns in the adjusted array
    
    

    Append padding to the existing string data to achieve the specified number of display columns.

    If the 'centered' flag is set, the padding will be divided equally between the beginning and end of the data to achieve the specified field width. Note that if an odd number of columns is to be added, then the odd column will be placed at the end of the data.

    The ‘padCols’ method calculates the number of padding characters needed to fill out the specified field width.

    The default padding character is the ASCII space character (20 hex).
    Any single-column character may be specified as an alternate padding character.

    Padding will be added until either the specified number of columns is reached, OR until the array contains the maximum number of characters (gsMAXCHARS).

    If the fieldWidth specified is <= the current width of the data, then the data will not be modified.

    Example

    gString gs( "This is a test." );
    
    gs.padCols( 30 );              ==> "This is a test.               "
    gs.padCols( 30, L'#' );        ==> "This is a test.###############"
    gs.padCols( 30, L'#', true );  ==> "#######This is a test.########"
    
    // To create a right-justified field, use 'shiftCols' (see above):
    gs.shiftCols( (30 - gs.gscols()) );
                                   ==> "               This is a test."
    gs.shiftCols( (30 - gs.gscols()), L'#' );
                                   ==> "###############This is a test."
    


  • short strip ( bool leading = true, bool trailing = true );
      Input  :
         leading  : (optional, 'true' by default)
                    if 'true', strip leading whitespace
                    if 'false', leading whitespace unchanged
         trailing : (optional, 'true' by default)
                    if 'true', strip trailing whitespace
                    if 'false', trailing whitespace unchanged
    
      Returns:
         number of characters in modified string (incl. NULL terminator)
    

    Strip (remove) leading and/or trailing whitespace from the string data.

    Whitespace characters are defined as:
    0x0020 single-column ASCII space
    0x3000 two-column CJK space
    0x0A linefeed character
    0x0D carriage-return character
    0x09 horizontal-tab character
    0x0B vertical-tab character
    0x0C formfeed character

    “Leading” whitespace is space characters from the beginning of the data to the first non-space character. “Trailing” whitespace is from after the last non-space character through the end of the data.

    By default, both leading and trailing whitespace will be removed; however, this action may be modified by resetting the appropriate parameter to ‘false’.



  • short erase ( const gString& src, short offset = 0, bool casesen = false, bool all = false );
  • short erase ( const wchar_t* src, short offset = 0, bool casesen = false, bool all = false );
  • short erase ( const char* src, short offset = 0, bool casesen = false, bool all = false );
  • short erase ( const wchar_t& src, short offset = 0, bool casesen = false, bool all = false );
      Input  :
         src    : source data to be matched, one of the following:
                  -- pointer to a UTF-8 string (max length==gsMAXBYTES)
                  -- pointer to a wchar_t string (max length==gsMAXCHARS)
                  -- a gString object containing the source (by reference)
                  -- a single, wchar_t character
         offset : (optional, ZERO by default)
                  character index at which to begin search
                  NOTE: This is a character index, NOT a byte offset.
         casesen: (optional, 'false' by default)
                  if 'false', then scan IS NOT case sensitive
                  if 'true, then scan IS case sensitive
                  The way upper/lowercase are related is locale dependent;
         all    : (optional, 'false' by default)
                  if 'false', then only the first instance of the substring
                              will be deleted
                  if 'true',  then all instances of the specified substring
                              from 'offset' forward will be deleted
    
      Returns:
         index of first character following the deleted sequence
         Note: This is the wchar_t character index, NOT a byte index
         Returns (-1) if:
           a) no matching substring found
           b) 'offset' out-of-range
           c) 'src' is an empty string or a NULL character
    
    

    Scan the data for a matching substring, and if found, erase (delete) the first occurance of the substring, or optionally all instances of the substring from 'offset' onward.

    Actual comparison is always performed against the ’wide’ character array using wcsncasecmp (or wcsncmp). Null terminator character IS NOT included in the comparison. These comparisons ARE locale dependent.



  • short erase ( short offset = 0, short length = gsMAXCHARS ) ;
      Input  :
         offset : (optional, ZERO by default)
                  index of first character of sequence to be erased
                  NOTE: This is a character index, NOT a byte offset.
         length : (optional, gsMAXCHARS by default)
                  if not specified, then erase all characters from
                     'offset' to end of data
                  if specified, then erase the specified number of
                     characters beginning at 'offset'
    
      Returns:
         index of first character following the deleted sequence
         Note: This is the wchar_t character index, NOT a byte index
         Returns (-1) if:
           a) offset < ZERO
           b) offset >= number of characters in data
           c) length <= ZERO
         (data will not be modified)
    
    

    Erase (delete) the data sequence specified by ’offset’ and ’length’.

    ’offset’ is the index of the first character to be deleted, and ’length’ specifies the number of characters to delete.

    Example

    • If the ’length’ parameter specifies deletion of more characters than remain, then ’erase’ has the same effect as calling the ’limitChars’ method (i.e. truncates the string at ’offset’).
    • If the defaults for both ’offset’ and length’ are used, then the ’erase’ method has the same effect as calling the ’clear’ method (gString data are reset to an empty string).
    • Note that the NULL terminator will never be deleted.
    • Programmer’s Note:
      Stubs with 'int' arguments are also defined, but are included only to silence a compiler warning that the prototype erase(short, short); is ’ambiguous’. Actually, it is not ambiguous at all, but we believe that the use of pragmas is bad design, so it’s easier to include the stubs than to argue with the compiler. Internally, gString uses ’short int’ exclusively.

    Examples for 'erase'

    For the gString object containing the following verbal 
    exchange, erase all occurances of the word 'crapweasel'.
    
    gString gs( "Ross : Are you familiar with the word crapweasel?\n"
                "Paolo: No, I don't know crapweasel.\n"
                "Ross : You're a huge crapweasel!" );
    short index = ZERO ;
    while ( (index = gs.erase( " crapweasel", index )) >= ZERO ) ;
    
    Yields:
      "Ross : Are you familiar with the word?\n"
      "Paolo: No, I don't know.\n"
      "Ross : You're a huge!"
    
    Find the substring, "the the" and erase the extra "the".
    gString gs1( L"There are seven candidates in the the Primary Election." );
    gString gs2( L"the " );
    short index = gs1.find( "the the" );
    gs1.erase( index, (gs2.gschars() - 1) );
    
    Yields:
      "There are seven candidates in the Primary Election."
    


  • short replace ( const wchar_t* src, const wchar_t* newtxt,
                    short offset = 0,
                    bool casesen = false, bool all = false );

  • short replace ( const wchar_t* src, const char* newtxt,
                    short offset = 0,
                    bool casesen = false, bool all = false );

  • short replace ( const wchar_t* src, const wchar_t newtxt,
                    short offset = 0,
                    bool casesen = false, bool all = false );

  • short replace ( const char* src, const wchar_t* newtxt,
                    short offset = 0,
                    bool casesen = false, bool all = false );

  • short replace ( const char* src, const char* newtxt,
                    short offset = 0,
                    bool casesen = false, bool all = false );

  • short replace ( const char* src, const wchar_t newtxt,
                    short offset = 0,
                    bool casesen = false, bool all = false );

  • short replace ( const wchar_t src, const wchar_t* newtxt,
                    short offset = 0,
                    bool casesen = false, bool all = false );

  • short replace ( const wchar_t src, const char* newtxt,
                    short offset = 0,
                    bool casesen = false, bool all = false );

  • short replace ( const wchar_t src, const wchar_t newtxt,
                    short offset = 0,
                    bool casesen = false, bool all = false );
      Input  :
         src    : source data to be matched
                  -- pointer to a UTF-8 string
                  -- pointer to a wchar_t string
                  -- a single, wchar_t character
         newtxt : data to overwrite existing text
                  -- pointer to a UTF-8 string
                  -- pointer to a wchar_t string
                  -- a single, wchar_t character
         offset : (optional, ZERO by default)
                  character index at which to begin search
                  NOTE: This is a character index, NOT a byte offset.
         casesen: (optional, 'false' by default)
                  if 'false', then scan IS NOT case sensitive
                  if 'true, then scan IS case sensitive
                  The way upper/lowercase are related is locale dependent;
         all    : (optional, 'false' by default)
                  if 'false', then replace only the first occurance found
                  if 'true',  then replace all occurances of the specified
                              substring
    
      Returns:
         'true' if successful
         returns 'false' if error (existing data not modified):
           a) no matching source substring found
           b) 'src' is a empty string or a null character
           c) offset < ZERO or offset is beyond existing data
           d) modifying the data would cause buffer overflow (>gsMAXCHARS)
    

    Replace the specified source substring, or optionally all matching substrings with the provided substring.

    Actual comparison is always performed against the ’wide’ character array using wcsncasecmp (or wcsncmp). Null terminator character IS NOT included in the comparison. These comparisons ARE locale dependent.

    Examples for 'replace'

    Correct the spelling errors in the following data:
    gString gs( "The land near hare is full of Heres, hoping her and ther." ) ;
    
    bool okiday = gs.replace( L"hare", L"here" ) ;
    Yields:
      "The land near here is full of Heres, hoping her and ther."
    
    okiday = gs.replace( "Here", "Hare", ZERO, true ) ;
    Yields:
      "The land near here is full of Hares, hoping her and ther."
    
    okiday = gs.replace( L'p', L"pp" ) ;
    Yields:
      "The land near here is full of Hares, hopping her and ther."
    
    short index = gs.find( "her " ) ;
    okiday = gs.replace( "her", L"here", index, true, true ) ;
    Yields:
      "The land near here is full of Hares, hopping here and there."
    
    Then, replace all spaces ' ' with underscores '_'.
    okiday = gs.replace( L' ', L'_', ZERO, false, true ) ;
    Yields:
      "The_land_near_here_is_full_of_Hares,_hopping_here_and_there."
    


  • short loadChars ( const wchar_t* wsrc, short charLimit,
                      bool append = false );

  • short loadChars ( const char* usrc, short charLimit,
                      bool append = false );

    Load the specified number of characters from the source data. This is useful for extracting data from fixed-width fields when the contents of the field is unknown.
    gs.loadChars( StreetAddress1, 48 );

    By default, the new text REPLACES the existing text; however, the ’append’ parameter allows the new text to be appended to the existing text.

    The functionality is equivalent to the gString constructor which loads only the specified number of characters.
    See gString Instantiation.


      Input  :
         usrc     : pointer to a UTF-8 encoded string
         wsrc     : pointer to a wchar_t encoded string
         charLimit: maximum number of characters (not bytes) from source
                    array to load. Range: 1 through gsMAXCHARS.
                    The count should not include the NULL terminator
         append   : (optional, 'false' by default)
                    if 'false', replace existing text with specified text
                    if 'true', append new text to existing text
    
      Returns:
         number of characters in modified string (incl. NULL terminator)
    

    Examples for 'loadChars'

    Replace the existing text with the specified number of characters 
    from the source text.
    
    const char* Piggy =  // (56 characters)
               "Spider-pig, Spider-pig does whatever a Spider-pig does." ;
    gString gs( "Existing text." ) ;
    gs.loadChars( Piggy, 36 ) ;
    Yields:
      "Spider-pig, Spider-pig does whatever"
    
    Append specified number of characters from the source text to the 
    existing text.
    
    gs.loadChars( " it's told, because it's stupid.", 11, true ) ;
    Yields:
      "Spider-pig, Spider-pig does whatever it's told,"
    


  • short textReverse ( bool punct = false,
                        bool para = false, bool rjust = false );

    Reverse the order of characters in the text string.

    If RTL text data displayed by your application is not formatted as desired, then this method may be used to reverse the character order before writing to the display. This is useful for manipulating both RTL (Right-To-Left) language text and mixed RTL/LTR text.

    The ’para’ parameter is useful for outputting columns of numeric data or when the column labels are in an RTL language (see example below).

    Although modern web browsers (Firefox, Opera, Chromium, etc.) usually handle RTL text correctly, other applications often do not. This is especially true of terminal emulator software.

    See also a discussion of multiple-language support in the NcDialog API.

      Input  :
         punct : (optional, 'false' by default)
                 if 'false', invert entire string
                 if 'true' AND if a punctuation mark is seen at either end
                    of the string, invert everything except the punctuation
                    mark(s). typically one of the following:
                    '.' ',' '?' '!' ';' ':' but see note below.
         para  : (optional, 'false' by default)
                 if 'false', invert data as a single character stream
                 if 'true',  invert data separately for each logical line
                             (line data are separated by newlines ('\n')
         rjust : (optional, 'false' by default)
                 if 'false', do not insert right-justification padding
                 if 'true',  insert padding for each logical line to
                             right-justify the data
                             (used to right-justify LTR output)
                             Note that right-justification is performed
                             ONLY if the 'para' parameter is true.
                             Otherwise, 'rjust' will be ignored.
    
      Returns:
         number of wchar_t characters in gString object
         (if return value >= gsMAXCHARS, data may have been truncated)
    

    Examples for 'textReverse'

    Note that the const Hebrew strings are written canonically in the source, but are displayed incorrectly in the terminal window by the info reader and by the HTML browser (sorry about that).

    // Hebrew: "Are you ready for lunch?"
    // Sometimes the terminal will inappropriately move the punctuation 
    // to the opposite end of the string.
    // To prevent this, use the 'punct' option.
    const wchar_t* const Lunch = L"?םיירהצ תחוראל ןכומ התא םאה" ;
    
    gString gs( Lunch ) ;
    gs.textReverse() ;
    wcout << gs.gstr() << endl ;
    
    OUTPUT (correct except punctuation) : האם אתה מוכן לארוחת צהריים?
    
    gs = Lunch ;
    gs.textReverse( true ) ;
    wcout << gs.gstr() << endl ;
    
    OUTPUT (correct): ?האם אתה מוכן לארוחת צהריים
    
    // When the 'punct' flag is set, both leading and trailing punctuation 
    // are identified.
    // Questions in Spanish use a leading inverted question mark (U+00BF), 
    // and a trailing ASCII question mark (U+003F).
    // Reverse the internal text but do not reverse the terminal punctuation.
    gs = "¿ozreumla le arap atsil sátsE?" ;
    gs.textReverse( true ) ;
    wcout << gs.gstr() << endl ;
    
    OUTPUT : ¿Estás lista para el almuerzo?
    
     = = = = =
     
    // Reverse multi-line text (paragraph formatting).
    // Ordinary ASCII text is used for this example 
    // to demonstrate the reversal.
    // The example outputs to an NcDialog window referenced by NcDialog *dp
    const char* Vaccine = "Have you received your covid-19 vaccination yet?\n
                          "Protect yourself and others,\n"
                          "get your vaccination today!" ;
    gs = Vaccine ;
    // Write unmodified text as LTR data:
    dp->WriteParagraph ( 1, 1, gs, nc.grR, true, false ) ;
    
    OUTPUT: Have you received your covid-19 vaccination yet?
            Protect yourself and others,
            get your vaccination today!
    
    // Reverse the data, without punctuation processing or 
    // right-justification. (Note: all parameters are optional, 
    // but are shown here for clarity.)
    gs.textReverse( false, true, false ) ;
    
    // Write reversed text as LTR data:
    dp->WriteParagraph ( 1, 1, gs, nc.grR, true, false ) ;
    
    OUTPUT: ?tey noitaniccav 91-divoc ruoy deviecer uoy evaH
            ,srehto dna flesruoy tcetorP
            !yadot noitaniccav ruoy teg
    
    // Write the same data as RTL (note the X origin of 48):
    dp->WriteParagraph ( 1, 48, gs, nc.grR, true, true ) ;
    OUTPUT: ?tey noitaniccav 91-divoc ruoy deviecer uoy evaH
                                ,srehto dna flesruoy tcetorP
                                 !yadot noitaniccav ruoy teg
    
    // Reload the source data and reverse it without punctuation processing,
    // but WITH right-justification padding.
    gs = Vaccine ;
    gs.textReverse( false, true, true ) ;
    
    // Write reversed text as LTR data.
    // Note that the padding character is ASCII space ' ',
    // however the '-' character is used here to show the padding position.
    dp->WriteParagraph ( 1, 1, gs, nc.grR, true, false ) ;
    
    OUTPUT: ?tey noitaniccav 91-divoc ruoy deviecer uoy evaH
            --------------------,srehto dna flesruoy tcetorP
            ---------------------!yadot noitaniccav ruoy teg
    
    // Write the same data as RTL (note the X origin of 48):
    dp->WriteParagraph ( 1, 48, gs, nc.grR, true, true ) ;
    
    OUTPUT: ?tey noitaniccav 91-divoc ruoy deviecer uoy evaH
            --------------------,srehto dna flesruoy tcetorP
            ---------------------!yadot noitaniccav ruoy teg
    
    // Combine RTL text with numeric (LTR) data.
    // Create an ASCII numeric time string.
    // Reverse the numeric string.
    // Insert a Hebrew label: "The current time is: "
    const wchar_t *timeLabel = L"השעה הנוכחית היא: " ; //(displayed incorrectly)
    short hours = 12, minutes = 32, seconds = 45 ;
    gs.compose( "%02hd:%02hd:%02hd", &hours, &minutes, &seconds ) ;
    gs.textReverse( false, true, false ) ;
    gs.insert( timeLabel ) ;
    
    // Write the data as RTL (note the X origin of 26):
    dp->WriteParagraph ( 1, 26, gs, nc.grR, true, true ) ;
    
    OUTPUT:
       12:32:45 :השעה הנוכחית היא
    
    

    Technical Note On Punctuation

    Determining which characters are and are not punctuation is locale specific (see the 'ispunct()' C-language function).
    Rather than rely on the locale used by the application calling this method, we test against a list of the most common punctuation characters used in modern languages. If a punctuation character used by your application is not recognized as punctuation, please send us a note including the Unicode codepoint and we will add it to the list.



  • short formatParagraph ( short maxRows, short maxCols, ) bool trunc = true, bool hypenbrk = false,
                            short *truncIndex = NULL );
      Input  : maxRows  : maximum rows for message (>= 1)
               maxCols  : maximum columns on any message row (>= 4)
               trunc    : (optional, 'true' by default)
                          if 'true',  truncate the data if necessary to ensure
                                      that the data do not extend beyond the
                                      specified target area
                          if 'false', format the entire source text array, even
                                      if doing so requires violating the specified
                                      height of the target area (maxRows).
                                      (see also 'truncIndex' parameter)
               hyphenbrk: (optional, 'false' by default)
                          if 'false', automatic line breaks occur at space ' '
                                      characters only (20h)
                                      Special Case: the following CJK ideographs
                                      are also recognized as whitespace for
                                      purposes of line formatting (see below)
                                       '、' comma U+3001 and
                                       '。' full stop U+3002
                          if 'true',  in addition to the space ' ' characters,
                                      enable line break at:
                                       ASCII hyphen    '-' (2dh)
                                       Unicode &mdash; '—' (2014h)
                                       Unicode &ndash; '–' (2013h)
                                       Unicode &minus; '−' (2212h)
                                       Unicode &shy;       (00ADh) (soft hyphen)
               truncIndex: (optional, null pointer by default)
                           (referenced _only_ when the 'trunc' flag is reset)
                           If specified, points to a variable to receive the
                           index at which the data _would_have_been_ truncted to
                           fit the specified height of the target area if the
                           'trunc' flag had been set.
                           a) A positive value indicates that the data have not
                              been truncated, AND that the data extend beyond
                              the specified target area.
                           b) A negative value indicates that it was not necessary
                              to truncate the data i.e. the data fit entirely
                              within the target area.
    
      Returns: number of text rows in the formatted output
    

    Reformat the text so that when written to the display, it will fit within the specified rectangular area.

    The dimensions of the formatted text is specified as the number of text rows and columns, with a minimum of one(1) row by four(4) columns.



    Technical Description of the Line Break Algorithm

    Formatting is done in three steps:

    1. Remove all newlines from the text.
    2. Perform word wrap to limit the width of each row of the paragraph.
    3. If necessary, truncate the text to limit the height of the paragraph. (but see the 'trunc' and 'truncIndex' parameters)

    Token: A ‘token’ as used here is a series of printing characters delimited by whitespace characters. If line breaks on hyphen-like characters is enabled, those characters are also interpreted as token delimiters.

    1. Line breaks occur after the space character at the end of a token, i.e. a word. This means that the line is broken by placing the newline after the identified space character.
      Note: Currently the ASCII space character (20h) and the two-column CJK space (U+3000) are the only whitespace characters recognized.

      Special Case: CJK text seldom contains any spaces at all within a sentence, or even between sentences. Two-column punctuation is designed to provide a visual spacing effect. For this reason, we have made the design decision to process the following CJK ideographs as whitespace:
      '、' comma U+3001 and '。' full stop U+3002

      Caution: Tab characters (’\t’) _are not_ recognized as whitespace characters because they are effectively variable width characters. Don’t use them!
      For safety, if Tab characters are present in the text, they are silently removed.

      Optionally, the ASCII hyphen and the Unicode hyphen-like characters can be treated as if they were whitespace characters for purposes of the algorithm. (see the 'hyphenbrk' parameter.) The characters included within this group are:

      NAME HTML TAG GLYPH UNICODE CODEPOINT ASCII hyphen '-' U+002D mDASH &mdash; '—' U+2014 nDASH &ndash; '–' U+2013 minus &minus; '−' U+2212 soft hyphen &shy; U+00AD
    2. When reformatting the data, it is possible that the text will be pushed beyond the specified number of rows. In this case, we have two options:
      a) Truncate the text after the last specified row is filled.
      b) Alert the caller about the number of rows actually required.
      Optionally, we can indicate the index of where we would have truncated the text so that caller can manually truncate the text if desired. (see the 'trunc' and 'truncIndex' parameters)
    3. It is possible that a single token (word) will be longer than the width of the target area. Handling this (unlikely) scenario complicates the line-break algorithm, but could come into play; for instance: filespecs, URLs, or some German words. :-)

      Filespecs and URLs should be parsed using specialized formatting methods.
      Long words can be a challenge during parsing of the data. Our solution is to define “long” tokens as those which are more than half the specified area width ('maxCols').

      This method can optionally break after hyphens, but this may sometimes cause unintended breaks or confusing output. Use the 'hyphenbrk' option wisely.

    4. Notes on automatic hyphenation:
      Technically, hyphens should be placed between syllables, but that would require a full dictionary of the target language.
      “Can open... worms everywhere.” (Thank you, Chandler Bing.)
      • The hyphen used is the the Unicode &ndash; U+2013. This facilitates stripping them from the text if the text is copied and pasted elsewhere.
        Note that ideally we would use the "soft hyphen," Unicode &shy; U+00AD, but unfortunately most non-word procesing applications interpret this as a zero-width character making it invisible under most circumstances.
      • Programmer’s Note: If the current character is the same width as the hyphen, then the hyphen will be at the right edge of the target area. Otherwise there will be a one-column gap at the end of the line.
      • Special case: For multi-column characters, it is assumed that the characters belong to the CJK character groups. (This may not be true, but multi-column characters seldom appear in Romance languages. Therefore, for multi-column characters only, hyphens are not inserted after mid-token line breaks because there is no way of knowing if we are breaking in mid-word or between words unless we have access to dictionaries for those languages. Again, “...worms everywhere.”



Comparisons

  • short compare ( const char* uStr, bool casesen = true,
                    short length = gsMAXCHARS, short offset = 0 ) const ;

  • short compare ( const wchar_t* wStr, bool casesen = true,
                    short length = gsMAXCHARS, short offset = 0 ) const ;
      Input  :
         uStr     : (UTF-8 string) to be compared
           OR
         wStr     : (wchar_t string) to be compared
         casesen  : (optional, 'true' by default)
                    if 'true' perform case-sensitive comparison
                    if 'false' perform case-insensitive comparison
         length   : (optional, gsMAXCHARS by default. i.e. compare to end)
                    maximum number of characters to compare
         offset   : (optional, ZERO by default)
                    If specified, equals the character offset into the
                    gString character array at which to begin comparison.
    
      Returns: 
         return value uses the rules of the 'wcsncmp' (or 'wcsncasecmp') 
         library function (see string.h):
           ZERO, text data are identical
         > ZERO, first differing char of gString object is numerically larger.
         < ZERO, first differing char of gString object is numerically smaller.
    

    Compares the text content of the gString object with the specified text.

    The comparison is performed against the gString object’s wchar_t character array. The relationship between upper-case and lower-case characters is locale dependent.



  • short compare ( const gString& gs, bool casesen = true ) const ;
      Input  :
         gs       : (by reference) object whose text is to be compared
         casesen  : (optional, 'true' by default)
                    if 'true' perform case-sensitive comparison
                    if 'false' perform case-insensitive comparison
    
      Returns:
         return value uses the rules of the 'wcsncmp' (or 'wcsncasecmp') 
         library function (see string.h):
           ZERO, text data are identical
         > ZERO, first differing char of gString object is numerically larger.
         < ZERO, first differing char of gString object is numerically smaller.
    

    Compares the text content of two gString objects.

    The comparison is performed against the gString objects’ wchar_t character arrays. The relationship between upper-case and lower-case characters is locale dependent.



  • bool operator == ( const gString& gs2 ) const ;
      Input  :
         gs2   : (by reference)
                 gString object containing string to be compared
      Returns:
         'true' if the strings are identical, else 'false'
    

    Comparison operator: Compares the text content of two gString objects.

    The comparison is performed against the wchar_t character arrays of the two objects. The comparison is is case-sensitive.



  • bool operator != ( const gString& gs2 ) const ;
      Input  :
         gs2   : (by reference)
                 gString object containing string to be compared
      Returns:
         'true' if the strings are different, else 'false'
    

    Comparison operator: Compares the text content of two gString objects.

    The comparison is performed against the wchar_t character arrays of the two objects. The comparison is is case-sensitive.



  • short find ( const char* src, short offset=0,
                 bool casesen=false, short maxcmp= -1 ) const ;
  • short find ( const wchar_t* src, short offset=0,
                 bool casesen=false, short maxcmp= -1 ) const ;
  • short find ( const gString& src, short offset=0,
                 bool casesen=false, short maxcmp= -1 ) const ;
  • short find ( const wchar_t src, short offset=0,
                 bool casesen=false ) const ;
      Input  :
         src    : source data to be matched, one of the following:
                  -- pointer to a UTF-8 string (max length==gsMAXBYTES)
                  -- pointer to a wchar_t string (max length==gsMAXCHARS)
                  -- a gString object containing the source (by reference)
                  -- a single, wchar_t character
         offset : (optional, ZERO by default)
                  character index at which to begin search
                  -- if out-of-range, then same as if not specified
         casesen: (optional, 'false' by default)
                  if 'false', then scan IS NOT case sensitive
                  if 'true, then scan IS case sensitive
                  The way upper/lowercase are related is locale dependent;
         maxcmp : (optional, (-1) by default)
                  -- if not specified, then scan for a match of all
                     characters in 'src' (not including null terminator)
                  -- if specified, then scan for a match of only the first
                     'maxcmp' characters of 'src'
                  -- if out-of-range, then same as if not specified
      Returns:
         index of matching substring or (-1) if no match found
         Note: This is the wchar_t character index, NOT a byte index
    

    Scan the data for a matching substring and if found, return the index at which the first substring match begins.

    Actual comparison is always performed against the ’wide’ character array using wcsncasecmp (or wcsncmp). Null terminator character IS NOT included in the comparison. These comparisons ARE locale dependent.


  • short findlast ( const char* src, bool casesen=false ) const ;
  • short findlast ( const wchar_t* src, bool casesen=false ) const ;
  • short findlast ( const gString& src, bool casesen=false ) const ;
  • short findlast ( const wchar_t src, bool casesen=false ) const ;
      Input  :
         src    : source data to be matched, one of the following:
                  -- pointer to a UTF-8 string (max length==gsMAXBYTES)
                  -- pointer to a wchar_t string (max length==gsMAXCHARS)
                  -- a gString object containing the source (by reference)
                  -- a single, wchar_t character
         casesen: (optional, 'false' by default)
                  if 'false', then scan IS NOT case sensitive
                  if 'true, then scan IS case sensitive
                  The way upper/lowercase are related is locale dependent;
    
      Returns:
         index of last matching substring or (-1) if no match found
         Note: This is the wchar_t character index, NOT a byte index
    

    Scan the data for the last occurance of the matching substring and if found, return the index at which the substring match occurs.

    Actual comparison is always performed against the ’wide’ character array using wcsncasecmp (or wcsncmp). Null terminator character IS NOT included in the comparison. These comparisons ARE locale dependent.


  • short after ( const char* src, short offset=0,
                  bool casesen=false ) const ;
  • short after ( const wchar_t* src, short offset=0,
                  bool casesen=false ) const ;
  • short after ( const gString& src, short offset=0,
                  bool casesen=false ) const ;
      Input  :
         src    : source data to be matched, one of the following:
                  -- pointer to a UTF-8 string (max length==gsMAXBYTES)
                  -- pointer to a wchar_t string (max length==gsMAXCHARS)
                  -- a gString object containing the source (by reference)
         offset : (optional, ZERO by default)
                  character index at which to begin search
                  -- if out-of-range, then same as if not specified
         casesen: (optional, 'false' by default)
                  if 'false', then scan IS NOT case sensitive
                  if 'true, then scan IS case sensitive
                  The way upper/lowercase are related is locale dependent;
      Returns:
         index of the character which follows the matching substring
           or (-1) if no match found
         Note: This is the wchar_t character index, NOT a byte index
    

    This method is very similar to the ’find()’ method above, but instead of returning the index to the beginning of the substring, returns the index of the character which FOLLOWS the substring.


  • short findr ( const gString& src, short offset = -1, bool casesen = false ) const;
  • short findr ( const wchar_t *src, short offset = -1, bool casesen = false ) const;
  • short findr ( const char *src, short offset = -1, bool casesen = false ) const;
  • short findr ( const wchar_t& src, short offset = -1, bool casesen = false ) const;
      Input  : src    : source data to be matched, one of the following:
                        -- pointer to a UTF-8 string (max length==gsMAXBYTES)
                        -- pointer to a wchar_t string (max length==gsMAXCHARS)
                        -- a gString object containing the source (by reference)
                        -- a single, wchar_t character (by reference)
               offset : (optional, by default: index of null terminator minus one)
                        character index at which to begin search
                        -- if out-of-range, then same as if not specified
               casesen: (optional, 'false' by default)
                        if 'false', then scan IS NOT case sensitive
                        if 'true, then scan IS case sensitive
                        The way upper/lowercase are related is locale dependent;
    
      Returns: index of the first character (closest to head of data) of the 
               matching substring or (-1) if no match found
               Note: This is the wchar_t character index, NOT a byte index
    

    Scan the data beginning at the specified offset and moving toward the head of the string (toward offset zero). Locate the matching substring and if found, return the index of the first match.

    Actual comparison is always performed against the ’wide’ character array using wcsncasecmp (or wcsncmp). Null terminator character IS NOT included in the comparison. These comparisons ARE locale dependent.


  • short findx ( wchar_t srcChar = L' ', short offset=0 ) const ;
      Input  :
         srcChar: (optional, L' ' by default) character to be skipped over
         offset : (optional, ZERO by default)
                  if specified, equals the character offset into the 
                  character array at which to begin the scan.
      Returns:
         a) If successful, returns the index of first character which 
            DOES NOT match specified character.
         b) If the scan finds no character which is different from the 
            specified character, OR if 'offset' is out of range, OR if the 
            specified character is the null character, the return value 
            indexes the null terminator of the array.
         Note: This is the wchar_t character index, NOT a byte index
    

    Scan the text and locate the first character which DOES NOT match the specified character.

    This method is used primarily to scan past the end of a sequence of space (' ') characters (0x0020), but may be used to skip over any sequence of a single, repeated character. Note that the character specified must be a ‘wide’ (wchar_t) character.

    See also the ’scan’ method below which scans to the first non-whitespace character.


  • short scan ( short offset=0 ) const ;
      Input  :
         offset : (optional, ZERO by default)
                  if specified, equals the character offset into the 
                  character array at which to begin the scan.
      Returns:
         a) If successful, returns the index of first non-whitespace 
            character
         b) If the scan finds no non-whitespace character OR if 'offset' 
            is out of range, the return value indexes the null terminator 
            of the array.
         Note: This is the wchar_t character index, NOT a byte index.
    

    Scans the text array and returns the index of the first non-whitespace character found.

    Whitespace characters are defined as:
    0x0020 single-column ASCII space
    0x3000 two-column CJK space
    0x0A linefeed character
    0x0D carriage-return character
    0x09 horizontal-tab character
    0x0B vertical-tab character
    0x0C formfeed character


Examples

gString str1( "Toys for Tots" ) ;
gString str2( "The Tin Drum" ) ;

// compare with UTF-8 string
short result = str1.compare( "Toys for Tots" ) ;

// compare with wchar_t string
short result = str2.compare( L"The Tin Drum" ) ;

// compare gString objects
if ( str1 == str2 ) { /* do stuff */ }
if ( str1 != str2 ) { /* do stuff */ }


gString gs( L"Tiny Tim had a titillating tour of Times Square." ) ;

// find first instance of substring "tim" (not case sensitive)
short tIndex = gs.find( "tim" ) ;
wcout << &gs.gstr()[tIndex] << endl ;
 - - -> Tim had a titillating tour of Times Square.

// find the next instance of "tim"
gString gsSub( "tim" ) ;      // search string in a gString object
tIndex = gs.find( gsSub, tIndex + 1 ) ;
wcout << &gs.gstr()[tIndex] << endl ;
 - - -> Times Square.

// find first instance of substring "ti" (case sensitive)
tIndex = gs.find( L"ti", 0, true ) ;
wcout << &gs.gstr()[tIndex] << endl ;
 - - -> titillating tour of Times Square.

// match the first three characters of the search string
tIndex = gs.find( L"squirrel", 0, false, 3 ) ;
wcout << &gs.gstr()[tIndex] << endl ;
 - - -> Square.

// find first instance of L'R' (not case sensitive)
tIndex = gs.find( L'R' ) ;
wcout << &gs.gstr()[tIndex] << endl ;
 - - -> r of Times Square.

// extract the filename from path/filename string
gString gs( "/home/sam/SoftwareDesign/NcDialog/Dialog1/gString.hpp" ) ;
if ( (tIndex = gs.findlast( L'/' )) >= 0 )
   wcout << &gs.gstr()[tIndex + 1] << endl ;
 - - -> gString.hpp

// insert text after first instance of substring
gString gs( "I think that a parrot would be an ideal pet." ) ;
short pIndex = gs.after( L"would" ) ;
gs.insert( " NOT", pIndex ) ;
 - - -> I think that a parrot would NOT be an ideal pet.

For more examples of using gString-class methods, please refer to Test #6 of the ’Dialogw’ test application.




Extract Formatted Data

  • short gscanf ( const wchar_t* fmt, ... ) const ;
  • short gscanf ( const char* fmt, ... ) const ;
      Input  :
         fmt  : a format specification template in the style of swscanf()
                or sscanf() and related C/C++ functions
                Template may be either a const char* OR a const wchar_t*
    
         ...  : optional arguments
                Each optional argument is a POINTER to (address of) the
                variable to receive the formatted data.
                - Important Note: There must be AT LEAST as many optional
                  arguments as the number of format specifiers defined in
                  the formatting template. Excess arguments will be
                  ignored; however, too few arguments will return an
                  error condition. (see below)
    
      Returns: 
         number of items captured and converted
         returns 0 if:
           a) number of format specifications in 'fmt' > gsFormMAXARGS
           b) number of format specifications in 'fmt' > number of
              optional arguments (pointers to target variables) provided
    

    Scan the text data contained in the gString object and extract data according to the specified formatting template.
    (This is an implementation of the standard C-library ’swscanf’ function.)

    The formatting template may be either a 'const wchar_t*' as in a call to 'swscanf', or a 'const char*' as in a call to 'sscanf'.

    The formatting template may contain between zero(0) and gsFormMAXARGS format specifiers.

    The optional arguments are pointers to the target variables which will receive the formatted data.

    As with 'swscanf', the number of optional arguments must be equal to or greater than the number of format specifiers:
    (%d, %63s, %X %lld, %24lls, %[, etc.).
    Excess arguments will be ignored; however, unlike the 'swscanf' function, 'gscanf' counts the number of format specifiers and optional arguments, and if there are too few arguments, the scan will be aborted to avoid an application crash due to memory-access violation. (You’re welcome :-)



  • short gscanf ( short offset, const wchar_t* fmt, ... ) const ;
  • short gscanf ( short offset, const char* fmt, ... ) const ;
      Input  :
         offset: wide-character offset at which to begin the scan
                 Important Note: This IS NOT a byte offset.
                 If offset < 0 || offset > length of data, offset will
                 be silently set to 0.
    
         fmt   : a format specification template in the style of swscanf()
                 or sscanf() and related C/C++ functions
                 Template may be either a const char* OR a const wchar_t*
    
         ...   : optional arguments
                 Each optional argument is a POINTER to (address of) the
                 variable to receive the formatted data.
                 - Important Note: There must be AT LEAST as many optional
                   arguments as the number of format specifiers defined in
                   the formatting template. Excess arguments will be
                   ignored; however, too few arguments will return an
                   error condition. (see below)
    
      Returns: 
         number of items captured and converted
         returns 0 if:
           a) number of format specifications in 'fmt' > gsFormMAXARGS
           b) number of format specifications in 'fmt' > number of
              optional arguments (pointers to target variables) provided
    

    This is the same method as described above except that the scan of the data begins at the specified character offset.


    Examples

    short sval1, sval2, sval3, sval4 ;
    int   ival ;
    long long int llval ;
    double dval ;
    char   str1[64], str2[16], str3[16], ch1 ;
    gString gs( "17 % 18 21 22 48A2B 720451 24.325 A country song "
                "is three chords and the truth. - Willie Nelson" ) ;
    short cnt = 
       gs.gscanf( L"%hd %% %hd %hd %hd %X %lld %lf %63[^-] %c %16s %16s", 
                  &sval1, &sval2, &sval3, &sval4, &ival, &llval, 
                  &dval, str1, &ch1, str2, str3 ) ;
    
    gString gsOut( "items    : %hd\n"
                   "numeric  : %hd  %hd  %hd  %hd  0x%04X  %lld  %4.6lf\n"
                   "text data: \"%s\" %c %s %s\n",
                   &cnt, &sval1, &sval2, &sval3, &sval4, &ival, &llval, &dval, 
                   str1, &ch1, str2, str3 ) ;
    dp->WriteParagraph ( 1, 1, gsOut, dColor ) ;
    
    This yields:
    items    : 11
    numeric  : 17  18  21  22  0x48A2B  720451  24.325000
    text data: "A country song is three chords and the truth. " - Willie Nelson
    
    A working example may be found in the NcDialog API package, 
    (Dialogw test application, Test #6).
    
     - - - - -
    
    To begin the scan from a particular offset into the text, the unneeded 
    initial data may either be scanned into a throw-away buffer or the 
    scan offset may be specified directly. For example, to discard the 
    first sixteen(16) characters:
    
    wchar_t junk[gsMAXCHARS] ;
    gs = "Useless garbage: 17 21 34" ;
    
    gs.gscanf( L"%16C %hd %hd %hd", junk, &sval1, &svan2, &sval3 ) ;
      _or_
    gs.gscanf ( 16, L"%hd %hd %hd", &sval1, &svan2, &sval3 ) ;
    
    
    Programmer’s Note: It is good practice to specify a maximum length for all scanned strings in order to avoid overrun of the target buffer. THIS: %64s %128lls %256S %32[a-z] NOT THIS: %s %lls %S %[a-z]



Statistical Info

  • short gString::gschars ( void ) const ;
      Input  :
         none
      Returns:
         number of characters in the string
    

    Returns the number of characters in the string including the null terminator.



  • short utfbytes ( void ) const ;
      Input  :
         none
      Returns:
         number of bytes in UTF-8 string
    

    Returns the number of bytes in the UTF-8-encoded string including the null terminator.



  • short gscols ( void ) const ;
     Input  :
        none
     Returns:
        number of columns needed to display the data
    

    Returns the number of columns required to display the string.



  • const short* gscols ( short& charCount ) const ;
      Input  :
         charCount : (by reference, initial value ignored)
                     on return, contains number of characters in 
                     the string including the null terminator.
      Returns:
         pointer to number of columns needed for display of each character
    

    Returns a pointer to an array of column counts, one for each character of the text data. Note that number of array elements equals number of characters (plus meta-characters, if any).

    Example

    This example is from the FileMangler utility. It trims the head of 
    the provided path/filename string to fit within a dialog window.
    
    void FmConfig::ecoPathFit ( gString& gsPath, short colsAvail )
    {
       if ( (gsPath.gscols()) > colsAvail )
       {
          gString gst = gsPath ;
          short width = gst.gscols(),
                offset = ZERO,
                charCount ;
          const short* colArray = gst.gscols( charCount ) ;
    
          while ( width > (colsAvail - 3) )
             width -= colArray[offset++] ;
          gsPath.compose( L"...%S", &gst.gstr()[offset] ) ;
       }
    }  //* End ecoPathFit() *
    


  • bool isASCII ( void ) ;
      Input  :
         none
      Returns:
         'true' if data are pure, 7-bit ASCII, else 'false'
    

    Scan the data to determine whether it is pure, 7-bit ASCII.




gString Miscellaneous

  • void gString::clear ( void ) ;
      Input  :
         none
      Returns:
         nothing
    

    Reset contents to an empty string i.e. "". The data will consist of a single, NULLCHAR character. The character and byte counts are set to 1 (one), and the column count is zero.



  • const char* Get_gString_Version ( void ) const ;
      Input  :
         none
      Returns:
         pointer to version string
    

    Return a pointer to gString class version number string.



  • void gString::dbMsg ( gString& gsmsg ) ;
      Input  :
         gsmsg : (caller's object, by reference)
                 receives most recent debug message
      Returns:
         nothing
    

    FOR DEBUGGING ONLY! Application can retrieve most recent debug message.
    Note: This method is visible only if the ENABLE_GS_DEBUG flag is set in gString.hpp.




gString Examples

The NcDialog API test application ’Dialogw’ contains extensive examples of gString usage, including working copies of the examples used in this chapter.
The other NcDialog API test applications also use gString in various ways.

Here, we show just a sample of some basic uses for the gString class.

  1. Convert a UTF-8 (8-bit) character string to a wchar_t (32-bit) character string.
    const char* some_UTF-8_data = "I want to buy an hamburger." ;
    wchar_t some_wide_data[gsMAXCHARS] ;
    
    gString gs( some_UTF-8_data ) ;
    gs.copy( some_wide_data, gsMAXCHARS ) ;
    
  2. Convert a wchar_t (32-bit) character string to a UTF-8 (8-bit) character string.
    const wchar_t* some_wide_data = L"I want to buy an hamburger." ;
    char some_UTF-8_data[gsMAXBYTES] ;
    
    gString gs( some_wide_data ) ;
    gs.copy( some_UTF-8_data, gsMAXBYTES ) ;
    
  3. Concatenate strings.
    const char* Head = "Where" ;
    const wchar_t* Tail = L"is Carmen Sandiego?" ;
    gString gs( L" in the world " ) ;
    gs.insert( Head, ZERO ) ;
    gs.append( Tail ) ;
    wcout << gs << endl ;
     - - ->  Where in the world is Carmen Sandiego?
    
  4. Create formatted string data.
    const char* utf8String = "We present" ;
    const wchar_t* wideString = L"for your enjoyment:" ;
    const char utf8Char = 'W' ;
    const wchar_t wideChar = L'C' ;
    short int ways = 100 ;
    double dish = 17.57 ;
    gString gs ;
    
    gs.compose( "%s %S %hd %cays to %Cook %Chicken,\n"
                "and %.2lf side dishes and beverages!",
                utf8String,
                wideString,
                &ways,
                &utf8Char,
                &wideChar, &wideChar,
                &dish ) ;
    
    wcout << gs << endl ;
     - - ->  We present for your enjoyment: 100 Ways to Cook Chicken,
             and 17.57 side dishes and beverages!
    

    Important Note: All parameters are pointers to the data: For strings (and pointers), the address is the name of the variable. For all other data types, including single characters, use the address-of ('&') operator.

  5. Count display columns to make data fit the window.

    This is a formatting method taken from the ’Dialogx’ test application. It breaks a text stream into lines which fit within the dianostic window. It’s not speedy (or particularly smart), but it demonstrates the use of ’gString’ to calculate the space needed to display text data and then formatting the data to fit the space.

    //*  FormatOutput   *
    //* Input  : gsOut     : semi-formatted source data
    //*          wpos      : start position for display
    //*          lineCount : maximum display lines before truncation
    //* Returns: nothing
    
    void dXClip::FormatOutput ( const gString& gsOut, 
                                winPos& wpos, short lineCount )
    {
       short tWidth = (ddCOLS - 2),        // inside width of target window
             maxWidth = tWidth * lineCount,// max columns for message
             tLines = ZERO ;               // loop counter
    
       gString gsw( gsOut.gstr() ), gso ;
       if ( gsw.gscols() > maxWidth )
       { // truncate the string if necessary
          gsw.limitCols ( maxWidth - 3 ) ;
          gsw.append( L"..." ) ;
       }
       do
       {  // break the source into convenient widths
          gso = gsw ;
          gso.limitCols( tWidth ) ;
          gso.append( L'\n' ) ;
          gsw.shiftCols( -tWidth ) ;
          this->dpd->ClearLine ( wpos.ypos ) ; // clear target display line
          wpos = this->dpd->WriteParagraph ( wpos, gso, 
                                             dxPtr->dColor, true ) ;
       }
       while ( ++tLines <= lineCount && gsw.gschars() > 1 ) ;
    
    }  //* End FormatOutput() *
    

    The example above is just a simple one. To see it in action, please refer to the ’Dialogw’ NcDialog API Test Application, test seven (7).
    For a more sophisticated example, see the ’DialogTextbox::mlFmtDisplay’ automatic word-wrapping method in the NcDialog API source code.








Technical Support

Please Note: All trademarks and service marks mentioned in this
document are the entirely-too-proprietary property of their
respective owners, and this author makes no representation of
affiliation with or ownership of any of the damned things.

Contact

The NcDialog family of classes, link library, demonstration apps and all associated Texinfo documentation were written and are maintained by: Mahlon R. Smith, The Software Samurai Beijing University of Technology on the web at: www.SoftwareSam.us For bugs, suggestions, periodic updates, or possible praise, please post a message to the author via website. The author wishes to thank everyone for their intelligent, kind and thoughtful responses. (ranters I can live without)


By the same author

The NcDialog-class link library, the FileMangler file management utility, and other utilities by the same author are also available through the website.

The NcDialog-class link library is a C++ API built on the ncurses C-language function library installed on most Linux systems. NcDialog provides console (text-based) applications with a simple and intuitive way to construct a dialog-based user interface without having to know much about the ncurses primitives (or dialog construction, for that matter). NcDialog has many built-in interface controls which make almost any user interface task a painless operation. The NcDialog-class library and source code are available on the author’s website.






Index

Jump to:   0   1  
B   C   F   G   L   M   R   S   T   W  
Index Entry  Section

0
06.11.04 Wayland Clipboard: Wayland Clipboard
06.11.05 Wayclip Demo App: Wayclip Demo App
07 gString Text Tool: gString Text Tool
07.01 Introduction to gString: Introduction to gString
07.02 gString Public Methods: gString Public Methods
07.03 gString Instantiation: gString Instantiation
07.04 Assignment Operators: Assignment Operators
07.05 Formatted Assignments: Formatted Assignments
07.06 Integer Formatting: Integer Formatting
07.07 Data Access: Data Access
07.08 Copying Data: Copying Data
07.09 Modifying Existing Data: Modifying Existing Data
07.10 Comparisons: Comparisons
07.11 Extract Formatted Data: Extract Formatted Data
07.12 Statistical Info: Statistical Info
07.13 gString Miscellaneous: gString Miscellaneous
07.14 gString Examples: gString Examples
09 Technical Support: Technical Support

1
10 Copyright Notice: Copyright Notice
10.01 GNU General Public License: GNU General Public License
10.02 GNU Free Documentation License: GNU Free Documentation License

B
BiDi text: Modifying Existing Data

C
clipboard access, public methods: Wayland Clipboard
contact info: Technical Support
contact information: Technical Support

F
fiUnits enumerated type: Integer Formatting
formatInt field overflow: Integer Formatting

G
gString Docs: Top
gString methods: gString Public Methods
gString text conversion: gString Text Tool

L
locale, set: Integer Formatting

M
methods, gString: gString Public Methods

R
RTL text: Modifying Existing Data

S
set locale: Integer Formatting
support: Technical Support
swscanf emulation: Extract Formatted Data

T
text conversion, gString: gString Text Tool

W
wayclip demo app: Wayclip Demo App
wayland clipboard: Wayland Clipboard
WaylandCB class: Wayland Clipboard
WaylandCB Docs: Top
WaylandCB public methods: Wayland Clipboard