Qore CsvUtil Module Reference  1.5
 All Classes Namespaces Functions Variables Groups Pages
CsvUtil::AbstractCsvIterator Class Referenceabstract

the AbstractCsvIterator class is an abstract base class that allows abstract CSV data to be iterated More...

Inheritance diagram for CsvUtil::AbstractCsvIterator:

Public Member Functions

 constructor (*hash opts)
 creates the AbstractCsvIterator with an option hash in single-type mode More...
 
 constructor (hash spec, hash opts)
 creates the AbstractCsvIterator with an option hash in multi-type mode More...
 
private *string getDataName ()
 Returns the name of the input data.
 
*list getHeaders ()
 Returns the current record headers or NOTHING if no headers have been detected or saved yet. More...
 
*list getHeaders (string type)
 Returns a list of headers for the given record or NOTHING if the record is not recognized. More...
 
private list getLineAndSplit ()
 Read line split by separator/quote into list.
 
abstract private string getLineValueImpl ()
 Returns the current line.
 
string getQuote ()
 Returns the current quote string. More...
 
hash getRecord (bool extended)
 Returns the current record as a hash. More...
 
hash getRecord ()
 Returns the current record as a hash. More...
 
any getRecordList ()
 Returns the current record as a list. More...
 
string getSeparator ()
 Returns the current separator string. More...
 
hash getValue ()
 Returns the current record as a hash. More...
 
string identifyType (list rec)
 Identify a fixed-length line type using identifyTypeImpl(); may be overridden if necessary. More...
 
private *string identifyTypeImpl (list rec)
 Identify a input record, given the raw line string. This method performs a lookup to a precalculated table based on number of records (see constructor()). In case different criteria are needed, eg. when two line types in a spec have the same record number and no unique resolving rule are specified, this method needs to be overridden, otherwise it will throw an exception because the precalculated mapping will be empty. More...
 
int index ()
 Returns the row index being iterated, which does not necessarily correspond to the line number when there are header rows and blank lines are skipped. More...
 
int lineNumber ()
 Returns the current iterator line number in the file (the first line is line 1) or 0 if not pointing at a valid element. More...
 
abstract private int lineNumberImpl ()
 Returns the current line number.
 
any memberGate (string name)
 Returns the given column value for the current row. More...
 
bool next ()
 Moves the current line / record position to the next line / record; returns False if there are no more lines to iterate. More...
 
abstract private bool nextLineImpl ()
 Moves the current line / record position to the next line / record; returns False if there are no more lines to iterate.
 
private hash parseLine ()
 Parses a line in the file and returns a processed list of the fields.
 
private prepareFieldsFromHeaders (*list headers)
 match headers provided at Csv header or in options, never called for multi-type because header_names is False */
 
private processCommonOptions (*hash opts, int C_OPTx)
 process common options and and assing internal fields
 
private processSpec (hash spec)
 process specification and assing internal data for resolving
 

Private Attributes

const Options
 valid options for the object (a hash for quick lookups of valid keys)
 

Additional Inherited Members

- Private Member Functions inherited from CsvHelper
private list adjustFieldsFromHeaders (string type, *list headers)
 
private checkType (string fld_errs, string key, string value)
 validate field type
 
 constructor (string n_errname)
 
private bool isMultiType ()
 returns True if specification hash defines more types
 

Detailed Description

the AbstractCsvIterator class is an abstract base class that allows abstract CSV data to be iterated

AbstractCsvIterator Constructor Option Hash Overview

The AbstractCsvIterator class constructor takes an optional hash with possible keys given in the following table. Note that key names are case-sensitive, and data types are soft (conversions are made when possible).

AbstractCsvIterator Options

Option Data Type Description
"date_format" string the default date format for "date" fields (see date formatting for the value in this case)
"encoding" string the character encoding for the file (and for tagging string data read); if the value of this key is not a string then it will be ignored
"separator" string the string separating the fields in the file (default: ",")
"quote" string the field quote character (default: '"')
"eol" string the end of line character(s) (default: auto-detect); if the value of this key is not a string then it will be ignored
"ignore_empty" bool if True (the default) then empty lines will be ignored; this option is processed with parse_boolean()
"ignore_whitespace" bool if True (the default) then leading and trailing whitespace will be stripped from non-quoted fields; this option is processed with parse_boolean()
"header_names" bool if True then the object will parse the header names from the first header row, in this case "header_lines" must be > 0. In case of multi-type lines "header_names" is mandatory False.
"header_lines" int the number of headers lines in the file (must be > 0 if "header_names" is True)
"header_reorder" bool if True (default value) then if "headers" are provided by options or read from file then data fields are reordered to follow headers. It has major effect on return value of AbstractCsvIterator::getRecordList() function and also minor on hash result of AbstractCsvIterator::getRecord() when a program code depends on order of keys. If False then fields not yet specified are pushed at the end of field definition.
"verify_columns" bool if True (the default is False) then if a line is parsed with a different column count than other lines, a CSVFILEITERATOR-DATA-ERROR exception is thrown
"timezone" string the timezone to use when parsing dates (will be passed to Qore::TimeZone::constructor())
"tolwr" bool if True then all header names will be converted to lower case letters

AbstractCsvIterator Single-type-only Options

Option Data Type Description
"headers" list of strings list of header / column names for the data iterated; if this is present, then "header_names" must be False.
"fields" Hash the keys are column names (or numbers in case column names are not used) and the values are either strings (one of Option Field Types giving the data type for the field) or a Option Field Hash describing the field; also sets headers if not set automatically with "header_names"; if no field type is given, the default is "*string"

AbstractCsvIterator Multi-type-only Options

Option Data Type Description
"extended_record" Boolean if True then get functions will use extended hash with "type" and "record" members to provide type to calling party, Default: False
Note
the following options separated by dashes are still supported for backwards-compatibility:
  • "date-format"
  • "ignore-empty"
  • "ignore-whitespace"
  • "header-names"
  • "header-lines"
  • "verify-columns"

Option Field Types

Fields are defined in order how the data are expected by user program. In this order are data returned by get functions. There are two exception, the former "headers" options sorts definition that data correspond to "headers" field order and the later when header names are read from Csv file header.

AbstractCsvIterator Option Field Types

Name Description
"int" the value will be unconditionally converted to an integer using the Qore::int() function
"*int" the value will be converted to NOTHING if empty, otherwise it will be converted to an integer using the Qore::int() function
"float" the value will be unconditionally converted to a floating-point value using the Qore::float() function
"*float" the value will be converted to NOTHING if empty, otherwise it will be converted to a floating-point value using the Qore::float() function
"number" the value will be unconditionally converted to an arbitrary-precision number value using the Qore::number() function
"*number" the value will be converted to NOTHING if empty, otherwise it will be converted to an arbitrary-precision number value using the Qore::number() function
"string" (the default) the value remains a string; no transformation is done on the input data
"*string" the value will be converted to NOTHING if empty, otherwise, it remains a string
"date" in this case dates are parsed directly with the Qore::date() function (and therefore are tagged automatically with the current time zone); to specify another date format, use the hash format documented below
"*date" the value will be converted to NOTHING if empty, otherwise dates are parsed directly with the Qore::date() function (and therefore are tagged automatically with the current time zone); to specify another date format, use the hash format documented below

Option Field Hash

See here for an example of using the hash field description in the constructor().

AbstractCsvIterator Option Field Hash and Spec Hash Field specification is provided via options "fields" for old-style constructor or as separate parameter in new-style constructor supporting multi-type.

Key Value Description
"type" one of the option type values giving the field type
"format" used only with the "date" type; this is a date/time format mask for parsing dates
"timezone" used only with the "date" type; this value is passed to Qore::TimeZone::constructor() and the resulting timezone is used to parse the date (this value overrides any default time zone for the object; use only in the rare case that date/time values from different time zones are present in different columns of the same file)
"code" this is a closure or call reference that takes a single argument of the value (after formatting with any optional "type" formats) and returns the value that will be output for the field

Extra AbstractCsvIterator Spec Hash Options

Key Data Type Value Description
value string the value to use to compare to input data when determining the record type; if "value" is defined for a field, then "regex" cannot be defined (for iterator only)
regex string the regular expression to use to apply to input data lines when determining the record type (for iterator only)
header string field name as defined in Csv header line. It enables remapping from Csv to own name
index int index of field in Csv file. It enables mapping when Csv has not header
default any Default output value (for writers only)

Member Function Documentation

CsvUtil::AbstractCsvIterator::constructor ( *hash  opts)

creates the AbstractCsvIterator with an option hash in single-type mode

Parameters
optsa hash of optional options; see AbstractCsvIterator Constructor Option Hash Overview for more information
Exceptions
ABSTRACTCSVITERATOR-ERRORinvalid or unknown option; invalid data type for option; "header-names" is True and "header_lines" is 0 or "headers" is also present; unknown field type
CsvUtil::AbstractCsvIterator::constructor ( hash  spec,
hash  opts 
)

creates the AbstractCsvIterator with an option hash in multi-type mode

Parameters
speca hash of field and type definition; see Option Field Hash for more information
optsa hash of optional options; see AbstractCsvIterator Constructor Option Hash Overview for more information
*list CsvUtil::AbstractCsvIterator::getHeaders ( )

Returns the current record headers or NOTHING if no headers have been detected or saved yet.

Example:
1 *list l = i.getHeaders();
Note
if headers are not saved against the object in the constructor(), then they are written to the object after the first call to next()
*list CsvUtil::AbstractCsvIterator::getHeaders ( string  type)

Returns a list of headers for the given record or NOTHING if the record is not recognized.

Example:
1 *list l = i.getHeaders(my_type);
string CsvUtil::AbstractCsvIterator::getQuote ( )

Returns the current quote string.

Example:
1 string quote = i.getQuote();
Returns
the current quote string
hash CsvUtil::AbstractCsvIterator::getRecord ( bool  extended)

Returns the current record as a hash.

Example:
1 hash h = i.getRecord();
Parameters
extendedspecifies if result is an extended hash including "type" and "record".
Returns
the current record as a hash; if extended is True, then the return value is a hash with the following keys:
  • "type": the record type
  • "record": a hash of the current record
Exceptions
INVALID-ITERATORthis error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method
hash CsvUtil::AbstractCsvIterator::getRecord ( )

Returns the current record as a hash.

Example:
1 hash h = i.getRecord();
Returns
the current record as a hash; when the "extended_record" option is set, then the return value is a hash with the following keys:
  • "type": the record type
  • "record": a hash of the current record
Exceptions
INVALID-ITERATORthis error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method
any CsvUtil::AbstractCsvIterator::getRecordList ( )

Returns the current record as a list.

Example:
1 list l = i.getRecordList();

When "extended_record" option is set then result is extended hash including "type" and "record".

Returns
the current record as a list of field values; when the "extended_record" option is set, then the return value is a hash with the following keys:
  • "type": the record type
  • "record": a list of field values for the current record
Exceptions
INVALID-ITERATORthis error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method
string CsvUtil::AbstractCsvIterator::getSeparator ( )

Returns the current separator string.

Example:
1 string sep = i.getSeparator();
Returns
the current separator string
hash CsvUtil::AbstractCsvIterator::getValue ( )

Returns the current record as a hash.

Example:
1 hash h = i.getValue();
Returns
the current record as a hash; when the "extended_record" option is set, then the return value is a hash with the following keys:
  • "type": the record type
  • "record": a hash of the current record
Exceptions
INVALID-ITERATORthis error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method
string CsvUtil::AbstractCsvIterator::identifyType ( list  rec)

Identify a fixed-length line type using identifyTypeImpl(); may be overridden if necessary.

Parameters
recInput line record to be identified
Returns
the name of the record corresponding to the input line
Exceptions
ABSTRACTCSVITERATOR-ERRORinput line cannot be matched to a known record
private *string CsvUtil::AbstractCsvIterator::identifyTypeImpl ( list  rec)

Identify a input record, given the raw line string. This method performs a lookup to a precalculated table based on number of records (see constructor()). In case different criteria are needed, eg. when two line types in a spec have the same record number and no unique resolving rule are specified, this method needs to be overridden, otherwise it will throw an exception because the precalculated mapping will be empty.

Parameters
recInput line record to be identified
Returns
the record name or NOTHING if the input cannot be matched
Exceptions
ABSTRACTCSVITERATOR-ERRORinput line cannot be matched to a known record
int CsvUtil::AbstractCsvIterator::index ( )

Returns the row index being iterated, which does not necessarily correspond to the line number when there are header rows and blank lines are skipped.

Example:
1 int index = i.index();
Returns
the row index being iterated, which does not necessarily correspond to the line number when there are header rows and blank lines are skipped
See Also
lineNumber()
Since
CsvUtil 1.1
int CsvUtil::AbstractCsvIterator::lineNumber ( )

Returns the current iterator line number in the file (the first line is line 1) or 0 if not pointing at a valid element.

Example:
1 while (i.next()) {
2  printf("+ line %d: %y\n", i.lineNumber(), i.getValue());
3 }
Returns
returns the current iterator line number in the file (the first line is line 1) or 0 if not pointing at a valid element
Note
equivalent to Qore::FileLineIterator::index()
See Also
index()
Since
CsvUtil 1.1
any CsvUtil::AbstractCsvIterator::memberGate ( string  name)

Returns the given column value for the current row.

Parameters
namethe name of the field (header name) in record
Returns
the value of the given header for the current record
Exceptions
INVALID-ITERATORthis error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method
ABSTRACTCSVITERATOR-FIELD-ERRORinvalid or unknown field name given
bool CsvUtil::AbstractCsvIterator::next ( )

Moves the current line / record position to the next line / record; returns False if there are no more lines to iterate.

This method will return True again after it returns False once if the file being iterated has data that can be iterated, otherwise it will always return False. The iterator object should not be used to retrieve a value after this method returns False.

Returns
False if there are no lines / records to iterate (in which case the iterator object is invalid and should not be used); True if successful (meaning that the iterator object is valid)
Note
that if headers are not given as an option to the constructor, then they are detected and set the first time AbstractCsvIterator::next() is run on a file (see getHeaders())