Qore CsvUtil Module Reference
1.5
|
the AbstractCsvIterator class is an abstract base class that allows abstract CSV data to be iterated More...
Public Member Functions | |
constructor (*hash opts) | |
creates the AbstractCsvIterator with an option hash in single-type mode More... | |
constructor (hash spec, hash opts) | |
creates the AbstractCsvIterator with an option hash in multi-type mode More... | |
private *string | getDataName () |
Returns the name of the input data. | |
*list | getHeaders () |
Returns the current record headers or NOTHING if no headers have been detected or saved yet. More... | |
*list | getHeaders (string type) |
Returns a list of headers for the given record or NOTHING if the record is not recognized. More... | |
private list | getLineAndSplit () |
Read line split by separator/quote into list. | |
abstract private string | getLineValueImpl () |
Returns the current line. | |
string | getQuote () |
Returns the current quote string. More... | |
hash | getRecord (bool extended) |
Returns the current record as a hash. More... | |
hash | getRecord () |
Returns the current record as a hash. More... | |
any | getRecordList () |
Returns the current record as a list. More... | |
string | getSeparator () |
Returns the current separator string. More... | |
hash | getValue () |
Returns the current record as a hash. More... | |
string | identifyType (list rec) |
Identify a fixed-length line type using identifyTypeImpl(); may be overridden if necessary. More... | |
private *string | identifyTypeImpl (list rec) |
Identify a input record, given the raw line string. This method performs a lookup to a precalculated table based on number of records (see constructor()). In case different criteria are needed, eg. when two line types in a spec have the same record number and no unique resolving rule are specified, this method needs to be overridden, otherwise it will throw an exception because the precalculated mapping will be empty. More... | |
int | index () |
Returns the row index being iterated, which does not necessarily correspond to the line number when there are header rows and blank lines are skipped. More... | |
int | lineNumber () |
Returns the current iterator line number in the file (the first line is line 1) or 0 if not pointing at a valid element. More... | |
abstract private int | lineNumberImpl () |
Returns the current line number. | |
any | memberGate (string name) |
Returns the given column value for the current row. More... | |
bool | next () |
Moves the current line / record position to the next line / record; returns False if there are no more lines to iterate. More... | |
abstract private bool | nextLineImpl () |
Moves the current line / record position to the next line / record; returns False if there are no more lines to iterate. | |
private hash | parseLine () |
Parses a line in the file and returns a processed list of the fields. | |
private | prepareFieldsFromHeaders (*list headers) |
match headers provided at Csv header or in options, never called for multi-type because header_names is False */ | |
private | processCommonOptions (*hash opts, int C_OPTx) |
process common options and and assing internal fields | |
private | processSpec (hash spec) |
process specification and assing internal data for resolving | |
Private Attributes | |
const | Options |
valid options for the object (a hash for quick lookups of valid keys) | |
the AbstractCsvIterator class is an abstract base class that allows abstract CSV data to be iterated
The AbstractCsvIterator class constructor takes an optional hash with possible keys given in the following table. Note that key names are case-sensitive, and data types are soft (conversions are made when possible).
AbstractCsvIterator Options
Option | Data Type | Description |
"date_format" | string | the default date format for "date" fields (see date formatting for the value in this case) |
"encoding" | string | the character encoding for the file (and for tagging string data read); if the value of this key is not a string then it will be ignored |
"separator" | string | the string separating the fields in the file (default: "," ) |
"quote" | string | the field quote character (default: '"' ) |
"eol" | string | the end of line character(s) (default: auto-detect); if the value of this key is not a string then it will be ignored |
"ignore_empty" | bool | if True (the default) then empty lines will be ignored; this option is processed with parse_boolean() |
"ignore_whitespace" | bool | if True (the default) then leading and trailing whitespace will be stripped from non-quoted fields; this option is processed with parse_boolean() |
"header_names" | bool | if True then the object will parse the header names from the first header row, in this case "header_lines" must be > 0. In case of multi-type lines "header_names" is mandatory False. |
"header_lines" | int | the number of headers lines in the file (must be > 0 if "header_names" is True) |
"header_reorder" | bool | if True (default value) then if "headers" are provided by options or read from file then data fields are reordered to follow headers. It has major effect on return value of AbstractCsvIterator::getRecordList() function and also minor on hash result of AbstractCsvIterator::getRecord() when a program code depends on order of keys. If False then fields not yet specified are pushed at the end of field definition. |
"verify_columns" | bool | if True (the default is False) then if a line is parsed with a different column count than other lines, a CSVFILEITERATOR-DATA-ERROR exception is thrown |
"timezone" | string | the timezone to use when parsing dates (will be passed to Qore::TimeZone::constructor()) |
"tolwr" | bool | if True then all header names will be converted to lower case letters |
AbstractCsvIterator Single-type-only Options
Option | Data Type | Description |
"headers" | list of strings | list of header / column names for the data iterated; if this is present, then "header_names" must be False. |
"fields" | Hash | the keys are column names (or numbers in case column names are not used) and the values are either strings (one of Option Field Types giving the data type for the field) or a Option Field Hash describing the field; also sets headers if not set automatically with "header_names" ; if no field type is given, the default is "*string" |
AbstractCsvIterator Multi-type-only Options
Option | Data Type | Description |
"extended_record" | Boolean | if True then get functions will use extended hash with "type" and "record" members to provide type to calling party, Default: False |
"date-format"
"ignore-empty"
"ignore-whitespace"
"header-names"
"header-lines"
"verify-columns"
Fields are defined in order how the data are expected by user program. In this order are data returned by get functions. There are two exception, the former "headers"
options sorts definition that data correspond to "headers"
field order and the later when header names are read from Csv file header.
AbstractCsvIterator Option Field Types
Name | Description |
"int" | the value will be unconditionally converted to an integer using the Qore::int() function |
"*int" | the value will be converted to NOTHING if empty, otherwise it will be converted to an integer using the Qore::int() function |
"float" | the value will be unconditionally converted to a floating-point value using the Qore::float() function |
"*float" | the value will be converted to NOTHING if empty, otherwise it will be converted to a floating-point value using the Qore::float() function |
"number" | the value will be unconditionally converted to an arbitrary-precision number value using the Qore::number() function |
"*number" | the value will be converted to NOTHING if empty, otherwise it will be converted to an arbitrary-precision number value using the Qore::number() function |
"string" | (the default) the value remains a string; no transformation is done on the input data |
"*string" | the value will be converted to NOTHING if empty, otherwise, it remains a string |
"date" | in this case dates are parsed directly with the Qore::date() function (and therefore are tagged automatically with the current time zone); to specify another date format, use the hash format documented below |
"*date" | the value will be converted to NOTHING if empty, otherwise dates are parsed directly with the Qore::date() function (and therefore are tagged automatically with the current time zone); to specify another date format, use the hash format documented below |
See here for an example of using the hash field description in the constructor().
AbstractCsvIterator Option Field Hash and Spec Hash Field specification is provided via options "fields" for old-style constructor or as separate parameter in new-style constructor supporting multi-type.
Key | Value Description |
"type" | one of the option type values giving the field type |
"format" | used only with the "date" type; this is a date/time format mask for parsing dates |
"timezone" | used only with the "date" type; this value is passed to Qore::TimeZone::constructor() and the resulting timezone is used to parse the date (this value overrides any default time zone for the object; use only in the rare case that date/time values from different time zones are present in different columns of the same file) |
"code" | this is a closure or call reference that takes a single argument of the value (after formatting with any optional "type" formats) and returns the value that will be output for the field |
Extra AbstractCsvIterator Spec Hash Options
Key | Data Type | Value Description |
value | string | the value to use to compare to input data when determining the record type; if "value" is defined for a field, then "regex" cannot be defined (for iterator only) |
regex | string | the regular expression to use to apply to input data lines when determining the record type (for iterator only) |
header | string | field name as defined in Csv header line. It enables remapping from Csv to own name |
index | int | index of field in Csv file. It enables mapping when Csv has not header |
default | any | Default output value (for writers only) |
CsvUtil::AbstractCsvIterator::constructor | ( | *hash | opts | ) |
creates the AbstractCsvIterator with an option hash in single-type mode
opts | a hash of optional options; see AbstractCsvIterator Constructor Option Hash Overview for more information |
ABSTRACTCSVITERATOR-ERROR | invalid or unknown option; invalid data type for option; "header-names" is True and "header_lines" is 0 or "headers" is also present; unknown field type |
creates the AbstractCsvIterator with an option hash in multi-type mode
spec | a hash of field and type definition; see Option Field Hash for more information |
opts | a hash of optional options; see AbstractCsvIterator Constructor Option Hash Overview for more information |
*list CsvUtil::AbstractCsvIterator::getHeaders | ( | ) |
Returns the current record headers or NOTHING if no headers have been detected or saved yet.
Returns a list of headers for the given record or NOTHING if the record is not recognized.
string CsvUtil::AbstractCsvIterator::getQuote | ( | ) |
hash CsvUtil::AbstractCsvIterator::getRecord | ( | bool | extended | ) |
Returns the current record as a hash.
extended | specifies if result is an extended hash including "type" and "record" . |
"type"
: the record type"record"
: a hash of the current recordhash CsvUtil::AbstractCsvIterator::getRecord | ( | ) |
Returns the current record as a hash.
"extended_record"
option is set, then the return value is a hash with the following keys:"type"
: the record type"record"
: a hash of the current recordany CsvUtil::AbstractCsvIterator::getRecordList | ( | ) |
Returns the current record as a list.
When "extended_record" option is set then result is extended hash including "type" and "record".
"extended_record"
option is set, then the return value is a hash with the following keys:"type"
: the record type"record"
: a list of field values for the current recordINVALID-ITERATOR | this error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method |
string CsvUtil::AbstractCsvIterator::getSeparator | ( | ) |
|
virtual |
Returns the current record as a hash.
"extended_record"
option is set, then the return value is a hash with the following keys:"type"
: the record type"record"
: a hash of the current recordINVALID-ITERATOR | this error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method |
Implements Qore::AbstractIterator.
Identify a fixed-length line type using identifyTypeImpl(); may be overridden if necessary.
rec | Input line record to be identified |
ABSTRACTCSVITERATOR-ERROR | input line cannot be matched to a known record |
Identify a input record, given the raw line string. This method performs a lookup to a precalculated table based on number of records (see constructor()). In case different criteria are needed, eg. when two line types in a spec have the same record number and no unique resolving rule are specified, this method needs to be overridden, otherwise it will throw an exception because the precalculated mapping will be empty.
rec | Input line record to be identified |
ABSTRACTCSVITERATOR-ERROR | input line cannot be matched to a known record |
int CsvUtil::AbstractCsvIterator::index | ( | ) |
Returns the row index being iterated, which does not necessarily correspond to the line number when there are header rows and blank lines are skipped.
int CsvUtil::AbstractCsvIterator::lineNumber | ( | ) |
Returns the current iterator line number in the file (the first line is line 1) or 0 if not pointing at a valid element.
any CsvUtil::AbstractCsvIterator::memberGate | ( | string | name | ) |
Returns the given column value for the current row.
name | the name of the field (header name) in record |
INVALID-ITERATOR | this error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method |
ABSTRACTCSVITERATOR-FIELD-ERROR | invalid or unknown field name given |
|
virtual |
Moves the current line / record position to the next line / record; returns False if there are no more lines to iterate.
This method will return True again after it returns False once if the file being iterated has data that can be iterated, otherwise it will always return False. The iterator object should not be used to retrieve a value after this method returns False.
Implements Qore::AbstractIterator.