CSV.h File Reference

getNextCSV() and setQuotedCSV(): library to process CSV files. More...

#include <string>
#include <iostream>
#include <vector>
#include <stdlib.h>

Go to the source code of this file.


Namespaces

namespace  std
 Defined by the C++ standard library.

Defines

#define English_dox   "Doxygen English documentation"
 Doxygen English documentation.

Functions

void setQuotedCSV (std::string &res, const std::string &value)
 Prepares value for output into a CSV file.
bool getNextCSV (std::string &csv, std::istream &CIN)
 Scans input stream CIN and returns the next CSV value.
void trim (std::string &str)
 Deletes leading and trailing whitespace from "str".
void trimCSV (std::string &str)
 Converts an incorrect CSV field value into its probably correct value.
void chop (std::string &str, char ch=0)
 Deletes ch when it is the trailing character in str.


Detailed Description

getNextCSV() and setQuotedCSV(): library to process CSV files.

CSV: Comma Separated Value. The CSV file format have been better defined by IETF with RFC-4180.

Class CSV_line is a wrapper around these routines, but requires that no quoted Line Feed characters "\n" appear within each line in a CSV file.

Author:
Adolfo Di Mare <adolfo@di-mare.com>
Date:
2008

Definition in file CSV.h.


Define Documentation

#define English_dox   "Doxygen English documentation"

Doxygen English documentation.

Definition at line 5 of file CSV.h.


Function Documentation

void setQuotedCSV ( std::string &  res,
const std::string &  value 
)

Prepares value for output into a CSV file.

  • Stores a new value into string res.
  • Surrounds the result in double-quotes when value has whitespace.
  • Surrounds the result in double-quotes when value has double-quotes.
  • Surrounds the result in double-quotes when value has commas ",".
  • Substitutes any double-quotes '"' within value with 2 double-quotes [""].
  • Works with char, not tested for wchar_t.

    {{  // test::setQuotedCSV()
        std::string res;
        setQuotedCSV( res, ","    );  assertTrue( res == "\",\"" );     // [","]
        setQuotedCSV( res, "2"    );  assertTrue( res == "2" );         // [2]
        setQuotedCSV( res, ""     );  assertTrue( res == "" );          // []
        setQuotedCSV( res, "4,5"  );  assertTrue( res == "\"4,5\"" );   // ["4,5"]
        setQuotedCSV( res, "K\""  );  assertTrue( res == "\"K\"\"\"" ); // ["K"""]
        setQuotedCSV( res, "\r\n" );  assertTrue( res == "\"\r\n\"" );  // ["\r\n"]
    }}
See also:
test_CSV::setQuotedCSV()

Definition at line 212 of file CSV.cpp.

bool getNextCSV ( std::string &  csv,
std::istream &  CIN 
)

Scans input stream CIN and returns the next CSV value.

  • CIN should be open in std::ios::binary mode as chars are extracted one by one, using CIN.get(ch).
  • The retrieved value from CIN gets stored into csv.
  • Works with char, not tested for wchar_t.
  • Removes from csv the trailing (CR+LF or LF) ==> "\r\n" o "\n".
  • An effort was made to comply with RFC-4180.

Returns:
true when the CSV field ends in "\n" (LF -> LineFeed).
See also:
http://tools.ietf.org/html/rfc4180

http://www.horstmann.com/cpp/pitfalls.html

{{  // test::getNextCSV()
    VEC.clear();          // std::vector<std::string> VEC;
    std::string csv;
    bool eol_CIN = false; // stop when the end of line is reached
    std::istringstream ist( str , std::ios::binary );
    while ( ! eol_CIN && ! ist.fail() ) { // ! ist.eof() pitfall!
        eol_CIN = getNextCSV( csv, ist );
        VEC.push_back( csv );
    }
    return;
    //  Using std::ios::binary ensures that no CR+LF chars are discarded
}}
See also:
test_CSV::getNextCSV()

Definition at line 170 of file CSV.cpp.

void trim ( std::string &  str  ) 

Deletes leading and trailing whitespace from "str".

  • It will alos delete characters " \f\n\r\t\v".
  • Uses isspace(ch) to find out if a letter is whitespace.

    {{  // test::trim()
        std::string str;
        str = " a b   "; trim(str); assertTrue( str == "a b"  );
        str = "  a\nb "; trim(str); assertTrue( str == "a\nb" );
        str = "";        trim(str); assertTrue( str == ""     );
        str = "\r\t\n "; trim(str); assertTrue( str == ""     );
        str = " a b ";   trim(str); assertTrue( str == "a b"  );
        str = " ab " ;   trim(str); assertTrue( str == "ab"   );
    }}
See also:
test_CSV::test_trim()

Definition at line 232 of file CSV.cpp.

void trimCSV ( std::string &  str  ) 

Converts an incorrect CSV field value into its probably correct value.

  • Strips out leading and trailing whitespace with trim().
  • If the trimmed filed is surrounded by quotes it will try to replace every pair of double quotes [""] by a single doble quote ["].
  • Will no verify that double quotes are correctly paired.

Sometimes a FILE.csv has quoted fields surrounded by whitespace. As these field values do not comply with RFC-4180, they are extracted by getNextCSV() as they come, with no whitespace removed and with their double quotes pairs intact. In the following example the string is enclosed in square parenthesis [..] instad of double quotes ["] for legibility:

        ["zero",  "if "" 1" , , " 3xt"  \r\n]
        [....0.,........ 1..,2,.........3...]

         csv field        getNextCSV()    trimCSV()
    +------------------+----------------+----------+
    | ["zero"]         | [zero]         | [zero]   |
    | [,  "if "" 1" ]  | [  "if "" 1" ] | [if " 1] |
    | [, ]             | [ ]            | []       |
    | [, " 3xt"  \r\n] | [ " 3xt"  ]    | [ 3xt]   |
    +------------------+----------------+----------+

By common sense, the programmer would expect that these strings be returned as they appear in the trimCSV() column, but the fact of the matter is that the only one that complies with RFC-4180 is the first one. After using trimCSV() on the value returned by getNextCSV() the result is what is reasonbly expected.

  • Nonetheless, the values fields that contain line feeds "\r" or carriage returns "\n" are probably processed in a way different form what it is expected, even before they are passed as arguments to trimCSV(). It is wiser no to trust this routine as a complete solution to process CSV files that do not fully comply with RFC-4180.

    {{  // test::trimCSV()
        CSV_line csv("\"zero\",  \"if \"\" 1\" , , \" 3xt\"  \r\f"); std::string s;
        s=csv[0]; assertTrue( s == "zero" );             trimCSV(s); assertTrue( s == "zero"    );
        s=csv[1]; assertTrue( s == "  \"if \"\" 1\" " ); trimCSV(s); assertTrue( s == "if \" 1" );
        s=csv[2]; assertTrue( s == " " );                trimCSV(s); assertTrue( s == ""        );
        s=csv[3]; assertTrue( s == " \" 3xt\"  \r\f" );  trimCSV(s); assertTrue( s == " 3xt"    );
    }}
See also:
test_CSV::test_trimCSV()

Definition at line 259 of file CSV.cpp.

void chop ( std::string &  str,
char  ch = 0 
)

Deletes ch when it is the trailing character in str.

  • The deleted character always is ch.

    {{  // test::chop()
        std::string str; char ch;
        str = "12345"; assertTrue( str == "12345" );
        chop(str,'0'); assertTrue( str == "12345" );

        for ( ch='5'; ch != '0'; --ch ) {
            assertTrue( str[str.size()-1] == ch );
            chop(str,ch);
        }

        assertTrue( str == "" );
        chop(str,'3'); assertTrue( str == "" );
    }}
See also:
test_CSV::test_chop()

Definition at line 304 of file CSV.cpp.


Generated on Wed May 27 11:04:47 2009 for CSV: by  doxygen 1.5.8