Jump to content

strsplit modification


photo

Recommended Posts

problem

 

Splitting some fixed-column-number, comma-separated-value (CSV) string is a common task. Unfortunately current implementation of strsplit() will drop empty tokens and will only return a subset of columns e.g.

 

strsplit( "col1;col2;;;col5", ";", tokens) -> "col1", "col2", "col5"

 

Empty columns 3+4 will therefore be "dropped". This renders strsplit() more or less useless for the most common task of CSV parsing were empty columns should be kept in the output e.g.

 

"col1", "col2", "", "", "col5" 

 

so column sequence/association of the output remains intact.

 

proposal

 

Addition of an optional flag argument e.g. keepEmpty to suppress current removal of empty columns in output. By using a default argument of "0" it should be possible to keep the modification compatible with current script code.

     

Link to comment

Ulf,

 

Got you, but Frustum needs to check what was the reason to implement it this way in the first place. As soon as this issues is cleared, we'll write you back. may take some time, however. Will keep you posted.

Link to comment

Yep, temporary work-around 

 

    /**
     *  Split CVS string separated by specified character into tokens. 
     *
     *  REMARK we cannot use UNIGINE strplit as this function will drop empty columns
     *
     *  @param row          data row string
     *  @param delimiter    token delimiter character
     *  @param tokens       splitted string tokens
     *
     *  @return token count
     */
    int split( string row, string delimiters, string tokens[] )
    {
        string  data = row;

        tokens.clear();
            
        while( 1 )
        {
            // find next left-most delimiter character
            int splitPos = -1;
            
            for( int i=0; i<strlen(delimiters); i++ )
            {
                int pos = strstr( data, substr( delimiters, i, i+1 ) );
                
                if( pos != -1 )
                {
                    if( splitPos != -1 )    splitPos = min( pos, splitPos );
                    else                    splitPos = pos;
                }
            }
            
            if( splitPos == -1 ) break; // no more delimiter characters
            
            // extract next token 
            tokens.append( trim( substr( data, 0, splitPos ) ) );
            
            // remaining row string
            data = substr( data, splitPos + 1, strlen(data) - ( splitPos + 1 ) );
        }

        tokens.append( trim( data ) );
        
        return tokens.size();
    }
Link to comment
×
×
  • Create New...