If all you need is the removal of the field-internal CRLFs, try the following (assumes GNUawk
, but it could be made to work with BSD awk
as well):
awk -v RS='\r?\n''/,[[:digit:]]{4,7}$/ { print; next } { printf("%s ", $0) }' input.csv > output.csv
/,[[:digit:]]{4,7}$/
matches only lines that end in 4-7 digits, implying that the line at hand is either a complete record or is a multi-line record's last line.{ print; next }
simply prints the line with a terminating\n
(if you wanted\r\n
on output too, you'd have to useprintf("%s\r\n", $0)
instead).
{ printf("%s ", $0) }
is then only printed for record fragments, i.e., a record that has a field-internal CRLF and therefore continues on the next line; by printing it withprintf
and just a trailing space, the net effect is that multiple lines comprising a single record are effectively joined with a space each on output.