Quantcast
Channel: Need a way to strip extra CRLFs from the middle of records - Stack Overflow
Viewing all articles
Browse latest Browse all 2

Answer by mklement0 for Need a way to strip extra CRLFs from the middle of records

$
0
0

If all you need is the removal of the field-internal CRLFs, try the following (assumes GNUawk, but it could be made to work with BSD awk as well):

awk -v RS='\r?\n''/,[[:digit:]]{4,7}$/ { print; next } { printf("%s ", $0) }' input.csv > output.csv
  • /,[[:digit:]]{4,7}$/ matches only lines that end in 4-7 digits, implying that the line at hand is either a complete record or is a multi-line record's last line.
    • { print; next } simply prints the line with a terminating \n (if you wanted \r\n on output too, you'd have to useprintf("%s\r\n", $0) instead).
  • { printf("%s ", $0) } is then only printed for record fragments, i.e., a record that has a field-internal CRLF and therefore continues on the next line; by printing it with printf and just a trailing space, the net effect is that multiple lines comprising a single record are effectively joined with a space each on output.

Viewing all articles
Browse latest Browse all 2

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>