Oh, most of the “old” programs are all well-behaved. It’s just a crop of more recent tooling; I seem to encounter more frequently ones that either barf if they can’t understand the terminfo, or always generate terminal codes that need to be cleaned out.
It’s not just me. I think I have a shell function defined to strip control codes, but after trying to build it myself a while ago I ended up stealing someone else’s that was more complete. The fact that it’s easy to search for and find Q/A about this is a pretty good indication I’m not having a singular experience. IIRC, even the fairly complex perl script I eventually ended up with came with a disclaimer about there being edge cases it didn’t handle.
My understanding of escape codes is that there’s really only a handful (0x1bNNm) and they should be easy to strip, but there are complexities like being able to compound codes (0x1bNN;XXm) that make it more challenging. That’s about where my knowledge ends.
I just know it’s occasionally a PITA when I want to process data.
Oh! One example is immortal. immortalctl always dumps control characters and is impossible to reliably grep.
Sounds like a “pester the devs” kind of deal if it’s an open-source project. It’d be a matter of them calling isatty(3) and a few extra if statements.
As for Perl, s/\e\[[0-9;]*[a-z]//gi would be my first attempt to get rid of them. You’ve probably been through all this already though.
Technical waffle:
The aforementioned regex/substitution would also delete malformed things like \e[;;;q, but since the offending supplier of codes is probably only generating valid codes, that shouldn’t matter much. There are also rarer escape sequences that it doesn’t catch, which would be where those better third party tools come in.
Come to think of it, there’d be a regex that detects everything laid out in the control_codes(4) man page (and, importantly, nothing that isn’t). It would be one of those terrifying write-only things like the one that validates the full e-mail address standard, but that only proves that such things are possible.
I’m almost tempted to have a go at creating it. Almost. Maybe another day.
Oh, most of the “old” programs are all well-behaved. It’s just a crop of more recent tooling; I seem to encounter more frequently ones that either barf if they can’t understand the terminfo, or always generate terminal codes that need to be cleaned out.
It’s not just me. I think I have a shell function defined to strip control codes, but after trying to build it myself a while ago I ended up stealing someone else’s that was more complete. The fact that it’s easy to search for and find Q/A about this is a pretty good indication I’m not having a singular experience. IIRC, even the fairly complex perl script I eventually ended up with came with a disclaimer about there being edge cases it didn’t handle.
My understanding of escape codes is that there’s really only a handful (0x1bNNm) and they should be easy to strip, but there are complexities like being able to compound codes (0x1bNN;XXm) that make it more challenging. That’s about where my knowledge ends.
I just know it’s occasionally a PITA when I want to process data.
Oh! One example is immortal.
immortalctl
always dumps control characters and is impossible to reliably grep.Sounds like a “pester the devs” kind of deal if it’s an open-source project. It’d be a matter of them calling isatty(3) and a few extra if statements.
As for Perl,
s/\e\[[0-9;]*[a-z]//gi
would be my first attempt to get rid of them. You’ve probably been through all this already though.Technical waffle:
The aforementioned regex/substitution would also delete malformed things like \e[;;;q, but since the offending supplier of codes is probably only generating valid codes, that shouldn’t matter much. There are also rarer escape sequences that it doesn’t catch, which would be where those better third party tools come in.
Come to think of it, there’d be a regex that detects everything laid out in the control_codes(4) man page (and, importantly, nothing that isn’t). It would be one of those terrifying write-only things like the one that validates the full e-mail address standard, but that only proves that such things are possible.
I’m almost tempted to have a go at creating it. Almost. Maybe another day.