Curious as to what those programs were. Most well-behaved programs can detect they’re outputting to a pipe, for example, and will drop the terminal escape codes. That is, pushing the output through something like cat ought to have caused those codes to not be generated in the first place.
Those same programs often have an option to generate the codes regardless, but that shouldn’t be the default.
An example here being the versions of ls that have the --color=auto option, which behaves the pipe-detecting way, and --color=force to send the escapes anyway. Of course, giving no --color option at all also avoids the escape codes, but many distros automatically set users up with an alias for ls that contains it, and the auto sub-option is precisely so users (and distro makers) can have ls act nicely.
Of course, the thing you were having trouble with might not have been a well-behaved program, which is why I’m curious.
Oh, most of the “old” programs are all well-behaved. It’s just a crop of more recent tooling; I seem to encounter more frequently ones that either barf if they can’t understand the terminfo, or always generate terminal codes that need to be cleaned out.
It’s not just me. I think I have a shell function defined to strip control codes, but after trying to build it myself a while ago I ended up stealing someone else’s that was more complete. The fact that it’s easy to search for and find Q/A about this is a pretty good indication I’m not having a singular experience. IIRC, even the fairly complex perl script I eventually ended up with came with a disclaimer about there being edge cases it didn’t handle.
My understanding of escape codes is that there’s really only a handful (0x1bNNm) and they should be easy to strip, but there are complexities like being able to compound codes (0x1bNN;XXm) that make it more challenging. That’s about where my knowledge ends.
I just know it’s occasionally a PITA when I want to process data.
Oh! One example is immortal. immortalctl always dumps control characters and is impossible to reliably grep.
Sounds like a “pester the devs” kind of deal if it’s an open-source project. It’d be a matter of them calling isatty(3) and a few extra if statements.
As for Perl, s/\e\[[0-9;]*[a-z]//gi would be my first attempt to get rid of them. You’ve probably been through all this already though.
Technical waffle:
The aforementioned regex/substitution would also delete malformed things like \e[;;;q, but since the offending supplier of codes is probably only generating valid codes, that shouldn’t matter much. There are also rarer escape sequences that it doesn’t catch, which would be where those better third party tools come in.
Come to think of it, there’d be a regex that detects everything laid out in the control_codes(4) man page (and, importantly, nothing that isn’t). It would be one of those terrifying write-only things like the one that validates the full e-mail address standard, but that only proves that such things are possible.
I’m almost tempted to have a go at creating it. Almost. Maybe another day.
Curious as to what those programs were. Most well-behaved programs can detect they’re outputting to a pipe, for example, and will drop the terminal escape codes. That is, pushing the output through something like
cat
ought to have caused those codes to not be generated in the first place.Those same programs often have an option to generate the codes regardless, but that shouldn’t be the default.
An example here being the versions of
ls
that have the--color=auto
option, which behaves the pipe-detecting way, and--color=force
to send the escapes anyway. Of course, giving no--color
option at all also avoids the escape codes, but many distros automatically set users up with an alias forls
that contains it, and theauto
sub-option is precisely so users (and distro makers) can havels
act nicely.Of course, the thing you were having trouble with might not have been a well-behaved program, which is why I’m curious.
Oh, most of the “old” programs are all well-behaved. It’s just a crop of more recent tooling; I seem to encounter more frequently ones that either barf if they can’t understand the terminfo, or always generate terminal codes that need to be cleaned out.
It’s not just me. I think I have a shell function defined to strip control codes, but after trying to build it myself a while ago I ended up stealing someone else’s that was more complete. The fact that it’s easy to search for and find Q/A about this is a pretty good indication I’m not having a singular experience. IIRC, even the fairly complex perl script I eventually ended up with came with a disclaimer about there being edge cases it didn’t handle.
My understanding of escape codes is that there’s really only a handful (0x1bNNm) and they should be easy to strip, but there are complexities like being able to compound codes (0x1bNN;XXm) that make it more challenging. That’s about where my knowledge ends.
I just know it’s occasionally a PITA when I want to process data.
Oh! One example is immortal.
immortalctl
always dumps control characters and is impossible to reliably grep.Sounds like a “pester the devs” kind of deal if it’s an open-source project. It’d be a matter of them calling isatty(3) and a few extra if statements.
As for Perl,
s/\e\[[0-9;]*[a-z]//gi
would be my first attempt to get rid of them. You’ve probably been through all this already though.Technical waffle:
The aforementioned regex/substitution would also delete malformed things like \e[;;;q, but since the offending supplier of codes is probably only generating valid codes, that shouldn’t matter much. There are also rarer escape sequences that it doesn’t catch, which would be where those better third party tools come in.
Come to think of it, there’d be a regex that detects everything laid out in the control_codes(4) man page (and, importantly, nothing that isn’t). It would be one of those terrifying write-only things like the one that validates the full e-mail address standard, but that only proves that such things are possible.
I’m almost tempted to have a go at creating it. Almost. Maybe another day.