Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Change copyright from 2013 to 2012
@brendanfalkowski
Copy link

brendanfalkowski commented Nov 2, 2013

Using "find" with -print0 ensures that any filenames that (for whatever reason) include a newline character will not be mis-interpreted by as two separate files. It forces a null character instead of a newline after each matched file, and then "xargs" uses the -0 param to indicate the file list is piped in with null chars for separators...or so I learned 5 minutes ago.

Here's the full command I just tested on MagentoEE 1.13.0.2:
https://gist.github.com/brendanfalkowski/7274294

I get two of this error as output:
sed: RE error: illegal byte sequence

But if I search the directory, there are only 70 files with "(c) 2013 Magento" in them. They are .php.sample, .xml.sample, .css, and .js types. Not sure how to detect where it's choking, but it's processing almost everything.

@astorm
Copy link

astorm commented Nov 2, 2013

The sed on my home laptop doesn't suffer the RE error: illegal byte sequence problem, so the following is speculation.

I think sed should recover from RE error: illegal byte sequence and continue processing files. Try running the sed against a few of the individual files to see if they have extra spaces or something that may be tripping up the regular expressions.

sed -i '' 's/2013 Magento Inc./2012 Magento Inc./g' /path/to/individual-file

Re: identifying problem files Give this a try

find . -name '*.css' -print0 -o -name '*.html' -print0 -o -name '*.js' -print0 -o -name '*.php' -print0 -o -name '*.phtml' -print0 -o -name '*.xml' -print0 -o -name '*.xml.template' -print0 | xargs -t -n 1 -0 sed -i '' 's/(c) 2013 Magento/(c) 2012 Magento/g'

This command adds two arguments to xargs

xargs -t -n 1

The -n 1 commands says to run sed once (1) for every argument pipped in. The -t argument tells xargs to print every command. This means your screen will be filled with the output from the command.

Before you run this command, clear your terminal scrollback

View -> Clear Scrollback

Run the command. When it's done, select all the terminal text with

Edit -> Select All

And then copy/paste it into a text editor. Search for which line produces RE error: illegal byte sequence and you'll find your problem files. If they're nto binary, they may have an encoding which conflicts with what's set in $LANG.

@brendanfalkowski
Copy link

brendanfalkowski commented Nov 4, 2013

Thanks Alan, traced those files down:

/skin/frontend/enterprise/default/js/jqzoom/jquery.jqzoom1.0.1.js
/js/extjs/resources/css/ytheme-galdaka.css
/js/tiny_mce/plugins/spellchecker/editor_plugin.js
/js/tiny_mce/plugins/spellchecker/editor_plugin_src.js
  1. Without inspecting what went wrong, I can see these aren't files I'll need to merge/update in custom work. So it's a quick way to isolate them.
  2. The first two bugging files contain comments written in Italian and Spanish that use accented characters. I'm guessing the UTF-8 chars are what's causing sed to choke. I didn't notice anything like that in the next two, but it's probably a similar problem.

Regardless, we can use this to rule out the files that can't be processed with sed but are valid file types.

Updated my gist with this info: https://gist.github.com/brendanfalkowski/7274294

@astorm
Copy link

astorm commented Nov 4, 2013

One last follow up. It's not strictly UTF-8 characters that are tripping up sed. Per the previous Stack Overflow questions, sed will obey the encoding set in

$ echo $LANG
en_US.UTF-8

That means it's fine with UTF-8. The "real" problem is those files aren't UTF-8 encoded. BBEdit reports them as "Western (Max OS Roman)" on my system (— but text encoding is complicated).

So, a better explanation of what's going wrong is those files contain characters that aren't technically valid for their encoding. Our text editors and browsers have heuristics to do something smart when they encounters this — but sed's a tool written by c programmers to operate directly on bitstreams (sed stands for stream editor). When sed encounters those characters, it gets upset and bails rather than making a wrong "heuristical" guess.

Ultimately not useful to our task at hand, but interesting if you're interested in C programming.

@rtull
Copy link

rtull commented Sep 5, 2014

If all you're needing is a diff, try this:

diff -qrI '@copyright' /path/to/mage-v1 /path/to/mage-v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment