Skip to content

Instantly share code, notes, and snippets.

Yakov Shafranovich yakovsh

Block or report user

Report or block yakovsh

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@yakovsh
yakovsh / 2006_06_01-fixing_utf8_with_regex.md
Last active Jan 17, 2016
Fixing Malformed UTF-8 via Regex
View 2006_06_01-fixing_utf8_with_regex.md

I have been struggling with a weird problem on one of my sites that prevent that site from functioning. One of XML files that is used for this site is supposed to come in UTF-8 but unfortunatly it had some extra characters that were not encoded properly. After looking at this site [http://perl-xml.sourceforge.net/faq/#encoding_conversion], I came up with a short regular expression of my own that can convert any malformed UTF-8 characters to XML/HTML numbered entities:

s/([^x80-xFF])/'?' . ord($1) . ';'/gse;

On a related note, another issue that came up a while back is the use of ampresand without being encoded as "&". Here is another regex to solve that issue (don't remember the site I got it from):

s/&(?!#?[xX]?(?:[0-9a-fA-F]+|w{1,8});)/&/g;
@yakovsh
yakovsh / 2008_10_26-fix_utf8_input.pl
Last active Jan 17, 2016
Fixing "Input is not proper UTF-8, indicate encoding" Error
View 2008_10_26-fix_utf8_input.pl
# Quick way to fix the following error in Perl:
#
# :1: parser error : Input is not proper UTF-8, indicate encoding !
# Bytes: 0xA0 0x20 0xA0 0x3C
#
# Use this command:
#
use Encode:
$string1 = decode("UTF-8", $input);
@yakovsh
yakovsh / 2007_01_16-postgres.vb
Last active Jan 17, 2016
Using PostgreSQL on Windows with ADO and VB
View 2007_01_16-postgres.vb
' The problem with PostgreSQL is lack of documentation for Windows interfaces. Visual Basic uses
' the ADO library to connect to the PostgreSQL ODBC driver, which in turns connects to the server.
'
' This example covers a unique requirement - the network has over 300 individual desktop machines,
' all of which must be able to access the planned PostgreSQL server via Access, VBA or VB6.
' However, they do not want to go and setup a data source name (DSN) on each machine separately
' (installing ODBC is easier via the Windows deployment tools). Unfortunately, the ODBC driver has
' absolutely zero documentation as to how to setup an ADO connection WITHOUT a DSN. After some prolonged
' tries and failures, we both were finally able to come up with a solution which I am posting here for
' others to benefit from.
@yakovsh
yakovsh / 2009_02_09-cleanup2.pl
Last active Jan 17, 2016
Cleaning Up Bad HTML in Perl, Take 2
View 2009_02_09-cleanup2.pl
# Here is another way to cleanup bad HTML with Perl, and convert to XML:
# This approach relies on the HTML::DOMbo module to do the actual conversion
# between HTML and XML, and HTML::TreeBuilder for parsing.
use HTML::DOMbo;
use HTML::TreeBuilder;
use XML::LibXML;
$html_code = '';
@yakovsh
yakovsh / 2008_10_24-cleanup.pl
Last active Jan 17, 2016
Cleaning Up Bad HTML in Perl
View 2008_10_24-cleanup.pl
#
# Here is a short way to cleanup bad HTML input and convert to XML with Perl:
#
use HTML::TreeBuilder;
use XML::LibXML;
$html_code = '';
my $builder = HTML::TreeBuilder->new();
@yakovsh
yakovsh / 2009_05_06-delete_s3_bucket.pl
Last active Jan 17, 2016
Deleting Amazon S3 Bucket with A Lot of Files
View 2009_05_06-delete_s3_bucket.pl
#!/usr/bin/perl
#
# Here is a short script that can mass delete files in an Amazon S3 bucket. It is limited to a 1,000 keys at a time
#
use Net::Amazon::S3;
my $s3 = Net::Amazon::S3->new({
aws_access_key_id => 'ACCESS_ID',
aws_secret_access_key => 'ACCESS_KEY',
@yakovsh
yakovsh / 2009_02_11-xml2json.pl
Last active Jan 17, 2016
Converting JSON to XML with Perl
View 2009_02_11-xml2json.pl
# Recently I had to work with Google AJAX API data which returns in JSON. For my purposes, the data needed to be in XML.
# While there is a CPAN module called XML2JSON which is designed to do that, for some reason it chokes on my input.
# Instead, I adopted a much more simple technique from the Google::Data::JSON module as follows.
use JSON::Any;
use XML::Simple;
my $convertor = JSON::Any->new();
my $data = $convertor->decode($json);
my $xml = XMLout($data);
@yakovsh
yakovsh / 2008_12_28-unicode_in_s3.pl
Last active Jan 17, 2016
Handling Unicode Data in Amazon S3 Headers
View 2008_12_28-unicode_in_s3.pl
# During a recent project, I ran into an issue when handling Unicode data in metadata headers in Amazon S3.
# Apparently, Amazon adds on "?UTF-8?B?" in front of any Unicode data and "?=" in end of the data.
# I could not find any existing standard that describes this or why it is done, but I surmise this probably
# has to do with Base-64 encoding and how it handles Unicode.
#
# As per @rawnsley:
# apparently this is because HTTP headers must only be encoded in ASCII: http://stackoverflow.com/a/4410331/671393
#
# An easy Perl hack to get around this is as following (assuming you are using MIME::Base64 module):
@yakovsh
yakovsh / 2004_08_30-xml_in_html.xsl
Last active Jan 17, 2016
Display XML in HTML files (XSLT)
View 2004_08_30-xml_in_html.xsl
<!--
While working with XSLT templates, I came across an interesting problem. I am using an XSLT template
to transform an XML file into HTML. However, for debugging purposes I need to see the original XML
and since the generation process is done on a web server (like Resin does), it is not easy to get it.
The solution: display the original XML file inside the output HTML itself. As it turns out, this was
not easy since it requires to change all "<" and ">" to use entities like "<" and ">". In XSLT,
the solution looks as follows (another solution would be to use JavaScript to escape this client-side)
-->
<xsl:template match="*">
@yakovsh
yakovsh / 2005_01_13-visited-links.css
Last active Jan 17, 2016
Making Visited Links Look the Same as Unvisited Links
View 2005_01_13-visited-links.css
You can’t perform that action at this time.