Created
February 18, 2011 05:43
-
-
Save igrigorik/833301 to your computer and use it in GitHub Desktop.
libidn + addressable
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| From 35ee80699222754ae6dc3a2b933716d215fa7824 Mon Sep 17 00:00:00 2001 | |
| From: Ilya Grigorik <ilya@igvita.com> | |
| Date: Thu, 17 Feb 2011 23:42:08 -0500 | |
| Subject: [PATCH 1/2] use libidn instead of pure ruby implementation | |
| --- | |
| lib/addressable/uri.rb | 25 ++++++++++++++++++------- | |
| 1 files changed, 18 insertions(+), 7 deletions(-) | |
| diff --git a/lib/addressable/uri.rb b/lib/addressable/uri.rb | |
| index 46ed296..fdb0530 100644 | |
| --- a/lib/addressable/uri.rb | |
| +++ b/lib/addressable/uri.rb | |
| @@ -23,7 +23,14 @@ | |
| #++ | |
| require "addressable/version" | |
| -require "addressable/idna" | |
| +#require "addressable/idna" | |
| + | |
| +begin | |
| + require 'rubygems' | |
| + require 'idn' | |
| +rescue | |
| + puts "install idn for faster performance" | |
| +end | |
| module Addressable | |
| ## | |
| @@ -432,7 +439,8 @@ module Addressable | |
| unencoded = self.unencode_component(component) | |
| begin | |
| encoded = self.encode_component( | |
| - Addressable::IDNA.unicode_normalize_kc(unencoded), | |
| + IDN::Stringprep.nfkc_normalize(unencoded), | |
| + #Addressable::IDNA.unicode_normalize_kc(unencoded), | |
| character_class | |
| ) | |
| rescue ArgumentError | |
| @@ -530,7 +538,8 @@ module Addressable | |
| if value != nil | |
| begin | |
| components[key] = | |
| - Addressable::IDNA.unicode_normalize_kc(value.to_str) | |
| + IDN::Stringprep.nfkc_normalize(value.to_str) | |
| + #Addressable::IDNA.unicode_normalize_kc(value.to_str) | |
| rescue ArgumentError | |
| # Likely a malformed UTF-8 character, skip unicode normalization | |
| components[key] = value.to_str | |
| @@ -951,9 +960,10 @@ module Addressable | |
| @normalized_host ||= (begin | |
| if self.host != nil | |
| if self.host.strip != "" | |
| - result = ::Addressable::IDNA.to_ascii( | |
| - self.class.unencode_component(self.host.strip.downcase) | |
| - ) | |
| + result = IDN::Idna.toASCII(self.class.unencode_component(self.host.strip.downcase)) | |
| + #result = ::Addressable::IDNA.to_ascii( | |
| + # self.class.unencode_component(self.host.strip.downcase) | |
| + #) | |
| if result[-1..-1] == "." | |
| # Trailing dots are unnecessary | |
| result = result[0...-1] | |
| @@ -1993,7 +2003,8 @@ module Addressable | |
| # @return [Addressable::URI] A URI suitable for display purposes. | |
| def display_uri | |
| display_uri = self.normalize | |
| - display_uri.host = ::Addressable::IDNA.to_unicode(display_uri.host) | |
| + #display_uri.host = ::Addressable::IDNA.to_unicode(display_uri.host) | |
| + display_uri.host = IDN::Idna.toUnicode(display_uri.host) | |
| return display_uri | |
| end | |
| -- | |
| 1.7.1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| From 389230b72b2a294f6e2e025534c69f358f3e8f67 Mon Sep 17 00:00:00 2001 | |
| From: root <root@ip-10-114-167-102.ec2.internal> | |
| Date: Thu, 17 Feb 2011 23:51:24 -0500 | |
| Subject: [PATCH 2/2] this is bizarre | |
| --- | |
| lib/addressable/uri.rb | 3 ++- | |
| 1 files changed, 2 insertions(+), 1 deletions(-) | |
| diff --git a/lib/addressable/uri.rb b/lib/addressable/uri.rb | |
| index fdb0530..6822626 100644 | |
| --- a/lib/addressable/uri.rb | |
| +++ b/lib/addressable/uri.rb | |
| @@ -2105,7 +2105,8 @@ module Addressable | |
| # @return [TrueClass, FalseClass] | |
| # <code>true</code> if the URI is frozen, <code>false</code> otherwise. | |
| def frozen? | |
| - self.to_s.frozen? | |
| + false | |
| + #self.to_s.frozen? | |
| end | |
| ## | |
| -- | |
| 1.7.1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Libidn: http://www.gnu.org/software/libidn/manual/libidn.html | |
| Ruby wrapper for libidn: http://idn.rubyforge.org/docs/ | |
| Pure-ruby: | |
| user system total real | |
| 9.500000 0.050000 9.550000 ( 10.029430) | |
| LibIDN: | |
| user system total real | |
| 3.390000 0.020000 3.410000 ( 3.527302) | |
| Trace post libidn integration: | |
| 32 10.6% 10.6% 32 10.6% garbage_collector | |
| 29 9.6% 20.1% 29 9.6% Addressable::URI.encode_component | |
| 29 9.6% 29.7% 62 20.5% Addressable::URI.normalize_component | |
| 28 9.2% 38.9% 63 20.8% Addressable::URI#frozen? | |
| 25 8.3% 47.2% 35 11.6% Addressable::URI#to_s | |
| 23 7.6% 54.8% 66 21.8% Addressable::URI#normalized_path | |
| 20 6.6% 61.4% 114 37.6% Addressable::URI#initialize | |
| 11 3.6% 65.0% 109 36.0% Addressable::URI#defer_validation | |
| 11 3.6% 68.6% 59 19.5% Addressable::URI.parse | |
| 9 3.0% 71.6% 9 3.0% Addressable::URI.normalize_path | |
| 8 2.6% 74.3% 10 3.3% Addressable::URI#authority | |
| 7 2.3% 76.6% 26 8.6% Addressable::URI#authority= | |
| 7 2.3% 78.9% 8 2.6% Kernel#gem_original_require | |
| 6 2.0% 80.9% 252 83.2% Integer#times | |
| 5 1.7% 82.5% 12 4.0% Addressable::URI#normalized_authority | |
| 5 1.7% 84.2% 5 1.7% Addressable::URI#validate | |
| Q: URI#frozen? This seems bizarre. Removing this shaves another 10% in performance. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment