Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save igrigorik/833301 to your computer and use it in GitHub Desktop.
Save igrigorik/833301 to your computer and use it in GitHub Desktop.
libidn + addressable
From 35ee80699222754ae6dc3a2b933716d215fa7824 Mon Sep 17 00:00:00 2001
From: Ilya Grigorik <ilya@igvita.com>
Date: Thu, 17 Feb 2011 23:42:08 -0500
Subject: [PATCH 1/2] use libidn instead of pure ruby implementation
---
lib/addressable/uri.rb | 25 ++++++++++++++++++-------
1 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/lib/addressable/uri.rb b/lib/addressable/uri.rb
index 46ed296..fdb0530 100644
--- a/lib/addressable/uri.rb
+++ b/lib/addressable/uri.rb
@@ -23,7 +23,14 @@
#++
require "addressable/version"
-require "addressable/idna"
+#require "addressable/idna"
+
+begin
+ require 'rubygems'
+ require 'idn'
+rescue
+ puts "install idn for faster performance"
+end
module Addressable
##
@@ -432,7 +439,8 @@ module Addressable
unencoded = self.unencode_component(component)
begin
encoded = self.encode_component(
- Addressable::IDNA.unicode_normalize_kc(unencoded),
+ IDN::Stringprep.nfkc_normalize(unencoded),
+ #Addressable::IDNA.unicode_normalize_kc(unencoded),
character_class
)
rescue ArgumentError
@@ -530,7 +538,8 @@ module Addressable
if value != nil
begin
components[key] =
- Addressable::IDNA.unicode_normalize_kc(value.to_str)
+ IDN::Stringprep.nfkc_normalize(value.to_str)
+ #Addressable::IDNA.unicode_normalize_kc(value.to_str)
rescue ArgumentError
# Likely a malformed UTF-8 character, skip unicode normalization
components[key] = value.to_str
@@ -951,9 +960,10 @@ module Addressable
@normalized_host ||= (begin
if self.host != nil
if self.host.strip != ""
- result = ::Addressable::IDNA.to_ascii(
- self.class.unencode_component(self.host.strip.downcase)
- )
+ result = IDN::Idna.toASCII(self.class.unencode_component(self.host.strip.downcase))
+ #result = ::Addressable::IDNA.to_ascii(
+ # self.class.unencode_component(self.host.strip.downcase)
+ #)
if result[-1..-1] == "."
# Trailing dots are unnecessary
result = result[0...-1]
@@ -1993,7 +2003,8 @@ module Addressable
# @return [Addressable::URI] A URI suitable for display purposes.
def display_uri
display_uri = self.normalize
- display_uri.host = ::Addressable::IDNA.to_unicode(display_uri.host)
+ #display_uri.host = ::Addressable::IDNA.to_unicode(display_uri.host)
+ display_uri.host = IDN::Idna.toUnicode(display_uri.host)
return display_uri
end
--
1.7.1
From 389230b72b2a294f6e2e025534c69f358f3e8f67 Mon Sep 17 00:00:00 2001
From: root <root@ip-10-114-167-102.ec2.internal>
Date: Thu, 17 Feb 2011 23:51:24 -0500
Subject: [PATCH 2/2] this is bizarre
---
lib/addressable/uri.rb | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/lib/addressable/uri.rb b/lib/addressable/uri.rb
index fdb0530..6822626 100644
--- a/lib/addressable/uri.rb
+++ b/lib/addressable/uri.rb
@@ -2105,7 +2105,8 @@ module Addressable
# @return [TrueClass, FalseClass]
# <code>true</code> if the URI is frozen, <code>false</code> otherwise.
def frozen?
- self.to_s.frozen?
+ false
+ #self.to_s.frozen?
end
##
--
1.7.1
Libidn: http://www.gnu.org/software/libidn/manual/libidn.html
Ruby wrapper for libidn: http://idn.rubyforge.org/docs/
Pure-ruby:
user system total real
9.500000 0.050000 9.550000 ( 10.029430)
LibIDN:
user system total real
3.390000 0.020000 3.410000 ( 3.527302)
Trace post libidn integration:
32 10.6% 10.6% 32 10.6% garbage_collector
29 9.6% 20.1% 29 9.6% Addressable::URI.encode_component
29 9.6% 29.7% 62 20.5% Addressable::URI.normalize_component
28 9.2% 38.9% 63 20.8% Addressable::URI#frozen?
25 8.3% 47.2% 35 11.6% Addressable::URI#to_s
23 7.6% 54.8% 66 21.8% Addressable::URI#normalized_path
20 6.6% 61.4% 114 37.6% Addressable::URI#initialize
11 3.6% 65.0% 109 36.0% Addressable::URI#defer_validation
11 3.6% 68.6% 59 19.5% Addressable::URI.parse
9 3.0% 71.6% 9 3.0% Addressable::URI.normalize_path
8 2.6% 74.3% 10 3.3% Addressable::URI#authority
7 2.3% 76.6% 26 8.6% Addressable::URI#authority=
7 2.3% 78.9% 8 2.6% Kernel#gem_original_require
6 2.0% 80.9% 252 83.2% Integer#times
5 1.7% 82.5% 12 4.0% Addressable::URI#normalized_authority
5 1.7% 84.2% 5 1.7% Addressable::URI#validate
Q: URI#frozen? This seems bizarre. Removing this shaves another 10% in performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment