Skip to content

Instantly share code, notes, and snippets.

@theSage21
Created August 19, 2015 07:29
Show Gist options
  • Save theSage21/e1f23c1cc88c6f06e49a to your computer and use it in GitHub Desktop.
Save theSage21/e1f23c1cc88c6f06e49a to your computer and use it in GitHub Desktop.
Results after changes
Wrote profile results to testing.py.lprof
Timer unit: 1e-06 s
Total time: 0.001185 s
File: html2text/__init__.py
Function: feed at line 121
Line # Hits Time Per Hit % Time Line Contents
==============================================================
121 @profile
122 def feed(self, data):
123 2 6 3.0 0.5 data = data.replace("</' + 'script>", "</ignore>")
124 2 1179 589.5 99.5 HTMLParser.HTMLParser.feed(self, data)
Total time: 0.195888 s
File: html2text/__init__.py
Function: handle at line 126
Line # Hits Time Per Hit % Time Line Contents
==============================================================
126 @profile
127 def handle(self, data):
128 1 1183 1183.0 0.6 self.feed(data)
129 1 20 20.0 0.0 self.feed("")
130 1 194685 194685.0 99.4 return self.optwrap(self.close())
Total time: 0.194552 s
File: html2text/__init__.py
Function: optwrap at line 786
Line # Hits Time Per Hit % Time Line Contents
==============================================================
786 @profile
787 def optwrap(self, text):
788 """
789 Wrap all paragraphs in the provided text.
790
791 :type text: str
792
793 :rtype: str
794 """
795 1 3 3.0 0.0 if not self.body_width:
796 return text
797
798 1 2 2.0 0.0 assert wrap, "Requires Python 2.3."
799 1 1 1.0 0.0 result = ''
800 1 1 1.0 0.0 newlines = 0
801 # I cannot think of a better solution for now.
802 # To avoid the non-wrap behaviour for entire paras
803 # because of the presence of a link in it
804 1 2 2.0 0.0 if not self.wrap_links:
805 self.inline_links = False
806 3 13 4.3 0.0 for para in text.split("\n"):
807 2 4 2.0 0.0 if len(para) > 0:
808 1 210 210.0 0.1 if not skipwrap(para, self.wrap_links):
809 1 194295 194295.0 99.9 temp = (result, "\n".join(wrap(para, self.body_width)))
810 1 7 7.0 0.0 result = ''.join(temp)
811 1 5 5.0 0.0 if para.endswith(' '):
812 result += " \n"
813 newlines = 1
814 else:
815 1 4 4.0 0.0 result += "\n\n"
816 1 2 2.0 0.0 newlines = 2
817 else:
818 # Warning for the tempted!!!
819 # Be aware that obvious replacement of this with
820 # line.isspace()
821 # DOES NOT work! Explanations are welcome.
822 if not config.RE_SPACE.match(para):
823 result += para + "\n"
824 newlines = 1
825 else:
826 1 2 2.0 0.0 if newlines < 2:
827 result += "\n"
828 newlines += 1
829 1 1 1.0 0.0 return result
Total time: 0.197733 s
File: html2text/__init__.py
Function: html2text at line 831
Line # Hits Time Per Hit % Time Line Contents
==============================================================
831 @profile
832 def html2text(html, baseurl='', bodywidth=config.BODY_WIDTH):
833 1 1834 1834.0 0.9 h = HTML2Text(baseurl=baseurl, bodywidth=bodywidth)
834
835 1 195899 195899.0 99.1 return h.handle(html)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment