Skip to content

Instantly share code, notes, and snippets.

@leandro
Last active February 15, 2023 17:11
Show Gist options
  • Save leandro/b806d92c07d08a6c39e3148161a99cc9 to your computer and use it in GitHub Desktop.
Save leandro/b806d92c07d08a6c39e3148161a99cc9 to your computer and use it in GitHub Desktop.
HTML based string into human friendly based plain text string
module StringUtils
CONTAINS_HTML_TAG_REGEX = /<[a-z][a-z0-9-]*?[^>]*?>/im
SELF_CLOSING_TAGS = %w[br hr img wbr].freeze
SPACE_REGEX = /^(?:\R|\s)$/m
module_function
# This method aims to convert HTML based strings into human friendly plain
# text strings, where paragraphs represented by <p> tag wrapped text chunks
# are replaced by their contents succeeded by two line break characters
# (a.k.a. "\n" chars) and <br> tags are replaced by one line break character.
def html_to_plain_text(html, internal: false)
return '' if html.blank?
return html.strip if html.is_a?(String) && html !~ CONTAINS_HTML_TAG_REGEX
html = Nokogiri::HTML.fragment(html) if html.is_a?(String)
prevs = []
html.children.filter_map do |node|
node_name, content, parent = node.name, node.content, node.parent
last_contentful_prev = prevs.find do |prev|
prev[:name].in?(SELF_CLOSING_TAGS) || !prev[:content].empty?
end || {}
prev_name, prev_content = last_contentful_prev.values_at(:name, :content)
preceded_by_p = prev_name == 'p'
empty_prev = last_contentful_prev.blank?
precede_with_whitespace = prevs.present? && !empty_prev && !preceded_by_p &&
!prev_name.in?(SELF_CLOSING_TAGS) && prev_content[-1] !~ SPACE_REGEX &&
content[0] !~ SPACE_REGEX
head = (precede_with_whitespace && ' ') ||
(!empty_prev && node_name == 'p' && !preceded_by_p && "\n\n") || ''
tail = parent&.name == 'p' && node.next.nil? ? "\n\n" : ''
prevs.unshift({ name: node_name, content: content })
next "\n#{tail}" if node_name == 'br'
next (content.empty? ? nil : head << content << tail) if node.text?
head << html_to_plain_text(node, internal: true) << tail
end.join.public_send(internal ? :itself : :strip)
end
end
# frozen_string_literal: true
RSpec.describe StringUtils do
describe '.html_to_plain_text' do
let(:html_body_1) do
"----------\r\n<span style='font-weight: bold;'>Intaker: Before I ask a few more important q"\
"uestions about your case, may I have your Full Name?</span><br>\r\nClient: - Jennifer Bresl"\
"in<br>\r\n<span style='font-weight: bold;'>Intaker: May I also have a phone number in case "\
"we need to speak with you?</span><br>\r\nClient: - +1 (800)443-2034<br>\r\nClient: - Yes, "\
'I consent to the practice contacting me by text message for the purpose of my inquiry to a '\
'DIFFERENT NUMBER. New Number: +1 (800)443-2034. Time: 02/14/2023 16:46:44 (EST). User IP ad'\
"dress ::ffff:10.232.4.10<br>\r\n<span style='font-weight: bold;'>Intaker: And an email addr"\
"ess?</span><br>\r\nClient: - cbre.jennifer@yahoo.com<br>\r\n<span style='font-weight: bold;"\
"'>Intaker: Could you please tell me about the legal matter?</span><br>\r\nClient: - A debt "\
'collector called for my husband. She gave the mini Miranda then disclosed his debt to me. '\
' I advise in the state of SC that was illegal. Her supervisor then got on the line and di'\
'd the same thing. When I told her I was in SC and this was illegal, she said I gave her per'\
'mission to speak about it. In the state of SC, you HAVE to have the persons permission bef'\
"ore you disclose debt. Not to mention this debt is past the statute of limitations.<br>\r"\
"\n----------\r\nReferral Url: www.google.com/<br>\r\nCurrent Url: https://www.blossomlaw.co"\
"m/bankruptcy/fair-debt-practices-act<br>\r\nApprox IP Location: Fort Mill, SC, US<br><br>\r"\
"\nBot Name: Main<br>\r\nPrompt: Another issue<br>\r\n<a target='_blank' href='https://dash"\
"board.intaker.com/leads/2181447'>Open the lead in Intaker</a><br>\r\n"
end
let(:html_body_2) do
'<p>Hello,</p><p>Welcome to Farble! I am one of your case managers, Hayley, working directly'\
' with your attorney. You might have received an automated email with information and contac'\
't numbers for some of your legal team but I wanted to reach out to introduce myself as well'\
'.As a case manager I can receive any documents, summary of events and requests for updates '\
'from you and follow up with your attorney and have everything transferred to your file. You'\
' can email me here or reach out to our general support line at the number below with any qu'\
'estions you have as well. Please let me know how I can assist in making this process easier'\
' for you in any way.Employment cases take time so it may seem to you as if there is no move'\
'ment or minimal movement in the progression of your case. Please know that your attorney an'\
'd legal team are working to move things forward to obtain the best&nbsp;outcome.Communicati'\
'ons from your legal support team will not be frequent however will occur when there is an u'\
'pdate in your case. Also, if you are aware of an approaching deadline please let us know.At'\
' this point you should have had an initial call with your attorney to go over your case and'\
' the next steps. &nbsp;Usually your attorney will ask you to send in any supporting documen'\
'tation, you can send it to this email if you have not sent them directly to your attorney y'\
'et. &nbsp;The next several weeks after that is submitted, your attorney will review everyth'\
'ing, do any research necessary, and consult with their team to determine strategy for your '\
'case. &nbsp;Next they will draft any paperwork needed for the next steps in your case. All '\
'of that can take some time to complete and your attorney or our team will be in touch with '\
'you when we have new information. If your attorney has indicated a different timeline, plea'\
'se follow that advice or if you have a deadline approaching please let us know. &nbsp;You c'\
'an always reach out to me if you have questions on where in the process your case is and I '\
'can follow up with that.Thank you so much and welcome to Farble.<br>Hayley</p>'
end
let(:html_body_3) do
'<p><span style="background-color:yellow;">Called &amp; spoke with Matthew McDormand – advis'\
'ed that we do not handle probate litigation, REFERRED TO JUSTIA.COM or FLORIDA BAR ASSOCATI'\
'ON</span></p>'
end
let(:html_body_4) { '<p>this an interesting</p><br />logic' }
let(:html_body_5) do
"<p>this<p>is a very<p>weird HTML structure</p>but ok, right?</p></p><span style='color:#fb0"\
";'>No judgements!</span>"
end
let(:html_body_6) { '<p>a really <p>bad <p>HTML chunk</p> over <p>here folks' }
let(:plain_text_body_1) do
"----------\r\nIntaker: Before I ask a few more important questions about your case, may I h"\
"ave your Full Name?\n\r\nClient: - Jennifer Breslin\n\r\nIntaker: May I also have a phone n"\
"umber in case we need to speak with you?\n\r\nClient: - +1 (800)443-2034\n\r\nClient: - Ye"\
's, I consent to the practice contacting me by text message for the purpose of my inquiry to'\
' a DIFFERENT NUMBER. New Number: +1 (800)443-2034. Time: 02/14/2023 16:46:44 (EST). User IP'\
" address ::ffff:10.232.4.10\n\r\nIntaker: And an email address?\n\r\nClient: - cbre.jennife"\
"r@yahoo.com\n\r\nIntaker: Could you please tell me about the legal matter?\n\r\nClient: - A"\
' debt collector called for my husband. She gave the mini Miranda then disclosed his debt t'\
'o me. I advise in the state of SC that was illegal. Her supervisor then got on the line '\
'and did the same thing. When I told her I was in SC and this was illegal, she said I gave h'\
'er permission to speak about it. In the state of SC, you HAVE to have the persons permissi'\
'on before you disclose debt. Not to mention this debt is past the statute of limitations.'\
"\n\r\n----------\r\nReferral Url: www.google.com/\n\r\nCurrent Url: https://www.blossomlaw."\
"com/bankruptcy/fair-debt-practices-act\n\r\nApprox IP Location: Fort Mill, SC, US\n\n\r\nBo"\
"t Name: Main\n\r\nPrompt: Another issue\n\r\nOpen the lead in Intaker"
end
let(:plain_text_body_2) do
"Hello,\n\nWelcome to Farble! I am one of your case managers, Hayley, working directly with "\
'your attorney. You might have received an automated email with information and contact numb'\
'ers for some of your legal team but I wanted to reach out to introduce myself as well.As a '\
'case manager I can receive any documents, summary of events and requests for updates from y'\
'ou and follow up with your attorney and have everything transferred to your file. You can e'\
'mail me here or reach out to our general support line at the number below with any question'\
's you have as well. Please let me know how I can assist in making this process easier for y'\
'ou in any way.Employment cases take time so it may seem to you as if there is no movement o'\
'r minimal movement in the progression of your case. Please know that your attorney and lega'\
'l team are working to move things forward to obtain the best outcome.Communications from yo'\
'ur legal support team will not be frequent however will occur when there is an update in yo'\
'ur case. Also, if you are aware of an approaching deadline please let us know.At this point'\
' you should have had an initial call with your attorney to go over your case and the next s'\
'teps.  Usually your attorney will ask you to send in any supporting documentation, you can '\
'send it to this email if you have not sent them directly to your attorney yet.  The next se'\
'veral weeks after that is submitted, your attorney will review everything, do any research '\
'necessary, and consult with their team to determine strategy for your case.  Next they will'\
' draft any paperwork needed for the next steps in your case. All of that can take some time'\
' to complete and your attorney or our team will be in touch with you when we have new infor'\
'mation. If your attorney has indicated a different timeline, please follow that advice or i'\
'f you have a deadline approaching please let us know.  You can always reach out to me if yo'\
'u have questions on where in the process your case is and I can follow up with that.Thank y'\
"ou so much and welcome to Farble.\nHayley"
end
let(:plain_text_body_3) do
'Called & spoke with Matthew McDormand – advised that we do not handle probate litigation, R'\
'EFERRED TO JUSTIA.COM or FLORIDA BAR ASSOCATION'
end
let(:plain_text_body_4) { "this an interesting\n\n\nlogic" }
let(:plain_text_body_5) do
"this\n\nis a very\n\nweird HTML structure\n\nbut ok, right? No judgements!"
end
let(:plain_text_body_6) { "a really \n\nbad \n\nHTML chunk\n\n over \n\nhere folks" }
it 'properly transforms HTML based strings into human friendly plain texts' do
expect(described_class.html_to_plain_text(html_body_1)).to eq(plain_text_body_1)
expect(described_class.html_to_plain_text(html_body_2)).to eq(plain_text_body_2)
expect(described_class.html_to_plain_text(html_body_3)).to eq(plain_text_body_3)
expect(described_class.html_to_plain_text(html_body_4)).to eq(plain_text_body_4)
expect(described_class.html_to_plain_text(html_body_5)).to eq(plain_text_body_5)
expect(described_class.html_to_plain_text(html_body_6)).to eq(plain_text_body_6)
expect(described_class.html_to_plain_text(' ')).to eq('')
expect(described_class.html_to_plain_text("\n \r\n")).to eq('')
expect(described_class.html_to_plain_text('')).to eq('')
expect(described_class.html_to_plain_text(nil)).to eq('')
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment