Skip to content

Instantly share code, notes, and snippets.

@atifaziz
Created December 15, 2016 15:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save atifaziz/4c2aeedcd5e33dbb4108edde90cbed16 to your computer and use it in GitHub Desktop.
Save atifaziz/4c2aeedcd5e33dbb4108edde90cbed16 to your computer and use it in GitHub Desktop.
LINQPad query to convert HTML table into a normalized textual tree
<Query Kind="Program">
<Reference>&lt;RuntimeDirectory&gt;\System.Net.Http.dll</Reference>
<NuGetReference>Fizzler.Systems.HtmlAgilityPack</NuGetReference>
<NuGetReference Prerelease="true">WebLinq</NuGetReference>
<Namespace>Fizzler</Namespace>
<Namespace>Fizzler.Systems.HtmlAgilityPack</Namespace>
<Namespace>HtmlAgilityPack</Namespace>
<Namespace>System.Net.Http</Namespace>
<Namespace>System.Threading.Tasks</Namespace>
<Namespace>WebLinq.Html</Namespace>
</Query>
async Task Main(string[] args)
{
var url = args != null && args.Any()
? new Uri(args.First())
: new Uri("https://en.wikipedia.org/wiki/Queen_discography");
string html;
using (var http = new HttpClient())
html = await http.GetStringAsync(url);
const string space = "\x20";
const string indent = space + space;
var tree =
from t in new HapHtmlParser().Parse(html, null).Tables(null)
let id = t.GetAttributeValue("id")
from ss in new[]
{
new[] { "- TABLE" + (string.IsNullOrEmpty(id) ? id : "#" + id) },
from row in
t.TableRows((_, tds) => from td in tds
select td?.InnerText ?? "<NULL>" into s
select Regex.Replace(s, @"\r|\n", " ") into s
select Regex.Replace(s, @"\s{2,}", " ") into s
select s.Trim())
from ss in new[]
{
new[] { indent + "- ROW" },
from cell in row
select indent + indent + "- " + cell
}
from s in ss
select s
}
from s in ss
select s;
foreach (var e in tree)
Console.WriteLine(e);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment