Skip to content

Instantly share code, notes, and snippets.

@bentrevett
Last active October 27, 2022 23:17
Show Gist options
  • Save bentrevett/274db7de0258bab8adf235045344bed7 to your computer and use it in GitHub Desktop.
Save bentrevett/274db7de0258bab8adf235045344bed7 to your computer and use it in GitHub Desktop.
linustechtips-forum-scraper
{"name": "linux", "headline": "linux", "text": "I am trying to build an os from scratch and wanted to look at Linux source code for some reference but I do not know where to find the source code ty in advance \ud83d\ude42\n \n\n\n\t\u00a0\n \n", "dateCreated": "2022-10-26T05:06:30+0000", "datePublished": "2022-10-26T05:06:30+0000", "dateModified": "2022-10-27T14:26:46+0000", "image": "https://linustechtips.com/uploads/monthly_2022_06/comic-book.thumb.gif.78e4e143db467ea95ca54a8d79a4f022.gif", "author": {"@type": "Person", "name": "swabro", "image": "https://linustechtips.com/uploads/monthly_2022_06/comic-book.thumb.gif.78e4e143db467ea95ca54a8d79a4f022.gif", "url": "https://linustechtips.com/profile/991006-swabro/"}, "interactionStatistic": [{"@type": "InteractionCounter", "interactionType": "http://schema.org/ViewAction", "userInteractionCount": 162}, {"@type": "InteractionCounter", "interactionType": "http://schema.org/CommentAction", "userInteractionCount": 9}, {"@type": "InteractionCounter", "interactionType": "http://schema.org/FollowAction", "userInteractionCount": 1}], "@context": "http://schema.org", "@type": "DiscussionForumPosting", "@id": "https://linustechtips.com/topic/1463407-linux/", "isPartOf": {"@id": "https://linustechtips.com/#website"}, "publisher": {"@id": "https://linustechtips.com/#organization", "member": {"@type": "Person", "name": "swabro", "image": "https://linustechtips.com/uploads/monthly_2022_06/comic-book.thumb.gif.78e4e143db467ea95ca54a8d79a4f022.gif", "url": "https://linustechtips.com/profile/991006-swabro/"}}, "url": "https://linustechtips.com/topic/1463407-linux/", "discussionUrl": "https://linustechtips.com/topic/1463407-linux/", "mainEntityOfPage": {"@type": "WebPage", "@id": "https://linustechtips.com/topic/1463407-linux/"}, "pageStart": 1, "pageEnd": 1, "comment": [{"@type": "Comment", "@id": "https://linustechtips.com/topic/1463407-linux/#comment-15623815", "url": "https://linustechtips.com/topic/1463407-linux/#comment-15623815", "author": {"@type": "Person", "name": "Gimmick21", "image": "https://linustechtips.com/applications/core/interface/email/default_photo.png", "url": "https://linustechtips.com/profile/865514-gimmick21/"}, "dateCreated": "2022-10-26T05:31:19+0000", "text": "https://github.com/torvalds/linux\n \n", "upvoteCount": 1}, {"@type": "Comment", "@id": "https://linustechtips.com/topic/1463407-linux/#comment-15623964", "url": "https://linustechtips.com/topic/1463407-linux/#comment-15623964", "author": {"@type": "Person", "name": "maplepants", "image": "https://linustechtips.com/uploads/monthly_2020_08/1678227700_BurgHain2.thumb.jpeg.7458399bc983230149fab3563c6fd889.jpeg", "url": "https://linustechtips.com/profile/672243-maplepants/"}, "dateCreated": "2022-10-26T09:06:14+0000", "text": "You can also checkout freeBSD:\u00a0https://cgit.freebsd.org/src/\n \n\n\n\tThe Darwin source code is here:\u00a0https://github.com/apple/darwin-xnu\n \n\n\n\t\u00a0\n \n\n\n\tBut one thing about all three of these (though less so for Darwin), is that they're huge.\u00a0\n \n\n\n\t\u00a0\n \n\n\n\tMaybe look for something smaller to start with. Something like Arduino can be an easy way into learning about writing an OS.\n \n", "upvoteCount": 0}, {"@type": "Comment", "@id": "https://linustechtips.com/topic/1463407-linux/#comment-15624172", "url": "https://linustechtips.com/topic/1463407-linux/#comment-15624172", "author": {"@type": "Person", "name": "Takumidesh", "image": "https://linustechtips.com/uploads/monthly_2021_08/imported-photo-709419.thumb.jpeg.f362d89561e63f6a49b0e6e5d339f729.jpeg", "url": "https://linustechtips.com/profile/709419-takumidesh/"}, "dateCreated": "2022-10-26T12:50:32+0000", "text": "I am assuming you are pretty fresh with programming, since this is a pretty easily searchable question (one of the first results when I search 'Linux source code')\n \n\n\n\t\u00a0\n \n\n\n\tCheck out OSdev wiki. You honestly are not going to gain much insight into how to develop an os from looking at the linux kernel unless you are already very very knowledgeable about the topic.\n \n\n\n\t\u00a0\n \n\n\n\tI also suggest reading Modern Operating Systems.\n \n\n\n\t\u00a0\n \n\n\n\tLearn about real/protected mode. How the boot sector works, interrogating the BIOS, VGA text mode. Learn ASM for your target architecture. Cross compiling (you will need to build C for your target arch.)\n \n\n\n\t\u00a0\n \n\n\n\tThere is a few months of solid work if you are serious about OS architecture before you get to the point of looking up other operating system sources for reference material.\n \n\n\n\t\u00a0\n \n\n\n\t\u00a0\n \n", "upvoteCount": 0}, {"@type": "Comment", "@id": "https://linustechtips.com/topic/1463407-linux/#comment-15624327", "url": "https://linustechtips.com/topic/1463407-linux/#comment-15624327", "author": {"@type": "Person", "name": "swabro", "image": "https://linustechtips.com/uploads/monthly_2022_06/comic-book.thumb.gif.78e4e143db467ea95ca54a8d79a4f022.gif", "url": "https://linustechtips.com/profile/991006-swabro/"}, "dateCreated": "2022-10-26T14:44:10+0000", "text": "I did search that on the web but did not find the GitHub repo strangely thanks anyways\n \n", "upvoteCount": 0}, {"@type": "Comment", "@id": "https://linustechtips.com/topic/1463407-linux/#comment-15625496", "url": "https://linustechtips.com/topic/1463407-linux/#comment-15625496", "author": {"@type": "Person", "name": "swabro", "image": "https://linustechtips.com/uploads/monthly_2022_06/comic-book.thumb.gif.78e4e143db467ea95ca54a8d79a4f022.gif", "url": "https://linustechtips.com/profile/991006-swabro/"}, "dateCreated": "2022-10-27T05:23:11+0000", "text": "tysm for all ur help\u00a0\n \n", "upvoteCount": 0}, {"@type": "Comment", "@id": "https://linustechtips.com/topic/1463407-linux/#comment-15625498", "url": "https://linustechtips.com/topic/1463407-linux/#comment-15625498", "author": {"@type": "Person", "name": "swabro", "image": "https://linustechtips.com/uploads/monthly_2022_06/comic-book.thumb.gif.78e4e143db467ea95ca54a8d79a4f022.gif", "url": "https://linustechtips.com/profile/991006-swabro/"}, "dateCreated": "2022-10-27T05:25:00+0000", "text": "I am not going to be using c at all and will use python and use a coding language that I have been developing from scratch and my target architure is for arm, x86 and amd64, 32 bit etc. basically cross compatibility in every way possible\u00a0\n \n", "upvoteCount": 0}, {"@type": "Comment", "@id": "https://linustechtips.com/topic/1463407-linux/#comment-15625712", "url": "https://linustechtips.com/topic/1463407-linux/#comment-15625712", "author": {"@type": "Person", "name": "Franck", "image": "https://linustechtips.com/uploads/monthly_2019_08/jawascript.thumb.jpg.a99b56be3eacffe54bf56169781fcd4f.jpg", "url": "https://linustechtips.com/profile/580241-franck/"}, "dateCreated": "2022-10-27T10:47:34+0000", "text": "Sorry to burst your bubble but you wont write the kernel in python. You will need A or C to run python so for sure the kernel root will be A or C\n \n", "upvoteCount": 2}, {"@type": "Comment", "@id": "https://linustechtips.com/topic/1463407-linux/#comment-15625799", "url": "https://linustechtips.com/topic/1463407-linux/#comment-15625799", "author": {"@type": "Person", "name": "Takumidesh", "image": "https://linustechtips.com/uploads/monthly_2021_08/imported-photo-709419.thumb.jpeg.f362d89561e63f6a49b0e6e5d339f729.jpeg", "url": "https://linustechtips.com/profile/709419-takumidesh/"}, "dateCreated": "2022-10-27T12:19:58+0000", "text": "This is such a monumental task.\n \n\n\n\t\u00a0\n \n\n\n\tSpeaking as someone who has dabbled in kernel development. targeting multiple architectures is difficult, targeting 64 bit is difficult.\n \n\n\n\t\u00a0\n \n\n\n\tYou need to be comfortable in asm for all of those architectures and unless you are going to be using an existing boot loader you need to be able to get into protected mode and long mode.\n \n\n\n\t\u00a0\n \n\n\n\tpython will be an absolute nightmare and require so much extra work only to have a kernel that is ultimately in c anyway and several orders of magnitude slower.\n \n\n\n\t\u00a0\n \n\n\n\tcross compatibility 'in every way possible' is something that is not even achieved by the biggest names in OS development with all of the time, money, and resources; teams of hundreds of developers are working on those projects.\n \n\n\n\t\u00a0\n \n\n\n\tI really suggest you start small. get into vga text mode. display some text on the screen, figure out panic handling.\n \n\n\n\t\u00a0\n \n\n\n\tYou have to write drivers for every peripheral, you need to write mitigations for vulnerabilities for every architecture, You need to handle all of the edge cases.\n \n\n\n\t\u00a0\n \n\n\n\tYou need to create stacks, and heaps. Memory managers. Context switching, BIOS interrogation. You need to build an entire network stack and implement IEEE standards.\n \n\n\n\t\u00a0\n \n\n\n\tIf you want your OS to be useful you need to implement C so that way other software can actually be run on it.\n \n\n\n\t\u00a0\n \n\n\n\tI am not saying this to discourage, only to help you understand what you should be focusing on.\n \n\n\n\t\u00a0\n \n\n\n\t\u00a0\n \n", "upvoteCount": 1}, {"@type": "Comment", "@id": "https://linustechtips.com/topic/1463407-linux/#comment-15625937", "url": "https://linustechtips.com/topic/1463407-linux/#comment-15625937", "author": {"@type": "Person", "name": "wasab", "image": "https://linustechtips.com/uploads/monthly_2018_01/penguingun-600x600.thumb.jpg.12528ac5130e482babe0d293dc8ed446.jpg", "url": "https://linustechtips.com/profile/542123-wasab/"}, "dateCreated": "2022-10-27T14:26:46+0000", "text": "right.... if i get a dime for every newbie programmer who suddenly post on this subforum and declare they can code up an os in python, i would be a millionare by now.\u00a0\n \n", "upvoteCount": 0}]}
{"name": "Ping Pong in Python Help (replit)", "headline": "Ping Pong in Python Help (replit)", "text": "I am trying to make ping pong in python and I have the code below. I am running it on replit but it does not work. Can anyone try and help?\n \n\nnamespace ping_pong_game\n{\n public partial class Form1 : Form\n {\n public int speed_left = 4; //speed of ball\n public int speed_top = 4;\n public int point = 0; //score point \n\n public Form1()\n {\n InitializeComponent();\n timer1.Enabled = true;\n Cursor.Hide(); //hide the cursor\n this.FormBorderStyle = FormBorderStyle.None; //remove any boder\n this.TopMost = true; //bring the form to the front\n this.Bounds = Screen.PrimaryScreen.Bounds; //make it fullscreen\n racket.Top = playground.Bottom - (playground.Bottom / 10); //set the position of racket\n }\n\n private void timer1_Tick(object sender, EventArgs e)\n {\n racket.Left = Cursor.Position.X - (racket.Width / 2); //set the center of the racket to the position of the cursor \n ball.Left += speed_left; //move the ball\n ball.Top += speed_top;\n\n if (ball.Bottom >= racket.Top && ball.Bottom <= racket.Bottom && ball.Left >= racket.Left && ball.Right <= racket.Right) //racket collision\n {\n speed_top += 2;\n speed_left += 2;\n speed_top = -speed_top;// change the direction\n point += 1;\n }\n if (ball.Left<=playground.Left)\n {\n speed_left = -speed_left;\n }\n if (ball.Right>=playground.Right)\n {\n speed_left = -speed_left;\n }\n if (ball.Top<=playground.Top)\n {\n speed_top = -speed_top;\n }\n if (ball.Bottom>=playground.Bottom)\n {\n timer1.Enabled = false; //ball is out ->stop the game \n }\n }\n\n private void Form1_KeyDown(object sender, KeyEventArgs e)\n {\n if (e.KeyCode == Keys.Escape)\n {\n this.Close(); //press escape to quit\n }\n\n }\n }\n}\n\n\n\t\u00a0\n \n", "dateCreated": "2022-10-26T17:51:47+0000", "datePublished": "2022-10-26T17:51:47+0000", "dateModified": "2022-10-26T19:47:28+0000", "image": "https://linustechtips.com/uploads/monthly_2021_10/imported-photo-859621.thumb.png.682abdef469a579331b5197c65b152ea.png", "author": {"@type": "Person", "name": "Bee Bus Hardware", "image": "https://linustechtips.com/uploads/monthly_2021_10/imported-photo-859621.thumb.png.682abdef469a579331b5197c65b152ea.png", "url": "https://linustechtips.com/profile/859621-bee-bus-hardware/"}, "interactionStatistic": [{"@type": "InteractionCounter", "interactionType": "http://schema.org/ViewAction", "userInteractionCount": 98}, {"@type": "InteractionCounter", "interactionType": "http://schema.org/CommentAction", "userInteractionCount": 8}, {"@type": "InteractionCounter", "interactionType": "http://schema.org/FollowAction", "userInteractionCount": 3}], "@context": "http://schema.org", "@type": "QAPage", "@id": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/", "url": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/", "mainEntity": {"@type": "Question", "name": "Ping Pong in Python Help (replit)", "text": "I am trying to make ping pong in python and I have the code below. I am running it on replit but it does not work. Can anyone try and help?\n \n\nnamespace ping_pong_game\n{\n public partial class Form1 : Form\n {\n public int speed_left = 4; //speed of ball\n public int speed_top = 4;\n public int point = 0; //score point \n\n public Form1()\n {\n InitializeComponent();\n timer1.Enabled = true;\n Cursor.Hide(); //hide the cursor\n this.FormBorderStyle = FormBorderStyle.None; //remove any boder\n this.TopMost = true; //bring the form to the front\n this.Bounds = Screen.PrimaryScreen.Bounds; //make it fullscreen\n racket.Top = playground.Bottom - (playground.Bottom / 10); //set the position of racket\n }\n\n private void timer1_Tick(object sender, EventArgs e)\n {\n racket.Left = Cursor.Position.X - (racket.Width / 2); //set the center of the racket to the position of the cursor \n ball.Left += speed_left; //move the ball\n ball.Top += speed_top;\n\n if (ball.Bottom >= racket.Top && ball.Bottom <= racket.Bottom && ball.Left >= racket.Left && ball.Right <= racket.Right) //racket collision\n {\n speed_top += 2;\n speed_left += 2;\n speed_top = -speed_top;// change the direction\n point += 1;\n }\n if (ball.Left<=playground.Left)\n {\n speed_left = -speed_left;\n }\n if (ball.Right>=playground.Right)\n {\n speed_left = -speed_left;\n }\n if (ball.Top<=playground.Top)\n {\n speed_top = -speed_top;\n }\n if (ball.Bottom>=playground.Bottom)\n {\n timer1.Enabled = false; //ball is out ->stop the game \n }\n }\n\n private void Form1_KeyDown(object sender, KeyEventArgs e)\n {\n if (e.KeyCode == Keys.Escape)\n {\n this.Close(); //press escape to quit\n }\n\n }\n }\n}\n\n\n\t\u00a0\n \n", "answerCount": 8, "dateCreated": "2022-10-26T17:51:47+0000", "author": {"@type": "Person", "name": "Bee Bus Hardware"}, "acceptedAnswer": {"@type": "Answer", "text": "Well, your code is (being compiled) in C#, so you might want to reorient yourself just a little.\n \n", "url": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/?do=findComment&comment=15624630", "dateCreated": "2022-10-26T18:21:34+0000", "upvoteCount": 0, "author": {"@type": "Person", "name": "AbydosOne", "image": "https://linustechtips.com/uploads/monthly_2018_11/b92b1e6427974419d6c836ec1492f1a2345d5f13_hq.thumb.jpg.aadcb73835e39cfdf278dbb427efcb91.jpg", "url": "https://linustechtips.com/profile/597681-abydosone/"}}, "suggestedAnswer": [{"@type": "Answer", "@id": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624604", "url": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624604", "author": {"@type": "Person", "name": "AbydosOne", "image": "https://linustechtips.com/uploads/monthly_2018_11/b92b1e6427974419d6c836ec1492f1a2345d5f13_hq.thumb.jpg.aadcb73835e39cfdf278dbb427efcb91.jpg", "url": "https://linustechtips.com/profile/597681-abydosone/"}, "dateCreated": "2022-10-26T18:06:51+0000", "text": "How\u00a0does it not work? Be more descriptive. Explain your process and where it's breaking down.\n \n", "upvoteCount": 0}, {"@type": "Answer", "@id": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624607", "url": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624607", "author": {"@type": "Person", "name": "Bee Bus Hardware", "image": "https://linustechtips.com/uploads/monthly_2021_10/imported-photo-859621.thumb.png.682abdef469a579331b5197c65b152ea.png", "url": "https://linustechtips.com/profile/859621-bee-bus-hardware/"}, "dateCreated": "2022-10-26T18:08:49+0000", "text": "it gives me this error\n \n\n\n\t\u00a0\n \n\n\n\t\ueea7 mcs -out:main.exe main.cs \n\tmain.cs(3,34): error CS0246: The type or namespace name `Form' could not be found. Are you missing an assembly reference? \n\tmain.cs(20,49): error CS0246: The type or namespace name `EventArgs' could not be found. Are you missing `System' using directive? \n\tmain.cs(51,51): error CS0246: The type or namespace name `KeyEventArgs' could not be found. Are you missing an assembly reference? \n\tCompilation failed: 3 error(s), 0 warnings \n\texit status 1\n \n", "upvoteCount": 0}, {"@type": "Answer", "@id": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624624", "url": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624624", "author": {"@type": "Person", "name": "AbydosOne", "image": "https://linustechtips.com/uploads/monthly_2018_11/b92b1e6427974419d6c836ec1492f1a2345d5f13_hq.thumb.jpg.aadcb73835e39cfdf278dbb427efcb91.jpg", "url": "https://linustechtips.com/profile/597681-abydosone/"}, "dateCreated": "2022-10-26T18:17:38+0000", "text": "Are you working in Python or C#?\n \n", "upvoteCount": 0}, {"@type": "Answer", "@id": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624625", "url": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624625", "author": {"@type": "Person", "name": "Bee Bus Hardware", "image": "https://linustechtips.com/uploads/monthly_2021_10/imported-photo-859621.thumb.png.682abdef469a579331b5197c65b152ea.png", "url": "https://linustechtips.com/profile/859621-bee-bus-hardware/"}, "dateCreated": "2022-10-26T18:19:23+0000", "text": "Python\n \n", "upvoteCount": 0}, {"@type": "Answer", "@id": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624645", "url": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624645", "author": {"@type": "Person", "name": "Bee Bus Hardware", "image": "https://linustechtips.com/uploads/monthly_2021_10/imported-photo-859621.thumb.png.682abdef469a579331b5197c65b152ea.png", "url": "https://linustechtips.com/profile/859621-bee-bus-hardware/"}, "dateCreated": "2022-10-26T18:30:26+0000", "text": "I was coming back to the project after 3 months but yes it is c# didn't realise\n \n", "upvoteCount": 0}, {"@type": "Answer", "@id": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624721", "url": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624721", "author": {"@type": "Person", "name": "Takumidesh", "image": "https://linustechtips.com/uploads/monthly_2021_08/imported-photo-709419.thumb.jpeg.f362d89561e63f6a49b0e6e5d339f729.jpeg", "url": "https://linustechtips.com/profile/709419-takumidesh/"}, "dateCreated": "2022-10-26T19:19:39+0000", "text": "You are trying to run a single partial class of a windows forms .net project on an online compiler.\n \n\n\n\t\u00a0\n \n\n\n\tUnless there is more code than this, it is about 5% of what you need to properly compile this software.\n \n\n\n\t\u00a0\n \n\n\n\tYou need the WinForms framework, .Net runtime, you have event handlers but no there would be nothing invoking the events, etc...\n \n", "upvoteCount": 0}, {"@type": "Answer", "@id": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624767", "url": "https://linustechtips.com/topic/1463495-ping-pong-in-python-help-replit/#comment-15624767", "author": {"@type": "Person", "name": "mariushm", "image": "https://linustechtips.com/applications/core/interface/email/default_photo.png", "url": "https://linustechtips.com/profile/327531-mariushm/"}, "dateCreated": "2022-10-26T19:47:28+0000", "text": "You have a windows form , and you have a timer called timer1 because you have the following function in the code\n \n\n\n\t\u00a0\n \n\n\n\tprivate void timer1_Tick(object sender, EventArgs e)\n \n\n\n\t\u00a0\n \n\n\n\tThis function runs every time the timer ticks, so this means you'd probably need to set the timer interval to some sane value like 500ms - that would mean every 500ms the code inside that function will run, and that code will move the ball around the window.\n \n\n\n\t\u00a0\n \n\n\n\tThen you also have some \"objects\" in your window ... look in the code, you have \"ball\" , you have \"racket\" , you have \"playground\" so use your brains ... playground is probably some picture box or something to represent the gameplay area , the ball is probably some picture object with a \"ball\" picture loaded in it, and placed over the playground object , and \"racket\" is probably another picture box or something that represents ... you figure it out.\n \n\n\n\t\u00a0\n \n\n\n\tyou have a function there form1_keydown so that tells you whoever wrote the code left the name of the form Form1 and this keydown function is an event which happens every time someone presses a key when the application runs ...\n \n\n\n\tThe function only tests if ESC is pressed and if so, closes the application ... so you should probably add there if user pressed left or right key or up and down to move the paddle so that you can hit the ball that moves around the window ...\n \n", "upvoteCount": 0}]}, "publisher": {"member": {"@type": "Person", "name": "Bee Bus Hardware", "image": "https://linustechtips.com/uploads/monthly_2021_10/imported-photo-859621.thumb.png.682abdef469a579331b5197c65b152ea.png", "url": "https://linustechtips.com/profile/859621-bee-bus-hardware/"}}}
from bs4 import BeautifulSoup
import requests
import json
from pathlib import Path
import tqdm
url = "https://linustechtips.com/forum/20-programming/"
dataset_dir = Path("dataset")
dataset_dir.mkdir(exist_ok=True)
dataset_path = dataset_dir / "dataset.jsonl"
with dataset_path.open("w") as f:
pass
def url_to_soup(url):
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
return soup
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
n_pages = int(soup.find("link", {"rel": "last"}).get("href").split("/")[-2])
for i_page in tqdm.tqdm(range(1, n_pages + 1)):
soup = url_to_soup(f"https://linustechtips.com/forum/20-programming/page/{i_page}/")
topic_urls = [
link["href"]
for link in soup.find_all("a")
if link["href"].startswith("https://linustechtips.com/topic/")
and not link["href"].endswith("#comments")
and "community-standards-updated" not in link["href"]
]
topic_urls = list(dict.fromkeys(topic_urls))
for topic_url in topic_urls:
soup = url_to_soup(topic_url)
n_topic_pages_url = soup.find("link", {"rel": "last"})
n_topic_pages = (
1
if n_topic_pages_url is None
else int(n_topic_pages_url.get("href").split("/")[-2])
)
for i_topic_page in range(1, n_topic_pages + 1):
topic_page_url = f"{topic_url}page/{i_topic_page}"
print(topic_page_url)
soup = url_to_soup(topic_page_url)
topic_page_contents = json.loads(
soup.find("script", {"type": "application/ld+json"}).text
)
with dataset_path.open("a") as f:
f.write(f"{json.dumps(topic_page_contents)}\n")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment