Skip to content

Instantly share code, notes, and snippets.

@SKaplanOfficial
Created July 10, 2022 11:01
Show Gist options
  • Save SKaplanOfficial/73abc9b9174e0f578440a3c8858170d4 to your computer and use it in GitHub Desktop.
Save SKaplanOfficial/73abc9b9174e0f578440a3c8858170d4 to your computer and use it in GitHub Desktop.
Using PyXA to extract paragraphs and sentences from a webpage, then create randomized flashcards from the webpage content
import os
from pprint import pprint
import PyXA
import random
from time import sleep
textedit = PyXA.application("TextEdit")
# Open a URL and wait for it to load
safari = PyXA.application("Safari")
safari.open("https://en.wikipedia.org/wiki/Computer")
sleep(1)
# Get the visible text of the document, then close the tab
doc_text = safari.current_document.text
safari.front_window().current_tab.close()
# Create folder path if it doesn't already exist
folder_path = "/Users/exampleuser/Documents/articles/"
os.makedirs(folder_path, exist_ok=True)
# Save the document text to a file on the disk
file_path = folder_path + "Wikipedia-Computer.txt"
with open(file_path, "w") as file:
file.write(doc_text)
# Create 5 random (sentence, paragraph) 'flashcards'
paragraphs = textedit.open(file_path).paragraphs()
paragraphs = random.choices(paragraphs, k=5)
flashcards = [(random.choice(paragraph.sentences()), paragraph) for paragraph in paragraphs]
pprint(flashcards)
@SKaplanOfficial
Copy link
Author

SKaplanOfficial commented Jul 10, 2022

An example of output:

[('Registers are used for the most frequently needed data items to avoid '
'having to access main memory every time data is needed. ',
The CPU contains a special set of memory cells called registers that can be read and written to much more rapidly than the main memory area. There are typically between two and one hundred registers depending on the type of CPU. Registers are used for the most frequently needed data items to avoid having to access main memory every time data is needed. As data is constantly being worked on, reducing the need to access main memory (which is often slow compared to the ALU and control units) greatly increases the computer's speed.
),
('The term hardware covers all of those parts of a computer that are tangible '
'physical objects. ',
The term hardware covers all of those parts of a computer that are tangible physical objects. Circuits, computer chips, graphic cards, sound cards, memory (RAM), motherboard, displays, power supplies, cables, keyboards, printers and "mice" input devices are all hardware.
),
('Early CPUs were composed of many separate components. ',
The control unit, ALU, and registers are collectively known as a central processing unit (CPU). Early CPUs were composed of many separate components. Since the 1970s, CPUs have typically been constructed on a single MOS integrated circuit chip called a microprocessor.
),
('During World War II, the British code-breakers at Bletchley Park achieved a '
'number of successes at breaking encrypted German military communications. ',
During World War II, the British code-breakers at Bletchley Park achieved a number of successes at breaking encrypted German military communications. The German encryption machine, Enigma, was first attacked with the help of the electro-mechanical bombes which were often run by women.[32][33] To crack the more sophisticated German Lorenz SZ 40/42 machine, used for high-level Army communications, Max Newman and his colleagues commissioned Flowers to build the Colossus.[31] He spent eleven months from early February 1943 designing and building the first Colossus.[34] After a functional test in December 1943, Colossus was shipped to Bletchley Park, where it was delivered on 18 January 1944[35] and attacked its first message on 5 February.[31]
),
('The task of developing large software systems presents a significant intellectual challenge. ',
Program design of small programs is relatively simple and involves the analysis of the problem, collection of inputs, using the programming constructs within languages, devising or using established procedures and algorithms, providing data for output devices and solutions to the problem as applicable. As problems become larger and more complex, features such as subprograms, modules, formal documentation, and new paradigms such as object-oriented programming are encountered. Large programs involving thousands of line of code and more require formal software methodologies. The task of developing large software systems presents a significant intellectual challenge. Producing software with an acceptably high reliability within a predictable schedule and budget has historically been difficult; the academic and professional discipline of software engineering concentrates specifically on this challenge.
)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment