Created
January 22, 2021 03:47
-
-
Save e96031413/9de85ae7bd595874d28e29934e22a4c0 to your computer and use it in GitHub Desktop.
Search keywords in ppt files with python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# REF https://stackoverflow.com/questions/55497789/find-a-word-in-multiple-powerpoint-files-python/55763992#55763992 | |
from pptx import Presentation | |
from pptx.enum.shapes import MSO_SHAPE_TYPE | |
import os | |
path = "./" | |
files = [x for x in os.listdir(path) if x.endswith(".pptx")] | |
def CheckRecursivelyForText(shpthissetofshapes): | |
for shape in shpthissetofshapes: | |
if shape.shape_type == MSO_SHAPE_TYPE.GROUP: | |
checkrecursivelyfortext(shape.shapes) | |
else: | |
if hasattr(shape, "text"): | |
shape.text = shape.text.lower() | |
if "what_ever_you_want_to_find" in shape.text: | |
print(eachfile) | |
print("----------------------") | |
else : | |
print("No text found in these PPTs") | |
print("----------------------") | |
break | |
for eachfile in files: | |
prs = Presentation(path + eachfile) | |
for slide in prs.slides: | |
CheckRecursivelyForText(slide.shapes) |
You can set a variable to count how many times you found something you want.
Once the variable reach 1, then you can break from the code.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, this code reports out every time "what_ever_you_want_to_find" is found.
Ex: if "what_ever_you_want_to_find" is found on 3 different slides of the same powerpoint, it will name the same file 3 times as well as "text found" or whatever prompt you may choose.
How can we change so that the code only searches until the first found "what_ever_you_want_to_find"