Skip to content

Instantly share code, notes, and snippets.

@anshoomehra
Last active July 17, 2024 14:25
Show Gist options
  • Save anshoomehra/ead8925ea291e233a5aa2dcaa2dc61b2 to your computer and use it in GitHub Desktop.
Save anshoomehra/ead8925ea291e233a5aa2dcaa2dc61b2 to your computer and use it in GitHub Desktop.
How to Parse 10-K Report from EDGAR (SEC)
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@monashjg
Copy link

monashjg commented Jun 6, 2023

May I know how to remove the footer information, "Apple Inc. | 2018 Form 10-K |" as well as page number from the generated text?

@AlessandroVentisei
Copy link

Thanks for this! I've followed the steps to get historic numeric data and made a free API in case anyone else wants the data for training AI etc.
https://rapidapi.com/alexventisei2/api/sec-api2

@thegallier
Copy link

i think the line below assumes same number of entries for all items, which is not necessarily the case for example nyt. in that case there are more item 1A items then 1B and the approach does not work. I would also add re.IGNORECASE to the re.compile

pos_dat = test_df.sort_values('start', ascending=True).drop_duplicates(subset=['item'], keep='last')

@VadarVillage
Copy link

This was very helpful, thank you for taking the time to post this

@niravsatani24
Copy link

Amazing! Thanks for sharing.

@rabsher
Copy link

rabsher commented Dec 4, 2023

i have Html url i dont know how to get txt url of 10k file after that I am able to use above notebook code

any one can help me please

@versatile712
Copy link

Jesus, you saved my life!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment