SEC EDGAR research toolkit for retail-attention studies
Download SEC EDGAR filings of any form type, filter daily traffic logs against those filings, and summarise them with an LLM — a Python toolkit that follows SEC's EDGAR fair-access policy. Usable as a CLI, an MCP server for AI agents, or a Claude Code skill. Generalises the methodology from my paper “Retail Investor Attention and Mutual Fund Performance: Evidence from EDGAR Log Files” to every form type EDGAR publishes.
What it does
Any form type
Download 10-K, 10-Q, 8-K, N-CSR, DEF 14A, 13F-HR, SC 13D, S-1 — or any preset group (corporate-annual, mutual-fund, proxy, insider, ownership).
LLM-powered summaries
One-shot filing summariser with form-aware prompts. Focus flag for risk, governance, strategy, or financial angles. See the demo below for sample output.
Try it
Pick a ticker and form type to simulate the two-step CLI flow: download an EDGAR filing, then summarise it with an LLM. This demo serves pre-generated sample output — no live SEC call, no live LLM — so you can see the shape of the result before installing the toolkit.
Summaries are illustrative — drafted against public filing context to show the CLI's output format. Install the toolkit to download and summarise filings.
Quickstart
Install from GitHub, declare a User-Agent per SEC's fair-access policy, and run the pipeline:
git clone https://github.com/haodingresearch/EDGAR.git
cd EDGAR
pip install -e '.[ai,mcp]'
export EDGAR_USER_AGENT="Your Name your@email.com"# Download 10-K and N-CSR filings for 2023
edgar-research forms --years 2023 --forms 10-K N-CSR \
--out ./data/filings --index ./data/index.csv
# Filter the 2023 daily traffic logs to only those accessions
edgar-research logs --year 2023 --index ./data/index.csv --out ./data/logs
# LLM summary of a single filing
edgar-research summarise ./data/filings/0000320193_0000320193-23-000106.txt \
--focus riskForm-type presets
Specify any combination of forms explicitly, or use a named preset as a shortcut:
| Preset | Expands to |
|---|---|
| corporate-annual | 10-K, 20-F, 40-F |
| corporate-quarterly | 10-Q, 6-K |
| corporate-current | 8-K |
| mutual-fund | N-CSR, N-CSRS, N-Q, N-PX, N-CEN |
| proxy | DEF 14A, DEFA14A, PRE 14A |
| insider | 3, 4, 5 |
| ownership | SC 13D, SC 13G, 13F-HR, 13F-NT |
| registration | S-1, S-3, S-4, F-1, F-3 |
Use with AI agents
The toolkit ships an MCP server and a Claude Code skill definition. Register it once and any MCP-capable agent can drive the pipeline — download filings, build an accession index, filter traffic logs, or summarise a filing by path.
claude mcp add edgar-research -- edgar-research-mcpFive tools exposed: list_form_presets, download_edgar_filings, build_accession_index, download_traffic_logs, summarise_edgar_filing.
Methodology
The toolkit implements the data-preparation pipeline behind the paper. For a given time range and set of form types, it downloads SEC's public full-index (form.idx), retrieves each matching filing, downloads the corresponding daily EDGAR traffic logs, and intersects log rows with the filing accession set — producing per-filing access traces ready for downstream attention analysis.
Citation
@article{Ding2024EDGARtraffic,
title = {Retail Investor Attention and Mutual Fund Performance:
Evidence from EDGAR Log Files},
author = {Ding, Hao},
year = {2024},
url = {https://ssrn.com/abstract=4992233}
}