Back

EDGAR Filing Analyser

Open source · Python · MCP · Claude Skill

SEC EDGAR research toolkit for retail-attention studies

Download SEC EDGAR filings of any form type, filter daily traffic logs against those filings, and summarise them with an LLM — a Python toolkit that follows SEC's EDGAR fair-access policy. Usable as a CLI, an MCP server for AI agents, or a Claude Code skill. Generalises the methodology from my paper “Retail Investor Attention and Mutual Fund Performance: Evidence from EDGAR Log Files” to every form type EDGAR publishes.

What it does

Any form type

Download 10-K, 10-Q, 8-K, N-CSR, DEF 14A, 13F-HR, SC 13D, S-1 — or any preset group (corporate-annual, mutual-fund, proxy, insider, ownership).

LLM-powered summaries

One-shot filing summariser with form-aware prompts. Focus flag for risk, governance, strategy, or financial angles. See the demo below for sample output.

Try it

Pick a ticker and form type to simulate the two-step CLI flow: download an EDGAR filing, then summarise it with an LLM. This demo serves pre-generated sample output — no live SEC call, no live LLM — so you can see the shape of the result before installing the toolkit.

Summaries are illustrative — drafted against public filing context to show the CLI's output format. Install the toolkit to download and summarise filings.

Quickstart

Install from GitHub, declare a User-Agent per SEC's fair-access policy, and run the pipeline:

git clone https://github.com/haodingresearch/EDGAR.git
cd EDGAR
pip install -e '.[ai,mcp]'

export EDGAR_USER_AGENT="Your Name your@email.com"
# Download 10-K and N-CSR filings for 2023
edgar-research forms --years 2023 --forms 10-K N-CSR \
  --out ./data/filings --index ./data/index.csv

# Filter the 2023 daily traffic logs to only those accessions
edgar-research logs --year 2023 --index ./data/index.csv --out ./data/logs

# LLM summary of a single filing
edgar-research summarise ./data/filings/0000320193_0000320193-23-000106.txt \
  --focus risk

Form-type presets

Specify any combination of forms explicitly, or use a named preset as a shortcut:

PresetExpands to
corporate-annual10-K, 20-F, 40-F
corporate-quarterly10-Q, 6-K
corporate-current8-K
mutual-fundN-CSR, N-CSRS, N-Q, N-PX, N-CEN
proxyDEF 14A, DEFA14A, PRE 14A
insider3, 4, 5
ownershipSC 13D, SC 13G, 13F-HR, 13F-NT
registrationS-1, S-3, S-4, F-1, F-3

Use with AI agents

The toolkit ships an MCP server and a Claude Code skill definition. Register it once and any MCP-capable agent can drive the pipeline — download filings, build an accession index, filter traffic logs, or summarise a filing by path.

claude mcp add edgar-research -- edgar-research-mcp

Five tools exposed: list_form_presets, download_edgar_filings, build_accession_index, download_traffic_logs, summarise_edgar_filing.

Methodology

The toolkit implements the data-preparation pipeline behind the paper. For a given time range and set of form types, it downloads SEC's public full-index (form.idx), retrieves each matching filing, downloads the corresponding daily EDGAR traffic logs, and intersects log rows with the filing accession set — producing per-filing access traces ready for downstream attention analysis.

Citation

@article{Ding2024EDGARtraffic,
  title   = {Retail Investor Attention and Mutual Fund Performance:
             Evidence from EDGAR Log Files},
  author  = {Ding, Hao},
  year    = {2024},
  url     = {https://ssrn.com/abstract=4992233}
}
Apache 2.0 licensed · Python 3.10+ · Contributions welcome