I chase financial analysis at the speed of thought

So I built psxGPT and created a course around it.

Financial statement page from a PSX filing
source pagepage 239
Chart-heavy source page from an annual report
OCR riskcharts
Balance sheet page from a PSX filing
checked rowsource linked

What I built

A financial modeling system for Pakistani equities listed on PSX.

It lines up historical data from filings, computes ratios, makes sources visible, and lets you run analyses through simple instructions in english.

Historical data lined up Filings become source-linked rows, ratios, and normalized fields for forecasting.
QC made legible Formula checks, row slips, and repair paths are visible instead of buried.
Precedents and dependents traced The engine shows where a value came from and what depends on it.
psxGPT query trace
Plan Translate the question into ticker, period, fields, statement scope, and source needs.
Search Run SQL for structured values and keyword search for supporting page text.
Source Recover page links for every row used in the answer.
Answer Write from evidence, cite sources, expose uncertainty when coverage is incomplete.
psxGPT makes the historical layer usable, checkable, and traceable.
Balance sheet source page from a PSX filing
A reviewer can open the page behind the number, then trace how that value moves through the system.

Why I teach

The course opens the implementation.

psxGPT shows the result. The course shows the machinery: PDF to database, QC, debugging, source tracing, and the query engine.

pipeline_guide.html

What the transformation looks like

Source financial statement page from a PSX filing
# Unconsolidated Statement of Profit or Loss
## For the year ended June 30, 2024

| Line item | Note | 2024 | 2023 |
| --- | --- | --- | --- |
| Gross sales | 28 | 151,808,171 | 125,819,372 |
| Net sales | | 115,324,942 | 95,832,147 |
| Cost of sales | 29 | (76,520,370) | (69,771,469) |
| Gross profit | | 38,804,572 | 26,060,678 |
| Profit after taxation | | 28,106,539 | 13,725,814 |

The guide explains why markdown is enough structure to inspect, route, and debug before asking an LLM for final structured data.

pipeline_colab.ipynb
In [4]split + OCR
pages = split_pdf(report)
markdown = run_mistral_ocr(pages)
paid OCR once, reusable markdown after that
In [9]filter pages
review = classify_pages(markdown)
manifest = build_page_shortlist(review)
free gates first, model calls only for survivors
In [15]check output
rows = flatten_statement_json()
qc = run_formula_checks(rows)
failed checks decide whether to repair or re-OCR

What comes next

Finish psxGPT, then apply the pattern elsewhere.

LinkedIn Twitter / X