Scoring V1 Signals Shadow Race Thresholds Decisions

Methodology

How Field Equities scores 76 tickers every 10 minutes, what each signal weighs, where the thresholds came from, and the explicit calls made during the build. The bot went live in paper mode on May 11, 2026. Alongside v1 live trading, the system runs a multi-scorer shadow race — 8 strategies recorded in parallel against 8 forward-return horizons. No shadow scorer drives a trade or modifies a v1 weight.

Scoring

Every 10 minutes during market hours the bot runs each ticker through 8 independent signals. Each signal produces a number on its own bounded scale; their sum is the ticker’s composite score, ranging from a theoretical −80 on the short side to +125 on the long side. The asymmetry is by design — three of the eight signals are long-only by construction.

Entries fire only when the composite score crosses a threshold and a size check passes. Below the threshold the bot logs a snapshot but takes no action — that same scoring history then feeds back into the Signal Edge Tracker on the Signals page, and into weight recommendations published every Friday.

On every scan, a second layer records the picks that 8 different scoring strategies would have made — the shadow race. All race scorers are gated behind the SCORING_V2_ENABLED flag, purely record-only, and raced against each other and SPY across 8 forward-return horizons (1d through 5y). The forward returns being collected now are the calibration data for deciding which approaches carry edge.

The 8 V1 Signals

Sorted from highest weight to lowest. The order reflects how much each signal can move the composite score on its own.

Congress Trades

±25 pts

MeasuresNet dollar-weighted purchases vs. sales by U.S. Senators and Representatives within the last 30 days, weighted by trade size and unique-rep count.

Why this weightTop weight (25, highest of all signals). Politicians have an information edge that academic research (Ziobrowski et al. and several follow-ups) has documented across decades — and STOCK-Act filing rules force disclosure of real positions, so the data is grounded rather than self-reported. Strongest documented edge of any signal we use, so it gets the largest weight.

Data sourceQuiver Quant / House and Senate financial disclosures

Insider Buys

+20 (long-only) pts

MeasuresForm 4 buys filed by company executives and directors within the last 14 days, weighted by transaction value.

Why this weightSecond-highest weight (20). Executives buying their own stock with real money is the cleanest bullish tell available — they only buy when they expect the stock to go up. Selling is noisy (RSU vesting, tax planning, diversification) so we ignore it entirely. The signal is clean but narrower than Congress, so it ranks just below.

Data sourceQuiver Quant / SEC Form 4

13F Institutional Flow

±15 pts

MeasuresNet dollar inflow vs. outflow across all 13F-reporting institutions in the most recent filing quarter.

Why this weightMid-weight (15). Large funds disclose positions quarterly, with a 45-day filing delay. Slower-moving than Congress or Insider data, but useful as context for sustained institutional conviction. Caps at 15 because the lag limits how tactical the signal can be.

Data sourceQuiver Quant / SEC 13F filings

Earnings Surprises

±15 pts

MeasuresMost recent quarterly EPS surprise vs. analyst estimates, decayed over time (full strength <7d, 60% strength 7–30d, fading thereafter).

Why this weightMid-weight (15). Post-earnings announcement drift — the tendency for stocks to keep moving in the direction of an earnings surprise for weeks after the announcement — is one of the most documented anomalies in academic finance. The time decay reflects how quickly the post-earnings move dissipates.

Data sourcePolygon Advanced (Benzinga partner data)

News Sentiment

±15 pts

MeasuresPolygon's pre-computed per-article sentiment aggregated over the last 48 hours; requires ≥2 articles to fire (single-article gate).

Why this weightMid-weight (15). Captures real-time information edge around discrete catalysts — FDA approvals, partnerships, guidance updates, lawsuits — before slower fundamental signals catch up. The 2-article gate is a noise filter, and the cap limits damage from misclassified headlines.

Data sourcePolygon Advanced (insights field per article)

AI-Detected Themes

+15 (long-only) pts

MeasuresTags tickers that belong to one or more active themes. Score = theme weight × 5 per matched theme, capped at +15.

Why this weightMid-weight (15). Captures emerging cross-ticker narratives that aren't visible in single-ticker fundamentals. Currently a basic implementation — a Claude-detected static theme list. The roadmap upgrades this to fully automatic theme discovery, per-theme edge tracking, and dynamic reweighting.

Data sourceAnthropic Claude API (Sonnet 4.6) on Polygon news

Price Momentum

±10 pts

MeasuresComposite of 30-day return + distance from the 20-day moving average. Strong uptrend (30d > +10%, above MA20) = +10; weakening trend scores negative.

Why this weightLower weight (10). Price-based momentum has weak but persistent predictive power and gets arbitraged quickly. We use it as confirmation rather than a primary driver — a strong fundamental read contradicted by terrible momentum is reason to step back, but momentum alone shouldn't open a position. Lower weight than every information-edge signal above.

Data sourcePolygon Advanced real-time bars

ARK Trades

+10 (long-only) pts

MeasuresTags tickers held by ARK Invest's public ETFs (ARKK, ARKW, ARKQ, ARKG). +5 base, +3 bonus if added in the last 7 days.

Why this weightLower weight (10), equal to Momentum. ARK is transparent about its trading, which acts as a directional conviction overlay — useful, but it's a single fund family, so the data base is narrower than 13F. Long-only because ARK doesn't short.

Data sourceARK Invest daily public holdings CSV

Shadow Race Strategies

Public names: Metis · Arachne · Asteria · Chronos · Phobos · Pandia · Tyche · SPY

Eight strategies run in parallel on every scan — record-only, never trading. All picks are beta-tagged (vs SPY) and regime-tagged (S&P 12-month trend, network modularity Q) at entry. Forward returns fill as horizons close. None of these scorers affect v1 weights or open a position.

Metis

Shadow · Record-only

MeasuresSector-relative fundamentals (P/E, P/B, EV/EBITDA; current ratio, debt/equity, interest coverage; revenue + EPS growth) combined with the 8 v1 smart-money signals, re-weighted into four layers: 40% sourcing / 30% factor / 20% fundamentals / 10% qualitative. Normalization is within GICS sector cohorts — a cheap energy name ranks against other energy names, not against tech.

RationaleTests whether adding sector-normalized valuation and financial-health data to the v1 signals improves stock ranking. Layer weights are uncalibrated placeholders; the race collects the forward data needed to calibrate them honestly.

Data sourceFMP /stable (fundamentals, earnings dates) · Quiver Quant + Polygon (v1 signals)

Arachne

Shadow · Record-only

MeasuresBase V2 with a bounded rank overlay tilting toward high eigenvector-centrality names. The correlation graph is built from trailing 252-day log-return correlations: MST backbone + edges added by decreasing |corr| to ~20% density. The tilt is capped at ±RANK_CAP positions on the centrality z-score — V2 stays the base score.

RationaleTests whether more-central, highly-connected names in the correlation network outperform. The bounded cap prevents topology from overriding the signal layer.

Data sourceDerived from Polygon price data (no new feed)

Asteria

Shadow · Record-only

MeasuresSame bounded rank overlay as V2+Hub, direction reversed — tilts toward low eigenvector-centrality (peripheral, less-correlated) names.

RationaleOpposing hypothesis: isolated names move independently of book risk and may carry independent alpha. Racing both hub and anti simultaneously reveals which effect, if any, holds in out-of-sample data.

Data sourceDerived from Polygon price data

Chronos

Shadow · Record-only

MeasuresPer-stock trailing 252-day total return (close[t] / close[t-252] − 1), ranked descending across the scan universe each day. Applied cross-sectionally. Adjudicated against 21d, 3mo, 6mo, 1y, 3y, 5y horizons — the relevant windows for momentum of this type.

RationaleTSMOM (Moskowitz, Ooi & Pedersen 2012) is one of the most replicated return predictors in finance. The race tests whether the cross-sectional version holds on this watchlist at these horizons.

Data sourcePolygon price data

Phobos

Shadow · Weekly (Fridays)

MeasuresCFTC Commitments of Traders: commercial (hedger) net position normalized by open interest, percentile-ranked within its own 3-year trailing window. High commercial-net percentile = hedgers net-long = historically bullish signal. Covers SPY, QQQ, IWM, GLD — the verified CFTC contract-to-ETF mapping. Runs Fridays only (CFTC publishes weekly). Instruments with insufficient history are marked insufficient_data and skipped.

RationaleCommercial positioning extremes are a classic macro-timing signal. Coverage is honestly limited: only instruments where the CFTC futures contract maps cleanly to a liquid US-listed ETF are included.

Data sourceCFTC public Socrata API (publicreporting.cftc.gov, keyless)

Pandia

Race control

MeasuresEvery ticker in the scan universe gets equal rank — the dumb null strategy. No signal fires, no data required.

RationaleThe primary race control. Any scorer that cannot beat equal-weight over a meaningful sample adds no ranking edge. It also validates that the race infrastructure produces approximately market returns when no signal is present.

Data sourceNo data source

Tyche

Race control

MeasuresUniverse shuffled by a deterministic seed each scan. The seed is logged for reproducibility.

RationaleSecondary statistical control alongside equal-weight. The two controls together bracket the no-information performance range.

Data sourceNo data source

SPY Benchmark

Benchmark

MeasuresSingle SPY pick per scan — the buy-and-hold market benchmark.

RationaleEvery scorer's excess return is computed relative to SPY at the same horizon. SPY represents the opportunity cost of not just holding the index.

Data sourcePolygon

Thresholds

The five numbers that turn a score into a trade — and why each is set where it is.

Long entry threshold

+15

A ticker needs a composite score of +15 or higher to open a long. With a theoretical maximum composite of 125 (sum of all 8 max weights), +15 is roughly 12% of max — a meaningful but not extreme conviction level that typically requires two or more signals firing together. Initial paper data showed the false-positive rate at lower thresholds (e.g., +10) was too high; +15 filters the noise without being so strict that real setups get missed.

Short entry threshold

-10

Tighter than the long threshold because the bot's score range is asymmetric. Three signals are long-only by construction (Insider, ARK, Themes), so the practical max-negative score is smaller than the max-positive score. A −10 threshold catches comparable short-side conviction relative to what +15 catches on the long side. We also gate short entries against recent earnings beats — a stock that beat estimates within the last 7 days is not eligible to be shorted, regardless of score.

Stop loss

2× 14-day ATR (floor 4%, cap 12%)

Stops are sized to the ticker's own recent volatility rather than a fixed percent. A 14-day ATR doubled gives a position room to breathe through normal noise. The 4% floor prevents absurdly tight stops on calm names; the 12% cap prevents giving back too much on volatile ones.

Daily loss limit

-8% of equity

If the portfolio is down 8% in a single day, new entries pause until the next reset. This is the maximum loss tolerated before assuming something systematic is wrong with the day (market regime shift, broken data feed, fat-fingered scoring).

Position cap

9 concurrent positions

Architectural cap on total open positions across both long and short sides combined. Enforced in three places: the risk engine pre-check, the scheduler's effective-cap pre-check (which accounts for orders placed but not yet visible in Alpaca's positions endpoint), and a final guard in the trade executor immediately before submitting any order. The three-layer enforcement means a single race condition during the order-fill window cannot blow past the cap.

Decisions Made

Why paper trading first

Real money compounds errors. A bug in the scoring engine running on $10 of paper capital is a curiosity; the same bug on real capital is a withdrawal. Paper validates the full stack — data, scoring, execution, monitoring — under live market conditions without putting capital at risk.

Why every signal max is between 10 and 25

No single signal can dominate the composite. The 25-point spread between the highest-weighted signal (Congress) and the lowest (Momentum / ARK) keeps the strongest signal influential without making any other signal a rounding error. This also forces multi-signal agreement to push a ticker past the +15 entry threshold — single-signal events almost never cross alone.

Why long and short signals are scored separately

Some signals only fire in one direction by construction — Insider, ARK, and Themes are long-only. Folding them into a symmetric scale would mean short candidates are scored on a smaller absolute range than longs, which is what the asymmetric +15 / −10 thresholds already account for. Keeping the scoring per-direction also lets us be more conservative on shorts, where the risk profile is asymmetric.

Why we excluded social-media sentiment signals

Reddit and X are too easy to manipulate (pump campaigns, coordinated meme cycles) and the signal-to-noise ratio is worse than the news-sentiment signal we already get from Polygon. Until there's a way to filter coordinated activity in real time, the data isn't trustworthy enough to weight.

Why we score insider buys but not sells

Insider buys are highly informative — executives only buy their own stock with real money when they expect it to go up. Insider sells, by contrast, are noisy: most are driven by RSU vesting, tax planning, or diversification, not by negative outlook. Scoring sells would add false negatives to short candidates that aren't actually fundamentally negative.

Why the shadow race is record-only and gated

No shadow scorer drives a trade, modifies a v1 weight, or touches any live signal. The gate (SCORING_V2_ENABLED env flag) means the race layer runs harmlessly in production as a pure measurement layer. This prevents the error of fitting calibration weights to a short in-sample window — the race collects honest out-of-sample forward returns first, then calibration happens later on that data.

Why equal-weight is the primary race control, not SPY

An equal-weight portfolio holds every name in the universe with no look-ahead and no signal fitting — it is the hardest no-information baseline to beat over short windows. If a sophisticated scorer underperforms equal-weight, that is strong evidence the scorer is adding noise to ranking rather than improving it. SPY measures total-return premium vs holding the index; equal-weight specifically measures whether the per-name ranking adds value.