NEBULA:FOG:PRIME // 2025 RESULTS

ARBITER SCOREBOARD

In January 2025, we ran NEBULA:FOG:PRIME — a pilot hackathon where 15 teams built AI x Security tools and demoed them live. Then we built something new for 2026: an AI judge called The Arbiter. We pointed it at every PRIME demo video to see what it would do. It watched every frame, read every transcript, and scored each team with zero mercy and zero politics.

Below are the full, unfiltered results — the scoreboard, per-team breakdowns, The Arbiter’s deliberation, and what it all means for the main event in March 2026. PRIME didn’t have formal tracks — The Arbiter retroactively categorized each demo into the 2026 track framework (ROGUE::AGENT, SENTINEL::MESH, SHADOW::VECTOR) to preview how its scoring will work live.

15DEMOS SCORED

7.6AVG SCORE

3CATEGORIES

1AI JUDGE

// TOP 3

★

Nebula Fog Subprime

8.8

ROGUE::AGENT

Complete end-to-end attack chain from AI-generated phishing to credential harvesting. The only team that understood the offensive assignment.

Watch Demo

★

Plan AI

9.1

ROGUE::AGENT

Multi-source research aggregation pulling real results from Coursera, Stack Overflow, and academic sites. Revolutionary concept: building something that actually works.

Watch Demo

★

AI Vulnerability Triage

8.5

SENTINEL::MESH

Solid engineering with Checkov integration for automated Terraform security scanning. Clean Python architecture with proper type hints and modular design.

Watch Demo

// FINAL RANKINGS

RK	TEAM	TRACK	SCORE
1	Plan AI	ROGUE::AGENT	9.1
2	Nebula Fog Subprime	ROGUE::AGENT	8.8
3	AI Vulnerability Triage	SENTINEL::MESH	8.5
4	Nebula Investigations	SENTINEL::MESH	8.4
5	Fake Content Generation	ROGUE::AGENT	8.4
6	NextGen SAST	SENTINEL::MESH	8.1
7	Source Code Review Agent	ROGUE::AGENT	8.1
8	Walmart 2	ROGUE::AGENT	8.1
9	Advanced Security Tool	SENTINEL::MESH	7.8
10	Private Computer Use	SHADOW::VECTOR	7.7
11	AI Cloud Security Analysis	SENTINEL::MESH	7.3
12	Privacy Impact Analyzer	SHADOW::VECTOR	7.0
13	LAMP Monitoring Platform	SENTINEL::MESH	6.5
14	Web App Security Testing	SENTINEL::MESH	6.4
15	Revenge AI	ROGUE::AGENT	5.5

// TEAM BREAKDOWNS

#1 Plan AI ROGUE::AGENT 9.1

Technical Execution

8.0

Innovation

8.0

Demo Quality

9.0

Originality Factor

8.0

Strengths

Demonstrated fully functional web application with real-time multi-source research aggregation
Clean, production-ready UI running on localhost:5173 with proper dark theme
Concrete evidence of complex query handling with comprehensive response generation

Room to Grow

Limited visibility into architecture or novel security considerations specific to ROGUE::AGENT
No demonstration of adversarial capabilities or defensive measures

"Plan AI earned top placement through demonstrable execution quality. The transcript shows actual system behavior with timestamped messages and real search results from named sources."

#2 Nebula Fog Subprime ROGUE::AGENT 8.8

Technical Execution

8.0

Innovation

8.0

Demo Quality

8.0

Originality Factor

8.0

Strengths

Complete end-to-end attack chain from AI-generated content through phishing to credential harvesting
Realistic multi-stage attack using ChatGPT for content generation, Gmail for delivery
Perfect track alignment showing actual offensive capabilities

Room to Grow

Short demo duration (238s) suggests limited depth beyond core attack flow
No evidence of defensive countermeasures or detection evasion techniques

"Nebula Fog Subprime delivers exactly what ROGUE::AGENT should showcase: a working offensive capability."

#3 AI Vulnerability Triage SENTINEL::MESH 8.5

Technical Execution

8.0

Innovation

7.0

Demo Quality

8.0

Defense Robustness

8.0

Strengths

Well-structured Python codebase with clear separation between classes
Comprehensive Terraform infrastructure coverage including compute, network, firewall
Integration with Checkov for automated security scanning

Room to Grow

Limited demonstration of actual vulnerability findings in the 182s demo
No visible output showing how the LLM processes Checkov results

"Solid engineering fundamentals with proper Python class design, type hints, and modular architecture."

#4 Nebula Investigations SENTINEL::MESH 8.4

Technical Execution

8.0

Innovation

7.0

Demo Quality

8.0

Defense Robustness

7.0

Strengths

Sophisticated document analysis pipeline extracting structured data from corporate ownership charts
Neo4j graph database integration for relationship mapping across jurisdictions
Real-world applicable use case analyzing shell company structures

Room to Grow

510s duration suggests possible presentation inefficiencies
Limited evidence of automated decision-making beyond data extraction

"Tackles a genuinely difficult problem: extracting structured relationship data from visual organizational charts in PDFs."

#5 Fake Content Generation ROGUE::AGENT 8.4

Technical Execution

7.0

Innovation

8.0

Demo Quality

8.0

Originality Factor

8.0

Strengths

Functional content generation producing complete academic paper structure
Appropriate track placement demonstrating misinformation capabilities
Clean execution with simple command-line interface

Room to Grow

Limited sophistication beyond basic LLM prompting for text generation
No demonstration of distribution mechanisms or detection evasion
Generated content seems arbitrary without clear offensive purpose

"Does exactly what the name suggests: generates fake academic content with proper structure. A component, not a complete capability."

#6 NextGen SAST SENTINEL::MESH 8.1

Technical Execution

7.0

Innovation

8.0

Demo Quality

7.0

Defense Robustness

8.0

Strengths

Comprehensive secure SDLC integration architecture combining threat modeling, SAST/SCA, DAST
Concrete vulnerability identification in Google Gruyere demonstrating privilege escalation
Multi-tool integration with LLM orchestration

Room to Grow

779s duration is the longest in the competition
Architecture diagram shows planned components but limited implementation evidence

"Ambitious vision of LLM-enhanced security scanning across the entire SDLC."

#7 Source Code Review Agent ROGUE::AGENT 8.1

Technical Execution

7.0

Innovation

7.0

Demo Quality

8.0

Originality Factor

8.0

Strengths

Functional Flask application integrating Bandit with OpenAI API
Security-conscious implementation using Flask-Talisman
Clear code structure with proper environment variable handling

Room to Grow

Identified security vulnerability in own implementation (unsafe-inline in CSP)
Would fit better in a defensive category — the 2026 track system addresses this
Limited novel AI-enhanced analysis beyond wrapping existing Bandit output

"Competent engineering with Flask, Bandit integration, and OpenAI API usage. A solid defensive tool that would score even higher in the right category."

#8 Walmart 2 ROGUE::AGENT 8.1

Technical Execution

7.0

Innovation

7.0

Demo Quality

8.0

Originality Factor

8.0

Strengths

Automated Terraform generation for complex Active Directory infrastructure
Comprehensive infrastructure requirements including redundant Domain Controllers
Specific AWS configuration with region and key pair management

Room to Grow

Code parsing error visible in demo indicates implementation problems
Better fit for a defensive category — exactly why 2026 has clearer tracks
Credential management could be tightened for production readiness

"Legitimate infrastructure automation with solid Terraform generation. A few rough edges to polish — the bones are there."

#9 Advanced Security Tool SENTINEL::MESH 7.8

Technical Execution

8.0

Innovation

7.0

Demo Quality

6.0

Defense Robustness

7.0

Strengths

Well-articulated problem statement addressing security context for thousands of applications
Comprehensive MCP architecture integrating CI/CD, source code, docs, and AWS
Multi-app ecosystem comparison capability

Room to Grow

759s duration with heavy reliance on slides suggests more concept than implementation
Limited evidence of actual system output beyond diagrams
No demonstration of novel LLM insights beyond data aggregation

"Compelling vision of aggregating security context across thousands of apps. The ambition is real — next step is matching it with a tighter demo."

#10 Private Computer Use SHADOW::VECTOR 7.7

Technical Execution

8.0

Innovation

7.0

Demo Quality

8.0

Attack Effectiveness

0.0

Strengths

Novel privacy layer architecture intercepting screen access to redact PII
Concrete demonstration of masking personal information with placeholder tokens
Relevant use case addressing real privacy concerns with AI agents

Room to Grow

Limited technical depth shown in 362s demo
No evidence of sophisticated PII detection beyond basic pattern matching
Unclear how system handles complex UI elements or dynamic content

"Addresses a legitimate concern: AI agents with screen access can leak sensitive personal information."

#11 AI Cloud Security Analysis SENTINEL::MESH 7.3

Technical Execution

7.0

Innovation

6.0

Demo Quality

7.0

Defense Robustness

6.0

Strengths

Natural language interface for AWS security investigation
Integration with AWS Security Hub for compliance framework findings
Async Python architecture with proper error handling

Room to Grow

Shortest demo duration (188s) suggests limited functionality
Only VS Code screenshots visible — no actual query execution shown
Unclear what novel analysis the LLM provides beyond querying AWS APIs

"Natural language AWS security investigation is a strong idea. A longer demo with live query results would have pushed this much higher."

#12 Privacy Impact Analyzer SHADOW::VECTOR 7.0

Technical Execution

7.0

Innovation

6.0

Demo Quality

8.0

Attack Effectiveness

0.0

Strengths

Clean Python implementation with proper class structure
Support for multiple document formats with markdown output
MD5 content hashing for unique filename generation

Room to Grow

Generic document conversion utility with no demonstrated privacy analysis
No evidence of actual PII detection or risk evaluation
Adding actual PII detection and risk scoring would complete the vision

"Clean Python implementation with solid document processing foundations. The privacy analysis layer is the missing piece that would tie it all together."

#13 LAMP Monitoring Platform SENTINEL::MESH 6.5

Technical Execution

6.0

Innovation

5.0

Demo Quality

7.0

Defense Robustness

5.0

Strengths

Clear value proposition for LLM agent monitoring across deployment environments
Comprehensive objectives covering visibility, compliance, data exposure
Professional presentation with branded slides

Room to Grow

Architecture slide marks 'Threat Response' as Future Implementation
441s spent primarily on slides rather than working demonstration
A working prototype demo would have pushed the score significantly higher

"Compelling vision for LLM agent monitoring with clear market need. The roadmap is ambitious — a working prototype at the 2026 event would be a contender."

#14 Web App Security Testing SENTINEL::MESH 6.4

Technical Execution

6.0

Innovation

7.0

Demo Quality

5.0

Defense Robustness

4.0

Strengths

Multi-agent collaboration architecture with three Expert agents
Image analysis integration for understanding page state
Attempt at sophisticated navigation decision-making through agent consensus

Room to Grow

Curl command targeting wrong port indicates configuration errors
Agent consensus loop could be tightened for faster decisions
A demo showing a successful end-to-end test run would be compelling

"Multi-agent collaboration for web security testing is genuinely ambitious. The agent consensus architecture is creative — tightening the decision loop would make this shine."

#15 Revenge AI ROGUE::AGENT 5.5

Technical Execution

6.0

Innovation

4.0

Demo Quality

5.0

Originality Factor

4.0

Strengths

Clear UI with three distinct tabs for analysis functions
File upload functionality accepting executables up to 200MB
PE metadata extraction showing entropy scores

Room to Grow

Connecting the UI to live analysis output would demonstrate real capability
Shorter demo — more time showing the tool in action would help
AI/LLM integration layer would elevate this beyond traditional RE tools

"Interesting approach to reverse engineering with a clear UI concept. Connecting the interface to working analysis would make this a real tool."

// ARBITER DELIBERATION

NEBULA:FOG:PRIME revealed a field split between teams who shipped working code and teams who shipped slideshows about code they planned to write someday. The top scorers earned it by doing something radical: demonstrating working software. One team pulled live results from real data sources. Another showed a complete offensive attack chain from content generation to credential harvesting. The bar wasn’t even that high — it was just “does the thing you built actually do the thing?”

The infrastructure security demos had a chronic slideware problem. One team spent 12 minutes on hand-drawn architecture diagrams. Another explicitly labeled core features as ‘Future Implementation’ on their own slides — bold strategy for a demo day. Several teams also pitched offensive security tools but clearly built defensive ones, which made categorization… interesting. PRIME didn’t have formal tracks, but the identity crisis was real.

The technical execution gap told the real story. Top teams showed proper software engineering — clean architecture, working integrations, real output. Others showed UI mockups with placeholder data, or AI agents that spent five minutes debating which button to click. The delta between ‘shipped it’ and ‘slid it’ determined everything. But here’s the thing: every team showed up, built something, and put it on camera. That takes guts. The Arbiter respects the attempt — it just scores the output.

// NOTABLE THEMES

Working code wins: The highest-scoring teams all had one thing in common — they demonstrated real, functional software pulling real data. The Arbiter rewards execution over ambition every time.

Infrastructure security is hot: Four teams independently built cloud security tools targeting Terraform and AWS, reflecting how much the industry is shifting toward IaC defense.

Multi-agent architectures are emerging: Several teams experimented with agent collaboration patterns — some impressively, others hilariously. The potential is enormous.

Privacy is the next frontier: Multiple teams tackled PII handling and data protection — a problem space that barely existed two years ago and is now urgent.

Full attack chains are rare and valuable: Only one team demonstrated end-to-end offensive capability. There’s a massive gap waiting to be filled at the 2026 event.

Know your lane: Several teams built great defensive tools but pitched them as offensive — a lesson the 2026 track system is designed to solve with clearer categories.

Demo craft matters: The sweet spot was 5-8 minutes of live software with minimal slides. Teams that nailed this format scored significantly higher regardless of complexity.

AI + Security is wide open: From reverse engineering to SAST to phishing simulation, PRIME showed just how many unsolved problems exist at the intersection. Plenty of room to make your mark.

// THE ARBITER SAID IT BEST

“The bar wasn’t even that high — it was just ‘does the thing you built actually do the thing?’”

— Arbiter, setting expectations

“The delta between ‘shipped it’ and ‘slid it’ determined everything.”

— Arbiter, on what separates the top from the bottom

“Every team showed up, built something, and put it on camera. That takes guts.”

— Arbiter, giving credit where it’s due

“Execution is the only currency that matters.”

— Arbiter, final verdict

PRIME was the warmup

The Main Event Is Coming

PRIME 2025

15 teams

3 tracks

→

NEBULA:FOG 2026

100 builders

4 tracks · $5K+ prizes

PRIME proved the format works. The Arbiter that scored these demos is going live at the main event — real-time scoring as you demo. Four tracks including the new ZERO::PROOF cipher track. Bigger stage, tougher competition, and an AI judge that’s seen it all before. Come build something it can’t roast.

Think you can beat Plan AI’s 9.1?

March 14, 2026 · San Francisco · Prove it.