SERVICES
Building a Solar Flare Grid Impact Engine on Databricks – The Databricks Hackathon Journey That Earned 2nd Place Globally
I recently entered the Databricks Free Edition Hackathon 2025, which challenged participants to build an end-to-end data engineering solution using Databricks Free Edition. I built a Theoretical Solar Flare Grid Impact Intelligence System that combines NASA space weather data with power grid monitoring to predict potential blackouts.
Finding the Problem
Hackathons give you the freedom to approach problems from different angles. Most of the time you’re handed a problem and asked to solve it, but with a broad scope I decided to flip that – find interesting data first and then discover what problem it could solve.
I’m drawn to data engineering because I like understanding how systems work, and that curiosity extends to physics in my spare time. Solar flares seemed like fascinating territory to explore. I’m no expert, but I knew they can cause serious problems for electrical grids, especially aging infrastructure. The question formed: what if we could predict these events days in advance and help grid operators prepare?
Building the Solution
The solution needed to ingest real-world data, make sense of it, and make it accessible to people who aren’t data engineers or analysts.
The Data Pipeline
I built the entire pipeline on Databricks Free Edition using Delta Live Tables with a medallion architecture:
Bronze Layer – Raw data ingestion using Auto Loader:
- NASA space weather observations (solar flare classifications, intensity, timing)
- Power grid fault detection training data (voltage, current, temperature, equipment health scores)
Silver Layer – Data quality and enrichment:
- Used @dlt.expect_or_drop() to validate timestamps, flare classifications, voltage ranges, and temperature limits
- Added severity classifications and temporal features
- This is where things get cleaned up and ready for analysis
Gold Layer – Business-ready analytics:
- Created correlation tables joining solar and grid data by date
- Added temporal lag features (same-day, next-day, 2-3 days later) because geomagnetic storms may not hit infrastructure instantly
- Built ML-enriched tables with predictions and probability forecasts
The model creates scenario-based predictions ranging from “Quiet Sun” (minimal impact) to “Severe X-class storms” (catastrophic potential), with expected fault counts, risk levels, operational recommendations, and probability estimates.
Making It Accessible
AI/BI Genie turned out to be the key to making this usable. I pre-computed all the predictions, risk assessments, and recommendations into gold tables, then let Genie handle the natural language interface allowing grid operators to query and visualise the data without the need for code.
Operators can ask questions like:
- “What happens if we get a severe X-class solar storm tomorrow?”
- “Show me the most likely solar scenarios for the next 7 days”
- “At what flare intensity should we activate emergency protocols?”
- “Visualize daily faults and types”
Genie also allows you to set up suggested questions and add descriptions to guide users toward the most relevant queries. This means operators can get started immediately with pre-configured questions tailored to their workflow, without needing to know what questions to ask in the first place.
What Grid Operators Get
The final system provides:
- 7-day forecasts with probability estimates for different solar scenarios
- Clear risk thresholds – when to escalate from routine monitoring to emergency protocols
- Specific action plans – concrete steps like “pre-position repair crews at substations” or “alert hospitals about potential outages”
- Anomaly detection – flags unusual patterns that need investigation
- Natural language queries via Genie
The Tech Stack
Built entirely on Databricks Free Edition:
- Delta Live Tables for pipeline orchestration
- Auto Loader for streaming data ingestion
- PySpark for data transformations
- AI/BI Genie for natural language queries
- Python ML libraries (RandomForest) for predictive modeling
The Journey
This hackathon gave me the freedom to work backwards – starting with fascinating data and discovering a problem worth solving. It’s not the typical workflow, but it’s a reminder of why data engineering is interesting: curiosity about how systems work and the drive to build solutions that could make a difference.
The tight timeframe was actually a benefit. Between coming up with an idea, building it, filming a demo (and deciding what to actually fit into max 5 minutes!), learning video editing for the first time, there wasn’t room for overthinking. The time constraint forces you to think about the most efficient way to build something rather than sitting in decision paralysis.
I’ve always worked on the premise that if something seems convoluted, there’s probably a simpler approach or if something seems arduous, there’s surely got to be an easier way. The Databricks tools aligned perfectly with that philosophy. Building what could be a complex Grid Impact Intelligence System was made simply and efficiently – Delta Live Tables handled the pipeline orchestration declaratively, Auto Loader managed streaming ingestion automatically, and Genie turned complex queries into natural language. What could have been weeks of infrastructure setup and query optimization became days of focusing on the actual problem. The tools did the heavy lifting so I could focus on solving the problem (albeit a problem that I created!).

