Finding relevant, upcoming tech events scattered across different websites and cities in India can be a real challenge. Whether you’re interested in SRE, DevOps, Kubernetes, Cloud Computing, or specific languages like Java, Ruby, or Golang, keeping track often means manually checking multiple sources.
This wasn’t my first attempt to solve this! Previously, I maintained a public Notion page (view the old page here) where I manually gathered and listed these events. While useful, the sheer effort of constantly finding, verifying, and adding events across numerous sites and cities became overwhelming, eventually leading to the manual project becoming defunct.
The desire for a centralized, automatically updated list remained. This led to the current project, where I leveraged automation and AI to revive the original goal.
The Journey: Beyond Manual Updates & Simple Scraping
My initial thought for automation was traditional web scraping using PHP. I planned to define a list of websites and use specific code snippets (like CSS selectors) to pull out event details. However, this approach quickly hit roadblocks:
- Websites change layouts frequently, breaking the scraping logic.
- Maintaining unique scraping rules for dozens of sites is complex.
- Many sites actively block simple scraping attempts.
I realized I needed a more resilient approach. Instead of relying on fragile CSS selectors, what if I could use AI to understand the web page content and extract the relevant information, even if the layout changed slightly? This led me to integrating the Google Gemini API.
How the AI-Powered System Works (The Cool Stuff!)
My current PHP-based system now follows a smarter workflow:
- Targeted Sourcing: It still reads a list of source URLs from a configuration file. For sites like Meetup, I learned the importance of using location-specific URLs (e.g., searching only for “DevOps” events near “Bengaluru”) to get highly relevant content.
- Fetching & Cleaning: The script fetches the raw HTML for each URL using PHP cURL. It performs basic cleaning (removing scripts, styles, etc.) and truncates the content to optimize it for the AI.
- Intelligent Extraction with AI: Instead of fragile parsing code, I send the cleaned HTML to the Gemini API with a detailed prompt. This prompt asks the AI to:
- Analyze the HTML.
- Find events matching my topics (
SRE
,DevOps
,Kubernetes
, etc.) and locations (major Indian cities). - Verify the event is upcoming.
- Extract
title
,date
,location
, andlink
. - Return the results only in a structured JSON format.
- (Cool Thing: The AI handles the complex task of understanding varied HTML and extracting specific data based on instructions!)
- Validation & Filtering: My PHP script receives the structured JSON response. It validates the data, checks for essential fields, and applies extra filters, like explicitly removing events explicitly marked as “Online”.
- Structured Data Storage: Validated, relevant, upcoming, non-online events are saved into a clean JSON data file. (Cool Thing #2: Transforming unstructured web content into structured data.)
- Automation: The entire data gathering process is set up as a Cron Job on the server, automatically running once a week.
Displaying the Goods: The Frontend
A separate PHP page reads the generated JSON data file and presents the events with some neat features:
- Clean Interface: Displays events in a user-friendly, styled list.
- Dynamic City Filters: Automatically creates a dropdown filter based on the cities found in the collected event data.
- Pagination: Handles large numbers of events gracefully, showing 10 per page with clear navigation that respects the city filter.
This project successfully revived the goal of my original manual Notion page by automating the discovery and aggregation process using AI. It transforms scattered web data into a structured, filterable list of relevant tech events across India. While there’s always room for improvement (like more advanced date handling or adding more sources), the AI-powered approach provides a significantly more sustainable and scalable solution for me than manual curation or brittle scraping.
Here’s the link to the project: https://unitechy.com/events/