<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:yandex="http://news.yandex.ru" xmlns:turbo="http://turbo.yandex.ru" xmlns:media="http://search.yahoo.com/mrss/">
  <channel>
    <title>SM projects</title>
    <link>https://indext.io</link>
    <description/>
    <language>ru</language>
    <lastBuildDate>Wed, 08 Apr 2026 14:57:40 +0300</lastBuildDate>
    <item turbo="true">
      <title>Kaggle Dataset: promoPulse (promotional offers, coupons and deals)</title>
      <link>https://indext.io/tpost/1bdylo8r81-kaggle-dataset-promopulse-promotional-of</link>
      <amplink>https://indext.io/tpost/1bdylo8r81-kaggle-dataset-promopulse-promotional-of?amp=true</amplink>
      <pubDate>Tue, 17 Mar 2026 17:56:00 +0300</pubDate>
      <author>Indext Data Lab</author>
      <enclosure url="https://static.tildacdn.com/tild3030-6465-4635-b430-636436356464/Gemini_Generated_Ima.png" type="image/png"/>
      <description>Daily-updated collection of promotional offers, coupons, and deals from major US e-commerce websites.</description>
      <turbo:content><![CDATA[<header><h1>Kaggle Dataset: promoPulse (promotional offers, coupons and deals)</h1></header><figure><img alt="" src="https://static.tildacdn.com/tild3030-6465-4635-b430-636436356464/Gemini_Generated_Ima.png"/></figure><h2  class="t-redactor__h2">Automated E-Commerce Promo Monitoring with LLM Extraction — promoPulse (Open Source Dataset)</h2><div class="t-redactor__text"><strong>Tag:</strong> E-Commerce · Data Pipeline · LLM Extraction · Open Source · Kaggle</div><h3  class="t-redactor__h3">The problem: promotional data is fragmented, stale, and unstructured</h3><div class="t-redactor__text">E-commerce teams and data scientists share the same frustration with promotional intelligence: there is no reliable, structured source of what competitors are running today. Deals pages are dynamic, return raw HTML, and change format without notice.</div><div class="t-redactor__text">Manual monitoring doesn't scale. Five retailers, daily deal cycles, seasonal campaigns, flash sales — that's hundreds of promotions per day, each with a different discount structure, expiration logic, and format. Some have coupon codes, some percentage discounts, some are BOGO, some are free shipping with a minimum order. Normalizing all of that by hand is not a pipeline; it's a full-time job.</div><div class="t-redactor__text">For data scientists and ML engineers, the problem is different but adjacent: training data for e-commerce entity extraction, promotion classification, and discount modeling is either paywalled, poorly structured, or not updated frequently enough to reflect real market behavior. What's missing is a transparent, daily-updated dataset with documented methodology, verified source URLs, and consistent schema — available in CSV, JSON, and Parquet.</div><h3  class="t-redactor__h3">The solution: multi-source scraping + LLM-powered structured extraction</h3><div class="t-redactor__text">promoPulse is a daily-updated dataset of promotional offers, coupons, and deals from major US e-commerce retailers. It is also a live demonstration of the Indext Data Lab extraction pipeline — fully documented, with every record traceable to its original source URL.</div><div class="t-redactor__text"><strong>The pipeline runs in four stages:</strong></div><div class="t-redactor__text"><strong>Multi-source fetching.</strong> Content is retrieved from public deal pages using three scraping APIs in parallel — Jina, Tavily, and Firecrawl. Multiple sources per site improve coverage and provide redundancy when one API returns incomplete content.</div><div class="t-redactor__text"><strong>LLM-powered extraction.</strong> Raw page text is processed by GPT-4o-mini and Llama-3.3-70B to identify structured fields: title, promo code, discount value, discount type, expiration date, and description. LLMs handle the format variation that makes rule-based extraction brittle.</div><div class="t-redactor__text"><strong>Deduplication.</strong> Two-key logic runs at both daily and cumulative history levels. Primary key when a promo code exists: (source_site, promo_code). Fallback: (source_site, title, source_url, valid_until). Same-day re-runs merge without duplicating records.</div><div class="t-redactor__text"><strong>Validation and normalization.</strong> Discount values are normalized to a consistent numeric format. Promo codes are verified against raw page content. JSON output is validated with a retry mechanism on extraction failures.</div><div class="t-redactor__text"><strong>Stack:</strong> Jina · Tavily · Firecrawl · GPT-4o-mini · Llama-3.3-70B · Python · Parquet / CSV / JSON</div><div class="t-redactor__text">Pipeline runs automatically every day at 08:00 UTC. Reliability over the last 35 days: <strong>100%</strong>.</div><h3  class="t-redactor__h3">Results: 4,600+ records, 32 days of history, 5 retailers</h3><div class="t-redactor__text">Today's snapshot (2026-04-08):</div><pre class="t-redactor__highlightcode"><code data-lang="{$la}">Promotions found   212
Sites scraped        5
Coupon codes         6
Pipeline status     OK</code></pre><div class="t-redactor__text">Per-site breakdown:</div><div class="t-redactor__embedcode"><table style="width:100%;border-collapse:collapse;font-family:monospace;font-size:15px;">
  <thead>
    <tr style="border-bottom:2px solid #000;">
      <th style="text-align:left;padding:8px 12px;">Site</th>
      <th style="text-align:right;padding:8px 12px;">Promos</th>
      <th style="text-align:right;padding:8px 12px;">Max Discount</th>
      <th style="text-align:right;padding:8px 12px;">Codes</th>
    </tr>
  </thead>
  <tbody>
    <tr><td style="padding:8px 12px;">officedepot.com</td><td style="text-align:right;padding:8px 12px;">107</td><td style="text-align:right;padding:8px 12px;">54.5%</td><td style="text-align:right;padding:8px 12px;">1</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">ulta.com</td><td style="text-align:right;padding:8px 12px;">52</td><td style="text-align:right;padding:8px 12px;">100.0%</td><td style="text-align:right;padding:8px 12px;">1</td></tr>
    <tr><td style="padding:8px 12px;">shutterfly.com</td><td style="text-align:right;padding:8px 12px;">18</td><td style="text-align:right;padding:8px 12px;">50.0%</td><td style="text-align:right;padding:8px 12px;">4</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">1800flowers.com</td><td style="text-align:right;padding:8px 12px;">12</td><td style="text-align:right;padding:8px 12px;">50.0%</td><td style="text-align:right;padding:8px 12px;">0</td></tr>
    <tr><td style="padding:8px 12px;">homedepot.com</td><td style="text-align:right;padding:8px 12px;">7</td><td style="text-align:right;padding:8px 12px;">30.0%</td><td style="text-align:right;padding:8px 12px;">0</td></tr>
  </tbody>
</table></div><div class="t-redactor__text">Data quality across key fields:</div><div class="t-redactor__embedcode"><table style="width:100%;border-collapse:collapse;font-family:monospace;font-size:15px;">
  <thead>
    <tr style="border-bottom:2px solid #000;">
      <th style="text-align:left;padding:8px 12px;">Field</th>
      <th style="text-align:right;padding:8px 12px;">Fill Rate</th>
      <th style="text-align:left;padding:8px 12px;">Note</th>
    </tr>
  </thead>
  <tbody>
    <tr><td style="padding:8px 12px;">title</td><td style="text-align:right;padding:8px 12px;font-weight:500;color:#2a9d5c;">100%</td><td style="padding:8px 12px;font-size:13px;color:#888;">Always present</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">discount_type</td><td style="text-align:right;padding:8px 12px;font-weight:500;color:#2a9d5c;">100%</td><td style="padding:8px 12px;font-size:13px;color:#888;">Always classified</td></tr>
    <tr><td style="padding:8px 12px;">description</td><td style="text-align:right;padding:8px 12px;font-weight:500;color:#2a9d5c;">100%</td><td style="padding:8px 12px;font-size:13px;color:#888;">Always present</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">discount_value</td><td style="text-align:right;padding:8px 12px;font-weight:500;color:#e8a020;">52%</td><td style="padding:8px 12px;font-size:13px;color:#888;">Not all promos have numeric value</td></tr>
    <tr><td style="padding:8px 12px;">valid_until</td><td style="text-align:right;padding:8px 12px;font-weight:500;color:#e8a020;">34%</td><td style="padding:8px 12px;font-size:13px;color:#888;">Retailers often omit expiry dates</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">promo_code</td><td style="text-align:right;padding:8px 12px;font-weight:500;color:#888;">3%</td><td style="padding:8px 12px;font-size:13px;color:#888;">Most discounts are automatic</td></tr>
  </tbody>
</table></div><div class="t-redactor__text">The low fill rates on promo_code and valid_until reflect real-world retailer behavior, not extraction failures. Most promotions are automatic checkout discounts with no published expiration.</div><h2  class="t-redactor__h2">What you can build with this data</h2><div class="t-redactor__text"><strong>For e-commerce and marketing teams:</strong> benchmark your promotional frequency and discount depth against market leaders. Track which discount types competitors favor — percentage off, BOGO, free shipping, fixed amount. Identify seasonal cycles before they happen. The daily_stats.csv and site_stats.csv analytics files are ready for dashboarding without any preprocessing.</div><div class="t-redactor__text"><strong>For data scientists and ML engineers:</strong> the dataset provides high-quality labeled training data for promotion entity extraction, discount classification, and e-commerce NLP models. The LLM-structured output with verified source URLs gives you full provenance on every record. The starter EDA notebook on Kaggle covers daily volume trends, discount type distribution, coupon frequency analysis, and day-of-week seasonality — ready to fork and run.</div><div class="t-redactor__text"><strong>Dataset structure</strong></div><pre class="t-redactor__highlightcode"><code data-lang="{$la}">dataset/
current/     # Latest extraction snapshot (CSV, JSON, Parquet)
history/     # Daily archives + full cumulative history
analytics/   # daily_stats.csv, site_stats.csv</code></pre><div class="t-redactor__text">Each record includes: title · promo_code · discount_value · discount_type · source_site · source_url · valid_from · valid_until · description · collect_date<br /><br />License: <strong>CC BY 4.0</strong> — use freely for research, commercial analytics, or model training with attribution.</div><h2  class="t-redactor__h2">Download the dataset → </h2><div class="t-redactor__text"><strong>Kaggle:</strong> https://www.kaggle.com/datasets/indext-data-lab-ai/promos-dataset</div><div class="t-redactor__text">Fork the starter EDA notebook to run your own analysis immediately.</div><h2  class="t-redactor__h2">Need a custom data pipeline?</h2><div class="t-redactor__text">promoPulse covers US retailers updated daily. If your business needs monitoring of specific competitors, higher update frequency, additional geographies, or a fully integrated AI-driven extraction pipeline — Indext Data Lab builds these as custom data products.</div><div class="t-redactor__text"><strong>Connect on LinkedIn → </strong>https://www.linkedin.com/company/indext-data-lab/<strong> </strong></div>]]></turbo:content>
    </item>
    <item turbo="true">
      <title>Windows UI Element Detector</title>
      <link>https://indext.io/tpost/bzltacekm1-windows-ui-element-detector</link>
      <amplink>https://indext.io/tpost/bzltacekm1-windows-ui-element-detector?amp=true</amplink>
      <pubDate>Tue, 17 Mar 2026 19:54:00 +0300</pubDate>
      <author>Indext Data Lab</author>
      <enclosure url="https://static.tildacdn.com/tild6233-3664-4565-b061-333030323037/1774982478828.jpeg" type="image/jpeg"/>
      <description>Upload a Windows screenshot to detect interactive UI elements (buttons, textboxes, checkboxes, dropdowns, icons, tabs, menu items).</description>
      <turbo:content><![CDATA[<header><h1>Windows UI Element Detector</h1></header><figure><img alt="" src="https://static.tildacdn.com/tild6233-3664-4565-b061-333030323037/1774982478828.jpeg"/></figure><h2  class="t-redactor__h2">Windows UI Element Detector — Try the Live Demo</h2><div class="t-redactor__text"><strong>Tag:</strong> Computer Vision · Windows Automation · Live Demo · Open Source</div><h3  class="t-redactor__h3">What this tool does</h3><div class="t-redactor__text">Windows UI Element Detector is a browser-based demo of a computer-vision model that finds interactive elements in any Windows screenshot — buttons, text fields, checkboxes, dropdowns, icons, tabs, and menu items. Upload a screenshot, get bounding boxes and JSON output back in seconds.</div><div class="t-redactor__text">Under the hood it runs YOLO11s fine-tuned on 3,000 synthetic Windows-style UI screenshots, with EasyOCR for text reading and rapidfuzz for fuzzy label matching. No cloud APIs. No data sent anywhere. Everything runs locally on the Space hardware.</div><h3  class="t-redactor__h3">Who needs this</h3><div class="t-redactor__text">UI automation agents that rely on native accessibility APIs — pywinauto, UIAutomation — regularly fail on custom-rendered controls, Electron apps, and heavily themed enterprise software. When the accessibility tree returns nothing, you need a vision fallback. This demo lets you test whether the model works on your specific application before integrating the library into your pipeline.</div><h3  class="t-redactor__h3">How to use the demo</h3><div class="t-redactor__text">Upload any Windows screenshot — a dialog box, a settings panel, a full desktop window. Adjust the confidence threshold to control how many detections appear. Use the IoU slider to tune overlap suppression. Filter by class if you only care about buttons or text fields. Hit Detect.</div><div class="t-redactor__text">The overlay shows bounding boxes with class labels and confidence scores. The JSON output gives you the raw data: class name, bounding box coordinates, score — ready to copy into your integration.</div><div class="t-redactor__text"><strong>Controls:</strong></div><div class="t-redactor__text"><ul><li data-list="bullet"><strong>Confidence threshold</strong> — lower it to catch more elements, raise it to keep only high-certainty detections</li><li data-list="bullet"><strong>IoU threshold (NMS)</strong> — controls how aggressively overlapping boxes are merged</li><li data-list="bullet"><strong>Filter classes</strong> — select specific element types or leave empty to detect all seven</li></ul></div><h3  class="t-redactor__h3">Model performance</h3><div class="t-redactor__text">Trained on NVIDIA RTX 5060 (Blackwell, 8 GB) for 120 epochs on 3,000 synthetic Windows screenshots generated via Playwright — no manual annotation required.</div><div class="t-redactor__text"><strong>Overall metrics:</strong></div><div class="t-redactor__embedcode"><table style="width:100%;border-collapse:collapse;font-family:monospace;font-size:15px;">
  <thead>
    <tr style="border-bottom:2px solid #000;">
      <th style="text-align:left;padding:8px 12px;">Metric</th>
      <th style="text-align:right;padding:8px 12px;">Value</th>
    </tr>
  </thead>
  <tbody>
    <tr><td style="padding:8px 12px;">mAP@50</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.989</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">mAP@50–95</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.954</td></tr>
    <tr><td style="padding:8px 12px;">Precision</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.996</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">Recall</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.973</td></tr>
    <tr><td style="padding:8px 12px;">CPU inference (Apple M2 Pro)</td><td style="text-align:right;padding:8px 12px;font-weight:500;">44–79 ms</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">GPU inference (RTX 5060)</td><td style="text-align:right;padding:8px 12px;font-weight:500;">2–5 ms</td></tr>
  </tbody>
</table></div><div class="t-redactor__text"><strong>Per-class AP@50:</strong></div><div class="t-redactor__embedcode"><table style="width:100%;border-collapse:collapse;font-family:monospace;font-size:15px;">
  <thead>
    <tr style="border-bottom:2px solid #000;">
      <th style="text-align:left;padding:8px 12px;">Component</th>
      <th style="text-align:right;padding:8px 12px;">Score</th>
    </tr>
  </thead>
  <tbody>
    <tr><td style="padding:8px 12px;">Button</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9919</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">Textbox</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9771</td></tr>
    <tr><td style="padding:8px 12px;">Checkbox</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9864</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">Dropdown</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9829</td></tr>
    <tr><td style="padding:8px 12px;">Icon</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9950</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">Tab</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9950</td></tr>
    <tr><td style="padding:8px 12px;">Menu item</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9915</td></tr>
  </tbody>
</table></div><h2  class="t-redactor__h2">Use it in your project</h2><div class="t-redactor__text">The library installs with a single command. Model weights download automatically from HuggingFace on first run.</div><pre class="t-redactor__highlightcode"><code data-lang="{$la}">pip install -e .</code></pre><pre class="t-redactor__highlightcode"><code data-lang="{$la}">from local_ui_locator import detect_elements, find_by_text, safe_click_point

# Detect all UI elements
detections = detect_elements(&quot;screenshot.png&quot;, conf=0.3)
for det in detections:
    print(f&quot;{det.type}: {det.bbox} (score={det.score:.2f})&quot;)

# Find element by visible label
match = find_by_text(&quot;screenshot.png&quot;, query=&quot;Sign in&quot;)
if match:
    x, y = safe_click_point(match.bbox)
    print(f&quot;Click at ({x}, {y})&quot;)</code></pre><div class="t-redactor__text">Full source code, training pipeline, and synthetic dataset generator on GitHub → https://github.com/Indext-Data-Lab/windows-ui-synth</div><h2  class="t-redactor__h2">Known limitations</h2><div class="t-redactor__text">The model performs best on standard Windows 10 and 11 UI. Heavily custom-styled applications — games, custom-skinned enterprise tools, non-standard widget libraries — may show lower accuracy due to the synthetic training data. The detector returns bounding boxes and class labels only; text content within elements requires the OCR layer. Seven element classes are supported in this release.</div><h2  class="t-redactor__h2">Stack</h2><div class="t-redactor__text">YOLO11s (Ultralytics) · EasyOCR · rapidfuzz · Playwright · MIT License</div><div class="t-redactor__text"><strong>HuggingFace</strong> <strong>→ </strong>https://huggingface.co/spaces/IndextDataLab/windows-ui-locator</div><div class="t-redactor__text"><strong>GitHub →</strong> https://github.com/Indext-Data-Lab/windows-ui-synth</div><div class="t-redactor__text"><strong>Need a fully integrated AI solution for your business?</strong> Reach out through the website or connect on <strong>LinkedIn</strong></div>]]></turbo:content>
    </item>
    <item turbo="true">
      <title>Indext Stealth Launcher - Windows AI agent</title>
      <link>https://indext.io/tpost/g894auiz51-indext-stealth-launcher-windows-ai-agent</link>
      <amplink>https://indext.io/tpost/g894auiz51-indext-stealth-launcher-windows-ai-agent?amp=true</amplink>
      <pubDate>Wed, 08 Apr 2026 12:23:00 +0300</pubDate>
      <enclosure url="https://static.tildacdn.com/tild3637-3635-4733-b634-376136633630/guide_5_1.jpg" type="image/jpeg"/>
      <description>A Windows app that turns Microsoft Edge into a programmable browser — and connects it to n8n via a local HTTP agent. </description>
      <turbo:content><![CDATA[<header><h1>Indext Stealth Launcher - Windows AI agent</h1></header><figure><img alt="" src="https://static.tildacdn.com/tild3637-3635-4733-b634-376136633630/guide_5_1.jpg"/></figure><h2  class="t-redactor__h2">Computer-Vision Fallback for Windows UI Automation — Local UI Locator (Open Source)</h2><div class="t-redactor__text"><strong>Tag:</strong> Computer Vision · Windows Automation · Open Source · Python</div><h3  class="t-redactor__h3">The problem: when native UI APIs go silent</h3><div class="t-redactor__text">Windows UI automation is built on accessibility APIs. Libraries like pywinauto and UIAutomation query the element tree of an application — finding a button by name, reading its state, clicking it programmatically. In theory, this works universally. In practice, it breaks constantly.</div><div class="t-redactor__text">Custom-rendered controls in Electron apps, legacy Win32 with owner-draw, dynamically injected popups, and aggressively themed enterprise software often expose no accessibility tree at all. The API returns nothing — just a flat window of pixels. Your automation agent is blind.</div><div class="t-redactor__text">The classic fallback is template matching: take a screenshot, find a known image, click its center. But template matching breaks the moment DPI, theme, or window scale changes. What you actually need is a model that understands <em>types</em> of UI elements — buttons, text fields, dropdowns — so it can locate them even when they look slightly different from training data. That model needs to run locally, add under 100 ms to each step, and install with a single pip install. Local UI Locator was built for exactly this gap.</div><h3  class="t-redactor__h3">The solution: YOLO11s + OCR + fuzzy matching</h3><div class="t-redactor__text">Local UI Locator is a Python library that provides a computer-vision fallback layer for Windows UI agents. It activates when the accessibility tree returns nothing, takes a screenshot, and returns actionable click coordinates.</div><div class="t-redactor__text"><strong>The pipeline has four stages:</strong></div><div class="t-redactor__text"><strong>Element detection.</strong> A YOLO11s model runs on the screenshot and returns bounding boxes with element type and confidence score. It detects seven classes: button, textbox, checkbox, dropdown, icon, tab, menu_item.</div><div class="t-redactor__text"><strong>Text reading.</strong> EasyOCR reads visible text within each detected bounding box. No system dependencies — pure pip-installable, supports 80+ languages.</div><div class="t-redactor__text"><strong>Fuzzy matching.</strong> rapidfuzz.fuzz.token_set_ratio matches OCR output against the agent's query string. Handles word reordering, partial labels, and minor OCR substitutions robustly. An agent looking for "Sign in" will match a button labeled "Signin" or "Sign In".</div><div class="t-redactor__text"><strong>Action verification.</strong> Before/after screenshot comparison via pixel diff, OCR delta, or combined mode — confirms the click actually had effect.</div><div class="t-redactor__text"><strong>Stack:</strong> YOLO11s (Ultralytics) · EasyOCR · rapidfuzz · Playwright (data generation) · FastAPI · Next.js · Gradio (demo)</div><div class="t-redactor__text">Installation is one command — model weights download automatically from HuggingFace on first run:</div><div class="t-redactor__text"><div class="ql-code-block" data-language="plain">pip install -e .</div></div><div class="t-redactor__text">A quick integration into any automation agent looks like this:</div><div class="t-redactor__text"><div class="ql-code-block" data-language="plain">from local_ui_locator import detect_elements, find_by_text, safe_click_point</div><div class="ql-code-block" data-language="plain"><br /></div><div class="ql-code-block" data-language="plain"># Detect all UI elements in a screenshot</div><div class="ql-code-block" data-language="plain">detections = detect_elements("screenshot.png", conf=0.3)</div><div class="ql-code-block" data-language="plain">for det in detections:</div><div class="ql-code-block" data-language="plain">    print(f"{det.type}: {det.bbox} (score={det.score:.2f})")</div><div class="ql-code-block" data-language="plain"><br /></div><div class="ql-code-block" data-language="plain"># Find a specific element by visible label</div><div class="ql-code-block" data-language="plain">match = find_by_text("screenshot.png", query="Sign in")</div><div class="ql-code-block" data-language="plain">if match:</div><div class="ql-code-block" data-language="plain">    x, y = safe_click_point(match.bbox)</div><div class="ql-code-block" data-language="plain">    print(f"Click at ({x}, {y})")</div></div><div class="t-redactor__text">The library also ships a complete training pipeline. You can regenerate the synthetic dataset, retrain the model on your own UI styles, and evaluate — useful when your application uses a custom theme outside standard Windows 10/11 aesthetics.</div><h3  class="t-redactor__h3">Results: near-perfect detection across all seven classes</h3><div class="t-redactor__text">The model was trained on an NVIDIA RTX 5060 (8 GB, Blackwell) for 120 epochs with early stopping. The synthetic dataset of 3,000 Windows-style screenshots was generated entirely via HTML/CSS templates rendered with Playwright — no manual annotation.</div><div class="t-redactor__embedcode"><table style="width:100%;border-collapse:collapse;font-family:monospace;font-size:15px;">
  <thead>
    <tr style="border-bottom:2px solid #000;">
      <th style="text-align:left;padding:8px 12px;">Metric</th>
      <th style="text-align:right;padding:8px 12px;">Value</th>
    </tr>
  </thead>
  <tbody>
    <tr><td style="padding:8px 12px;">mAP@50</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.989</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">mAP@50–95</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.954</td></tr>
    <tr><td style="padding:8px 12px;">Precision</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.996</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">Recall</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.973</td></tr>
    <tr><td style="padding:8px 12px;">CPU inference (M2 Pro)</td><td style="text-align:right;padding:8px 12px;font-weight:500;">44–79 ms</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">GPU inference (RTX 5060)</td><td style="text-align:right;padding:8px 12px;font-weight:500;">2–5 ms</td></tr>
  </tbody>
</table></div><div class="t-redactor__embedcode"><table style="width:100%;border-collapse:collapse;font-family:monospace;font-size:15px;margin-top:2rem;">
  <thead>
    <tr style="border-bottom:2px solid #000;">
      <th style="text-align:left;padding:8px 12px;">Class</th>
      <th style="text-align:right;padding:8px 12px;">AP@50</th>
    </tr>
  </thead>
  <tbody>
    <tr><td style="padding:8px 12px;">button</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9919</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">textbox</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9771</td></tr>
    <tr><td style="padding:8px 12px;">checkbox</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9864</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">dropdown</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9829</td></tr>
    <tr><td style="padding:8px 12px;">icon</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9950</td></tr>
    <tr style="background:#f5f5f5;"><td style="padding:8px 12px;">tab</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9950</td></tr>
    <tr><td style="padding:8px 12px;">menu_item</td><td style="text-align:right;padding:8px 12px;font-weight:500;">0.9915</td></tr>
  </tbody>
</table></div><div class="t-redactor__text">This represents a meaningful improvement over the prior YOLOv8n baseline (mAP@50 of 0.93) — a 6-point absolute gain — while keeping CPU inference under 80 ms. For a fallback layer that fires only when the accessibility tree is empty, that latency is acceptable.</div><div class="t-redactor__text">The library ships with a Gradio demo that lets you upload any screenshot, adjust confidence threshold, filter by element class, and search elements by text — useful for validating behavior on your specific application before wiring it into an agent.</div><h3  class="t-redactor__h3">Why these specific components</h3><div class="t-redactor__text"><strong>YOLO11s over YOLOv8n.</strong> The accuracy gain from upgrading the backbone was significant: mAP@50 went from 0.93 to 0.989. Inference time roughly doubled on CPU (~30 ms to ~60 ms), but for a fallback layer that activates only on API failure, 60 ms is a reasonable trade-off.</div><div class="t-redactor__text"><strong>EasyOCR over Tesseract.</strong> EasyOCR installs via pip with zero system dependencies. Tesseract requires system package installation and can be fragile in CI/CD environments. EasyOCR also returns word-level bounding boxes that intersect cleanly with the detector output.</div><div class="t-redactor__text"><strong>Synthetic data via Playwright.</strong> Rendering HTML/CSS templates with Playwright gives exact bounding box coordinates from DOM queries — no manual annotation needed. Domain randomization across themes, fonts, DPI scaling, and noise was sufficient to achieve production-grade accuracy on real Windows UI despite training on entirely synthetic images.</div><div class="t-redactor__text"><strong>Fuzzy matching via rapidfuzz.</strong> token_set_ratio handles partial label matches, word reordering, and minor OCR substitutions. Standard string equality would fail on the kind of OCR noise you see in real screenshots.</div><h3  class="t-redactor__h3">Known limitations</h3><div class="t-redactor__text">The model was trained on synthetic data only — real-world applications with heavily custom-styled controls may show a domain gap. It performs best on standard Windows 10 and 11 UI. The current release supports 7 element classes; complex widgets like date pickers, tree views, and data grids are not detected. Text content within elements is not provided by the detector — that requires the OCR layer explicitly. For non-standard applications, the included training pipeline makes it straightforward to generate additional data and fine-tune.</div><h3  class="t-redactor__h3">Source code and documentation →</h3><div class="t-redactor__embedcode"><a href="https://github.com/username/repository" target="_blank">
  ➡️GitHub repository link
</a></div><div class="t-redactor__text"><em>MIT License · Model weights on HuggingFace · Gradio demo included</em></div>]]></turbo:content>
    </item>
  </channel>
</rss>
