TBB Blog

Automating Onboarding with GAI Extractors.

At The Blue Box, we developed a platform for Broken Shepherds that helps users engage with nonprofits through volunteer and giving programs, streamlining their onboarding process.

One of the biggest challenges was the onboarding of nonprofit organizations. In the U.S., nonprofits typically provide an IRS Form 990, a lengthy document containing tax and operational details. Manually reviewing these forms was slow, error-prone, and created friction for nonprofits eager to join the platform.

We implemented an AI-driven automation powered by Google Document AI, which automatically extracts the most relevant data points from Form 990 and pushes them into Broken Shepherds’s platform. Thanks to pre-trained models, we only needed 15 labeled examples to fine-tune the workflow and achieved +99% accuracy in data extraction.
 

🧠 What is Google Document AI?

Google Document AI is a cloud-based platform that combines OCR (Optical Character Recognition) with AI and natural language processing to transform unstructured documents into structured, machine-readable data.

Unlike traditional OCR that only “reads” characters, Document AI understands context and meaning, making it highly effective for structured forms like invoices, contracts, and tax filings.

In our case, it meant we could go from a PDF Form 990 to a clean JSON of key fields (EIN, organization name, mission, financials) in seconds.
 

✨ New Power: Generative AI Custom Extractor

In mid-2024, Google released the Document AI Custom Extractor powered by Generative AI, now generally available (announcement).

This feature allows developers to train custom extraction models with far fewer examples while covering a wider range of document types. Instead of relying solely on pre-trained templates, the system can adapt quickly to new formats by leveraging generative AI’s pattern recognition and contextual understanding.

For use cases like IRS Form 990, this means:

  • Even fewer samples are required to reach production-level accuracy.
  • The system can handle slight layout variations across nonprofit filings.
  • Continuous improvements as Google’s generative models evolve.

This aligns perfectly with our experience of needing just 15 samples for training. Going forward, the barrier to deploying document automation will be even lower.
 

✅ Benefits of Using Document AI

  • Pre-trained models: Ready-to-use parsers for tax docs, invoices, IDs, contracts.
  • Low training effort: Fine-tune with as few as 10–20 examples (or fewer with the custom extractor).
  • High accuracy: 95–99%+ for structured forms.
  • Scalable & serverless: Process thousands of docs in parallel.
  • Seamless GCP integration: Works with BigQuery, Vertex AI, and Cloud Storage.


 

⚠️ Considerations

  • Pricing: Pay-per-page can grow costly at scale.
  • Customization: Pre-trained models shine on standard docs; custom extractor now closes this gap.
  • Cloud-only: Data residency/compliance may be a concern.
  • Learning curve: Requires some GCP familiarity.
     

🔄 How Does it Compare?

FeatureGoogle Document AIAWS TextractAzure Form Recognizer
Accuracy (forms)⭐⭐⭐⭐½ (99%+)⭐⭐⭐⭐ (95%+)⭐⭐⭐⭐ (95%+)
Pre-trainedYes (forms, invoices, docs)LimitedYes (invoices, receipts, ... )
Custom extractorGenAI-powered, few-shotBasic templatesYes, but less flexible
Training effortVery low (few samples)MediumMedium
IntegrationStrong with GCPStrong with AWSStrong with Azure
Cost$$ per page$$ per page$$ per page
Best fitComplex docsOCR at AWS scaleEnterprise in MS ecosystem


 

🎯 Impact

  • For nonprofits: Faster, smoother onboarding.
  • For Broken Shepherds Platform: Reduced manual work, fewer errors, and higher efficiency.


 

🌍 Why it matters

By combining Broken Shepherds’s mission-driven platform with Google Document AI, we transformed a slow and manual onboarding workflow into a fast, automated, and highly accurate process.

With the addition of Generative AI-powered custom extraction, the possibilities for scaling document automation are even broader — helping businesses and nonprofits save time, reduce friction, and focus on what really matters: creating impact.


 

Ready to Take the Next Step?

If you want to optimize your business processes and take advantage of the benefits of these technologies, we are here to help. Complete the form and get FREE advice on how to implement this technology in your company. Click the following link to get started:.

Automating Onboarding with GAI Extractors.
Written byThe Blue Box

Small team. Smart systems. Real impact.

Newsletter Signup

Stay Informed

Get the latest tech insights delivered to your inbox.