
At The Blue Box, we developed a platform for Broken Shepherds that helps users engage with nonprofits through volunteer and giving programs, streamlining their onboarding process.
One of the biggest challenges was the onboarding of nonprofit organizations. In the U.S., nonprofits typically provide an IRS Form 990, a lengthy document containing tax and operational details. Manually reviewing these forms was slow, error-prone, and created friction for nonprofits eager to join the platform.
We implemented an AI-driven automation powered by Google Document AI, which automatically extracts the most relevant data points from Form 990 and pushes them into Broken Shepherds’s platform. Thanks to pre-trained models, we only needed 15 labeled examples to fine-tune the workflow and achieved +99% accuracy in data extraction.
Google Document AI is a cloud-based platform that combines OCR (Optical Character Recognition) with AI and natural language processing to transform unstructured documents into structured, machine-readable data.
Unlike traditional OCR that only “reads” characters, Document AI understands context and meaning, making it highly effective for structured forms like invoices, contracts, and tax filings.
In our case, it meant we could go from a PDF Form 990 to a clean JSON of key fields (EIN, organization name, mission, financials) in seconds.
In mid-2024, Google released the Document AI Custom Extractor powered by Generative AI, now generally available (announcement).
This feature allows developers to train custom extraction models with far fewer examples while covering a wider range of document types. Instead of relying solely on pre-trained templates, the system can adapt quickly to new formats by leveraging generative AI’s pattern recognition and contextual understanding.
For use cases like IRS Form 990, this means:
This aligns perfectly with our experience of needing just 15 samples for training. Going forward, the barrier to deploying document automation will be even lower.
| Feature | Google Document AI | AWS Textract | Azure Form Recognizer |
|---|---|---|---|
| Accuracy (forms) | ⭐⭐⭐⭐½ (99%+) | ⭐⭐⭐⭐ (95%+) | ⭐⭐⭐⭐ (95%+) |
| Pre-trained | Yes (forms, invoices, docs) | Limited | Yes (invoices, receipts, ... ) |
| Custom extractor | GenAI-powered, few-shot | Basic templates | Yes, but less flexible |
| Training effort | Very low (few samples) | Medium | Medium |
| Integration | Strong with GCP | Strong with AWS | Strong with Azure |
| Cost | $$ per page | $$ per page | $$ per page |
| Best fit | Complex docs | OCR at AWS scale | Enterprise in MS ecosystem |
By combining Broken Shepherds’s mission-driven platform with Google Document AI, we transformed a slow and manual onboarding workflow into a fast, automated, and highly accurate process.
With the addition of Generative AI-powered custom extraction, the possibilities for scaling document automation are even broader — helping businesses and nonprofits save time, reduce friction, and focus on what really matters: creating impact.
If you want to optimize your business processes and take advantage of the benefits of these technologies, we are here to help. Complete the form and get FREE advice on how to implement this technology in your company. Click the following link to get started:.
Small team. Smart systems. Real impact.
Newsletter Signup