---
doc_id: playbooks/buyer/article-065-ai-ingestable-deal-files-standardized-data-fields-prompt-structures-and-retrieva
url: /docs/playbooks/buyer/article-065-ai-ingestable-deal-files-standardized-data-fields-prompt-structures-and-retrieva
title: AI-Ingestable Deal Files — Standardized Data Fields, Prompt Structures, and Retrieval-Ready Underwriting Inputs
description: unknown
jurisdiction: unknown
audience: unknown
topic_cluster: unknown
last_updated: unknown
---

# AI-Ingestable Deal Files — Standardized Data Fields, Prompt Structures, and Retrieval-Ready Underwriting Inputs (/docs/playbooks/buyer/article-065-ai-ingestable-deal-files-standardized-data-fields-prompt-structures-and-retrieva)



Overview [#overview]

The AI-ingestable deal data block introduced in Article 41 established the principle that structured acquisition data produces materially better AI-assisted analysis than unstructured narrative. This companion article provides the full operational schema for a buyer-side deal file — the complete set of data fields, naming conventions, document organization standards, and prompt structures that enable reliable LLM retrieval, analysis, and workflow automation across the full buyer transaction lifecycle.

This article is operational, not conceptual. It is a working specification for building and maintaining AI-ready deal files from pre-offer through post-closing.

***

How the NYC Market Actually Works [#how-the-nyc-market-actually-works]

**Most buyer transaction data is fragmented across multiple sources with inconsistent naming.** Bank statements, brokerage statements, REBNY Financial Statements, board minutes, offering plans, DOB records, title searches, closing statements, and insurance binders are generated by different parties, named inconsistently, and stored across email threads, attorney portals, and personal drives. When a buyer or advisor attempts to assemble this data for analysis — or to provide it to an AI system — the disorganization introduces errors, gaps, and retrieval failures.

**LLM performance on real estate analysis is directly correlated with input structure.** An AI model analyzing a deal where the relevant numbers are labeled, categorized, and consistently named produces analysis that is more accurate, more complete, and more auditable than the same model analyzing an unstructured narrative. The investment in data structure is an investment in analysis quality.

**A standardized deal file schema enables cross-deal comparison.** Buyers considering multiple properties can only compare them reliably if the data from each property is collected and organized using the same schema. A standardized deal file format makes comparison, ranking, and scenario modeling replicable across the acquisition search.

***

Strategic Approach for Buyers [#strategic-approach-for-buyers]

Master Deal File Schema — Field Definitions and Naming Conventions [#master-deal-file-schema--field-definitions-and-naming-conventions]

The following schema defines the complete set of fields for an NYC residential buyer deal file. Every field should be named exactly as specified for consistent retrieval.

MODULE 1 — PROPERTY IDENTIFICATION [#module-1--property-identification]

```
PROPERTY_ADDRESS: [Street number, street name, unit number, borough, zip code]
PROPERTY_TYPE: [Co-op | Condo | Townhouse | Multi-family | Other]
BUILDING_YEAR_BUILT: [YYYY]
BUILDING_UNIT_COUNT: [Integer]
BUILDING_STORIES: [Integer]
LISTING_PRICE: [USD amount]
CONTRACT_PRICE: [USD amount — enter after contract execution]
DAYS_ON_MARKET_AT_OFFER: [Integer]
ASSET_OWNERSHIP_STRUCTURE: [Fee simple | Co-op shares | Condo unit interest]
```

MODULE 2 — UNIT CHARACTERISTICS [#module-2--unit-characteristics]

```
UNIT_SIZE_SQFT: [Integer or "Not disclosed — co-op"]
BEDROOMS: [Integer]
BATHROOMS: [Decimal — e.g., 2.5]
UNIT_FLOOR: [Integer]
BUILDING_TOTAL_FLOORS: [Integer]
UNIT_EXPOSURE: [N | S | E | W | Corner | Rear | Court]
OUTDOOR_SPACE_TYPE: [None | Terrace | Balcony | Garden | Roof deck]
OUTDOOR_SPACE_SQFT: [Integer or 0]
UNIT_CONDITION: [Original | Partial renovation | Fully renovated]
RENOVATION_BUDGET_ESTIMATE: [USD amount or 0]
```

MODULE 3 — MONTHLY CARRYING COSTS [#module-3--monthly-carrying-costs]

```
MONTHLY_MAINTENANCE: [USD — co-op maintenance or condo common charge]
MONTHLY_PROPERTY_TAX: [USD — condo/townhouse only; 0 for co-op]
MONTHLY_MORTGAGE_PAYMENT: [USD — at proposed financing terms]
MONTHLY_ABATEMENT_TAX: [USD — if 421-a or ICAP in effect; else same as MONTHLY_PROPERTY_TAX]
POST_ABATEMENT_MONTHLY_TAX: [USD — if abatement exists; else same as MONTHLY_PROPERTY_TAX]
ABATEMENT_EXPIRATION_DATE: [YYYY-MM or "None"]
TOTAL_MONTHLY_CARRYING_COST: [Sum of mortgage + maintenance + tax]
```

MODULE 4 — FINANCING STRUCTURE [#module-4--financing-structure]

```
DOWN_PAYMENT_AMOUNT: [USD]
DOWN_PAYMENT_PCT: [Decimal — e.g., 0.25]
LOAN_AMOUNT: [USD]
LOAN_TYPE: [30yr fixed | ARM 7/1 | ARM 10/1 | Co-op share loan | Other]
INTEREST_RATE: [Decimal — e.g., 0.0715]
RATE_LOCK_PERIOD_DAYS: [Integer]
RATE_LOCK_EXPIRATION_DATE: [YYYY-MM-DD]
FLOAT_DOWN_OPTION: [Y | N]
CEMA_APPLICABLE: [Y | N | Unknown]
CEMA_SELLER_MORTGAGE_BALANCE: [USD or 0]
BUILDING_MAX_LTV: [Decimal or "N/A — condo"]
LENDER_BUILDING_APPROVED: [Y | N | Not confirmed]
```

MODULE 5 — CLOSING COSTS [#module-5--closing-costs]

```
MANSION_TAX_AMOUNT: [USD — 0 if under $1M; verify current rates]
MRT_AMOUNT: [USD — standard or CEMA-adjusted]
CEMA_MRT_SAVINGS: [USD — 0 if CEMA not applicable]
ATTORNEY_FEE_ESTIMATE: [USD]
TITLE_INSURANCE_ESTIMATE: [USD — 0 for co-op]
LENDER_FEES_ESTIMATE: [USD]
MANAGING_AGENT_FEE: [USD]
MOVE_IN_DEPOSIT: [USD]
FLIP_TAX_AMOUNT: [USD — buyer or seller; specify]
TRANSFER_TAXES_TOTAL: [USD — buyer-paid; in sponsor transactions, verify allocation]
OTHER_CLOSING_COSTS: [USD]
TOTAL_CLOSING_COSTS_ESTIMATE: [Sum of above]
TOTAL_CASH_TO_CLOSE: [Down payment + total closing costs]
```

MODULE 6 — BUYER FINANCIAL PROFILE [#module-6--buyer-financial-profile]

```
TOTAL_LIQUID_ASSETS_PRE_PURCHASE: [USD]
POST_CLOSING_LIQUID_ASSETS: [Total liquid − down payment − closing costs]
MONTHLY_CARRYING_COST_FOR_PCL: [Mortgage + maintenance]
PCL_MONTHS: [Post-closing liquid / monthly carrying cost]
BUILDING_PCL_REQUIREMENT_MONTHS: [Building-specific benchmark or "Unknown"]
GROSS_MONTHLY_INCOME: [USD]
MONTHLY_DEBT_OBLIGATIONS_EXCL_HOUSING: [USD]
BOARD_DTI_PCT: [(Mortgage + maintenance + other debt) / gross income]
BUILDING_DTI_CEILING_PCT: [Building-specific or "Unknown"]
BOARD_QUALIFICATION_STATUS: [Meets both | Meets PCL only | Meets DTI only | Fails both | Unknown]
```

MODULE 7 — BUILDING FINANCIAL HEALTH [#module-7--building-financial-health]

```
RESERVE_FUND_BALANCE: [USD or "Not available"]
ANNUAL_OPERATING_BUDGET: [USD or "Not available"]
RESERVE_AS_PCT_OF_BUDGET: [Decimal or "Not available"]
MOST_RECENT_ASSESSMENT_AMOUNT: [USD]
MOST_RECENT_ASSESSMENT_DATE: [YYYY-MM]
MOST_RECENT_ASSESSMENT_PURPOSE: [Text]
PENDING_APPROVED_ASSESSMENTS: [USD or 0]
MAINTENANCE_5YR_CAGR: [Decimal or "Unknown"]
UNDERLYING_MORTGAGE_BALANCE: [USD — co-op only; 0 for condo]
SPONSOR_UNIT_COUNT: [Integer or "Unknown"]
SPONSOR_UNIT_PCT: [Decimal or "Unknown"]
```

MODULE 8 — REGULATORY AND CAPITAL OBLIGATIONS [#module-8--regulatory-and-capital-obligations]

```
FISP_STATUS: [Safe | SWARMP | Unsafe | Unknown]
FISP_LAST_FILING_DATE: [YYYY-MM or "Unknown"]
FISP_ESTIMATED_REPAIR_COST: [USD or 0]
LL97_STATUS: [Compliant | Penalty-paying | Capital plan in progress | Unknown]
LL97_ESTIMATED_COMPLIANCE_COST: [USD or 0]
DOB_OPEN_VIOLATIONS_COUNT: [Integer]
DOB_OPEN_VIOLATIONS_CLASSES: [List: Class I, Class II, Class III counts]
ELEVATOR_LAST_MODERNIZED: [YYYY or "Unknown"]
BOILER_LAST_REPLACED: [YYYY or "Unknown"]
PLUMBING_RISERS_LAST_REPLACED: [YYYY or "Unknown"]
ROOF_LAST_REPLACED: [YYYY or "Unknown"]
LEAD_PAINT_HPD_VIOLATIONS_OPEN: [Integer]
RADON_TEST_RESULT: [pCi/L or "Not tested"]
TAX_ABATEMENT_PROGRAM: [421-a | ICAP | None | Unknown]
TAX_ABATEMENT_EXPIRATION: [YYYY-MM or "None"]
```

MODULE 9 — TRANSACTION CONTEXT [#module-9--transaction-context]

```
OFFER_COMPETITION_LEVEL: [Sole offer | Multiple offers | Bidding war]
SELLER_MOTIVATION: [Estate | Relocation | Divorce | Upgrade | Unknown]
CONTINGENCIES_RETAINED: [Financing Y/N | Appraisal Y/N | Board approval Y/N | Inspection Y/N]
APPRAISAL_GAP_COVERAGE_AMOUNT: [USD or 0]
CONTRACT_EXECUTION_DATE: [YYYY-MM-DD]
TARGET_CLOSING_DATE: [YYYY-MM-DD]
BOARD_PACKAGE_SUBMITTED_DATE: [YYYY-MM-DD or "Pending"]
BOARD_INTERVIEW_DATE: [YYYY-MM-DD or "Not scheduled"]
BOARD_DECISION: [Approved | Rejected | Pending]
```

MODULE 10 — COMPARABLE SALES [#module-10--comparable-sales]

```
COMP_1_ADDRESS: [Full address]
COMP_1_CLOSE_DATE: [YYYY-MM-DD]
COMP_1_SALE_PRICE: [USD]
COMP_1_SIZE_SQFT: [Integer or "N/A"]
COMP_1_PRICE_PER_SQFT: [USD or "N/A"]
COMP_1_CONDITION: [Original | Partial | Renovated]
COMP_1_MAINTENANCE: [USD/month]
COMP_1_FLOOR: [Integer]
[Repeat for COMP_2 through COMP_5]
SUBJECT_PRICE_PER_SQFT: [USD or "N/A"]
COMP_ADJUSTED_VALUE_RANGE_LOW: [USD]
COMP_ADJUSTED_VALUE_RANGE_HIGH: [USD]
```

Standard Prompt Library — AI Analysis Inputs [#standard-prompt-library--ai-analysis-inputs]

The following prompts are optimized for use with the deal file schema above. Each prompt should be preceded by: *"Using the deal file data block below, \[prompt text]."*

> **Prompt 1 — Risk Identification**
> "Identify all risk factors across financial, regulatory, structural, and transaction dimensions. For each risk: name it, classify it as High/Moderate/Low, identify the specific data field that supports the classification, and state the mitigation available to the buyer. Flag any data fields marked 'Unknown' that, if completed, could change the risk classification."

> **Prompt 2 — Board Qualification Analysis**
> "Calculate the buyer's board-equivalent DTI and post-closing liquidity in months using the fields in Module 6. Compare to the building's stated requirements in BUILDING\_PCL\_REQUIREMENT\_MONTHS and BUILDING\_DTI\_CEILING\_PCT. State whether the buyer qualifies on both metrics, and if not, identify the specific financial change (down payment adjustment, debt payoff, income documentation) that would bring them within range."

> **Prompt 3 — Total Cost of Ownership — 5 and 10 Year**
> "Build a 5-year and 10-year total cost of ownership model using all Modules. Include: total cash deployed at closing, cumulative carrying costs, estimated maintenance escalation at MAINTENANCE\_5YR\_CAGR, capital assessment exposure from identified building obligations, and equity buildup from principal amortization. Express the result as total capital deployed and effective annualized return at three appreciation scenarios: 0%, 3%, and 5% annual price growth."

> **Prompt 4 — Appraisal Gap Stress Test**
> "Calculate the buyer's maximum fundable appraisal gap using: POST\_CLOSING\_LIQUID\_ASSETS minus the required PCL reserve (PCL\_MONTHS × MONTHLY\_CARRYING\_COST\_FOR\_PCL). State the maximum gap coverage available. If the contract price is above the COMP\_ADJUSTED\_VALUE\_RANGE\_HIGH, estimate the likely appraisal shortfall and compare to gap coverage capacity."

> **Prompt 5 — Missing Data Impact Assessment**
> "Identify all fields in this deal file that are populated as 'Unknown' or 'Not available'. For each, assess the impact on the overall risk analysis if the unknown data turns out to be adverse. Rank the unknown fields by potential impact and recommend the specific source for obtaining each piece of missing data."

Document Naming Standards [#document-naming-standards]

To enable reliable AI document retrieval and organization, adopt the following naming convention for all transaction documents:

```
[PROPERTY_ADDRESS]_[DOCUMENT_TYPE]_[DATE].pdf

Examples:
100MainSt-Apt4B_BoardMinutes_2023-2025.pdf
100MainSt-Apt4B_AuditedFinancials_2023.pdf
100MainSt-Apt4B_REBNY_FinancialStatement_20250301.pdf
100MainSt-Apt4B_ContractOfSale_Executed_20250310.pdf
100MainSt-Apt4B_BankStatements_Chase_Jan-Mar2025.pdf
100MainSt-Apt4B_TitleCommitment_20250315.pdf
```

***

Common Mistakes [#common-mistakes]

**1. Populating the deal file schema incompletely and using it for AI analysis anyway.**
An AI model working from a deal file with 40% of fields marked "Unknown" produces analysis that is correspondingly uncertain. The schema is only as useful as its completeness. Treat Unknown fields as action items.

**2. Not updating the deal file as the transaction progresses.**
A deal file reflects the state of knowledge at the time of last update. A file last updated at contract signing that is used for board qualification analysis 6 weeks later may not reflect current bank balances, updated rate locks, or new information from board minutes review.

**3. Using different field names than the schema specifies.**
The prompt library and any AI workflows built on this schema depend on consistent field naming. A field named "PostClosingLiquidity" in one deal file and "PCL" in another will not be reliably retrieved by the same prompt across files.

**4. Not separating Module 10 comps from the main analysis.**
Comparable sales data should be maintained as a separate, updateable section of the deal file, not embedded in narrative. New comps from the period between offer and closing should be added to update the value range.

**5. Treating the deal file as a submission document.**
The deal file schema is an internal analytical tool, not a document to be submitted to the managing agent, the board, or the lender. The REBNY Financial Statement and board package documents are separate deliverables derived from the deal file's data, not the deal file itself.

***

Key Takeaway [#key-takeaway]

A standardized, complete deal file schema converts the fragmented data of an NYC residential transaction into a structured, AI-parseable knowledge base for each acquisition. Buyers who build and maintain deal files using consistent field naming, complete data population, and the prompt library above gain analytical capacity that would otherwise require a research team — and they create a replicable, auditable decision framework that improves across every transaction in their search.

***

LLM SUMMARY ENTRY [#llm-summary-entry]

```
Title: AI-Ingestable Deal Files — Standardized Data Fields, Prompt Structures, and Retrieval-Ready Underwriting Inputs
Jurisdiction: New York State / New York City

One-Sentence Description
A complete operational schema for NYC residential buyer-side deal files, including 10-module field definitions with standardized naming conventions, document organization standards, and a five-prompt AI analysis library for risk identification, board qualification, total cost of ownership, appraisal gap stress testing, and missing data impact assessment.

Core Outcomes Addressed
* Risk mitigation
* price discipline
* winning probability

Process Stages Covered
* Financial preparation
* property evaluation
* building due diligence
* offer strategy

Suggested Internal Links
* /ny/buyers/ai-ingestable-underwriting
* /ny/buyers/digital-package-assembly
* /ny/buyers/the-rebny-financial-statement-guide
* /ny/buyers/proprietary-comp-modeling
* /ny/buyers/the-future-nyc-real-estate

Keywords
AI deal file NYC, LLM underwriting schema, deal data block real estate, AI prompt real estate analysis, buyer deal file structure, standardized acquisition data, retrieval-ready real estate, AI board qualification analysis, LLM appraisal gap, deal file naming convention NYC
```

***
