The Data Platform Underneath Marketproof
Marketproof, in addition to being a website, is a software platform. It is connected to a variety of New York City real estate data sources and configured with relevant business rules to automatically aggregate, cleanse and enrich data to present a complete and accurate view of the city's real estate market.
The system was built for New York and specializes in processing and organizing residential data for a high-velocity urban market where properties are often irregular and vertical, and valuation multidimensional.
In New York, real estate can differ greatly even on the same block. A pre-war co-op next to a new development condo in Manhattan may have very different building amenities, features and property values. In these areas, the real estate market is often volatile, and information about the market can come from a diverse range of sources, from government agencies to real estate brokers to private data vendors. What's more, because New York does not have a unified MLS, listings information is disparate, nonstandard and often incomplete. For these reasons, Marketproof organizes real estate data at the building and unit level in addition to lot, block, zip code, neighborhood, borough and city levels to accurately depict the real world.
Data Acquisition
Data acquisition is the process of ingesting data from external sources into the Marketproof system for processing. Marketproof acquires data from many different sources, which include:
NYC Department of Buildings
NYC Department of City Planning
NYC Department of Education
NYC Department of Finance
REBNY's Residential Listing System
Landmarks Preservation Commission
Metropolitan Transit Authority (MTA)
New York State Attorney General (AG)
New York State Department of State (DOS)
FEMA
The number of sources and amount of data acquired by Marketproof is continually growing and evolving.
Data Normalization
Data normalization is the process of aggregating, merging and organizing data from a diverse set of sources into a single coherent view.
One primary example of this is street address normalization.
Street address normalization de-duplicates and standardizes street level addresses from various sources. Different sources often refer to the same physical building, house or complex by different spellings of street number, direction, street name and suffix. For example, different sources refers to “East” as “E” and “Street” as “ST”.
In certain cases, a single building is referred to by multiple (alternate) addresses or entrances. These buildings are often commonly known by one address but designated by another address in different data sources. For example, the Dakota, one of New York's most well known buildings, is designated by the Department of Buildings with a range of address, 119-127 Central Park West (odd side of street), 1-13 West 72nd St (odd side of street) and 2-12 West 73rd St (even side of street). Many government data sources reference this building by 121 Central Park West while it is commonly known by 1 West 72nd St.
The street address normalization process allows Marketproof to succinctly attribute physical characteristics and events to the same real world building, house or complex. While this is just one example of how Marketproof normalizes data, it demonstrates the kind of processes running in the background in order to output the highest quality data possible.
Data Enrichment
Data enrichment is the process that knits together data in meaningful ways so as to derive additional value from it. More complex and multi-step business rules are introduced to further improve the accuracy and completeness of the data such as matching listings with recorded sales to understand the full cycle of the transaction. Analytical functions are performed during this process to create derived data points such as days on market and price per square foot. Establishing relationships and interconnection between data in this way is perhaps the most powerful and valuable thing the Marketproof platform does.
Marketproof's Output
The output of Marketproof, after data has been ingested, normalized and enriched, is separated into two groups, core and supporting data. Core data refer to the physical attributes and real world events that directly impact the property. Supporting data refers to the trends, boundaries, lifestyle, transportation and other factors that provide additional information about the real estate market.
The compete data set, which is constantly evolving and expanding, offers thousands of data points and millions of records about New York City’s properties, buildings and real estate market.
Data Quality
A key component of Marketproof is its flexibility to be continuously configured and tuned to improve data quality. As such, Marketproof's data sets are constantly monitored for quality improvement through a combination of automated Quality Assurance tools and human analysts who have New York City real estate domain expertise.
Marketproof defines data quality by the following dimensions.
Accuracy - A measure of the degree of correctness of data values compared to the real world objects represented.
Non-Duplication - The degree to which there is a one-to-one correlation between records and the real world objects or events represented.
Completeness - The characteristic of having all available fields and values for each attribute within a data set.
Coverage - The measure of the total number of records compared to all possible real word objects or events represented.
Timeliness - The relative availability of data within the timetable required for the data to be meaningful.
In the end, Marketproof, the software platform, works its magic in the background so that Marketproof the website can provide a rich, comprehensive, real-time experience of New York City and its real estate market.