Real Estate Investment and Development: Data Visualization as a tool

Python and Geospatial tools create powerful and compelling data visualizations that supports Real Estate Analysis

Keith Tan
7 min readNov 3, 2020
Photo by Franki Chamaki on Unsplash

Data visualization is a staple — a data preprocessing step — for data scientists developing machine models, and a necessity for business stakeholders with the desire to learn more about their business space.

Unfortunately, many industries today still rely on excel charting, either due to legacy technology or are unwitting of the benefits of powerful Python based visualizations (using Altair, Plotly, Matplotlib and Seaborn libraries) and other related softwares (e.g. tableau, geospatial and charting visualization).

The Real Estate industry is not indifferent to this sentiment.

In this post, we seek to explore a subset of the wide range of tools available for such visualization tasks, with 3 application genres in the real estate space. The dataset used is focused on transaction level data in Singapore’s private residential industry i.e. data containing known purchase price with stated property characteristics.

Point- And Aggregate- Level Visualizations

To start off, a simple bar chart with an ascending color tone can be a simple yet effective tool to discover the relative ranking between objects. Take the following figure, where we explore the top and bottom 10 private residential projects in Singapore, by $ Per Square Foot.

Fig 1. Barchart on 10 most and least expensive private residential projects in Singapore. Image by author.

With a given dataset, you can also decide the order of ranking, how many projects to display, and even the ranking metric (e.g. by average floor area, average age).

To understand the price rankings, we can follow-up to use Python Seaborn’s scatter plot to chart the transacted prices across multiple projects, and add a time dimension for supplementary information.

Fig 2. Scatterplot on transacted prices for 3 chosen projects in Singapore. Image by author.

From the above figure, we can immediately establish 3 observations:

  1. The projects are aged with a wide time frame — Melville Park saw transactions starting pre-2000, while South Beach Residences was constructed in the mid-2010s.
  2. There is a general increase in project prices across time — Costa Del Sol and Melville Park saw a lift in their price lower bound.
  3. The price deviations vary significantly across different projects — South Beach Residences had a wider range of transacted price compared to the other two projects, despite its shorter transaction period.

Any — or a combination — of the 3 observations would be useful for prospective investors or individuals looking to purchase a home. Again, a different metric can be used in the scatter plot e.g. building age, distance to central business district, subway score.

If however, you already have projects that you’re considering, but you’re unsure how it stacks up against others, a radar chart (a.k.a. spider chart) can be a highly effective tool for comparison of key metrics.

Fig 3. Radar chart for project attribute comparison. Image by author.

Geospatial Visualization

Geospatial analysis is arguably the most widely used form of visualization when prospecting a property or land parcel.

A compelling and commonly used geospatial analysis tool, Quantum GIS, allows users to plot points on a map, calculate the straight line and commute distance between these points, and explore the surrounding points of interest.

Wait, it sounds like Google Maps.

Except, it is more than that.

Investors and developers use Quantum GIS, or an equivalent, to survey the exterior and neighboring characteristics of an asset — from 3D surveillance to observing the demographic changes across neighborhood or state lines.

Doing so helps determine the value-add (or otherwise) of the presence of a nearby facility, or how an increase in population in a city can boost property prices over time.

Fig 4. Buffer arcs to segment Singapore’s different Central Region zones (silhouette). Image by author.

In Figure 4, we used Python’s Shapely library to create a silhouette of the map of Singapore, overlapped with distance buffer arcs of 4km each i.e. the red ring represents a 4km radius from the Core Central Region (CCR) of Singapore, the yellow region represents the Rest of Central Region (RCR; an 8km radius minus the CCR), and the orange represents the rest of Singapore, labelled as Outside Core Region (OCR).

We can also import a Python library to overlap against the actual map of Singapore, and quickly determine which neighbourhoods would fall under each segmentation.

Fig 5. Buffer arcs to segment Singapore’s different Central Region zones (map). Image by author.

Within each arc, we can visualize the relative property price changes between 2012 to 2019. In Figure 6, from the legend on the right, we can see that RCR has performed the best amongst the 3, at an approximately 5% increase; OCR performed poorly, with a negative capital appreciation.

Fig 6. Residential property capital appreciation across OCR, RCR, CCR between 2012 to 2019, SIngapore. Image by author.

Singapore can also be split geographically across district lines (displayed by numbers in the circle); a cross-district capital appreciation for residential property between 2009 and 2014 can be visualized in Figure 7, where the darker colors represent stronger appreciation.

Fig 7. Residential property capital appreciation across districts between 2009 to 2014, SIngapore. Image by author.

The same cross-district plot can be used to determine changes in population, median income, and unemployment levels across time and geography, and how it will potentially affect property values in a region.

Lastly, to supplement any analysis, a plot of the distance between buildings in Singapore to the CCR can also be overlayed. *Note that the legend on the right is in meters.

Fig 8. Distance of buildings from CCR, Singapore. Image by author.

Time Series Visualizations

Fig 9. Residential Property Price Index for selected districts, Singapore, 2008–2019. Image by author.

Another dimension of focus for real estate is time. Understanding how a property, or a district has performed over time is necessary for a forecast of its returns.

Figure 9 is a chart of residential property price indices in Singapore between the period 2008 and 2019. It represents the historical price performance of 4 different districts. The indices start at a base index = 100 in Jan 2010.

Right off the bat, we can observe that residential assets in D20 (blue) has outperformed D10 (red), even though it started at a lower index pre- Jan 2010.

During construction of the indices, the level of granularity can also be altered for different geographical zooms and property characteristics e.g. an index for Singapore as a whole, or down to a specific street.

Additional features such as bedroom type can also be considered i.e. an index for 4 bedroom apartments, in district 10, between 2008 and 2019.

Beyond That, What Else?

The above-mentioned techniques are really just the tip of the toolset iceberg for data visualization. Through Python, we can also create interactive plots using Altair or Plotly libraries, embed into websites or integrate into a dashboard, and update in real-time.

Fig 10. Gif of an interactive chart using Plotly, on house prices in Berlin. Image from Elizabeth Ter.

Within the same vein, check out the following chart on the 2014 population in US, by cities. With the size of each bubble representing the relative population, hovering over a specific bubble can provide additional information.

Fig 11. Interactive chart on 2014 US population, by city. Image from Plotly.

Or if you require something ‘sophisticated looking’, and will allow you to display a wide range of data in a single visualization, try a Bubble Chart.

In the following chart, the size of the bubble represents the size of real estate investment flow into the country, while the intensity of the color depicts GDP growth in the past 3 years.

Fig 12. Bubble Chart on real estate investment flow and GDP. Image from JLL

In establishing visualizations — with the risk of being cliche — the only limit is the user’s imagination; interactive plots, multi-dimensional and comparison plots, you name it. There are a myriad of tools available for real estate investors and developers alike to harness for data-driven decision making.

Great! Where can I start learning these?

PropertyQuants is currently running both a 3-hr live and code along workshop to share the techniques used to create the powerful and compelling visualizations above, and more. There is also a full 12-weeks data science course where anyone can learn machine learning techniques that are applied to the real estate industry.

Visit their website (www.propertyquants.com) or email training@propertyquants.com for more information! Alternatively, head directly to their course or workshop page to register for the upcoming runs.

Psst! Don’t forget to quote “EARLYBIRD20” at the workshop checkout for a 20% discount.

References:

  1. How to create a plotly visualization and embed it on websites, TowardsDataScience: https://towardsdatascience.com/how-to-create-a-plotly-visualization-and-embed-it-on-websites-517c1a78568b
  2. Bubble Maps in Python, United States Bubble Map, Plotly: https://plotly.com/python/bubble-maps/
  3. Economic and Real Estate market performance for 300 cities globally, JLL: https://www.us.jll.com/en/trends-and-insights/research/global/crc/city-clustering-tool

--

--

Keith Tan

Keith is a Data Scientist at PropertyQuants, building time-series based machine- and deep- learning models for real estate valuation. He loves handling data.