PolyBase vs. Linked Servers … which wins? Recently we have had several clients reach out to us about a data archival solution. Some of their databases are in the TB range, with data going back 7-10 years or even more. Keeping this data in the database is costly due to disk space needs, and query performance can be affected due to the amount of data being queried as well as index sizes, etc.
Several of our clients have looked at a range of different solutions, such as copying “old” data to a different report or read-only server and hooking up a Linked Server through which to query the older data when needed. Another option is creating a “history” table in the same database and moving the older data to that table.
In both cases, this doesn’t solve the disk space issue, and it is still a costly storage solution. With the first scenario, I’m sure we have all been a little frustrated at the performance of Linked Servers at one time or another. In all cases, we had a requirements-gathering session with each client to determine several items, including:
- Does the archive data need to be queried, and if so, how often?
- How much data needs to be archived?
- Does referential integrity need to be kept between current and archive data?
- Is cloud storage an option?
There are other questions that we could ask, but these are the most critical because it establishes a baseline and determines the technology we can work with. In some cases, cloud storage is not an option, so we needed to stick with an on-premises solution. Where cloud storage was an option, that opened things up and we were able to look at and utilize some great technologies.
Now, where cloud storage wasn’t an option, there were still some things that we could look at that are strong, viable options, thus the topic of this blog. With SQL Server 2016, Microsoft added the ability to use PolyBase to query other SQL Server instances. This is a game changer, as PolyBase is very powerful. We at SSG set out to determine which is faster and more efficient, PolyBase or Linked Servers, when querying one SQL Server instance from another SQL Server instance.
Honestly, what we found was very surprising. Extremely surprising. Granted, all our tests were performed using SQL Server 2019 and SQL Server 2022, but even then, what we found was quite shocking.
I could walk through everything here, which would make for an extremely long blog, so I did a webinar on this exact topic with a lot of demos, discussion, and insights, and you can view it here. Let us know what you think as we always like to hear from you.