databases, whereas Hadoop uses MapReduce to run jobs over its
Hadoop Distributed File System (HDFS). You can transfer data between
Hadoop and SQL Server using the Microsoft SQL Server Connector for
Apache Hadoop. Microsoft’s upcoming PolyBase technology will provide a bridge to Big Data from SQL Server. PolyBase will initially be
released with the Parallel Data Warehouse (PDW). PolyBase allows SQL
Server T-SQL queries to run against data stored in a Hadoop cluster; the
data is returned as standard SQL results. PolyBase also permits queries
to reference data stored in the HDFS as if the data were in a relational
table. HDFS has the additional ability to perform joins between tables
in the PDW and data in the HDFS.
Although Big Data is defined by large volumes of data and high processing power, getting started with Big Data doesn’t always require
a huge investment in additional infrastructure. You don’t necessarily
have to go out and buy a lot of new servers and new storage. Cloud
services such as Windows Azure HDInsight enable your organization
to implement Hadoop clusters in the cloud, allowing you to pay for
the storage and processing power you need, without the need to buy
additional hardware. Windows Azure HDInsight provides a good way
to gain experience with Big Data without spending a lot of cash.
There’s no doubt that like BI, Big Data is a technology that’s not going
away. However, Big Data isn’t a replacement for relational databases.
Relational databases will continue to support an organization’s core
mission-critical applications. Like BI, Big Data will open up business
insights, enabling organizations to make better business decisions. ■
Big Data is a
technology that’s
not going away.
www.sqlmag.com
sql server Pro / may 2013