Using Hadoop and Hive to Introduce Big Data Solutions in a Classroom Environment
Gihanthi De Silva Minnesota State University, Mankato
Sarah L. Klammer Kruse
Minnesota State University, Mankato
Cyrus Azarbod Minnesota State University, Mankato
Abstract Data is valuable when it is ordered, stored, computed and illustrated in ways that benefit organizations or society. Large volumes of data are generated across many industries using various data collection techniques such as crowd sourcing. Banking, healthcare, power companies, transportation systems, the stock exchange, and social networking sites capture large volumes of data every day (Kumari, 2016). Data analysis, data mining, and predictive analysis are carried out by organizations to observe trends, gain clarity of their status quo, and predict future outcomes. Several pioneering companies are already using big data to create business value. An institution that can obtain a solution for ordering and analyzing data the earliest, with the least cost, will gain the competitive edge among its peer firms within the industry (Manyika et al., 2011). However, the exponential growth of data has its challenges. Storage, extraction, and analysis of data have become costly and time consuming. A new generation of scientists who possess techniques for working with large data sets is needed and in great demand, yet data scientists are in short supply (McAfee & Brynjolfsson 2012). Relevant academic training is necessary to prepare students to enter a career using Big Data. Hadoop is one tool that can be introduced to university students and provide a foundation for working with vast data resources. The purpose of this research was to explore installation of Hadoop, in a virtual server environment, in order to give students a simulated setting to gain understanding and practice with the components of Hadoop technology. Techniques for introducing tools to apply big data concepts in the classroom environment are proposed, including a Hadoop and Hive installation guide for the Linux platform. This paper will also demonstrate examples of computations that can be made using Hive in order to manipulate data stored in Hadoop using Hibernate Query Language (HQL).
Manyika J., Chui M., Brown B., Bughin J., Dobbs R., Roxburgh C., Hung Byers A., (2011). Big data: The next frontier for innovation, competition, and productivity.
Kumari, S. (2016). Impact of big data and social media on society. ResearchGate.
McAfee, A. and Brynjolfsson, E. (2012). Big Data: The management revolution. Harvard Business Review.
Recommended Citation: De Silva, G., Klammer Kruse, S. L., Azarbod, C., (2017).
Using Hadoop and Hive to Introduce Big Data Solutions in a Classroom Environment.
Proceedings of the EDSIG Conference, (2017) n.4468, Austin, Texas