Hands On Big Data: Getting Started With NoSQL And Hadoop
DATE: Thursday the 26th of March.
What is Big Data? A Gartner analyst has characterized Big Data as “data that’s an order of magnitude greater than data you’re accustomed to”. The workshop introduces the topic of Big Data by providing a practical knowledge of the tools and techniques most commonly used to handle them.
LANGUAGE
Italian
LEVEL
Medium
DURATION
The workshop is full-day (8 hours) from 9:00 to 18:00, with one hour lunch break.
LOCATION
c/o Polo Didattico | Piazza Oderico da Pordenone – 00145 – Rome
CHECK IN: 8:30 – 9:00
PRICES:
Super Early Bird: 105 €,from the 28 th of January to the 17 th of February;
Early Bird: 125 €,from the 18 th of February to the 4 th of March;
Full: 145 €, from the 5th of March to the end of the sales.
MARIO CARTIA
Mario Cartia is Chief Technology Officer of an italian company market leader in the field of software for school of every grade. He has more than 15 years of experience with enterprise architectures. In the past he has carried out training activities at various multinational companies on issues such as distributed architectures, performance tuning, system scalability and information security. He is also founder and board member of various community related to the opensource ecosystem. He is a Red Hat Certified Professional too.
ABSTRACT
What is Big Data? Gartner analyst Doug Laney has characterized Big Data as “data that’s an order of magnitude greater than data you’re accustomed to”. IBM’s chief executive, Virginia Rometty estimates that “there will be 5,200 gigabytes of data for every human on the planet by 2020”. The workshop introduces the topic of Big Data by providing a practical knowledge of the tools and techniques most commonly used to handle them. The workshop will include four “hands-on” labs focused on the use of the Hadoop framework.
TABLE OF CONTENTS
– What is Big Data?
– The strategic relevance of Big Data within the context of social networks, the “Internet of Things” and in the market of wearable devices
– Other case histories: Using Big Data in scientific research
– Architectural prototype of a “Big Data oriented” system
– Introduction to NoSQL database
– Analysis of various NoSQL databases available on the market. Which one to choose according to your needs?
– Lab1: setup and use of a document-oriented database (MongoDB)
– Introduction to Apache Hadoop framework
– Hadoop: architecture and modules
– The Hadoop Common module
– HDFS: a distributed filesystem for quick access to large amounts of data
– YARN framework for job scheduling and management of cluster resources
– Importing data using Sqoop
– Making queries using Hive and Pig
– Lab2: Hadoop setup
– Lab3: Importing data into Hadoop from a MySQL database using Sqoop
– Lab4: Processing data stored on Hadoop and export into NoSQL database created during lab1
– Introduction to Machine Learning: Apache Mahout and Prediction.io
– Data Visualization examples using D3.js
TRAINING OBJECTIVES
Understanding the typical architecture of a Big Data system; setup, configure and use Hadoop; import data from SQL database into Hadoop. Writing simple mapreduce jobs using java. Using Hive and Pig. Connetting Hadoop with MongoDB. Getting started with Apache Mahout. Effectively communicate the results of processing using the techniques of data visualization.
WHO THE WORKSHOP IS DEDICATED TO?
CIO, CTO, DevOps, SysAdmin, DBA, Developers.
PREREQUISITES NEEDED FROM ATTENDEES
Basic knowledge of *nix operating systems, relational databases, distributed systems and scalable architectures.
HARDWARE AND SOFTWARE REQUIREMENTS
Operating system capable of hosting VirtualBox VMs.
The workshop will be held only if the minimun number of attendees is reached