Data Science Overview | Technologies, Tools, and Roles in the Data-Driven Enterprise (AA-TTDS6000)


Course Description

This foundation-level level course introduces the multi-disciplinary Data Science team to the many evolving and related terms. It includes a focus on Big Data, Data Science, Predictive Analytics, Artificial Intelligence, Data Mining, and Data Warehousing. You'll also explore the current state of the art and science, the major components of a modern data science infrastructure, team roles and responsibilities, and level-setting of possible outcomes for your investment.

This course provides a high-level view of current data science related technologies, concepts, strategies, skillsets, initiatives and supporting tools in common business enterprise practices. This goal of this course is to provide you with a baseline understanding of core concepts.

Course Outline

Foundations

  • Grids and Virtualization
  • Service-Oriented Architecture
  • Enterprise Service Bus
  • Enterprise Message Bus
  • The Cloud

The Hadoop Ecosystem

  • HDFS: Hadoop Distributed File System
  • Resource Negotiators: YARN, Mesos, and Spark; ZooKeeper
  • Hadoop Map/Reduce
  • Spark
  • Hadoop Ecosystem Distributions: Cloudera, Hortonworks, OpenSource

Big Data, NOSQL, and ETL

  • Big Data vs. RDBMS
  • NOSQL: Not Only SQL
  • Relational Databases: Oracle, MariaDB, DB/2, SQL Server, PostGreSQL
  • Key/Value Databases: JBoss Infinispan, Terracotta, Dynamo, Voldemort
  • Columnar Databases: Cassandra, HBase, BigTable
  • Document Databases: MongoDB, CouchDB/CouchBase
  • Graph Databases: Giraph, Neo4J, GraphX
  • Apache Hive
  • Common Data Formats
  • Leveraging SQL and SQL variants

ETL: Exchange, Transform, Load

  • Data Ingestion, Transformation, and Loading
  • Exporting Data
  • Sqoop, Flume, Informatica, and other tools

Enterprise Integration Patterns and Message Busses

  • Enterprise Integration Patterns: Apache Camel and Spring Integration
  • Enterprise Message Busses: Apache Kafka, ActiveMQ, and other tools

Developing in Hadoop Ecosystem

  • Languages: R, Python, Java, Scala, Pig, and BPMN
  • Libraries and Frameworks
  • Development, Testing, and Deployment

Artificial Intelligence and Business Systems

  • Artificial Intelligence: Myths, Legends, and Reality
  • The Math
  • Statistics
  • Probability
  • Clustering Algorithms, Mahout, MLLib, SciKit, and Madlib
  • Business Rule Systems: Drools, JRules, Pegasus

The Team

  • Agile Data Science
  • NOSQL Data Architects and Administrators
  • Developers
  • Grid Administrators
  • Business and Data Analysts
  • Management
  • Evolving your Team
  • Growing your Infrastructure

Course Objectives

Join an engaging learning environment, where you'll explore:

  • Foundations: Grids & Virtualization; SOA, ESB/EMB and the Cloud
  • The Hadoop Ecosystem: HDFS, Resource Navigators, MapReduce, Spark, and Distributions
  • Big Data, NOSQL, and ETL
  • ETL: Exchange, Transform, Load
  • Handling Data and a Survey of Useful tools
  • Enterprise Integration Patterns and Message Busses
  • Developing in Hadoop Ecosystem: R, Python, Java, Scala, Pig, and BPMN
  • Artificial Intelligence and Business Systems
  • WhoĆ­s on the Team? Roles and Functions in Data Science
  • Growing your Infrastructure

This is a seminar-style course that combines engaging expert lectures, pertinent skills, tool demonstrations, and group discussions.

Course Prerequisites

Attendees should have:

  • Exposure to Enterprise Information Technology
  • Familiarity with Relational Databases

Course Information

Length: 1 day

Format: Lecture

Delivery Method: n/a

Max. Capacity: 16



Schedule

Contact Us

UPCOMING COURSES
Date
Geography & Location
Days
Cost
CLC
GTR
Dec 05, 2024 - 1 day(s)
Dec 05, 2024
AMER
Remote-EST
AMER, Remote-EST
1
$995 USD
$995 USD

Do you have more questions? We're delighted to assist you!

1-877-797-2799
info@firefly.cloud

Who Should Attend

Business Analysts, Data Analysts, Data Architects, Database Administrators, Network Administrators (Grid), Developers, Technical Manager, or anyone else in the data science realm who needs to have a baseline understanding of the core areas of modern Data Science technologies, practices, and tools.