Upload
john-furrier
View
1.026
Download
0
Embed Size (px)
Citation preview
The Journey To IoT Systems Of Intelligence:Determined By Combination of Tech and Enterprise Capabilities
Smart Grid
Adjunct Data Warehouse
Customer 360
Real-time loyaltyomni-channelmulti-touchpoint
Predictive model learns from and anticipates consumer in near real-time
Continuously updated predictive models of energy supply, demand tune end-point consumption
Autonomic Systems Management System learns “normal” behavior of apps and infrastructure and flags or fixes anomalies
Data Lake with some production analytics offload from Data Warehouse
Enough internal and external customer data in a pipeline to start predictive modeling
Applications
Foundation Capabilites:Speed, Richness of Analytics
2
Vendor New Services
Telco Manage capacity of towers, cells, switches, connections, devices. Performance dashboards and reports on customer consumption for billing and infrastructure utilization for capacity planning.
Intelligent Service Provider
Real-time updates/integration between individual plans, consumption, and promotions; Real-time integration of individual consumer SLAs and connection / bandwidth allocation in order to support tiered pricing
Use CaseSystems of Record Transition to IoT Systems of Intelligence:
From Telco OSS/BSS to Intelligent Service Provider
Use Case: Bridging Carrier App Billing and Network Operations
Customer- and developer-facing services Billing and settlement• App store and in-app billing via carrier billing• Provisioning app install order on credit verification• Settle developer royalties based on splitsOffers• Offer discount on monthly top-up of bandwidth if user is heavy consumer over time and
approaching monthly limit• Serve app install adds based on user profile Network operations-facing servicesNetwork performance and configuration management• Real-time ingestion of CDRs to create heat map of network performance. This requires
such fast ingest that it would likely be done by streaming products in absence of in-memory DBMS. (this is IoT machine data app example)
Bridging customer-facing and network-facing services• Enrich CDR data with information about customer profitability• Real-time prioritization of bandwidth on a per customer basis when there is high
congestion
Spectrum of Applications: Fast Data vs. Big Data
Fast Data Big Data
Range of “Real-Time” Interactions• REAL RT: high frequency algorithmic
securities trading on one end of the spectrum
• Updates every couple hours: inventory levels accessed by ecommerce, mobile apps at other end of spectrum
Modern SoR makes it easier to get to fastest part of spectrum
Real-Time is a Matter of Degree: Choices Depend on Usage Scenario, Accessibility of Applications That Need to be Integrated – Including Legacy and Modern Systems of Record
GB
TB
PB
Data
Vol
ume
Yr Mo Day Hr Min Sec MS µS
AdvancedAnalytics
Data Velocity
Data Warehouse OLTP,
Operational Intelligence
Big Data: Machine Learning, Predictive Analytics
OLTP
Business Intelligence,Production Reporting
Fast Data: Streaming DataPer Event Decisions
*TRADITIONAL* Analytic Trade-Off:Speed vs. Richness
Traditional Data Warehouse PipelineTime-to-analysis bottlenecked by • Design time: Need to
decide questions before building the analytic pipeline
• Runtime: Batch ETL
DataWarehouse
OLTPApplications
Batch ETL
Ingest: SlowAnalysis: Rich But Slow
Analytic Trade-Off:Speed vs. Richness
Hive ETL
Pig/Sqoop ETL
Hadoop/HDFS
Iterative self-service and incremental database design
Data provisioning
Interactive BI
Production
Reporting
OLTPApplications
Hadoop Data PipelineTime-to-analysis bottlenecked by • Design time: Iterative,
incremental analysis and enrichment
• Runtime: Inherent batch design center
Ingest: SlowAnalysis: Rich But Slow
Analytic Trade-Off:Speed vs. Richness
OLTPApplications
Hadoop/HDFS
Iterative self-service and incremental database design
Interactive BI
Production
Reporting
Hadoop Data Pipeline with Streaming IngestTime-to-analysis bottlenecked by • Design time: Still need
iterative, incremental analysis and enrichment
• Runtime: real-time ingest but data still needs to be stored before rich analytics
Streaming Ingest: FastAnalysis but Limited
Hadoop Cluster
Analysis: Rich but Slow
Stream Processor
BOTTLENECK: DBMS Storage *Before* Rich Analysis
Analytic Trade-Off:Speed vs. Richness
Hadoop Cluster
Integrated Streaming and Persistence: Real-Time, Rich Analysis
StoreE-Mail
Social Media
Operational apps
Customer interactions
Customer“Breadcrumbs”
Predictions,Recommendations
ImprovingPredictions(Machine Learning)
Operational Data
IoT – Devices, MachinesMachine
Data
Stream Processor
Better Integration of Real-Time and Batch:Analytic Trade-Off Between Speed vs. Richness Diminishes
GB
TB
PB
Data
Vol
ume
Yr Mo Day Hr Min Sec MS µS
AdvancedAnalytics
Data Velocity
Big *AND* Fast Data: Machine Learning onHistorical AND Recent DataDrives Per Event Decisions
OLTP
Better Integration of Real-Time and Batch:Analytic Trade-Off Between Speed vs. Richness Diminishes
GB
TB
PB
Batc
h Pr
oces
sing
Min Sec MS µS
Streaming - Velocity
Big Data Maximum throughput of dataExploratory analysis of historical data
Fast DataFastest speed to make a decision on each event
Streaming is Newest Religious War: Use It For *All* Analytic Workloads? Processing Lots of Data vs. Analyzing Each Event = Inherent Conflict
“Streams can do it all” school: Big Data Apps are Just Fast Data Apps Scaled-Out• If it can handle fast data, just scale it out to handle big
data• Big win: only one application needed
Wikibon recommendation (elaborated on next slide):Streaming and batch *will always* coexist• Even batch programs on streaming platform will still
have different application logic…• High volume machine learning vs. incremental update• Historical performance analysis vs. looking up a profile
Latency(Higher is Slower)
Even When Streaming Engines Support More Sophisticated Analytic WorkloadsThe Applications Are Likely to Differ Between Event-at-a-Time vs. Batch
Analytic Sophistication
Basic Streaming
SQL
Machine Learning
What HappenedCounting
What HappenedExploration, OLAP or Dashboard
Anticipate or Act AutomaticallyPrediction or Prescription
IMPLICATION: Converging on one application engine not critical
Stream processors: Spark, Flink, InfoStreams, Samza, DataTorrent, (DB): VoltDB / MemSQL
Hist
oric
al a
naly
sis
Batc
h-or
ient
edPe
r Eve
nt-O
rient
ed
Profi
le lo
okup
Expl
ore
larg
e, n
ew
data
Incr
emen
tal m
odel
up
date
YARN – Cluster Resource Management
HDFS or operational database
StreamingStorm, Flink,Samza, Data Torrent
SQLImpala, Drill, Hive, HAWQ…
Machine LearningMahout…
Key Takeaway: Coexistence of Batch and Streaming Means One Application Engine Doesn’t Have to Rule All - Spark and Hadoop Can Live Together
Pro: Mix and match pipeline comprised of specialized processing *optimized* for each workloadCon: Batch-only - hand-off between processing engines via storage is slow. Each processing engine is standalone and can’t leverage the others’ functionality
Pro: Fast and simple - pipeline comprised of one in-memory engine with streaming, SQL, machine learning, graph personalities (libraries)
Con: still immature – performance an issue; haven’t fully delivered integration – But Tungsten per boost, IBM projects could add huge new value
Spark Core
Spark MLlib
Spark Streaming
Machine Learning
Spark SQL: Join, filter, aggregate
Streaming Ingest
Spark SQL
HDFS or operational database
YARN or Mesos or other Workload Mgr