覓語 spark 螳 危襴貅伎 伎 願 .
windows10 ろ.

c:\data\test.txt れ螻 螳 一危郁 .
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
b
c
d
e

れ 豌 譴 暑 spark 危襴貅伎企.

# -*- coding: utf-8 -*-
"""
Created on Sat Mar 30 21:19:54 2019

@author: dwa
"""

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)

lines = sc.textFile("c:\\data\\test.txt")
print(lines.first())

cmd れ螻 螳 覈麹.
cd C:\Spark\spark-2.3.3-bin-hadoop2.7\bin
spark-submit.cmd c:\py\my_script.py

蟆郁骸(譴螳 aaaaa....aaaa 手 print蟆 覲伎 蟆企.
C:\Spark\spark-2.3.3-bin-hadoop2.7\bin>spark-submit.cmd c:\py\my_script.py
2019-03-30 21:42:36 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-03-30 21:42:36 INFO  SparkContext:54 - Running Spark version 2.3.3
2019-03-30 21:42:37 INFO  SparkContext:54 - Submitted application: My App
2019-03-30 21:42:37 INFO  SecurityManager:54 - Changing view acls to: dwa
2019-03-30 21:42:37 INFO  SecurityManager:54 - Changing modify acls to: dwa
2019-03-30 21:42:37 INFO  SecurityManager:54 - Changing view acls groups to:
2019-03-30 21:42:37 INFO  SecurityManager:54 - Changing modify acls groups to:
2019-03-30 21:42:37 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(dwa); groups with view permissions: Set(); users  with modify permissions: Set(dwa); groups with modify permissions: Set()
2019-03-30 21:42:37 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 2952.
2019-03-30 21:42:37 INFO  SparkEnv:54 - Registering MapOutputTracker
2019-03-30 21:42:37 INFO  SparkEnv:54 - Registering BlockManagerMaster
2019-03-30 21:42:37 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2019-03-30 21:42:37 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2019-03-30 21:42:37 INFO  DiskBlockManager:54 - Created local directory at C:\Users\dwa\AppData\Local\Temp\blockmgr-4a04629e-c632-4b4d-8bb7-c31b06d78fe1
2019-03-30 21:42:37 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2019-03-30 21:42:37 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2019-03-30 21:42:37 INFO  log:192 - Logging initialized @2777ms
2019-03-30 21:42:37 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2019-03-30 21:42:37 INFO  Server:419 - Started @2880ms
2019-03-30 21:42:37 INFO  AbstractConnector:278 - Started ServerConnector@27668d0a{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-03-30 21:42:37 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@495ee5f9{/jobs,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@506ffd12{/jobs/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2b2a6a7c{/jobs/job,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@23ffb9e0{/jobs/job/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c5757e0{/stages,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7456a59c{/stages/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6b85d257{/stages/stage,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@428ed0db{/stages/stage/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@75ca5b7{/stages/pool,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@686b5b94{/stages/pool/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@44895722{/storage,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@70b54f60{/storage/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@12a63ddc{/storage/rdd,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@20a363e7{/storage/rdd/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2e1d1096{/environment,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7489847a{/environment/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@57620fea{/executors,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6ce5e2ba{/executors/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2a3ad610{/executors/threadDump,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@12282d79{/executors/threadDump/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@55727d89{/static,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@167f3a8f{/,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@ccc2aef{/api,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6fbbdfb4{/jobs/job/kill,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@10fcc530{/stages/stage/kill,null,AVAILABLE,@Spark}
2019-03-30 21:42:37 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://192.168.1.197:4040
2019-03-30 21:42:38 INFO  SparkContext:54 - Added file file:/c:/py/my_script.py at file:/c:/py/my_script.py with timestamp 1553949758351
2019-03-30 21:42:38 INFO  Utils:54 - Copying c:\py\my_script.py to C:\Users\dwa\AppData\Local\Temp\spark-84b82922-9621-413a-9939-2f8fb60831b1\userFiles-a1eb4f46-ecf3-4f10-8137-0f797fc66494\my_script.py
2019-03-30 21:42:38 INFO  Executor:54 - Starting executor ID driver on host localhost
2019-03-30 21:42:38 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 2961.
2019-03-30 21:42:38 INFO  NettyBlockTransferService:54 - Server created on 192.168.1.197:2961
2019-03-30 21:42:38 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2019-03-30 21:42:38 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, 192.168.1.197, 2961, None)
2019-03-30 21:42:38 INFO  BlockManagerMasterEndpoint:54 - Registering block manager 192.168.1.197:2961 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.197, 2961, None)
2019-03-30 21:42:38 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, 192.168.1.197, 2961, None)
2019-03-30 21:42:38 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, 192.168.1.197, 2961, None)
2019-03-30 21:42:38 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@641f8eb7{/metrics/json,null,AVAILABLE,@Spark}
2019-03-30 21:42:39 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 236.7 KB, free 366.1 MB)
2019-03-30 21:42:39 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 366.0 MB)
2019-03-30 21:42:39 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 192.168.1.197:2961 (size: 22.9 KB, free: 366.3 MB)
2019-03-30 21:42:39 INFO  SparkContext:54 - Created broadcast 0 from textFile at <unknown>:0
2019-03-30 21:42:39 INFO  FileInputFormat:249 - Total input paths to process : 1
2019-03-30 21:42:39 INFO  SparkContext:54 - Starting job: runJob at PythonRDD.scala:152
2019-03-30 21:42:39 INFO  DAGScheduler:54 - Got job 0 (runJob at PythonRDD.scala:152) with 1 output partitions
2019-03-30 21:42:39 INFO  DAGScheduler:54 - Final stage: ResultStage 0 (runJob at PythonRDD.scala:152)
2019-03-30 21:42:39 INFO  DAGScheduler:54 - Parents of final stage: List()
2019-03-30 21:42:39 INFO  DAGScheduler:54 - Missing parents: List()
2019-03-30 21:42:39 INFO  DAGScheduler:54 - Submitting ResultStage 0 (PythonRDD[2] at RDD at PythonRDD.scala:52), which has no missing parents
2019-03-30 21:42:39 INFO  MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 5.8 KB, free 366.0 MB)
2019-03-30 21:42:39 INFO  MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.7 KB, free 366.0 MB)
2019-03-30 21:42:39 INFO  BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on 192.168.1.197:2961 (size: 3.7 KB, free: 366.3 MB)
2019-03-30 21:42:39 INFO  SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1039
2019-03-30 21:42:39 INFO  DAGScheduler:54 - Submitting 1 missing tasks from ResultStage 0 (PythonRDD[2] at RDD at PythonRDD.scala:52) (first 15 tasks are for partitions Vector(0))
2019-03-30 21:42:39 INFO  TaskSchedulerImpl:54 - Adding task set 0.0 with 1 tasks
2019-03-30 21:42:39 INFO  TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7869 bytes)
2019-03-30 21:42:39 INFO  Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2019-03-30 21:42:39 INFO  Executor:54 - Fetching file:/c:/py/my_script.py with timestamp 1553949758351
2019-03-30 21:42:39 INFO  Utils:54 - c:\py\my_script.py has been previously copied to C:\Users\dwa\AppData\Local\Temp\spark-84b82922-9621-413a-9939-2f8fb60831b1\userFiles-a1eb4f46-ecf3-4f10-8137-0f797fc66494\my_script.py
2019-03-30 21:42:40 INFO  HadoopRDD:54 - Input split: file:/c:/data/test.txt:0+145
2019-03-30 21:42:40 INFO  PythonRunner:54 - Times: total = 432, boot = 421, init = 11, finish = 0
2019-03-30 21:42:40 INFO  Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 1642 bytes result sent to driver
2019-03-30 21:42:40 INFO  TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 766 ms on localhost (executor driver) (1/1)
2019-03-30 21:42:40 INFO  TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2019-03-30 21:42:40 INFO  PythonAccumulatorV2:54 - Connected to AccumulatorServer at host: 127.0.0.1 port: 2962
2019-03-30 21:42:40 INFO  DAGScheduler:54 - ResultStage 0 (runJob at PythonRDD.scala:152) finished in 0.885 s
2019-03-30 21:42:40 INFO  DAGScheduler:54 - Job 0 finished: runJob at PythonRDD.scala:152, took 0.950622 s
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
2019-03-30 21:42:40 INFO  SparkContext:54 - Invoking stop() from shutdown hook
2019-03-30 21:42:40 INFO  AbstractConnector:318 - Stopped Spark@27668d0a{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-03-30 21:42:40 INFO  SparkUI:54 - Stopped Spark web UI at http://192.168.1.197:4040
2019-03-30 21:42:40 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2019-03-30 21:42:40 INFO  MemoryStore:54 - MemoryStore cleared
2019-03-30 21:42:40 INFO  BlockManager:54 - BlockManager stopped
2019-03-30 21:42:40 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2019-03-30 21:42:40 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2019-03-30 21:42:40 INFO  SparkContext:54 - Successfully stopped SparkContext
2019-03-30 21:42:40 INFO  ShutdownHookManager:54 - Shutdown hook called
2019-03-30 21:42:40 INFO  ShutdownHookManager:54 - Deleting directory C:\Users\dwa\AppData\Local\Temp\spark-1248e6a0-4008-4d9c-9c27-12dc8717c727
2019-03-30 21:42:40 INFO  ShutdownHookManager:54 - Deleting directory C:\Users\dwa\AppData\Local\Temp\spark-84b82922-9621-413a-9939-2f8fb60831b1
2019-03-30 21:42:40 INFO  ShutdownHookManager:54 - Deleting directory C:\Users\dwa\AppData\Local\Temp\spark-84b82922-9621-413a-9939-2f8fb60831b1\pyspark-35878223-740a-4a92-a1f5-ed490499cdf5