12.4 如何蒐集資料?
網址進入moivelens 網站:
http://grouplens.org/datasets/movielens/Step1. 下載ml-100k 資料
mkdir -p ~/pythonwork/PythonProject/data cd ~/pythonwork/PythonProject/data wget http://files.grouplens.org/datasets/movielens/ml-100k.zipStep2. 解壓縮ml-100k 資料
unzip -j ml-100kStep5. 啟動hadoop cluster
start-all.shStep6. 複製ml-100k 檔案至HDFS
hadoop fs -mkdir /user/hduser/data hadoop fs -copyFromLocal -f ~/pythonwork/PythonProject/data /user/hduser/ hadoop fs -ls /user/hduser/data12.5 啟動IPython Note Book
Step1. 執行eclipse 程式
輸入工作路徑
start-all.sh cd ~/pythonwork/ipynotebook執行IPython Notebook在hadoop yarn-client模式
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client
請參考本書附錄APPENDIX A 本書範例程式下載與安裝說明 ,A.2 開啟本書iPython Note Book範例程式ch12.ipynb 範例檔案 。
12.10 建立Recommend推薦系統
請參考本書附錄APPENDIX A 本書範例程式下載與安裝說明 ,A.3 開啟eclipse PythonProject範例程式 : RecommendTrain.py,Recommend.py
此圖出自Spark官網 https://spark.apache.org/
以上內容節錄自這本書,很適合Python程式設計師學習Spark機器學習與大數據架構,點選下列連結查看本書詳細介紹:
Python+Spark 2.0+Hadoop機器學習與大數據分析實戰
http://pythonsparkhadoop.blogspot.tw/2016/10/pythonspark-20hadoop.html
博客來網路書店: http://www.books.com.tw/products/0010730134?loc=P_007_090
天瓏網路書店: https://www.tenlong.com.tw/items/9864341537?item_id=1023658
露天拍賣:http://goods.ruten.com.tw/item/show?21640846068139
蝦皮拍賣:https://goo.gl/IEx13P
老師您好:
回覆刪除在用Eclipse執行RecommendTrain.py時,發生下列錯誤,可以幫忙指點一下嗎??感謝您!
17/09/26 10:00:50 ERROR SparkContext: Error initializing SparkContext.
java.net.ConnectException: Call From master/192.168.56.100 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: 連線被拒絕; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1473)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:217)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:206)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:214)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:157)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.(SparkContext.scala:500)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: 連線被拒絕
回覆刪除at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:706)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:369)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1522)
at org.apache.hadoop.ipc.Client.call(Client.java:1439)
... 28 more
17/09/26 10:00:50 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
17/09/26 10:00:51 WARN MetricsSystem: Stopping a MetricsSystem that is not running
Traceback (most recent call last):
File "/home/hduser/pythonwork/PythonProject/RecommendTrain.py", line 53, in
sc=CreateSparkContext()
File "/home/hduser/pythonwork/PythonProject/RecommendTrain.py", line 23, in CreateSparkContext
sc = SparkContext(conf = sparkConf)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/context.py", line 168, in _do_init
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/context.py", line 233, in _initialize_context
File "/usr/local/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 1183, in __call__
File "/usr/local/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError
您好:
回覆刪除我在執行Recommend.py時碰到這問題(使用教學範例)
master=local[4]
==========資料準備===============
開始讀取電影ID與名稱對照表...
==========載入模型===============
找不到ALSModel模型,請先訓練
Traceback (most recent call last):
File "/home/hduser/pythonwork/PythonProject/Recommend.py", line 79, in
model=loadModel(sc)
File "/home/hduser/pythonwork/PythonProject/Recommend.py", line 59, in loadModel
return model
UnboundLocalError: local variable 'model' referenced before assignment
於是搜尋了類似的錯誤案例表示local變數model引用前沒有定義,請問這該如何解決呢?謝謝
在ALSmodel 已儲存Model 需要先刪除
回覆刪除hadoop fs -rm -R /user/hduser/ALSmodel/data
hadoop fs -rm -R /user/hduser/ALSmodel/matadata