9.1 安裝Anaconda
Step1. 複製安裝Anaconda 下載網址
連結continuum網址
https://repo.continuum.io/archive/index.htmlStep2. 下載Anaconda2-2.5.0-Linux-x86_64.sh
wget https://repo.continuum.io/archive/Anaconda2-2.5.0-Linux-x86_64.shStep3. 安裝Anaconda
bash Anaconda2-2.5.0-Linux-x86_64.sh -bStep4. 編輯~/.bashrc 加入模組路徑
修改~/.bashrc
sudo gedit ~/.bashrc輸入下列內容
export PATH=/home/hduser/anaconda2/bin:$PATH export ANACONDA_PATH=/home/hduser/anaconda2 export PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/bin/ipython export PYSPARK_PYTHON=$ANACONDA_PATH/bin/pythonStep5 使讓~/.bashrc修改生效
source ~/.bashrcStep6. 查看python 版本
python --version9.2 在IPython Notebook使用Spark
Step1. 建立ipynotebook 工作目錄
mkdir -p ~/pythonwork/ipynotebook cd ~/pythonwork/ipynotebookStep2. 在IPython Notebook 介面執行pyspark
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pysparkStep6. 在IPython Notebook 執行程式碼
sc.masterStep8. 讀取本機檔案程式碼
textFile=sc.textFile("file:/usr/local/spark/README.md") textFile.count()Step9. 輸入讀取HDFS 檔案程式碼
textFile=sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/LICENSE.txt") textFile.count()ch09.ipynb 完整內容請參考本書附錄(APPENDIX A 本書範例程式下載與安裝說明) ,下載本章IPython Notebook 範例檔案。
9.7 使用IPython Notebook在hadoop yarnclient模式執行
start-all.sh cd ~/pythonwork/ipynotebook PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client pysparkStep5. 在Hadoop Web 介面可以查看pyspark App
http://localhost:8088/9.8 使用IPython Notebook在Spark Stand Alone模式執行
Step1. 啟動Spark Stand Alone cluster
/usr/local/spark/sbin/start-all.shStep2. 啟動IPython Notebook 在Spark Stand Alone 模式
cd ~/pythonwork/ipynotebook PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://master:7077 pyspark --num-executors 1 --total-executor-cores 2 --executor-memory 512mStep5. 查看Spark Standalone Web UI 介面
http://master:8080/9.9 在不同的模式執行IPython Notebook指令整理
9.9.1 在Local 啟動IPython Notebook
cd ~/pythonwork/ipynotebook PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*]9.9.2 在hadoop yarn-client 模式啟動IPython Notebook
start-all.sh PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client9.9.3 在Spark Stand Alone 模式啟動IPython Notebook
start-all.sh /usr/local/spark/sbin/start-all.sh PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://master:7077 pyspark --num-executors 1 --total-executor-cores 3 --executor-memory 512m
此圖出自Spark官網 https://spark.apache.org/
mkdir -p ~/pythonwork/ipynotebook, pythonwork -> pythonsparkexample
回覆刪除mkdir -p ~/pythonsparkexample/ipynotebook
回覆刪除