此一篇章想给还在新项目中使用已经过了生命周期的centos,拥抱过去,不敢踏出新一步,也许也是好事,少出错。
下面给出一份CentOS 7.9 最小化环境下,3 节点(master/node1/node2)全分布式 Hadoop 2.6.0 的“一步一命令”实战手册。 (所有操作如无特别说明均在 master 节点完成,普通用户 hadoop 执行;只有系统级操作才切 root。)
节点 | IP | 角色 |
master | 192.168.41.81 | NameNode、SecondaryNameNode、ResourceManager、JobHistoryServer |
node1 | 192.168.41.82 | DataNode、NodeManager |
node2 | 192.168.41.83 | DataNode、NodeManager |
因centos7已经停止了生命周期,官网不在提供在线仓库服务。这里我们可以配置一下国内还没有下载的在线仓库源,例如阿里云的源。网址是
https://developer.aliyun.com/mirror/centos/。命令如下:
curl https:/mirrors.aliyun.com/repo/Centos-7.repo -o /etc/yum.repos.d/CentOS-Base.repo
# 如果有wget命令可以使用下面的命令列表来配置
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo配置完成后,添加阿里云的dns。
echo "nameserver 223.5.5.5" >> /etc/resolv.conf
echo "nameserver 223.6.6.6" >> /etc/resolv.conf # 关防火墙 & SELinux
systemctl stop firewalld && systemctl disable firewalld
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config && setenforce 0
# 安装常用包
yum -y install vim wget net-tools rsync ntpdate
echo "*/5 * * * * /usr/sbin/ntpdate ntp.aliyun.com &>/dev/null" >> /var/spool/cron/root
# 新建普通用户
useradd hadoop
echo 'hadoop ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers # 2.1 设置静态 IP(示例:/etc/sysconfig/network-scripts/ifcfg-ens33)
BOOTPROTO=static
IPADDR=192.168.1.101 # 各机对应自己 IP
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
DNS1=223.5.5.5
ONBOOT=yes
systemctl restart network
# 2.2 主机名
hostnamectl set-hostname master # 其余两台分别为 node1 / node2
# 2.3 hosts 映射(3 台保持一致)
cat >> /etc/hosts <<EOF
192.168.41.81 hadoop1
192.168.41.82 hadoop2
192.168.41.83 hadoop3
EOF su - hadoop
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
for h in hadoop1 hadoop2 hadoop3; do
ssh-copy-id -i ~/.ssh/id_rsa.pub $h
done
# 验证
ssh node1 jps # 创建保存jdk的目录
mkdir /opt/jdk
# 解压
tar -zxf jdk-8u391-linux-x64.tar.gz -C /opt/jdk/
# 环境变量
vi /etc/profile
# 先按两次大写的 G 跳转到文件的尾部,按下 i 或者 o 键,于文件末端添加以下的内容
export JAVA_HOME=/opt/jdk/jdk1.8.0_391
export PATH=$JAVA_HOME/bin:$PATH
# 完成后按 esc 键退出编辑模式,并输入 :wq 保存并退出
# 使刚才配置的环境变量立即生效
source /etc/profile
# 查看 java 的版本信息,确认配置正确并可运行。
java -version # 解压
tar -zxf hadoop-2.6.0.tar.gz -C /opt/
mv /opt/hadoop-2.6.0 /opt/hadoop
# 添加环境变量
vi /etc/profile
# 在文件中添加以下的内容
export HADOOP_HOME=/opt/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
# 使配置立即生效
source /etc/profile以下 6 个文件均在 $HADOOP_HOME/etc/hadoop/ 目录完成。
6.1 hadoop-env.sh
export JAVA_HOME=/opt/jdk/jdk1.8.0_3916.2 core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/data/tmp</value>
</property>
</configuration>6.3 hdfs-site.xml
<configuration>
<property><name>dfs.replication</name><value>2</value></property>
<property><name>dfs.namenode.http-address</name><value>hadoop1:50070</value></property>
<property><name>dfs.secondary.http-address</name><value>hadoop1:50090</value></property>
</configuration>6.4 mapred-site.xml(先拷模板)
cp mapred-site.xml.template mapred-site.xml<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1:19888</value>
</property>
</configuration>6.5 yarn-site.xml
<configuration>
<property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
<property><name>yarn.resourcemanager.hostname</name><value>hadoop1</value></property>
</configuration>6.6 slaves(DataNode 列表)
cat > slaves <<EOF
hadoop2
hadoop3
EOF# 本地目录
mkdir -p $HADOOP_HOME/data/tmp
# 7.2 同步脚本(也可以手动使用 scp 命令)
rsync -av /opt/hadoop/ hadoop2:/opt/hadoop
rsync -av /opt/hadoop/ hadoop3:/opt/hadoop
# 7.3 把环境变量也带过去
scp /etc/profile hadoop2:/etc/profile
ssh hadoop2 -c 'source /etc/profile && java -version && hadoop version'
scp /etc/profile hadoop3:/etc/profile
ssh hadoop3 -c 'source /etc/profile && java -version && hadoop version'hdfs namenode -format # 出现 “successfully formatted” 即 OK
start-dfs.sh # 启动 NameNode 与 DataNode
start-yarn.sh # 启动 ResourceManager 与 NodeManager
mr-jobhistory-daemon.sh start historyserver# 9.1 jps 输出
master: NameNode SecondaryNameNode ResourceManager JobHistoryServer
node1 : DataNode NodeManager
node2 : DataNode NodeManager
# 9.2 报告
hdfs dfsadmin -report # Live datanodes = 2
# 9.3 Web UI
http://192.168.41.81:50070 (HDFS)
http://192.168.41.81:8088 (YARN)
http://192.168.41.81:19888 (JobHistory)stop-all.sh # 一键停集群
start-all.sh # 一键起集群
hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode
yarn-daemon.sh start|stop resourcemanager|nodemanager