Currently, only parameters in the following files can be customized: HDFS: core-site.xml, hdfs-site.xml, hadoop-env.sh, log4j.properties YARN: yarn-site.xml, mapred-site.xml, fair-scheduler.xml, capacity-scheduler.xml, yarn-env.sh, mapred-env.sh Hive: hive-site.xml, hive-env.sh, hive-log4j2.properties
[{"serviceName": "HDFS","classification": "hdfs-site.xml","serviceVersion": "2.8.4","properties": {"dfs.blocksize": "67108864","dfs.client.slow.io.warning.threshold.ms": "900000","output.replace-datanode-on-failure": "false"}},{"serviceName": "YARN","classification": "yarn-site.xml","serviceVersion": "2.8.4","properties": {"yarn.app.mapreduce.am.staging-dir": "/emr/hadoop-yarn/staging","yarn.log-aggregation.retain-check-interval-seconds": "604800","yarn.scheduler.minimum-allocation-vcores": "1"}},{"serviceName": "YARN","classification": "capacity-scheduler.xml","serviceVersion": "2.8.4","properties": {"content": "<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>\\n<?xml-stylesheet type=\\"text/xsl\\" href=\\"configuration.xsl\\"?>\\n<configuration><property>\\n <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>\\n <value>0.8</value>\\n</property>\\n<property>\\n <name>yarn.scheduler.capacity.maximum-applications</name>\\n <value>1000</value>\\n</property>\\n<property>\\n <name>yarn.scheduler.capacity.root.default.capacity</name>\\n <value>100</value>\\n</property>\\n<property>\\n <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>\\n <value>100</value>\\n</property>\\n<property>\\n <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>\\n <value>1</value>\\n</property>\\n<property>\\n <name>yarn.scheduler.capacity.root.queues</name>\\n <value>default</value>\\n</property>\\n</configuration>"}}]
capacity-scheduler.xml or fair-scheduler.xml, set key in properties to content, and set value to the content of the entire file.nameservice of the external cluster to be accessed is HDFS8088, and the access method is as follows:<property><name>dfs.ha.namenodes.HDFS8088</name><value>nn1,nn2</value></property><property><name>dfs.namenode.http-address.HDFS8088.nn1</name><value>172.21.16.11:4008</value></property><property><name>dfs.namenode.https-address.HDFS8088.nn1</name><value>172.21.16.11:4009</value></property><name>dfs.namenode.rpc-address.HDFS8088.nn1</name><value>172.21.16.11:4007</value><property><name>dfs.namenode.http-address.HDFS8088.nn2</name><value>172.21.16.40:4008</value></property><property><name>dfs.namenode.https-address.HDFS8088.nn2</name><value>172.21.16.40:4009</value></property><property><name>dfs.namenode.rpc-address.HDFS8088.nn2</name><value>172.21.16.40:4007</value><property>
[{"serviceName": "HDFS","classification": "hdfs-site.xml","serviceVersion": "2.7.3","properties": {"newNameServiceName": "newEmrCluster","dfs.ha.namenodes.HDFS8088": "nn1,nn2","dfs.namenode.http-address.HDFS8088.nn1": "172.21.16.11:4008","dfs.namenode.https-address.HDFS8088.nn1": "172.21.16.11:4009","dfs.namenode.rpc-address.HDFS8088.nn1": "172.21.16.11:4007","dfs.namenode.http-address.HDFS8088.nn2": "172.21.16.40:4008","dfs.namenode.https-address.HDFS8088.nn2": "172.21.16.40:4009","dfs.namenode.rpc-address.HDFS8088.nn2": "172.21.16.40:4007"}}]
nameservice of the newly created cluster, which is optional. If this parameter is left empty, its value will be generated by the system; if it is not empty, its value must consist of a string, digits, and hyphen.Access to external clusters is supported only for high-availability clusters. Access to external clusters is supported only for clusters with Kerberos disabled.
nameservice of the cluster is HDFS80238 (if it is not a high-availability cluster, the nameservice will usually be masterIp:rpcport, such as 172.21.0.11:4007).
The nameservice of the external cluster to be accessed is HDFS8088, and the access method is as follows:<property><name>dfs.ha.namenodes.HDFS8088</name><value>nn1,nn2</value></property><property><name>dfs.namenode.http-address.HDFS8088.nn1</name><value>172.21.16.11:4008</value></property><property><name>dfs.namenode.https-address.HDFS8088.nn1</name><value>172.21.16.11:4009</value></property><name>dfs.namenode.rpc-address.HDFS8088.nn1</name><value>172.21.16.11:4007</value><property><name>dfs.namenode.http-address.HDFS8088.nn2</name><value>172.21.16.40:4008</value></property><property><name>dfs.namenode.https-address.HDFS8088.nn2</name><value>172.21.16.40:4009</value></property><property><name>dfs.namenode.rpc-address.HDFS8088.nn2</name><value>172.21.16.40:4007</value><property>
/usr/local/service/hadoop/etc/hadoop/hdfs-site.xml file.hdfs-site.xml file of the HDFS component.dfs.nameservices to HDFS80238,HDFS8088.Configuration Item | Value |
dfs.ha.namenodes.HDFS8088 | nn1,nn2 |
fs.namenode.http-address.HDFS8088.nn1 | 172.21.16.11:4008 |
dfs.namenode.https-address.HDFS8088.nn1 | 172.21.16.11:4009 |
dfs.namenode.rpc-address.HDFS8088.nn1 | 172.21.16.11:4007 |
fs.namenode.http-address.HDFS8088.nn2 | 172.21.16.40:4008 |
dfs.namenode.https-address.HDFS8088.nn2 | 172.21.16.40:4009 |
dfs.namenode.rpc-address.HDFS8088.nn2 | 172.21.16.40:4007 |
dfs.client.failover.proxy.provider.HDFS8088 | org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider |
dfs.internal.nameservices | HDFS80238 |
dfs.internal.nameserviceneeds to be added; otherwise, if the cluster is scaled out, the "datanode" may report an error and be marked as "dead" by thenamenode.
Feedback