“Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS”, by Addison Wesley

title

HDFS

NameNode

hdfs-1

DataNode

YARN, Hadoop

yarn-1 memory

CLI

$ jps # java processes
$ hdfs dfs -ls / # like ls but for HDFS
$ pdsh #shell for multi hosts at same time
$ hdfs dfsadmin -report # stats about HDFS
$ hdfs fds -copyFromLocal file.txt /input/file
$ hdfs dfs -get /out/result 
$ hdfs dfs -cat /out/result
$ hdfs dfsadmin -printTopology # racks nodes location
$ hdfs fsck / # show number of data nodes and racks
$ hdfs dfsadmin -report 
$ hdfs dfs -df # free space
$ hdfs dfs -du -h / # usage
$ hdfs dfs -test -e /abc # file exists
$ ethtool eth0 # speed of network of machine
$ hdfs storagepolicies -listPolicies
$ hdfs dfs -put
$ hdfs dfs -get
$ hdfs dfs -mv
$ hdfs dfs -tail
$ hadoop distcp srcdir destdir # copy between clusters! 
$ yarn application -movetoqueue appID -queue myq
$ yarn top
$ yarn application -kill
$ yarn node -list
$ yarn logs
$ yarn rmadmin refreshNodes
$ yarb rmadmin -transitionToActive
$ yarn rmadmin -failover

Applications

MapReduce

Hadoop Streaming

Hive

Pig

Spark

spark-memory

Sqoop

sqoop

Flume

Spout

Oozie