Index ¦ Archives ¦ RSS

dragon - an OSS File System for Hadoop

OSS is Aliyun’s offering of a distributed, highly available storage service. dragon allows YARN applications (incl. MapReduce jobs) to read and write data to OSS through the HDFS API. This allows you to swap filesystems without modifying your YARN application, MapReduce job, Pig script, Hive script etc.

The source code is available on GitHub. The dragon jar may be built using

$ ./gradlew jar

dragon’s dependencies may be downloaded using

$ ./gradlew copyDeps

To use dragon in your Hadoop cluster, add dragon-*.jar and its dependencies to Hadoop’s classpath on your clusters. Then modify core-site.xml to add the following properties:

  • fs.oss.impl: com.quixey.hadoop.fs.oss.OSSFileSystem
  • fs.oss.accessKeyId: ...
  • fs.oss.secretAccessKey: ...

This tells the Hadoop clients to use the dragon implementation of HDFS for all URIs with the oss:// scheme.

You may verify that your setup works by running the following command:

$ hdfs dfs -ls oss://your-bucket/

(dragon is an open-source project by Quixey.)

© James Lim. Built using Pelican. Theme by Giulio Fidente on github.