Posted on Wed 29 October 2014

dragon - an OSS File System for Hadoop

OSS is Aliyun’s offering of a distributed, highly available storage service. dragon allows YARN applications (incl. MapReduce jobs) to read and write data to OSS through the HDFS API. This allows you to swap filesystems without modifying your YARN application, MapReduce job, Pig script, Hive script etc.

The source code is available on GitHub. The dragon jar may be built using

$ ./gradlew jar

dragon’s dependencies may be downloaded using

$ ./gradlew copyDeps

To use dragon in your Hadoop cluster, add dragon-*.jar and its dependencies to Hadoop’s classpath on your clusters. Then modify core-site.xml to add the following properties:

<property>
    <name>fs.oss.impl</name>
    <value>com.quixey.hadoop.fs.oss.OSSFileSystem</value>
</property>
<property>
    <name>fs.oss.accessKeyId</name>
    <value>...</value>
</property>
<property>
    <name>fs.oss.secretAccessKey</name>
    <value>...</value>
</property>

This tells the Hadoop clients to use the dragon implementation of HDFS for all URIs with the oss:// scheme.

You may verify that your setup works by running the following command:

$ hdfs dfs -ls oss://your-bucket/

(dragon is an open-source project by Quixey.)

Category: Software

Tags: hadoop, oss

Jim LimGitHubLinkedInStackOverflowontoplist ⋅ Theme by Giulio Fidente.