Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rajkrrsingh/260880b3a587bd36ab4b to your computer and use it in GitHub Desktop.
Save rajkrrsingh/260880b3a587bd36ab4b to your computer and use it in GitHub Desktop.
reading parquet files and know meta information of parquet file
// Building a parquet tools
git clone https://github.com/Parquet/parquet-mr.git
cd parquet-mr/parquet-tools/
mvn clean package -Plocal
// know the schema of the parquet file
java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar schema sample.parquet
// Read parquet file
java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar cat sample.parquet
// Read few lines in parquet file
java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar head -n5 sample.parquet
// know the meta information of the parquet file
java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar meta sample.parquet
@shyamsrai
Copy link

shyamsrai commented Oct 5, 2016

It seems like there is a resolution issue for one of the dependencies.

[hdfs@master parquet-tools]$ mvn clean package -Plocal
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Apache Parquet Tools (Incubating) 1.6.0rc3-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for com.twitter:parquet-hadoop:jar:1.6.0rc3-SNAPSHOT is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.677 s
[INFO] Finished at: 2016-10-05T17:57:24+00:00
[INFO] Final Memory: 7M/150M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project parquet-tools: Could not resolve dependencies for project com.twitter:parquet-tools:jar:1.6.0rc3-SNAPSHOT: Failure to find com.twitter:parquet-hadoop:jar:1.6.0rc3-SNAPSHOT in https://oss.sonatype.org/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of sonatype-nexus-snapshots has elapsed or updates are forced -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

Fixed this by substituting the variable with 1.6.0 here

  <groupId>com.twitter</groupId>
  <artifactId>parquet-hadoop</artifactId>
  <!-- <version>${project.version}</version> -->
  <version>1.6.0</version>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment