Execution plan of spark on bucketed data-sets, and verify if it is smart enough to avoid wide dependency.
PS: When trying things in spark-shell, make a note that, for small datasets, the join would be probably be broadcast exchange in physical execution plan by default. Example:
./spark-shell