Skip to content

Instantly share code, notes, and snippets.

@yaooqinn
Created March 19, 2020 07:51
Show Gist options
  • Save yaooqinn/76bef5fe936b0abcf7f80c585ad278f3 to your computer and use it in GitHub Desktop.
Save yaooqinn/76bef5fe936b0abcf7f80c585ad278f3 to your computer and use it in GitHub Desktop.
SPARK-31189
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Datetime patterns - Spark 3.1.0 Documentation</title>
<link rel="stylesheet" href="css/bootstrap.min.css">
<style>
body {
padding-top: 60px;
padding-bottom: 40px;
}
</style>
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="css/bootstrap-responsive.min.css">
<link rel="stylesheet" href="css/main.css">
<script src="js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
<link rel="stylesheet" href="css/pygments-default.css">
</head>
<body>
<!--[if lt IE 7]>
<p class="chromeframe">You are using an outdated browser. <a href="https://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
<![endif]-->
<!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html -->
<div class="navbar navbar-fixed-top" id="topbar">
<div class="navbar-inner">
<div class="container">
<div class="brand"><a href="index.html">
<img src="img/spark-logo-hd.png" style="height:50px;"/></a><span class="version">3.1.0</span>
</div>
<ul class="nav">
<!--TODO(andyk): Add class="active" attribute to li some how.-->
<li><a href="index.html">Overview</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Programming Guides<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="quick-start.html">Quick Start</a></li>
<li><a href="rdd-programming-guide.html">RDDs, Accumulators, Broadcasts Vars</a></li>
<li><a href="sql-programming-guide.html">SQL, DataFrames, and Datasets</a></li>
<li><a href="structured-streaming-programming-guide.html">Structured Streaming</a></li>
<li><a href="streaming-programming-guide.html">Spark Streaming (DStreams)</a></li>
<li><a href="ml-guide.html">MLlib (Machine Learning)</a></li>
<li><a href="graphx-programming-guide.html">GraphX (Graph Processing)</a></li>
<li><a href="sparkr.html">SparkR (R on Spark)</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">API Docs<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="api/scala/org/apache/spark/index.html">Scala</a></li>
<li><a href="api/java/index.html">Java</a></li>
<li><a href="api/python/index.html">Python</a></li>
<li><a href="api/R/index.html">R</a></li>
<li><a href="api/sql/index.html">SQL, Built-in Functions</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Deploying<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="cluster-overview.html">Overview</a></li>
<li><a href="submitting-applications.html">Submitting Applications</a></li>
<li class="divider"></li>
<li><a href="spark-standalone.html">Spark Standalone</a></li>
<li><a href="running-on-mesos.html">Mesos</a></li>
<li><a href="running-on-yarn.html">YARN</a></li>
<li><a href="running-on-kubernetes.html">Kubernetes</a></li>
</ul>
</li>
<li class="dropdown">
<a href="api.html" class="dropdown-toggle" data-toggle="dropdown">More<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="configuration.html">Configuration</a></li>
<li><a href="monitoring.html">Monitoring</a></li>
<li><a href="tuning.html">Tuning Guide</a></li>
<li><a href="job-scheduling.html">Job Scheduling</a></li>
<li><a href="security.html">Security</a></li>
<li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
<li><a href="migration-guide.html">Migration Guide</a></li>
<li class="divider"></li>
<li><a href="building-spark.html">Building Spark</a></li>
<li><a href="https://spark.apache.org/contributing.html">Contributing to Spark</a></li>
<li><a href="https://spark.apache.org/third-party-projects.html">Third Party Projects</a></li>
</ul>
</li>
</ul>
<!--<p class="navbar-text pull-right"><span class="version-text">v3.1.0</span></p>-->
</div>
</div>
</div>
<div class="container-wrapper">
<div class="left-menu-wrapper">
<div class="left-menu">
<h3><a href="sql-programming-guide.html">Spark SQL Guide</a></h3>
<ul>
<li>
<a href="sql-getting-started.html">
Getting Started
</a>
</li>
<li>
<a href="sql-data-sources.html">
Data Sources
</a>
</li>
<li>
<a href="sql-performance-tuning.html">
Performance Tuning
</a>
</li>
<li>
<a href="sql-distributed-sql-engine.html">
Distributed SQL Engine
</a>
</li>
<li>
<a href="sql-pyspark-pandas-with-arrow.html">
PySpark Usage Guide for Pandas with Apache Arrow
</a>
</li>
<li>
<a href="sql-migration-old.html">
Migration Guide
</a>
</li>
<li>
<a href="sql-ref.html">
SQL Reference
</a>
</li>
<ul>
<li>
<a href="sql-ref-datatypes.html">
Data Types
</a>
</li>
<li>
<a href="sql-ref-null-semantics.html">
Null Semantics
</a>
</li>
<li>
<a href="sql-ref-nan-semantics.html">
NaN Semantics
</a>
</li>
<li>
<a href="sql-ref-ansi-compliance.html">
ANSI Compliance
</a>
</li>
<li>
<a href="sql-ref-syntax.html">
SQL Syntax
</a>
</li>
<li>
<a href="sql-ref-datetime-pattern.html">
<b>Datetime Pattern</b>
</a>
</li>
</ul>
</ul>
</div>
</div>
<input id="nav-trigger" class="nav-trigger" checked type="checkbox">
<label for="nav-trigger"></label>
<div class="content-with-sidebar" id="content">
<h1 class="title">Datetime Patterns for Formatting and Parsing</h1>
<p>There are several common scenarios for datetime usage in Spark:</p>
<ul>
<li>
<p>CSV/JSON datasources use the pattern string for parsing and formatting datetime content.</p>
</li>
<li>
<p>Datetime functions related to convert <code class="highlighter-rouge">StringType</code> to/from <code class="highlighter-rouge">DateType</code> or <code class="highlighter-rouge">TimestampType</code>.
For example, <code class="highlighter-rouge">unix_timestamp</code>, <code class="highlighter-rouge">date_format</code>, <code class="highlighter-rouge">to_unix_timestamp</code>, <code class="highlighter-rouge">from_unixtime</code>, <code class="highlighter-rouge">to_date</code>, <code class="highlighter-rouge">to_timestamp</code>, <code class="highlighter-rouge">from_utc_timestamp</code>, <code class="highlighter-rouge">to_utc_timestamp</code>, etc.</p>
</li>
</ul>
<p>Spark uses <a href="https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html">java.time.format.DateTimeFormatter</a>
for formatting and parsing date-time objects.
Basically, Spark follows the behaviors of a <code class="highlighter-rouge">DateTimeFormatter</code> which formed by
<a href="https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatterBuilder.html#appendPattern-java.lang.String-">DateTimeFormatterBuilder.appendPattern</a>
exactly for formatting, but slightly different for parsing where the length second fraction part can be variable,
For instance, the pattern <code class="highlighter-rouge">'yyyy-MM-dd HH:mm:ss.SSS'</code> can parse timestamp string with [1, 3] significant digits after the decimal point,
but format timestamp to a string with fixed faction part which length is 3.</p>
<p>Notice that the pattern string used here is similar, but not identical, to <a href="https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html">DateTimeFormatter</a>
or <a href="https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html">SimpleDateFormat</a>.
Spark follows <code class="highlighter-rouge">SimpleDateFormat</code> to use &#8216;u&#8217; for the numeric day of week, not use it for year as DateTimeFormatter does, and ban &#8216;e&#8217; and &#8216;c&#8217; from <code class="highlighter-rouge">DateTimeFormatter</code> which
also mean the numeric day of week to eliminate vagueness.</p>
<p>Spark uses pattern letters in the following table for date and timestamp parsing and formatting:</p>
<table class="table">
<tr>
<th> <b>Symbol</b> </th>
<th> <b>Meaning</b> </th>
<th> <b>Presentation</b> </th>
<th> <b>Examples</b> </th>
</tr>
<tr>
<td> <b>G</b> </td>
<td> era </td>
<td> text </td>
<td> AD; Anno Domini; A </td>
</tr>
<tr>
<td> <b>y</b> </td>
<td> year </td>
<td> year </td>
<td> 2020; 20 </td>
</tr>
<tr>
<td> <b>D</b> </td>
<td> day-of-year </td>
<td> number </td>
<td> 189 </td>
</tr>
<tr>
<td> <b>M/L</b> </td>
<td> month-of-year </td>
<td> number/text </td>
<td> 7; 07; Jul; July; J </td>
</tr>
<tr>
<td> <b>d</b> </td>
<td> day-of-month </td>
<td> number </td>
<td> 28 </td>
</tr>
<tr>
<td> <b>Q/q</b> </td>
<td> quarter-of-year </td>
<td> number/text </td>
<td> 3; 03; Q3; 3rd quarter </td>
</tr>
<tr>
<td> <b>Y</b> </td>
<td> week-based-year </td>
<td> year </td>
<td> 1996; 96 </td>
</tr>
<tr>
<td> <b>w</b> </td>
<td> week-of-week-based-year </td>
<td> number </td>
<td> 27 </td>
</tr>
<tr>
<td> <b>W</b> </td>
<td> week-of-month </td>
<td> number </td>
<td> 4 </td>
</tr>
<tr>
<td> <b>E</b> </td>
<td> day-of-week </td>
<td> text </td>
<td> Tue; Tuesday; T </td>
</tr>
<tr>
<td> <b>u</b> </td>
<td> localized day-of-week </td>
<td> number/text </td>
<td> 2; 02; Tue; Tuesday; T </td>
</tr>
<tr>
<td> <b>F</b> </td>
<td> week-of-month </td>
<td> number </td>
<td> 3 </td>
</tr>
<tr>
<td> <b>a</b> </td>
<td> am-pm-of-day </td>
<td> text </td>
<td> PM </td>
</tr>
<tr>
<td> <b>h</b> </td>
<td> clock-hour-of-am-pm (1-12) </td>
<td> number </td>
<td> 12 </td>
</tr>
<tr>
<td> <b>K</b> </td>
<td> hour-of-am-pm (0-11) </td>
<td> number </td>
<td> 0 </td>
</tr>
<tr>
<td> <b>k</b> </td>
<td> clock-hour-of-day (1-24) </td>
<td> number </td>
<td> 0 </td>
</tr>
<tr>
<td> <b>H</b> </td>
<td> hour-of-day (0-23) </td>
<td> number </td>
<td> 0 </td>
</tr>
<tr>
<td> <b>m</b> </td>
<td> minute-of-hour </td>
<td> number </td>
<td> 30 </td>
</tr>
<tr>
<td> <b>s</b> </td>
<td> second-of-minute </td>
<td> number </td>
<td> 55 </td>
</tr>
<tr>
<td> <b>S</b> </td>
<td> fraction-of-second </td>
<td> fraction </td>
<td> 978 </td>
</tr>
<tr>
<td> <b>A</b> </td>
<td> milli-of-day </td>
<td> number </td>
<td> 1234 </td>
</tr>
<tr>
<td> <b>n</b> </td>
<td> nano-of-second </td>
<td> number </td>
<td> 987654321 </td>
</tr>
<tr>
<td> <b>N</b> </td>
<td> nano-of-second </td>
<td> number </td>
<td> 1234000000 </td>
</tr>
<tr>
<td> <b>V</b> </td>
<td> time-zone ID </td>
<td> zone-id </td>
<td> America/Los_Angeles; Z; -08:30 </td>
</tr>
<tr>
<td> <b>z</b> </td>
<td> time-zone name </td>
<td> zone-name </td>
<td> Pacific Standard Time; PST </td>
</tr>
<tr>
<td> <b>O</b> </td>
<td> localized zone-offset </td>
<td> offset-O </td>
<td> GMT+8; GMT+08:00; UTC-08:00; </td>
</tr>
<tr>
<td> <b>X</b> </td>
<td> zone-offset 'Z' for zero </td>
<td> offset-X </td>
<td> Z; -08; -0830; -08:30; -083015; -08:30:15; </td>
</tr>
<tr>
<td> <b>x</b> </td>
<td> zone-offset </td>
<td> offset-x </td>
<td> +0000; -08; -0830; -08:30; -083015; -08:30:15; </td>
</tr>
<tr>
<td> <b>Z</b> </td>
<td> zone-offset </td>
<td> offset-Z </td>
<td> +0000; -0800; -08:00; </td>
</tr>
<tr>
<td> <b>p</b> </td>
<td> pad next field with spaces </td>
<td> pad modifier </td>
<td> 1 </td>
</tr>
<tr>
<td> <b>'</b> </td>
<td> escape for text </td>
<td> delimiter </td>
<td></td>
</tr>
<tr>
<td> <b>''</b> </td>
<td> single quote </td>
<td> literal </td>
<td> ' </td>
</tr>
<tr>
<td> <b>[</b> </td>
<td> optional section start </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> <b>]</b> </td>
<td> optional section end </td>
<td> </td>
<td> </td>
</tr>
</table>
<p>The count of pattern letters determines the format.</p>
<ul>
<li>
<p>Text: The text style is determined based on the number of pattern letters used. Less than 4 pattern letters will use the short form. Exactly 4 pattern letters will use the full form. Exactly 5 pattern letters will use the narrow form. Six or more letters will fail.</p>
</li>
<li>
<p>Number: If the count of letters is one, then the value is output using the minimum number of digits and without padding. Otherwise, the count of digits is used as the width of the output field, with the value zero-padded as necessary. The following pattern letters have constraints on the count of letters. Only one letter &#8216;F&#8217; can be specified. Up to two letters of &#8216;d&#8217;, &#8216;H&#8217;, &#8216;h&#8217;, &#8216;K&#8217;, &#8216;k&#8217;, &#8216;m&#8217;, and &#8216;s&#8217; can be specified. Up to three letters of &#8216;D&#8217; can be specified.</p>
</li>
<li>
<p>Number/Text: If the count of pattern letters is 3 or greater, use the Text rules above. Otherwise use the Number rules above.</p>
</li>
<li>
<p>Fraction: Use <code class="highlighter-rouge">'S..S'</code> pattern for parsing and formatting fraction of second.
For parsing, the acceptable fraction length can be [1, <code class="highlighter-rouge">'S..S'</code>.length(&lt;=9)] and with <code class="highlighter-rouge">[]</code> surrounded can be [0, <code class="highlighter-rouge">'S..S'</code>.length(&lt;=9)].
For formatting, with or without <code class="highlighter-rouge">[]</code> surrounded, the fraction length would be padded to <code class="highlighter-rouge">'S..S'</code>.length(&lt;=9) with zeros.
Spark supports datetime with max precision to micro-of-second which has six significant digits at most, but can parse nano-of-second field with exceed part truncated.</p>
</li>
<li>
<p>Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when &#8216;G&#8217; is not present.</p>
</li>
<li>
<p>Zone names: This outputs the display name of the time-zone ID. If the count of letters is one, two or three, then the short name is output. If the count of letters is four, then the full name is output. Five or more letters will fail.</p>
</li>
<li>
<p>Offset X and x: This formats the offset based on the number of pattern letters. One letter outputs just the hour, such as &#8216;+01&#8217;, unless the minute is non-zero in which case the minute is also output, such as &#8216;+0130&#8217;. Two letters outputs the hour and minute, without a colon, such as &#8216;+0130&#8217;. Three letters outputs the hour and minute, with a colon, such as &#8216;+01:30&#8217;. Four letters outputs the hour and minute and optional second, without a colon, such as &#8216;+013015&#8217;. Five letters outputs the hour and minute and optional second, with a colon, such as &#8216;+01:30:15&#8217;. Six or more letters will fail. Pattern letter &#8216;X&#8217; (upper case) will output &#8216;Z&#8217; when the offset to be output would be zero, whereas pattern letter &#8216;x&#8217; (lower case) will output &#8216;+00&#8217;, &#8216;+0000&#8217;, or &#8216;+00:00&#8217;.</p>
</li>
<li>
<p>Offset O: This formats the localized offset based on the number of pattern letters. One letter outputs the short form of the localized offset, which is localized offset text, such as &#8216;GMT&#8217;, with hour without leading zero, optional 2-digit minute and second if non-zero, and colon, for example &#8216;GMT+8&#8217;. Four letters outputs the full form, which is localized offset text, such as &#8216;GMT, with 2-digit hour and minute field, optional second field if non-zero, and colon, for example &#8216;GMT+08:00&#8217;. Any other count of letters will fail.</p>
</li>
<li>
<p>Offset Z: This formats the offset based on the number of pattern letters. One, two or three letters outputs the hour and minute, without a colon, such as &#8216;+0130&#8217;. The output will be &#8216;+0000&#8217; when the offset is zero. Four letters outputs the full form of localized offset, equivalent to four letters of Offset-O. The output will be the corresponding localized offset text if the offset is zero. Five letters outputs the hour, minute, with optional second if non-zero, with colon. It outputs &#8216;Z&#8217; if the offset is zero. Six or more letters will fail.</p>
</li>
</ul>
<p>More details for the text style:</p>
<ul>
<li>
<p>Short Form: Short text, typically an abbreviation. For example, day-of-week Monday might output &#8220;Mon&#8221;.</p>
</li>
<li>
<p>Full Form: Full text, typically the full description. For example, day-of-week Monday might output &#8220;Monday&#8221;.</p>
</li>
<li>
<p>Narrow Form: Narrow text, typically a single letter. For example, day-of-week Monday might output &#8220;M&#8221;.</p>
</li>
</ul>
</div>
<!-- /container -->
</div>
<script src="js/vendor/jquery-3.4.1.min.js"></script>
<script src="js/vendor/bootstrap.min.js"></script>
<script src="js/vendor/anchor.min.js"></script>
<script src="js/main.js"></script>
<!-- MathJax Section -->
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
TeX: { equationNumbers: { autoNumber: "AMS" } }
});
</script>
<script>
// Note that we load MathJax this way to work with local file (file://), HTTP and HTTPS.
// We could use "//cdn.mathjax...", but that won't support "file://".
(function(d, script) {
script = d.createElement('script');
script.type = 'text/javascript';
script.async = true;
script.onload = function(){
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ],
displayMath: [ ["$$","$$"], ["\\[", "\\]"] ],
processEscapes: true,
skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
}
});
};
script.src = ('https:' == document.location.protocol ? 'https://' : 'http://') +
'cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js' +
'?config=TeX-AMS-MML_HTMLorMML';
d.getElementsByTagName('head')[0].appendChild(script);
}(document));
</script>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment