Skip to content

Instantly share code, notes, and snippets.

@lordmulder lordmulder/x264 log

Created Jun 26, 2020
Embed
What would you like to do?
This file has been truncated, but you can view the full file.
------------------------------------
x264 version history
------------------------------------
commit 4c9b076be684832b9141f5b6c03aaf302adca0e4 [revision 3009]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 26 03:19:00 2020 +0300
Remove code for non-positive f_ip_factor/f_pb_factor
Currently they are guaranteed to be positive.
commit d1fee1e05249c84f642c13b6af010c331af21e1c [revision 3008]
Author: JHammler <j.hammler@gmail.com>
Date: Mon Jun 15 21:57:16 2020 +0200
configure: Fix building under the MSYS shell
commit 235ce6130168f4deee55c88ecda5ab84d81d125b [revision 3007]
Author: Sergei Trofimovich <slyfox@inbox.ru>
Date: Fri Jun 5 19:34:02 2020 +0200
configure: allow 'strings' override via STRINGS variable
This allows building x264 on systems where 'strings' or
'${HOST}-strings' does not exist, but llvm-strings exists.
commit 22fcbe12046b7d6ed7af5d7a47258f1f2ebd56c1 [revision 3006]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Jun 9 21:04:58 2020 +0200
x86inc: Fix warnings when using nasm 2.15
commit 32a7ee1ccc56b0bd98b13d91b49951b84d934d27 [revision 3005]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun May 24 17:15:35 2020 +0300
checkasm: increase float error margin to 1e-5
checkasm10 with seed=511142008 failed on win32 gcc builds.
commit 6b28e585046a10f5b18ef873d67ac1f0cb3e2677 [revision 3004]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun May 24 16:35:00 2020 +0300
Fix data race
Closes videolan/x264#16.
Bug report by Zu-Ming Jiang.
commit 538f09b5b92eda0b6efe25e62fcc8542fc9f025d [revision 3003]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 26 02:56:25 2020 +0300
Fix undefined behavior: index out of bounds (one more)
last_non_b_pict_type is initialized to -1.
Bug report by Vitaly Buka.
commit 526b6b5488fb3bf45b3838007e1da593b1fb409b [revision 3002]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 26 00:50:12 2020 +0300
Remove use of non-breaking spaces
commit 829f625cb975d24251e7fa37c1e5bdf548c8e2c1 [revision 3001]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 26 00:20:07 2020 +0300
Fix file encoding from Windows-1252 to UTF-8
commit 33f9e1474613f59392be5ab6a7e7abf60fa63622 [revision 3000]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 5 22:29:32 2020 +0300
Fix warning: comparison of integers of different signs [-Wsign-compare]
commit 296494a4011f58f32adc54304a2654627558c59a [revision 2999]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 5 21:55:15 2020 +0300
Fix error "invalid size of malloc" for 10-bit encodes at i686
commit 9a1bd573c680e2339f359af3f4cf458bb71958f4 [revision 2998]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Mar 1 14:44:09 2020 +0300
Fix undefined behavior: shift exponent is negative
commit af2755cd18891bdfced9663030ccc6c2583b8a2a [revision 2997]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Mar 1 14:42:50 2020 +0300
Fix undefined behavior: access within misaligned address
commit 7186cc92bee2551bb656d669205c2a68848a8f87 [revision 2996]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Mar 1 14:17:02 2020 +0300
Fix undefined behavior: applying [non-]zero offset to null pointer
commit dfac34deef1ab22c011a1314a07dd23bbbeac289 [revision 2995]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Mar 1 14:00:34 2020 +0300
Fix undefined behavior: index out of bounds
commit d2833a49f42f0e10569e6ab3dbd97807209b3515 [revision 2994]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Mar 1 13:38:46 2020 +0300
Fix undefined behavior: division by zero
commit cc9d9e325e4546747c37ef06d4fa3bff35d1a740 [revision 2993]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Apr 9 01:11:23 2020 +0300
CI: Fix vlc-contrib URL for windows targets
commit 04e6c65e6b2878e58c6ff6d3c395a266caa26bb3 [revision 2992]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sat Feb 29 22:02:01 2020 +0300
Bump dates to 2020
commit 1771b556ee45207f8711744ccbd5d42a3949b14c [revision 2991]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Nov 25 17:38:57 2019 +0300
Check support for force_align_arg_pointer attribute
Closes videolan/x264#9.
commit 76669180821692303465b59de9c9e3933db32db2 [revision 2990]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Nov 25 14:58:43 2019 +0300
Fix float division by zero when encoding CRF+VBV
Bug report by Sam Panzer.
commit 7923c5818b50a3d8816eed222a7c43b418a73b36 [revision 2989]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Nov 15 03:04:16 2019 +0300
Limit maximum supported resolution
And other resolution dependent buffers checks.
Closes videolan/x264#10.
commit 7817004df0bf57e1eb83e8ef9c0c407477b59d71 [revision 2988]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Fri Nov 1 10:00:11 2019 +0100
aarch64: Use HAVE_NEON define during CPU detection
commit 7114174b23b1764b8f4b58ae9d0f8a422748df0f [revision 2987]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Nov 1 02:45:39 2019 +0300
aarch64: Fix compilation with disabled asm
commit b2e66daba6e82ccc433c019e2d61d2496e2a7cd5 [revision 2986]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Nov 1 00:10:22 2019 +0300
Export symbols only when building shared library
commit 0e227c47ce99ab26fc30e3326ce6d923d191e922 [revision 2985]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Oct 31 23:22:28 2019 +0300
Fix compilation of fprofiled shared build
commit 3759fcb7b48037a5169715ab89f80a0ab4801cdf [revision 2984]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Wed May 8 19:19:11 2019 +0300
Remove CRT objects use between DLL boundaries
Fix crash of MSVC builds compiled with --system-libx264 and /MT (default) CRT.
commit 76c5afc25b331cf98c63c6e313a90cd98c575858 [revision 2983]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Apr 22 22:18:01 2019 +0300
Fix MSVS build with ./configure --enable-shared --system-libx264
commit a615f027ed172e2dd5380e736d487aa858a0c4ff [revision 2982]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Mar 29 17:53:14 2019 +0300
Mark explicitly DSO public API symbols and hide all other by -fvisibility=hidden
Removes need for -Bsymbolic during linking.
commit b5bc5d69c580429ff716bafcd43655e855c31b02 [revision 2981]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Mar 30 17:47:25 2019 +0100
x86: Perform stack realignment in C instead of assembly
Simplifies a lot of code and avoids having to export public asm functions.
Note that the force_align_arg_pointer function attribute is broken in clang
versions prior to 6.0.1 which may result in crashes, so make sure to either
use a newer clang version or a different compiler.
commit 34c06d1c17ad968fbdda153cb772f77ee31b3095 [revision 2980]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Jul 12 15:23:29 2019 +0300
Strip git-hash from version in x264.pc
pkg-config doesn't like spaces in version string.
commit f9af2a0f71d0fca7c1cafa7657f03a302da0ca1c [revision 2979]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Jul 8 15:46:56 2019 +0300
Revert r2959: Signal Progressive and Constrained profiles
Some hardware decoders reject to decode streams with non-zero
constraint_set4_flag/constraint_set5_flag.
commit 6d4947083a712c7dc2efca569c8149ffc8667eda [revision 2978]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Jun 14 19:57:36 2019 +0300
Fix x264_picture_alloc with X264_CSP_I400 colorspace
commit 6b1170cbbd4f5cf3170d9d79aa1182e863188b04 [revision 2977]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Wed May 8 17:52:15 2019 +0300
Shut up UBSan about uninitialized data read
Result was never used in that case.
commit f06062f51bc5928e6a364598357dbea2d7b83cd2 [revision 2976]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Apr 22 21:41:43 2019 +0300
Fix integer overflow detected by UBSan in --weightp analysis
Bug report by Xuezhi Yan.
commit 3147fa431627f1a00e54c8701d5ac07f1857c981 [revision 2975]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Apr 12 15:40:01 2019 +0300
checkasm: Fix heap-buffer-overflow read detected by ASan
commit 6381798d2d1339c0535732a764096b5345607981 [revision 2974]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Apr 12 15:38:08 2019 +0300
Fix heap-buffer-overflow read detected by ASan with interlaced encoding
Bug report by Hongxu Chen.
commit bd8a88be426baa903427a10de9f9ddb5e7c32812 [revision 2973]
Author: Konstantin Pavlov <thresh@videolan.org>
Date: Tue Jul 16 22:38:32 2019 +0300
CI: Bump macos target to darwin18
commit 352c02634d6d004c1d79ff5ccbbd2414ad32b67c [revision 2972]
Author: Konstantin Pavlov <thresh@videolan.org>
Date: Tue Jul 16 22:24:46 2019 +0300
CI: Use a newer aarch64 image
It now includes pkg-config, so lavf can be detected.
commit 98ee9d2f215326feeb221a4434957fa586d55c18 [revision 2971]
Author: Konstantin Pavlov <thresh@videolan.org>
Date: Fri Apr 5 15:08:29 2019 +0300
Added gitlab CI
Supported targets:
- debian amd64
- debian aarch64
- windows 32 bit
- windows 64 bit
- macos 64bit
The tests are ran on all supported targets (via wine on windows).
The release jobs are only available on master/stable branches in
videolan/x264 repository, and must be ran manually when a developer
wishes to upload the artifacts.
commit 5493be84cdccecee613236c31b1e3227681ce428 [revision 2970]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Mar 14 14:31:22 2019 +0100
Fix warning in autocomplete.c when compiled with lavf
commit d4099dd4c722f52c4f3c14575d7d39eb8fadb97f [revision 2969]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Jun 6 02:30:41 2017 +0300
Remove compatibility workarounds
This will break decoding with older versions of FFmpeg/Libav.
commit 120ed3afe4bdef3f7f0ac2768e57da0d935e7536 [revision 2968]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Nov 9 18:37:17 2018 +0300
Remove h->rc dereferencing where possible
commit 3e5aed95cc470f37e2db3e6506a8deb89b527720 [revision 2967]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Feb 16 21:02:01 2019 +0100
x86inc: Add support for GFNI instructions
commit d3fa8b972557bad64c2e0247b0b5276c2d49961b [revision 2966]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Feb 16 17:57:21 2019 +0100
x86inc: Improve warnings for use of unsupported instructions
Warn when the following are used without the appropriate cpuflag:
* YMM and ZMM registers
* 'pextrw' with a memory operand
* GPR instruction set extensions
commit 101bd27d89cc84c18845046c13a67ab39e443a25 [revision 2965]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jan 31 20:42:32 2019 +0100
x86inc: Support N_PEXT bit on Mach-O
Allows for marking symbols as having limited global scope, similar to
using 'hidden' symbol visibility on ELF.
commit 6f85b3c4961810427cc4e8f520e0b706a321114d [revision 2964]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jan 31 20:21:43 2019 +0100
x86inc: Make 'non-adjacent' default in the TAIL_CALL macro
commit 82721eae6edddf4955634adc51bf6eb228cc1313 [revision 2963]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jan 31 20:17:56 2019 +0100
x86inc: Add x86-32 PIC support macros
commit b7e9935c3f08055a67a0fdea498499c675d00054 [revision 2962]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jan 31 20:11:01 2019 +0100
x86inc: Turn 'movsxd' into 'movifnidn' on x86-32
commit ec1d32302d0f1f59d3882e0289126b8d897c9f57 [revision 2961]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jan 31 20:08:40 2019 +0100
Bump dates to 2019
commit 74c051f2c4945cf2a279e36051537a2a1897c120 [revision 2960]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jul 1 20:34:48 2018 +0200
cli: Bash autocomplete support
Allows for automatic command line completion for both options and values.
Options such as --input-csp and --input-fmt will dynamically retrieve
supported values from libavformat when compiled with lavf support.
Execute 'source tools/bash-autocomplete.sh' in bash to enable.
commit 92d36908cbafd2a6edf7e61d69f341027b57f6f8 [revision 2959]
Author: Yusuke Nakamura <muken.the.vfrmaniac@gmail.com>
Date: Mon Apr 9 11:01:28 2018 +0900
Signal Progressive and Constrained profiles
Progressive High, Constrained High, and Progressive High 10.
Even in Main profile, constraint_set4_flag is now set to 1 if progressive,
and constraint_set5_flag is set to 1 if no B-slices are present.
commit 57baac4ed7fe213a2c2bb07924c6c7cee8ac25f9 [revision 2958]
Author: Alexandra Hájková <alexandra.khirnova@gmail.com>
Date: Sat Sep 8 07:15:53 2018 +0000
ppc: Use xxpermdi in sad_x3/x4 and use macros to avoid redundant code
commit de380f4aed75b0a9bf5bdfc298a9901646184375 [revision 2957]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Thu Sep 6 12:25:14 2018 +0200
ppc: Use the vec_xst_len for partial stores in mc
Around a ~1% speedup to the overall encoding for --slow.
commit 69dfb2896cf3180fd59233b124b5589f12fb6a94 [revision 2956]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Thu Sep 6 12:25:13 2018 +0200
ppc: Use vec_splats in mc
No overall speedup, just tidier code.
commit 40688108dd13fc0bf1847a6dfc1cf86a728654fb [revision 2955]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Thu Aug 23 08:30:37 2018 +0000
ppc: Use the vec_xst_len for partial stores
Seems to give about a 1-2% overall speedup on --slow.
commit 0d111333bbd65b1a76b5c646abf802f45dd41e96 [revision 2954]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Sun Aug 19 17:27:55 2018 +0200
ppc: Use xxpermdi in VEC_STORE8
Around a ~2% speedup to the overall encoding for --slow.
commit 18262ee37fedeb4d7b30d9a228f2f38ef0e13cc1 [revision 2953]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Sun Aug 19 17:27:54 2018 +0200
ppc: Use a single store to write the scores for sad_x4_8x8
Yet another use of xxpermdi, another 10% gain.
commit 28fb2661161c12ee20c29d9bb2a75509a5af5327 [revision 2952]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Sun Aug 19 17:27:53 2018 +0200
ppc: Use xxpermdi to halve the computation in sad_x4_8x8
About 20% faster.
commit 83acefef8990302caf962c77e5a8189bb620ca6f [revision 2951]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Sun Aug 19 09:28:42 2018 +0200
ppc: Rework satd_4* likewise
Now 4x4 is as slow as C and 4x8 is a 2% faster than before.
commit e0d846a63313e2a3d71faa703238b70385f6a5e4 [revision 2950]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Sun Aug 19 09:28:41 2018 +0200
ppc: Factor out the sum of absolute
And use it on the other satd > 8.
5-10% faster depending on the size.
commit 6e74eb5af2f28ab30d2c28a86f921b56e94f04f7 [revision 2949]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Sun Aug 19 09:28:40 2018 +0200
ppc: Rework the adds in satd8x8
10% faster.
commit 4dd83955b282e722fbeb3f4ee5cc05a45dc54c7f [revision 2948]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Fri Aug 17 22:28:45 2018 +0200
ppc: Add quant_4x4x4
4x faster than C.
commit 8f6ac77f325c70631359e5f173e76b41e3fb55d9 [revision 2947]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Fri Aug 17 22:28:44 2018 +0200
ppc: Cleanup quant
commit 275ef5332dffec445a0c5a78dbc00c3e0766011d [revision 2946]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Aug 12 17:00:13 2018 +0200
x86: Always use PIC in x86-64 asm
Most x86-64 operating systems nowadays doesn't even allow .text relocations
in object files any more, and there is no measurable overall performance
difference from using RIP-relative addressing in x264 asm.
Enforcing PIC reduces complexity and simplifies testing.
commit 72db437770fd1ce3961f624dd57a8e75ff65ae0b [revision 2945]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Feb 23 20:15:33 2019 +0100
x86: Fix integer overflow in intra_sa8d_x3_8x8_sse2
commit 88943afa4ee6565370e0e7cdc475b3b2283ada4b [revision 2944]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Nov 9 18:13:34 2018 +0300
Check that mbtree settings are consistent between passes
Also check that CQP mode is not used with 2-pass.
commit 6d8af5f0e390bbcd31a65dda04ef27d3f93821c1 [revision 2943]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Feb 4 22:04:56 2019 +0300
Mark frame_size_estimated as volatile
Ensures that access is atomic and that other threads sees the actual
value of the variable.
commit a6327f8a25b72f5edd3515aca82190046d18745b [revision 2942]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Feb 4 21:46:12 2019 +0300
Fix data race detected by ThreadSanitizer
Bug report by Daniel Deptford.
commit 6172da4d77a574c831ed4710a10d945ea128528e [revision 2941]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Dec 24 19:37:45 2018 +0300
Fix XAVC with sliced-threads
commit c7ec24cfbdf720dbf0806046cb5fb9302b941ec9 [revision 2940]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Dec 21 18:54:56 2018 +0300
Fix XAVC slice pattern
commit 6aa4b5929d3ce92ab618e98c34ed6e0948b06bbf [revision 2939]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Oct 21 14:28:59 2018 +0200
Eliminate the use of strtok()
Also fix the string parsing in param_apply_tune() to correctly compare
the entire string, not just the first N characters.
commit d6af823959dc06f061e0a7b038dab83d9c1c9ea3 [revision 2938]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Nov 8 22:01:54 2018 +0300
configure: Fix log2f misdetection on some systems
Bug report by Dirk Fieldhouse.
commit b763e338e0cec4dae13c4fc2fc49c63ac6f26df1 [revision 2937]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Nov 8 21:53:17 2018 +0300
Fix ultrafast preset speed regression
--trellis 0 was missed for it during 8-bit and 10-bit unification.
Bug report by Aleksey Vasenev.
commit b048e2658ad6aec55deceb0561db5796cdb64bd2 [revision 2936]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Wed Oct 10 19:41:08 2018 +0300
Fix --crop-rect top offset with --interlaced or --fake-interlaced
Bug report by Koby Shina.
commit 545de2ffec6ae9a80738de1b2c8cf820249a2530 [revision 2935]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Sep 23 20:47:44 2018 +0300
Fix possible double transpose of custom CQM if --level is not set
Bug reported by Nicolas Gaullier
commit b63c73dc5c37e5405bf032c9113c1daced3e45a4 [revision 2934]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Aug 7 22:42:22 2018 +0200
cli: Fix linking with --system-libx264 on x86
commit fb17a6b5b51d02020fb0cadea2b27c7803e734ba [revision 2933]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Aug 21 15:11:21 2018 +0300
Fix CAVLC+RDO in 4:4:4
commit 303c484ec828ed0d8bfe743500e70314d026c3bd [revision 2932]
Author: Alexandra Hájková <alexandra.khirnova@gmail.com>
Date: Wed Jul 11 19:28:20 2018 +0000
ppc: Optimize quant functions
1) using xxpermdi + merge instead of 2 merges improves quant_8x8
performance by 5%
2) use vec_splats instead of vec_splat
checkasm timings when compiled with gcc:
C: AltiVec:
before: after:
quant_2x2_dc: 57 163 46
quant_4x4_dc: 141 162 57
dequant_4x4_cmp: 104 101 45
dequant_4x4_flat: 104 106 46
dequant_8x8_cmp: 412 208 147
dequant_8x8_flat: 414 212 149
commit 44f1671369b54734db1775fe5155f17041344d8f [revision 2931]
Author: Alexandra Hajkova <alexandra.khirnova@gmail.com>
Date: Sun Jul 8 13:04:43 2018 -0500
ppc: Add support for Power9-only vec_absd
Increases overall encoding speed on POWER9 by 8%.
commit f8afe3820c84798e9e50623cf7349bdb98765926 [revision 2930]
Author: Alexandra Hájková <alexandra.khirnova@gmail.com>
Date: Fri Jun 29 16:50:20 2018 +0000
ppc: Optimize sub8x8_dct_dc
commit 411c957d82d357250f3a3099727b1a2c84caaee9 [revision 2929]
Author: Alexandra Hájková <alexandra.khirnova@gmail.com>
Date: Thu Jun 21 18:36:32 2018 +0000
ppc: AltiVec add16x16_idct_dc
commit 53fe16e51349c43c483e81afb1f08a39f843a234 [revision 2928]
Author: Alexandra Hájková <alexandra.khirnova@gmail.com>
Date: Sat Jun 23 14:58:17 2018 +0000
ppc: Optimize add8x8_idct_dc
commit 62dcebbce2c3f34998aeb2ea76b89f51306e78e9 [revision 2927]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Thu Jul 12 10:41:22 2018 +0200
ppc: Add compatibility macros for vec_xxpermdi
commit d1a53926fb90e9f4a4f1605f4b2a8a945a73e1d2 [revision 2926]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Jun 25 00:09:51 2018 +0200
Prefer a monotonic clock source if available
commit 1d18f0e025e994b93233b8e8afa0c691bccc8fda [revision 2925]
Author: Kieran Kunhya <kierank@obe.tv>
Date: Wed Aug 30 16:05:41 2017 +0100
Add Sony XAVC, a flavour of AVC-Intra
commit bc136ec6a0f863c42686a3bc9fa4c7820f83d413 [revision 2924]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Jul 2 20:20:03 2018 +0300
Cosmetics: Fix indentation for multiline function prototypes
It was broken in "Drop the x264 prefix" patch.
commit 6dd1d3b5d9e16a5951ececb7351cd63f02b36435 [revision 2923]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Apr 16 23:54:43 2018 +0300
Cosmetics: Use consistent "inline" attribute position
Place it immediately after "static".
commit 3d9ec58f27f1cd6732484246aaad59158b98af47 [revision 2922]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jan 25 22:17:57 2018 +0100
x86: AVX-512 plane_copy and plane_copy_swap
Avoid the scalar C wrapper by utilizing opmasks to prevent overreading the
input buffer.
commit 698c5a32e63a3ed6b976ed196abe479efd78530b [revision 2921]
Author: Emanuele Ruffaldi <emanuele.ruffaldi@gmail.com>
Date: Sat Jan 6 02:34:39 2018 +0100
4:0:0 (monochrome) encoding support
Virtually zero increase in compression efficiency compared to 4:2:0 with empty
chroma planes. Performance is better though, especially with fast settings.
commit 814e61e88c809bb00d17c200a04e9c7d42a19bb5 [revision 2920]
Author: Diego Biurrun <diego@biurrun.de>
Date: Sun Feb 5 09:02:43 2017 +0100
Makefile improvements
* Coalesce some install recipe lines
* Remove empty addition of GPLed filters
* Install libdir in recipes that directly require it
* Coalesce etags/TAGS rules
* Simplify fprofiled rule
commit 28e4879842a86cc6bb63db0f5f386a3e9268fd46 [revision 2919]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Apr 22 22:49:15 2018 +0200
x86inc: Improve SAVE/LOAD_MM_PERMUTATION macros
Use register numbers instead of copying the full register names. This makes it
possible to change register widths in the middle of a function and keep the
mmreg permutations intact which can be useful for code that only needs larger
vectors for parts of the function in combination with macros etc.
Also change the LOAD_MM_PERMUTATION macro to use the same default name as the
SAVE macro. This simplifies swapping from ymm to xmm registers or vice versa:
SAVE_MM_PERMUTATION
INIT_XMM <cpuflags>
LOAD_MM_PERMUTATION
commit 8badb910847e94abb66686009e424bdce355c9f4 [revision 2918]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Mar 31 13:49:56 2018 +0200
x86inc: Optimize VEX instruction encoding
Most VEX-encoded instructions require an additional byte to encode when src2
is a high register (e.g. x|ymm8..15). If the instruction is commutative we
can swap src1 and src2 when doing so reduces the instruction length, e.g.
vpaddw xmm0, xmm0, xmm8 -> vpaddw xmm0, xmm8, xmm0
commit 0a84d986e7020f8344f00752e3600b9769cc1e85 [revision 2917]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Mar 31 01:16:06 2018 +0200
x86inc: Fix VEX -> EVEX instruction conversion
There's an edge case that wasn't properly handled.
commit 9d33c8fefbb506377b943aba11cd99c74258c5de [revision 2916]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Jul 31 22:54:33 2018 +0300
configure: Fix required version checks for lavf and swscale
commit 34843deb060248514ecd9edd88d72c2c2d6b906a [revision 2915]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Jul 20 08:37:43 2018 +0300
Fix float division by zero in weightp analysis
commit 1c3174775c6c1789aaf10172e4cb619f91ecff4a [revision 2914]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Wed Jul 18 21:56:33 2018 +0300
Fix undefined behavior of left shift for CAVLC encoding
commit a0253ebee0f4d854cf89934b5f420275862d0b5b [revision 2913]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Jul 2 20:59:16 2018 +0300
Fix integer overflow in slicetype_path_cost
The path cost for high resolutions can exceed COST_MAX.
commit 2af2742821f0b08a4295055b41875e660d5a7746 [revision 2912]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Jun 29 13:14:01 2018 +0200
cli: Fix preset help listing
It was previously incorrect when --chroma-format or --bit-depth was
specified in configure.
commit f5d929ab8faf2319dda10836f51803ba25f0ad07 [revision 2911]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Sat Jun 23 13:14:28 2018 +0200
ppc: Fix zigzag_interleave
The permv array has 3 elements
commit 7737e6ad4acf1058aeb0f9802e2a3ca1e0a30d29 [revision 2910]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Jun 2 20:35:10 2018 +0200
Fix clang stack alignment issues
Clang emits aligned AVX stores for things like zeroing stack-allocated
variables when using -mavx even with -fno-tree-vectorize set which can
result in crashes if this occurs before we've realigned the stack.
Previously we only ensured that the stack was realigned before calling
assembly functions that accesses stack-allocated buffers but this is
not sufficient. Fix the issue by changing the stack realignment to
instead occur immediately in all CLI, API and thread entry points.
commit 26b99cce1f03f023dee98bef2ec3cd2eff319f8e [revision 2909]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 1 20:49:29 2018 +0300
Fix missing bs_flush in AUD writing
commit da6b29b553bb56e16e99527733849735c2ea264c [revision 2908]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 1 20:39:30 2018 +0300
Fix possible undefined behavior of right shift
32-bit shifts are only defined for values in the range 0-31.
commit 7e457290cdd6da592ae63aa25facc47cd09d2128 [revision 2907]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 1 20:34:18 2018 +0300
Make bs_align_10 imply bs_flush
Now behaves the same as bs_align_0 and bs_align_1.
commit 6afb67c6d7b71fcc6fc14d167f1fcf55623846f4 [revision 2906]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 1 17:52:47 2018 +0300
Fix theoretically incorrect cost_mv_fpel free
commit 57dd6274e2da70bdb8220bc159976e3ac2aea017 [revision 2905]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 1 17:42:46 2018 +0300
configure: Fix ambiguous "$(("
commit 0e6425e03e28213e73ae770df5e08fffba72d290 [revision 2904]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Feb 19 19:53:38 2018 +0300
Fix --qpmax default value in fullhelp
commit 5f7f950c80e330728ecb07bc133e17456870121a [revision 2903]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Mar 31 01:31:57 2018 +0200
x86: Correctly use v-prefix for instructions with opmasks
This was always required, but accidentally happened to work correctly
in a few cases.
commit 3d90057e15abf257320c89bb7146fb0c92687fa6 [revision 2902]
Author: Martin Storsjö <martin@martin.st>
Date: Sat Mar 31 00:10:14 2018 +0300
configure: Only use gas-preprocessor with armasm for compiler=CL
This picks the right assembler automatically for arm and aarch64
llvm-mingw targets.
This doesn't get the right assembler for clang setups when clang
acts like MSVC and uses MSVC headers though (where it perhaps
should use armasm as before), but that's probably an even more
obscure setup.
commit 7d0ff22e8c96de126be9d3de4952edd6d1b75a8c [revision 2901]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Wed Jan 17 22:03:06 2018 +0300
Remove ARRAY_SIZE macro which is identical to ARRAY_ELEMS
commit 4a158b00943c334ec9e0aabe6a919900c32e360e [revision 2900]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Jan 6 17:47:42 2018 +0100
x86inc: Correctly set mmreg variables
commit 40b47eed1338cef1ac66c98b4e393dfcf5d998ae [revision 2899]
Author: Diego Biurrun <diego@biurrun.de>
Date: Sun Feb 5 09:02:49 2017 +0100
.gitignore: Ignore TAGS file
commit 6fce82284a0fb3edfa299b904b1559452a3b1094 [revision 2898]
Author: Diego Biurrun <diego@biurrun.de>
Date: Sun Feb 5 09:02:51 2017 +0100
Minor configure improvements
* Drop empty addition of GPLed filters
* Replace backticks with $()
commit ca5408b13cf0e58a7505051861f20a63a7a6aec1 [revision 2897]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Jan 1 15:05:48 2018 +0100
Bump dates to 2018
commit b019515ef4ad77022b849283c62612157e8458a7 [revision 2896]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Jan 16 17:43:24 2018 +0100
Merge zero buffers
Improves cache efficiency.
commit d75b93b0e82cefa93e5db2d6b0be475566101431 [revision 2895]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Wed Jan 17 18:19:44 2018 +0300
rdo: Use ALIGNED_ARRAY for stack arrays
commit 9384a7389b251b59a079ccc3d1af9edd42e3d5e6 [revision 2894]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Jan 15 21:42:59 2018 +0100
Correctly align buffers for AVX and AVX-512
Fixes segfaults on Windows where the stack is only 16-byte aligned.
commit b00bcafe53a166b63a179a2f41470cd13b59f927 [revision 2893]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Dec 24 22:59:09 2017 +0300
Cosmetics
commit 7c6b3ad50d9210d27be6953dfa6d24e5d183be18 [revision 2892]
Author: Alexandra Hájková <alexandra.khirnova@gmail.com>
Date: Sun May 21 17:40:45 2017 +0000
ppc: Add load_deinterleave_chroma_fenc_altivec
5x speed up vs C code.
commit b461e015fd7efe3bb740ef0716bc41d76eff30c9 [revision 2891]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Oct 26 13:09:46 2017 +0300
Update to the latest upstream version of gas-preprocessor
This version supports converting aarch64 assembly for MS armasm64.exe.
commit 61e8b5cc482b08d51e18b336081073736d963e7e [revision 2890]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Oct 22 09:59:28 2017 +0200
input: Add a workaround for swscale overread bugs
swscale can read past the end of the input buffer, which may result in
crashes if such a read crosses a page boundary into an invalid page.
Work around this by adding some padding space at the end of the buffer when
using memory-mapped input frames. This may sometimes require copying the
last frame into a new buffer on Windows since the Microsoft memory-mapping
implementation has very limited capabilities compared to POSIX systems.
commit 1221f097473a049a52fbb47aff2733321bd4661a [revision 2889]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Oct 22 10:50:46 2017 +0200
filters/resize: Upgrade to a newer libavutil API
Use the AVComponentDescriptor depth field instead of depth_minus1.
commit 99ca611d2e667553e026f487dc787b595dde84c5 [revision 2888]
Author: Martin Storsjö <martin@martin.st>
Date: Wed Oct 18 10:40:02 2017 +0300
aarch64: Use ldurb/sturb for loads/stores with negative offsets
The assembler (both gas and clang/llvm) automatically fixes this,
armasm64 doesn't. We can fix it in gas-preprocessor, but we should
also be using the right instruction form.
commit f745815e593b788d846182c8d42eed4f72f7c33c [revision 2887]
Author: Martin Storsjö <martin@martin.st>
Date: Mon Oct 16 22:50:27 2017 +0300
configure: Add support for building with MSVC/armasm for ARM64
commit 7b13b31be60ed65bee615bab28c422e2df027ee1 [revision 2886]
Author: Martin Storsjö <martin@martin.st>
Date: Mon Oct 16 22:50:26 2017 +0300
arm: Check for __ELF__ instead of !__APPLE__, for using .arch/.fpu
For windows, when building with armasm, we already filtered these out
with gas-preprocessor.
By filtering them out already in the source, we can also build directly
with clang for windows (which also require wrapping the assembler in
gas-preprocessor for converting instructions to thumb form, but
gas-preprocessor doesn't and shouldn't filter out them in the clang
configuration).
commit 12ca9a69e855c4d4b9000894f478bce665e4e02c [revision 2885]
Author: Martin Storsjö <martin@martin.st>
Date: Mon Oct 16 22:50:25 2017 +0300
aarch64: Don't .set a symbol named st2
This confuses gas-preprocessor, which tries to replace actual
st2 instructions by the integer 1 or 2.
commit 06c8f6bab0fc8fa9b2df9a1af5d10c87c515edb4 [revision 2884]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Oct 14 14:11:26 2017 +0200
Shrink the i4x4_mode cost_table array
Only 17 elements are actually used. It was originally padded to 64 bytes to
avoid cache line splits in the x86 assembly, but those haven't really been
an issue on x86 CPU:s made in the past decade or so.
Benchmarking shows no performance impact from dropping the padding, so
might as well remove it and save some cache.
commit 344699fd386890ac1cf80a70a68a3ae16767ed62 [revision 2883]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Oct 11 18:02:26 2017 +0200
x86: Remove some legacy CPU detection hacks
Some ancient Pentium-M and Core 1 CPU:s had slow SSE units, and using MMX
was preferable. Nowadays many assembly functions in x264 completely lack MMX
implementations and falling back to C code will likely make things worse.
Some misconfigured virtualized systems could sometimes also trigger this code
path and cause assertions.
commit 0fe75403d7b40c0209c3df992632956292065cdc [revision 2882]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Oct 11 17:58:36 2017 +0200
lavf: Upgrade to the new core decoding API
commit dae7f18d2cc5c7eccfb73649cda458e3c8e2256e [revision 2881]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Mon Oct 9 12:04:22 2017 -0400
lavf: Upgrade to some newer API:s
* Use the codec parameters API instead of the AVStream codec field.
* Use av_packet_unref() instead of av_free_packet().
* Use the AVFrame pts field instead of pkt_pts.
commit 12611ec99bb52f4f2c1b114138d867b3a2aa182b [revision 2880]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Oct 8 21:41:16 2017 +0200
x86: AVX-512 load_deinterleave_chroma_fdec
commit d93851ec282eb069f91a6eddab3284f7766cd5bd [revision 2879]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Oct 8 21:23:12 2017 +0200
x86: AVX-512 load_deinterleave_chroma_fenc
commit 5b62ab59be01579ab37033cc86527df922efb843 [revision 2878]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Oct 7 12:06:51 2017 +0200
x86: AVX-512 mbtree_fix8_pack and mbtree_fix8_unpack
Takes advantage of opmasks to avoid having to use scalar code for the tail.
Also make some slight improvements to the checkasm test.
commit 08476ab1c0a9b741198677731373b173657fa079 [revision 2877]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Oct 7 11:34:16 2017 +0200
x86: Faster mbtree_fix8_unpack
Use a different multiplier in order to eliminate some shifts.
About 25% faster than before.
commit e3fae10bf7db9571d5c69ad910f10df625bad73e [revision 2876]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Sep 22 17:28:18 2017 +0300
Don't force fast-intra for subme < 3
It have caused significant quality hit without any meaningful (if any) speed up.
commit bdf27e783a8eb4a5bcae0cd0a950d6dc3d995bfe [revision 2875]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Sep 22 17:18:55 2017 +0300
Make ref and i4x4_mode costs global instead of static
Fixes some thread safety doubts and makes code cleaner.
Downside: slightly higher memory usage when calling multiple encoders from the same application.
commit fefc3fa1fa98a7bac4eaf3c8e6e1c52b7e427ddd [revision 2874]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Sep 22 17:05:06 2017 +0300
Fix thread safety of x264_threading_init() and use of X264_PTHREAD_MUTEX_INITIALIZER with win32thread
commit 694d031c1d120a8b578f60eeccf14fcf9ca4200e [revision 2873]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Sep 22 16:59:13 2017 +0300
configure: Improvements
Log result of pkg-config checks to config.log.
Fix lavf support detection for pkg-config fallback case.
Fix detection of linking dependencies errors for lavf/lsmash/gpac.
Cosmetics.
commit 5d4031618e9feedcb527fd4e5a91bc06e30b70b4 [revision 2872]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Aug 17 23:51:14 2017 +0300
flv: Fix one frame video total duration
commit 8b9c89d331f5a2d6335ff9b08abc8d5c94428731 [revision 2871]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Aug 17 23:46:23 2017 +0300
flv: Split FrameType and CodecID values
commit 95cdb743463f723cea58c8ae01d7762f7ae9965c [revision 2870]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Tue Aug 8 15:40:45 2017 +0200
Support writing the alternative transfer SEI message
commit c98d02bebd6dd04b61306ee27712aeff96f19f29 [revision 2869]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Tue Aug 8 14:56:43 2017 +0200
Support 04/2017 color matrix and transfer values
commit 71ed44c7312438fac7c5c5301e45522e57127db4 [revision 2868]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Fri Jan 6 15:23:38 2017 +0100
Unify 8-bit and 10-bit CLI and libraries
Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
option to set the bit depth at runtime.
Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
incorrect value, it's preferable to induce a linking failure. If applications
relies on this symbol this will make it more obvious where the problem is.
Add Makefile rules that compiles modules with different bit depths. Assembly
on x86 is prefixed with the 'private_prefix' define, while all other archs
modify their function prefix internally.
Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
assembly, PowerPC assembly, and MIPS assembly.
The depth and cache CLI filters heavily depend on bit depth size, so they
need to be duplicated for each value. This means having to rename these
filters, and adjust the callers to use the right version.
Unfortunately the threaded input CLI module inherits a common.h dependency
(input/frame -> common/threadpool -> common/frame -> common/common) which
is extremely complicated to address in a sensible way. Instead duplicate
the module and select the appropriate one at run time.
Each bitdepth needs different checkasm compilation rules, so split the main
checkasm target into two executables.
commit 2451a7282463f68e532f2eee090a70ab139bb3e7 [revision 2867]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Fri Jan 6 17:50:40 2017 +0100
Change default QP parameters initialization
qp is modified to require a valid value before use, while qp_max is set
to maximum allowable value (and clipped later on).
This is needed so that param functions do not depend on bit depth size.
commit 7839a9e1f03b49e3e0cbfcb3091093af7c6d54ee [revision 2866]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Tue Jan 17 17:07:42 2017 +0100
aarch64: Set the function symbol prefix in a single location
commit 498cca0b74ab90c363b761083c7fdcf56fc60904 [revision 2865]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Tue Jan 17 17:04:19 2017 +0100
arm: Set the function symbol prefix in a single location
commit 8f2437d33301faaf0e2fcaff16e2b01e9bbe27ae [revision 2864]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Fri Jan 27 11:58:33 2017 +0100
Drop the x264 prefix from static functions and variables
commit 4e2ed4087ac1621f946b83366e1f53a1326d7424 [revision 2863]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Aug 17 23:25:31 2017 +0300
configure: Check for strtok_r compiler support
commit d1eebb2927da15c41c7c180d398b0cdad3d1f396 [revision 2862]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Aug 6 17:17:55 2017 +0200
cabac: Make the cabac_contexts array static
Also drop the x264 prefix from all static cabac arrays.
commit 3f9f6554a4cfa4189855756860a61ceb2f2a41a3 [revision 2861]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Aug 17 18:04:13 2017 +0200
x86: AVX-512 pixel_satd_x3 and pixel_satd_x4
commit dd399ab862e2271e869bc8aefcb3166180ecdb10 [revision 2860]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Aug 14 23:13:44 2017 +0200
x86: Shrink the x86-64 cabac coeff_last tables
Use dword instead of qword entries. Cuts the size of the tables in half
which allows each table fit inside a single cache line.
When PIC is disabled dwords are enough to store absolute addresses.
When PIC is enabled we can store dword offsets relative to the start of
the table and simply add the address of the table to the offset in order
to calculate the full address. This approach also have the advantage of
eliminating a whole bunch of run-time .data relocations.
commit d463a92e3b6f8ec04d54cc6c437892f9ffa98e29 [revision 2859]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Aug 16 15:59:16 2017 +0200
x86inc: Support creating global symbols from local labels
On ELF platforms such symbols needs to be flagged as functions with the
correct visibility to please certain linkers in some scenarios.
commit 67b5c961046621a4554a9577e68cd9e31a212091 [revision 2858]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Aug 15 16:11:32 2017 +0200
x86inc: Use .rdata instead of .rodata on Windows
The standard section for read-only data on Windows is .rdata. Nasm will
flag non-standard sections as executable by default which isn't ideal.
commit f15d366510cc60d9d9b2aeb576cade5b94509f37 [revision 2857]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Aug 5 00:43:26 2017 +0200
x86inc: Set the correct cpuflag for AES-NI instructions
commit 1ae63361304e952ac625a7016f2cf4a64e39a314 [revision 2856]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Aug 5 00:09:52 2017 +0200
x86inc: Enable AVX emulation for floating-point pseudo-instructions
There are 32 pseudo-instructions for each floating-point comparison
instruction, but only 8 of them are actually valid in legacy-encoded mode.
The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions
and can therefore be disregarded for this purpose.
commit 1e27313c12154dd3922ef7ab9508a4320e83c2ac [revision 2855]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Aug 4 23:09:00 2017 +0200
configure: Increase x86 stack alignment on clang
commit e9a5903edf8ca59ef20e6f4894c196f135af735e [revision 2854]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Oct 22 20:18:39 2017 +0300
x86: Fix stack alignment for x264_cabac_encode_ue_bypass call
Fix MSVS fprofiled build for win64
commit 45e6eb6006d1d23b6f69a1cfb62a86dc67092a81 [revision 2853]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Oct 22 16:18:29 2017 +0300
mips: Fix incorrect pointers to msa optimized functions
commit 09705c0b68232a05da8cc672c7c6092071eb4a21 [revision 2852]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Aug 11 16:41:31 2017 +0200
Fix cpu capabilities listing on older x86 operating systems
Some cpuflags would previously be displayed incorrectly when running older
operating systems without AVX support on modern CPU:s.
commit ba24899b0bf23345921da022f7a51e0c57dbe73d [revision 2851]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Jun 24 15:12:57 2017 +0200
x86: AVX-512 pixel_avg_weight_w8
commit d3214e6b102701911fc9d5fc92435e79e8b49100 [revision 2850]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Jun 24 14:26:25 2017 +0200
x86: AVX-512 pixel_avg_weight_w16
commit 1d9dee2e9be717fcde416854f902db776312f141 [revision 2849]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jun 22 19:51:28 2017 +0200
x86: AVX-512 sub8x16_dct_dc
commit f672795407bf90045e399eb057e5b2426d79f961 [revision 2848]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jun 22 11:26:21 2017 +0200
x86: AVX-512 sub8x8_dct_dc
commit 0af1c6d0d0cc54ba4f888db39247774edcf19b44 [revision 2847]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jun 1 22:13:19 2017 +0200
x86: AVX-512 add8x8_idct
commit 9034085265e5ca56e801c3efbf5c538fcc17c82b [revision 2846]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Jun 10 16:01:53 2017 +0200
x86: AVX-512 sub16x16_dct
commit 774c6c76d081305d9c891091e1d4694acb3f8a68 [revision 2845]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Jun 7 16:55:48 2017 +0200
x86: AVX-512 sub8x8_dct
commit 2d653411c2135377fb8c956e897880ff997b50ec [revision 2844]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jun 8 21:14:08 2017 +0200
x86: AVX-512 sub4x4_dct
commit 07483f72d7e1a4f7079a429dd1370f4221006862 [revision 2843]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun May 28 16:12:33 2017 +0200
x86: AVX-512 mbtree_propagate_list
Uses gathers and scatters in combination with conflict detections to
vectorize the scalar part.
Also improve the checkasm test to try different mb_y values and check
for out-of-bounds writes.
commit 1a88481b85da964aadae1e05347342b03be17712 [revision 2842]
Author: James Darnley <jdarnley@obe.tv>
Date: Fri Jun 9 14:08:16 2017 +0200
x86inc: Add aesni cpuflag define
Upstreaming this from FFmpeg. Unused in x264.
commit 98e9543b4c39360326e6d5bf266c0c634cb9ee2e [revision 2841]
Author: Martin Storsjö <martin@martin.st>
Date: Mon May 29 12:13:03 2017 +0300
aarch64: Update the var2 functions to the new signature
The existing functions could easily be used by just calling them
twice - this would give the following cycle numbers from checkasm:
var2_8x8_c: 4110
var2_8x8_neon: 1505
var2_8x16_c: 8019
var2_8x16_neon: 2545
However, by merging both passes into the same function, we get the
following speedup:
var2_8x8_neon: 1205
var2_8x16_neon: 2327
commit 824802ad5a877244fb9eb48a892ed348736af5b0 [revision 2840]
Author: Martin Storsjö <martin@martin.st>
Date: Mon May 29 12:13:02 2017 +0300
arm: Update the var2 functions to the new signature
The existing functions could easily be used by just calling them
twice - this would give the following cycle numbers from checkasm:
Cortex A7 A8 A9 A53
var2_8x8_c: 7302 5342 5050 4400
var2_8x8_neon: 2645 1612 1932 1715
var2_8x16_c: 14300 10528 10020 8637
var2_8x16_neon: 5127 2695 3217 2651
However, by merging both passes into the same function, we get the
following speedup:
var2_8x8_neon: 2312 1190 1389 1300
var2_8x16_neon: 4862 2130 2293 2422
commit 6f8aa71ce797be01fd2ebe53c072a6696ea19828 [revision 2839]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Feb 15 22:00:25 2017 +0100
Add support for levels 6, 6.1, and 6.2
These levels were added in the 2016-10 revision of the H.264 specification and
improves support for content with high resolutions and/or high frame rates.
Level 6.2 supports 8K resolution at 120 fps.
Also shrink the x264_levels array by using smaller data types.
commit 2baa28c880d11377115bbd5508e72053f6ba61f5 [revision 2838]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Mar 23 17:51:09 2017 +0100
Use a larger integer type for the slice_table array
Makes it possible to use slicing with resolutions larger than 2^24 pixels.
commit c9d2c1c80b25c6ae15c41b200ec44ac2dabce725 [revision 2837]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Feb 19 10:48:33 2017 +0100
analyse: Reduce the size the cost_mv arrays
Use a dynamic size depending on the MV range. Reduces memory consumption by
up to a few megabytes.
Drop a related old miscompilation check since it may otherwise cause an
out-of-bounds memory access.
Also remove an unused extern variable declaration.
commit d46a5a463f0de5ec479d256af72bba3de4ba2d1a [revision 2836]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Wed May 31 02:52:16 2017 +0300
Fix CABAC+8x8dct in 4:4:4
Use the correct ctxIdxInc calculation for coded_block_flag.
commit 79b36f27a57dd511eefead6d5422689220c767b5 [revision 2835]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Jun 6 02:07:21 2017 +0300
Fix 8x8dct in lossless encoding
Change V and H intra prediction in lossless (TransformBypassModeFlag == 1)
macroblocks to correctly adhere to the specification. Affects lossless
encoding with 8x8dct or mix of lossless with normal macroblocks.
8x8dct has already been disabled in lossless mode for some time due to
being out-of-spec but this will allow us to re-enable it again.
commit 68a550217c8d0fae6229c5b322b6810fe9652ef3 [revision 2834]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Jun 8 18:35:21 2017 +0300
mbtree: Fix buffer overflow
Could occur on the 1st pass in combination with --fake-interlaced and
some input heights due to allocating a too small buffer.
commit df79067c0cf33da712d344b5f8869be7eaf326f3 [revision 2833]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue May 23 16:40:26 2017 +0200
x86: Avoid self-relative expressions on macho64
Functions that uses self-relative expressions in the form of [foo-$$]
appears to cause issues on 64-bit Mach-O systems when assembled with nasm.
Temporarily disable those functions on macho64 for the time being until
we've figured out the root cause.
commit f1ac7122645bbeb56e7a4401f71a7055cb2431c4 [revision 2832]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon May 22 23:59:32 2017 +0300
configure: Don't try to detect clang by $CC
Only check if option -Werror=unknown-warning-option is supported before adding it
commit b4d811df4fd7dbb9220fe2c8f2a2c2a6ba2bbc87 [revision 2831]
Author: Martin Storsjö <martin@martin.st>
Date: Mon May 22 13:10:46 2017 +0300
checkasm: Use the right variable in a loop condition
Prior to this, this loop hasn't run at all. The condition has been
the same since it was introduced in 5b0cb86f.
This issue was pointed out by a clang warning.
commit a3d24462ae284bf03958f0ed41e824dd7d48e15e [revision 2830]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon May 22 22:02:34 2017 +0300
x86: Fix linking with 8-bit depth shared libx264
commit d1fe6fd1c0930d88da90f23f6d5fdb6ceaf6b0a9 [revision 2829]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon May 15 00:18:36 2017 +0200
x86: Only enable AVX-512 in 8-bit mode
commit 6151882671b6f9e1ceec2cdb76dd1123c8dc766f [revision 2828]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri May 12 00:43:43 2017 +0200
x86: AVX-512 cabac_block_residual
commit 4579616543f2e701ee9510f5eb57e31a3ef99e10 [revision 2827]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed May 10 18:36:59 2017 +0200
x86: AVX-512 pixel_sad_x3 and pixel_sad_x4
Covers all variants: 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, and 16x16.
commit 993eb2079e45619098241e14806fc70030968af6 [revision 2826]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun May 7 23:35:49 2017 +0200
x86: AVX-512 pixel_sad
Covers all variants: 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x8, and 16x16.
commit 2463174c119cef4f7e6a36a1151054fbb268b082 [revision 2825]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu May 4 21:53:28 2017 +0200
x86: AVX-512 decimate_score
Also drop the MMX versions and improve the SSE2, SSSE3 and AVX2 versions.
commit 49fb50a67cc41e4bed2dd66f7beed12797249cd9 [revision 2824]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon May 1 14:55:45 2017 +0200
x86: AVX-512 pixel_var2_8x8 and 8x16
commit 92c074e27f6bfccee42b41c183203b7b2763a94d [revision 2823]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon May 1 14:54:32 2017 +0200
Rework pixel_var2
The functions are only ever called with pointers to fenc and fdec and the
strides are always constant so there's no point in having them as parameters.
Cover both the U and V planes in a single function call. This is more
efficient with SIMD, especially with the wider vectors provided by AVX2 and
AVX-512, even when accounting for losing the possibility of early termination.
Drop the MMX and XOP implementations, update the rest of the x86 assembly
to match the new behavior. Also enable high bit-depth in the AVX2 version.
Comment out the ARM, AARCH64, and MIPS MSA assembly for now.
commit 4c48f9e751e969188d606eb15aeada7f652c9db9 [revision 2822]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Apr 29 14:26:40 2017 +0200
x86: AVX-512 pixel_var_8x8, 8x16, and 16x16
Make the SSE2, AVX, and AVX2 versions a bit faster.
Drop the MMX and XOP versions.
commit 1cf7baa462ca52de7f07d6e4c795853900bb50bb [revision 2821]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Apr 28 21:35:25 2017 +0200
x86: AVX-512 pixel_sa8d_8x8
commit 386050088a66aa66bcaebb9b6b4b0a2b6af76a73 [revision 2820]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Apr 13 23:56:04 2017 +0200
x86: AVX-512 pixel_satd
Covers all variants: 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x8, and 16x16.
commit 2eceefe89fea91bbc7d5af2a1b4a9047d8da7805 [revision 2819]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Apr 19 16:39:48 2017 +0200
x86: AVX-512 deblock_strength
Also drop the MMX version and make some slight improvements to the SSE2,
SSSE3, AVX, and AVX2 versions.
commit 3081ffa1c540d1df05123e0fab1937985573ac78 [revision 2818]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Apr 12 16:21:09 2017 +0200
x86: AVX-512 plane_copy_deinterleave_v210
commit 95dc64c4efdf16404e58be9ff9da4e0acaa1a4b2 [revision 2817]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Apr 9 20:34:28 2017 +0200
x86: AVX-512 memzero_aligned
Reorder some elements in the x264_t.mb.pic struct to reduce the amount
of padding required.
Also drop the MMX implementation in favor of SSE.
commit c0cd7650cb65164d183d8f77d0697b7569a52917 [revision 2816]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Apr 7 21:34:40 2017 +0200
x86: AVX and AVX-512 memcpy_aligned
Reorder some elements in the x264_mb_analysis_list_t struct to reduce the
amount of padding required.
Also drop the MMX implementation in favor of SSE.
commit f29fbc6fd23e9bc2d800eb1246e8fa19a203b831 [revision 2815]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Apr 6 16:06:34 2017 +0200
x86: AVX-512 dequant_8x8_flat16
commit 40aca29a164d5e5e6589d507bdcae6717d72f6bf [revision 2814]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Apr 4 20:54:12 2017 +0200
x86: AVX-512 dequant_8x8
commit 74f7802bb7bd301299f8229a0552a7caf2b55434 [revision 2813]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Apr 4 20:01:26 2017 +0200
x86: AVX-512 dequant_4x4
commit 3451ba3af49e58a720277615df3d8e4a4171986f [revision 2812]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Mar 28 22:59:56 2017 +0200
x86: AVX-512 mbtree_propagate_cost
Also make the AVX and AVX2 implementations slightly faster.
commit 75f6f9b228c3498b8c9b0d97fc925c0a7e6e6f43 [revision 2811]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Mar 27 18:19:53 2017 +0200
x86: AVX-512 coeff_last
commit c3a1d1d892a79bc460c7fc192b0bf7a32c2ce0b2 [revision 2810]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Mar 26 18:29:37 2017 +0200
x86: AVX-512 zigzag_interleave_8x8_cavlc
commit edb22f57ba03718c1cb9781ba005aec20a1e50e0 [revision 2809]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Mar 26 11:34:18 2017 +0200
x86: AVX-512 zigzag_scan_8x8_field
commit 77b9a818fc622d0cdaa96aeb37339fbd5b1ef857 [revision 2808]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Mar 25 22:13:22 2017 +0100
x86: AVX-512 zigzag_scan_4x4_field
commit 724a577237f27cdb0c0fd18ef8ed32d39430796b [revision 2807]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Mar 25 19:14:28 2017 +0100
x86: AVX-512 zigzag_scan_8x8_frame
The vperm* instructions ignores unused bits, so we can pack the permutation
indices together to save cache and just use a shift to get the right values.
commit 2b2f039512bde7c097280255c6376cf9a901e08e [revision 2806]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Mar 25 19:14:22 2017 +0100
x86: AVX-512 zigzag_scan_4x4_frame
commit 1878c7f2af0a9c73e291488209109782c428cfcf [revision 2805]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri May 12 00:03:10 2017 +0200
checkasm: x86: More accurate ymm/zmm measurements
YMM and ZMM registers on x86 are turned off to save power when they haven't
been used for some period of time. When they are used there will be a
"warmup" period during which performance will be reduced and inconsistent
which is problematic when trying to benchmark individual functions.
Periodically issue "dummy" instructions that uses those registers to
prevent them from being powered down. The end result is more consitent
benchmark results.
commit 472ce3648aea3ddc16b7716eb114f4bcdb8fea8f [revision 2804]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Mar 25 10:16:09 2017 +0100
x86: AVX-512 support
AVX-512 consists of a plethora of different extensions, but in order to keep
things a bit more manageable we group together the following extensions
under a single baseline cpu flag which should cover SKL-X and future CPUs:
* AVX-512 Foundation (F)
* AVX-512 Conflict Detection Instructions (CD)
* AVX-512 Byte and Word Instructions (BW)
* AVX-512 Doubleword and Quadword Instructions (DQ)
* AVX-512 Vector Length Extensions (VL)
On x86-64 AVX-512 provides 16 additional vector registers, prefer using
those over existing ones since it allows us to avoid using `vzeroupper`
unless more than 16 vector registers are required. They also happen to
be volatile on Windows which means that we don't need to save and restore
existing xmm register contents unless more than 22 vector registers are
required.
Also take the opportunity to drop X264_CPU_CMOV and X264_CPU_SLOW_CTZ while
we're breaking API by messing with the cpu flags since they weren't really
used for anything.
Big thanks to Intel for their support.
commit d2b5f4873e2147452a723b61b14f030b2ee760a5 [revision 2803]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Mar 18 18:50:36 2017 +0100
x86: Change assembler from yasm to nasm
This is required to support AVX-512.
Drop `-Worphan-labels` from ASFLAGS since it's enabled by default in nasm.
Also change alignmode from `k8` to `p6` since it's more similar to `amdnop`
in yasm, e.g. use long nops without excessive prefixes.
commit 8c2974255b01728d4eda2434cc1997c4a3ca5eff [revision 2802]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat May 6 12:26:56 2017 +0200
x86: Add some additional cpuflag relations
Simplifies writing assembly code that depends on available instructions.
LZCNT implies SSE2
BMI1 implies AVX+LZCNT
AVX2 implies BMI2
Skip printing LZCNT under CPU capabilities when BMI1 or BMI2 is available,
and don't print FMA4 when FMA3 is available.
commit 93bc2cbc66f0bf4616965dcd7e0eba89201c8086 [revision 2801]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Apr 14 16:16:49 2017 +0200
x86: Faster SSE2 pixel_sad_16x16 and 16x8
Also make the order of fenc/fdec arguments a bit more consistent.
commit 8ae2b62462176cd731a1cb8b5bdc9a38cba0fbe4 [revision 2800]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon May 15 00:40:52 2017 +0300
msvs/icl: Improve target host detection
commit 181a920ad5d0acdc3a08418c0e9c95be4785b814 [revision 2799]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Sat May 13 17:14:52 2017 +0000
ppc: Optimize add8x8_idct_dc
Increases speedup compared to C from 2x to 6x.
commit d0b905b901c5ee5989777cf437a7f20c1fa0a794 [revision 2798]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Feb 19 10:33:16 2017 +0100
analyse: Faster min/max MV clipping
Values only needs to be clipped in one direction.
commit 1bde30193eb91d1bc69b00a27e6874eb88ed4eab [revision 2797]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Feb 16 20:04:10 2017 +0100
slicetype_mb_cost: Clip MVs based on MV range
Improves cost calculations, especially when a short MV range is used.
commit dcf406978b9dda5c2b8aab80af5c1c47c78efd92 [revision 2796]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jan 29 21:38:43 2017 +0100
Support YUYV and UYVY packed 4:2:2 raw input
Packed YUV is arguably more common than planar YUV when dealing with raw
4:2:2 content.
We can utilize the existing plane_copy_deinterleave() functions with some
additional minor constraints (we cannot assume any particular alignment
or overread the input buffer).
Enables assembly optimizations on x86.
commit aaa9aa83a111ed6f1db253d5afa91c5fc844583f [revision 2795]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Apr 20 21:58:23 2017 +0200
x86: Utilize 3-arg instructions in AVX deblock
Avoids some redundant register-register moves.
commit a52d41c4d135c79373a86c3a82dcc2ec3f88b025 [revision 2794]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:46 2017 +0200
configure: Support targeting ARM with MSVC tools
Set up the right gas-preprocessor as assembler frontend in these cases,
using armasm as actual assembler.
Don't try to add the -mcpu -mfpu options in this case.
Check whether the compiler actually supports inline assembly.
Check for the ARMv7 features in a different way for the MSVC compiler.
commit b22a5db3c481b10b4a6ec190978d97b377750a12 [revision 2793]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:45 2017 +0200
configure: Check for -lshell32 before forcibly adding it into LDFLAGSCLI
When targeting the Windows Phone API subset, there is no shell32.lib.
When targeting Windows Phone/RT, the CLI itself won't be built, but
LDFLAGSCLI are included in all later cases of cc_check within configure.
Therefore only add -lshell32 there if it actually is usable.
commit 0aed59e74808f1cd22ee47c055a8eb4f367b2f55 [revision 2792]
Author: Martin Storsjö <martin@martin.st>
Date: Thu May 4 22:00:51 2017 +0300
arm: Always unconditionally declare .arch armv7-a
We already unconditionally declare .fpu neon and try to build all the
neon codepaths (but only execute them conditionally based on a runtime
check).
This fixes builds targeting armv6, where the rbit instruction isn't
available. This instruction is only used within a neon function in
any case, so there's little point in emulating it.
commit 196d7676c8f40b7c1f8f2f4af64e09ebf4c9816b [revision 2791]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:44 2017 +0200
arm: Use .section .rodata for non-elf, non-mach platforms as well
If targeting windows with armasm, gas-preprocessor can rewrite the
.section .rodata into the right construct for that platform.
commit 9bffbabfecf0bda066362a1b76b62c5085257e18 [revision 2790]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:41 2017 +0200
gas-preprocessor: Support conversion of additional arm instructions into thumb
Convert muls into mul+cmp.
Convert "and r0, sp, #xx" into "mov r0, sp", "and r0, r0, #xx".
Convert ldr with a too large shift into add+ldr. This only works in the
special case when the base register is the same as the target for the ldr.
commit 2e9bd88f27ed8f5f058e7e220070b7a15965cb8e [revision 2789]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:40 2017 +0200
arm: Explicitly declare using the .text segment in the function macro
This fixes one issue in building with MS armasm via gas-preprocessor.
Without the .text segment specification, the object files assembled
fine, but linking failed. (armasm source files don't get the text/code
segment implied automatically if nothing is specified.)
commit 64843af913e76fd7fb590e9227f678add96e8a3c [revision 2788]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:39 2017 +0200
osdep: Use the EXPAND macro on other cases of ALIGNED_ARRAY_EMU
EXPAND is already used on the other cases where ALIGNED_ARRAY_EMU
is used on all platforms (originally needed for ICL, later also
required by MSVC); apply the same change (originally from 21ba91ae)
for the cases that only are used on ARM.
This fixes use of ALIGNED_ARRAY_16 with MSVC when targeting ARM.
commit 757091fe3abd0af0f45d11f52b652f0be2fb76f5 [revision 2787]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:38 2017 +0200
Update to the latest version of gas-preprocessor.pl
From http://git.libav.org/?p=gas-preprocessor.git
This update contains changes from myself only.
commit d13705191cdcbcd10d87524dbb0c26ba998d8dcc [revision 2786]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:37 2017 +0200
arm: Skip using gas-preprocessor for iOS on arm as well
The few constructs that differ can easily be handled within the
source itself - tested to be working since at least Xcode 6.
commit 3a3cfe32416efa4f966c0586411148236e4703c1 [revision 2785]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:36 2017 +0200
arm: Use const macros in arm assembly where applicable
This unifies the source code style, and allows building the code
with clang without gas-preprocessor.
commit 1e92821c5a52c80ca4d1a9b6d038bec84be48b0a [revision 2784]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:35 2017 +0200
arm: Use commas between all macro arguments in arm assembly
The clang built-in assembler requires proper commas between all macro
arguments. As long as gas-preprocessor is used when building with clang,
this isn't an issue.
commit a84e6a486b991bffb2cc9f86b6e236978d251d2c [revision 2783]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:34 2017 +0200
aarch64: Skip invoking gas-preprocessor for iOS
Clang can handle all the constructs used there these days, working
since Xcode 6 at least.
commit 535fd2ec9985b9874d6ed23904404d0d2f5d40d6 [revision 2782]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Mar 24 11:33:33 2017 +0200
aarch64: Use the const macro in the aarch64 checkasm assembly source
This fixes building the source with clang for iOS without gas-preprocessor.
commit bec87ba69421572282e473cf8f2e11c77285ed88 [revision 2781]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Apr 12 23:26:32 2017 +0200
Windows: Add support for MSVC compilation with WSL
In Windows 10 version 1703 (Creators Update) WSL supports calling native
Windows binaries from the Bash shell, but it requires using full file
names including extension, e.g. `cl.exe` instead of `cl`.
We also don't have access to `cygpath`, so use a simple regex for
converting the dependencies to Unix paths that `make` can understand.
commit 43e9a6157752c2a3c2cc6c6a7fa13da72033d1dd [revision 2780]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jan 29 22:58:24 2017 +0100
cli: Improve the --fullhelp raw demuxer input-csp listing
Use the same logic for indentation as the lavf demuxer.
commit 3538df12688fc4408f585c4e65ee92d5a4737b2c [revision 2779]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sat May 20 21:17:59 2017 +0300
x86inc: Remove argument from WIN64_RESTORE_XMM
The use of rsp was pretty much hardcoded there and probably didn't work
otherwise with stack_size > 0.
commit e4b0974a4ea3a727f6cc8941e9accf7ef3ba0637 [revision 2778]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Apr 22 20:30:35 2017 +0200
x86inc: Prefer r14/r15 over r12/r13 on x86-64
Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13
registers sometimes requires an additional byte when used as a base register.
r14 and r15 doesn't have that issue, so prefer using them.
commit 46a489b5e21cae3b4fea5d41cc285dcaf79d19e3 [revision 2777]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Apr 20 19:16:51 2017 +0200
x86inc: Make REP_RET identical to RET in SSSE3+ functions
There's no point in emitting a rep prefix before ret on modern CPUs.
commit 50a9dd78263191474c948d53e837348abd0bf316 [revision 2776]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Mar 29 16:43:57 2017 +0200
x86inc: Fix call with memory operands
We overload the `call` instruction with a macro, but it would misbehave when
the macro argument wasn't a valid identifier. Fix it by explicitly checking
if the argument is an identifier.
commit d13b4c3a9574cd2fbd5407c7dfc58eeff72d2080 [revision 2775]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jan 29 16:41:33 2017 +0100
osdep: Rework alignment macros
Drop ALIGNED_N and ALIGNED_ARRAY_N in favor of using explicit alignment.
This will allow us to increase the native alignment without unnecessarily
increasing the alignment of everything that's currently 32-byte aligned.
commit 5840e200a0f1869a0596c5ed75c76f4d3221dd68 [revision 2774]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Mon Jan 30 22:14:57 2017 +0100
Move cabac_block_residual function declarations
commit a2d2621cc5741414b1f1adfbc08f19f1cc763847 [revision 2773]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Mon Jan 30 22:14:59 2017 +0100
Recursively delete conftest files
On OS X, one of the conftest files might be a directory named `conftest.dSYM`.
commit 988ce459433fd3f978d632e8fc0ef9c19c94a6a1 [revision 2772]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Mon Jan 30 22:14:56 2017 +0100
Drop unused function declarations
commit fb3f97833cbe3305eb613633e604f424d6d2d096 [revision 2771]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Fri Jan 27 18:06:39 2017 +0100
x86: Adjust cache64_ssse3 function suffixes
Makes those function names more consistent with other similar functions.
commit a77f3917cc6ba5e1d3c20ca649d4114217976d53 [revision 2770]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Fri Jan 27 16:21:16 2017 +0100
mc: Mark a function only used within the file as static
commit 0ca36bfa3d2bf272da88b1df5abfc0406662989a [revision 2769]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Fri Jan 27 16:21:15 2017 +0100
ppc: Drop two unused static functions
commit d32d7bf1c6923a42cbd5ac2fd540ecbb009ba681 [revision 2768]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri May 19 16:08:34 2017 +0200
cli: Verify that yuv/y4m input has at least one frame of data
Prevents a SIGBUS crash caused by attempting to access a memory-mapped
region beyond the end of the input file.
commit 959e869c20ea151917695930d9ad0a7a9a2f90c5 [revision 2767]
Author: Kaustubh Raste <kaustubh.raste@imgtec.com>
Date: Fri Apr 14 15:29:31 2017 +0530
mips: Fix out-of-tree build
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
commit d6eb2c9630d40a2765d5092f87637f4e4d084ed1 [revision 2766]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Mar 25 00:02:11 2017 +0100
checkasm: Fix load_deinterleave_chroma_fdec test
The function only writes to parts of the destination buffer but the test
verifies the content of the entire buffer. The problem is that some earlier
IDCT functions clobbers the same part of the buffer with garbage when
benchmarked which would incorrectly cause test failures.
Fix this by explicitly zeroing the buffers beforehand.
commit a472b60daae0cac17d91ddf62ad4f474ded63e5b [revision 2765]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Mar 24 22:27:42 2017 +0100
checkasm: Fix compilation on hardened x86-64 ELF systems
Normal PC-relative relocations cannot be used for resolving the address of
external symbols on systems where ASLR results in the offset being larger
than 32 bits. We are required to to go through the PLT instead.
commit 469ad705b1064207b6b1068d1e25a0a591021007 [revision 2764]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Mar 23 15:05:38 2017 +0200
aarch64: Fix building checkasm for iOS
On iOS, symbols are prefixed - this prefix gets added by the X()
macro.
commit 93340ca300e7ce66f49e41b7c2ef4a0492a7e57c [revision 2763]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Mar 23 15:05:37 2017 +0200
configure: Always enable PIC in aarch64 assembly for apple platforms
This is similar to what we do for 32-bit ARM assembly as well.
Fixes linker errors such as `ld: Absolute addressing not allowed in
arm64 code but used in '_x264_cabac_encode_terminal_asm' referencing
'_x264_cabac_range_lps' for architecture arm64`.
commit 90a61ec76424778c050524f682a33f115024be96 [revision 2762]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Mon Dec 5 10:28:53 2016 +0000
ppc: AltiVec plane_copy_deinterleave
commit bd6b66dbf9fcf67b7ebb23e4e9249083191fb984 [revision 2761]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Mon Jan 2 12:56:48 2017 +0000
ppc: AltiVec plane_copy_deinterleave_v210
commit 00f1670087db1b025a8088289de8938bf88a0d8b [revision 2760]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Wed Dec 7 19:48:02 2016 +0000
ppc: AltiVec plane_copy_deinterleave_rgb
Also add some missing vector types in ppccommon.h
commit 5e1ed367d725f895eeadf358861ab52521a420d3 [revision 2759]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Thu Jan 19 17:43:57 2017 +0100
ppc: Adjust AltiVec function suffix
Architecture should always be the last element.
commit 28ebb95d92278069b80ee729eb1884fe0981c6ae [revision 2758]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Mon Jan 9 22:28:20 2017 +0100
Move the x264_mdate() declaration to the appropriate header
commit 1d2420981aa004f051a0869c005776084f7d2a44 [revision 2757]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Tue Jan 17 17:04:19 2017 +0100
arm/aarch64: Correctly prefix integral function symbols
commit 4c4c495d58dbdea46a23947e4f202fc3b82fb891 [revision 2756]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Jan 13 14:57:51 2017 +0100
x86: Avoid using hardcoded function symbol prefixes
commit 2524fc3164d9f00b393d4254d2c5ea8f3b9d43b0 [revision 2755]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Jan 18 21:57:14 2017 +0100
x86: AVX2 high bit-depth load_deinterleave_chroma
load_deinterleave_chroma_fenc: 50% faster than AVX
load_deinterleave_chroma_fdec: 25% faster than AVX
commit cce50082129d3c92bd41bc0afc5a8c8d93084c9c [revision 2754]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Jan 18 21:46:55 2017 +0100
x86: AVX2 load_deinterleave_chroma_fenc
20% faster than SSSE3.
commit c22c10ddb21e9f5af1da83d37122e6f7388e1342 [revision 2753]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Jan 17 21:59:47 2017 +0100
x86: AVX2 plane_copy_deinterleave
50% faster than SSSE3 in 8-bit.
25% faster than AVX in high bit-depth.
Also drop the MMX versions of deinterleave functions in favor of SSE2.
commit f4890275ca6523dfe5b4ae60279ae8597d9dbd4b [revision 2752]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jan 12 22:16:53 2017 +0100
x86: AVX2 plane_copy_deinterleave_rgb
Around 15% faster than SSSE3.
commit da71b556730c8eb6c12a0d6950a221a4e4a99ca6 [revision 2751]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jan 12 21:36:28 2017 +0100
x86: Faster plane_copy_deinterleave_rgb_sse2
50% faster than the previous SSE2 function.
commit 3c7bf52c5b0a849458a45b5628ed1cc4b898da5f [revision 2750]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jan 15 14:52:29 2017 +0100
x86util: Reduce code size of high bit-depth AVX LOAD_DIFF
AVX supports unaligned memory operands which makes the SATD code a bit denser.
commit c7a2e327bebd2b863c2620b6962fa18ab681e5dd [revision 2749]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jan 1 19:10:10 2017 +0100
Bump dates to 2017
commit 97eaef2ab82a46d13ea5e00270712d6475fbe42b [revision 2748]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Sat Jan 21 12:34:49 2017 +0000
ppc: Fix the pre-VSX vec_vsx_st() fallback macro
It would previously only work correctly with 8-bit data types.
Fixes compilation with --disable-vsx.
commit 2ebe09a4f583d108c6ec1caf70b2a7a289a8820d [revision 2747]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Wed Jan 18 09:13:39 2017 +0000
Fix plane_copy_deinterleave_v210 on big-endian
commit 79288d90471e246584d19054bdb5381982114126 [revision 2746]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Wed Dec 21 13:13:43 2016 +0000
ppc: Avoid instantiating unused plane_copy functions
Those functions are currently only used in 8-bit mode and results in
warnings in other bit depths.
commit 2ebdb90bd32c3d1618b1c5b360bff750b82b1d0b [revision 2745]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Dec 27 00:22:48 2016 +0200
arm: Load mb_y properly in mbtree_propagate_list_internal_neon
The previous version, attempting to load two stack parameters at once,
only would have worked if they were interpreted and loaded as 32 bit
elements, not when loading them as 16 bit elements.
commit b97ae0644f16bad2e2c9c9181264a946769a0aa0 [revision 2744]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Oct 31 14:39:52 2016 +0300
analyse: Fix lambda table values
commit b2b39dae0bd891c8d150b4f4c3a2a24d8d6c1431 [revision 2743]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sat Nov 26 15:30:58 2016 +0300
Cosmetics
Also make x264_weighted_reference_duplicate() static.
commit 9c82d2b65534e477c972b811a4dd5004d0dd262e [revision 2742]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Mon Nov 28 14:04:10 2016 +0000
ppc: AltiVec store_interleave_chroma
commit ea1fee272b20e1bcff2a862ea9a29e151c9136a9 [revision 2741]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Mon Nov 28 10:51:54 2016 +0000
ppc: AltiVec plane_copy_interleave
commit 42348a8e664b091203a05d3e15555b5085afcac1 [revision 2740]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Sat Nov 26 20:03:34 2016 +0000
ppc: AltiVec plane_copy_swap
commit 2610019af8bfb8e71f813cd2188b9eccbc287c59 [revision 2739]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Wed Nov 23 20:53:51 2016 +0100
ppc: AltiVec zigzag_interleave_8x8_cavlc
commit 25e4e06fe8151f627a953fbd2bd39302436bf689 [revision 2738]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Wed Nov 23 20:53:50 2016 +0100
ppc: AltiVec zigzag_scan_8x8_frame
commit 99863c665a6d4ec58b7fcc4a8a791e9c8f35a86e [revision 2737]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Mon Nov 14 15:06:06 2016 +0100
ppc: AltiVec sub8x8_dct_dc
commit 42cb0a6813714b5380e23871a155e3820846d991 [revision 2736]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Mon Nov 14 15:06:05 2016 +0100
ppc: AltiVec add8x8_idct_dc
commit 983acc911543453449a65bd02bbdff4c8cfe8e6a [revision 2735]
Author: Martin Storsjö <martin@martin.st>
Date: Wed Nov 16 10:57:31 2016 +0200
checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack
This, combined with clobbering the stack space prior to the call,
increases the chances of finding cases where 32 bit parameters
are erroneously treated as 64 bit.
commit 8ada354c9b5d72356c34c9ae3f787a6df4d61506 [revision 2734]
Author: Martin Storsjö <martin@martin.st>
Date: Wed Nov 16 10:57:30 2016 +0200
checkasm: aarch64: Clobber the stack before calling functions
commit 62d604ac6dddbf553c1ff2432d899b61cc50d95a [revision 2733]
Author: Alexandra Hájková <alexandra@khirnov.net>
Date: Tue Nov 1 23:16:17 2016 +0100
ppc: Use vec_vsx_ld instead of VEC_LOAD/STORE macros
Remove VEC_LOAD*, some of VEC_STORE* macros, some PREP* macros and
VEC_DIFF_H_OFFSET macro.
Make sure the functions do not use deprected primitives.
commit 16142d8ee2a974060ecbad0f495b5a5c6516a75e [revision 2732]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Tue Nov 1 23:16:16 2016 +0100
ppc: Provide fallbacks for older architectures
commit 2b741f81e51f92d053d87a49f59ff1026553a0f6 [revision 2731]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Tue Nov 1 23:16:14 2016 +0100
ppc: Add VSX support to configure
commit 1f7518182e3204cb14e87baffb0150a848167ddc [revision 2730]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Tue Nov 1 23:16:13 2016 +0100
ppc: Manually unroll the horizontal prediction loop
Doubles the speedup from the function (from being slower to be over
twice as fast than C).
commit 0706ddb1df88d716cf73decba4d82b953011760c [revision 2729]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Oct 8 17:20:18 2016 +0200
x86inc: Avoid using eax/rax for storing the stack pointer
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.
If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.
commit 4d5c8b01a48f72f9c40651e92c39294326a0863f [revision 2728]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Dec 1 16:05:16 2016 +0100
Show the correct settings for --preset slow in --fullhelp
The slow preset was recently adjusted but we forgot to update the
corresponding --fullhelp message to reflect the change.
commit c996ed202e2d17d1d8ae42c42d0707e51c29bb93 [revision 2727]
Author: Martin Storsjö <martin@martin.st>
Date: Mon Nov 14 23:54:51 2016 +0200
checkasm: arm/aarch64: Fix the amount of space reserved for stack parameters
Even if MAX_ARGS - 2 (for arm) or MAX_ARGS - 6 (for aarch64) parameters
are passed on the stack to checkasm_checked_call, we actually only
need to store MAX_ARGS - 4 (for arm) or MAX_ARGS - 8 (for aarch64)
parameters on the stack when calling the tested function.
commit cd15b354a887943d525e6fd8096ad4b75692d2b2 [revision 2726]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Mon Nov 14 23:54:50 2016 +0200
checkasm: arm: preserve the stack alignment in x264_checkasm_checked_call
The stack used by x264_checkasm_checked_call_neon was a multiple of 4
when the checked function is called. AAPCS requires a double word (8 byte)
aligned stack public interfaces. Since both calls are public interfaces
the stack is misaligned when the checked is called.
This can cause issues if code called within this (which includes
the C implementations) relies on the stack alignment.
commit 834e1b11e174f2694a4c81b4922c0c5f8778796a [revision 2725]
Author: Martin Storsjö <martin@martin.st>
Date: Wed Nov 16 10:56:14 2016 +0200
arm: Don't use vcmp.f64 for testing for an all-zeros register
On iOS, vcmp.f64 can behave as if the register was zero, if the
register (interpreted as a f64), was a denormal number.
The vcmp.f64 (and other VFP instructions) will trap to the kernel
(which is supposed to implement the FP operation, which it apparently
doesn't do properly on iOS) if the value is a denormal. If this happens,
the whole comparison ends up way more costly.
commit a91e95fca2222ac0731e987a07f4b11c670f4556 [revision 2724]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Wed Nov 16 10:49:14 2016 +0200
aarch64: Clear the upper half of int parameters in x264_plane_copy_core_neon
commit 1eab3b402e1d7729da295024fa7eec8b09e30c20 [revision 2723]
Author: Luca Barbato <lu_zero@gentoo.org>
Date: Tue Nov 1 23:16:18 2016 +0100
ppc: Fix hadamard for little-endian
Extending to 16-bit works with flipped bytes.
commit 75918e1849e1286885bfcfb0c348de885a702fb3 [revision 2722]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Sep 22 00:17:48 2016 +0300
Correctly signal max_dec_frame_buffering with --keyint 1
According to E.2.1 it is inferred to be equal to 0 only if profile_idc is equal
to 44, 86, 100, 110, 122, or 244 and constraint_set3_flag is equal to 1.
commit 72d53ab2ac7af24597a824e868f2ef363a22f5d4 [revision 2721]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Sep 17 21:41:52 2016 +0200
x86: Faster pixel_ssim_4x4x2_core
commit 8c07263ad9218bdc3e0f5b84d578968513885df7 [revision 2720]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Sep 17 21:14:35 2016 +0200
x86: Deduplicate a constant in hpel_filter_c
commit 9521b278adb92081f052c1b7bfc4b95651d88b07 [revision 2719]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Sep 17 14:45:08 2016 +0200
x86: Faster pixel_ssd_nv12
Also drop the MMX2 version to simplify things.
commit 75d0f9cc8770bc4f36785062116757d24eb44604 [revision 2718]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Sep 11 15:32:54 2016 +0200
x86: SSE zigzag_scan_4x4_field
Replaces the MMX2 version, one cycle faster.
Also change the checkasm test to use the correct alignment macro.
commit 0ce77f9eb71051c9a6121ec12c2abaac99ee628a [revision 2717]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Sep 7 19:27:31 2016 +0200
x86: AVX2 mbtree_propagate_list
SIMD part is around 25% faster than AVX on Haswell, around 7%
faster when including the runtime of the scalar C wrapper.
commit 0c36239a4826f6e5a3cb873aca1814e389a46e29 [revision 2716]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Sep 7 19:26:42 2016 +0200
x86: Move predict_16x16_dc_left calculations to asm
1-2 cycles faster and avoids some code duplication to decrease code size.
Also drop the MMX2 implementation in favor of SSE2 to simplify things.
commit 0cc8afd31212de013b26b10f58c608c9adcff2fc [revision 2715]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Aug 18 19:00:48 2016 +0300
avs: support for AviSynth+ high bit-depth pixel formats
commit dc0fe73636d34baeb3a64918b52db64d2a9e83bb [revision 2714]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Fri Aug 26 20:26:56 2016 +0300
aarch64: implement x264_plane_copy_swap_neon
plane_copy_swap_c: 27054
plane_copy_swap_neon: 4152
commit eaf2fc20c8579714a48523b7ab8c05373708a25f [revision 2713]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Aug 18 22:14:22 2016 +0300
Various cosmetics of semicolon use
commit aae177c55141460f442de0572c4a434bf2ae20bc [revision 2712]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jul 28 21:58:40 2016 +0200
cli: Prefetch yuv/y4m input frames on Windows 8 and newer
Use PrefetchVirtualMemory() (if available) on memory-mapped input frames.
Significantly improves performance when the source file is not already
present in the OS page cache by asking the OS to bring in those pages from
disk using large, concurrent I/O requests.
Most beneficial on fast encoding settings. Up to 40% faster overall with
--preset ultrafast, and up to 20% faster overall with --preset veryfast.
This API was introduced in Windows 8, so call it conditionally. On older
Windows systems the previous behavior remains unchanged.
commit 4e5adb87070c82b937c03e0cc030eae3578c251d [revision 2711]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jul 28 19:34:04 2016 +0200
Adjust --preset slow
* Swap --me umh for --trellis 2. They have a similar effect on performance
but the latter gives slightly better results in most cases.
* Change --b-adapt from 2 to 1. Negligible difference in quality since the
b-adapt 1 improvements, but it's significantly faster.
Also remove a redundant assignment from veryfast (--me hex is set by default).
commit 1e4fb55a283ba90fef346033027af851f2a04468 [revision 2710]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jul 28 19:33:57 2016 +0200
ratecontrol_new: Simplify an expression in HRD timescale calculation
Also gets rid of a false positive static analyser integer division warning.
commit 17378b2028146fa54a1b2b90da62554935d9dcc2 [revision 2709]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jul 28 19:33:44 2016 +0200
gcc: Enable __sync_fetch_and_add() on x86-64
It was previously only enabled on 32-bit x86 for no reason, so 64-bit
systems had to use a mutex instead of a simple `lock xadd` instruction.
Note that this code is only used in some very specific configurations
involving sliced threads.
commit 86b71982e131eaa70125f8d0e725fcade9c4c677 [revision 2708]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Sep 20 18:48:22 2016 +0300
mips: Fix high bit-depth compilation
commit 1ea3c682ca12c7f13ea6f82b42bdc40afcfda87f [revision 2707]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Sep 17 15:53:59 2016 +0200
checkasm: Fix compilation on Windows with --disable-thread
commit 5caef139cf7d6b41a95ee9568625d36d1ae1c107 [revision 2706]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Fri Aug 26 20:26:55 2016 +0300
arm/aarch64: use plane_copy wrapper macros
Move the macros to common/mc.h to share them across all architectures.
Fixes possible buffer overreads if the width of the user supplied frames
is not a multiple of 16.
Reported-by: Kirill Batuzov <batuzovk@ispras.ru>
commit 3f5ed56d4105f68c01b86f94f41bb9bbefa3433b [revision 2705]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Apr 3 17:28:33 2016 +0200
configure: Support specifying a custom pkg-config
commit 7c9c687d8062f72b3ec300de8997bdae8277a741 [revision 2704]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Wed Jun 8 22:46:17 2016 +0300
Add support for new VUI parameters
Support the new color primaries, transfer characteristics, and matrix
coefficients defined in the 2016-02 edition of the H.264 specification.
commit 92515e8ff73491ef8a44c85e0bee265ba5791070 [revision 2703]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Apr 24 14:10:22 2016 +0200
configure: Add link-time optimization support
Enabled by using the --enable-lto configuration option.
May give a slight performance improvement in some cases, but it can
also reduce performance in other cases (largely compiler-dependant)
so don't enable it by default. It also makes compilation (and linking
in particular) a fair bit slower.
Note that some older versions of GNU binutils will incorrectly warn
about "memset used with constant zero length parameter" when linking
using LTO. This is due to a bug in binutils and can safely be ignored.
commit b6267e0ff770545de88dfb5d3f176ea73f453730 [revision 2702]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Apr 24 13:32:43 2016 +0200
configure: Fix clang detection with versioned binaries
Correctly detect clang binaries that has the version number appended
as a suffix to the file name, e.g. `clang38`.
commit 14a58532fea2c5f9e7b93c918476d842091c4268 [revision 2701]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Sun Apr 24 14:38:56 2016 +0200
arm: Add asm for mbtree fixed point conversion
7-8 times faster on a cortex-a53 vs. gcc-5.3.
mbtree_fix8_pack_c: 44114
mbtree_fix8_pack_neon: 5805
mbtree_fix8_unpack_c: 38924
mbtree_fix8_unpack_neon: 4870
commit b6f189eb4c5646483f7901293944695167e71ed9 [revision 2700]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Sun Apr 24 14:38:55 2016 +0200
aarch64: Add asm for mbtree fixed point conversion
pack is ~7 times faster and unpack is ~9 times faster on a cortex-a53
compared to gcc-5.3.
mbtree_fix8_pack_c: 41534
mbtree_fix8_pack_neon: 5766
mbtree_fix8_unpack_c: 44102
mbtree_fix8_unpack_neon: 4868
commit a5e06b9a435852f0125de4ecb198ad47340483fa [revision 2699]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun May 22 22:33:58 2016 +0300
Fix p4x4 analyse for 4:4:4 encoding with chroma ME
commit 07221290db0a94bda1f6ece3fdf3c02675c8adce [revision 2698]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun May 22 22:18:34 2016 +0300
Fix 4:4:4 encoding with CQM
commit 23ebc1f763936b7fcfc81e21530e1b65dbc503b9 [revision 2697]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun May 22 19:36:05 2016 +0300
Fix p4x4 RDO with CAVLC
commit 740a8c556bd9b68e899d6991f3f987a443aa14aa [revision 2696]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sat Apr 23 23:10:03 2016 +0300
Apply zone options a little bit earlier
This way things like SAR changes will have full effect from the start frame.
commit 928bd9d5def4f0ca5071ea176a11b816a01e6495 [revision 2695]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sat Apr 23 22:45:44 2016 +0300
Fix corruption when using encoder_reconfig() with some parameters
Changing parameters that affects SPS, like --ref for example, wasn't
behaving correctly previously.
Probably a regression in r2373.
commit 3b70645597bea052d2398005bc723212aeea6875 [revision 2694]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Wed Apr 13 21:54:25 2016 +0300
Clean up header includes
commit 2102de2584e03fce4abac49eb37d5d7a0803380f [revision 2693]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Apr 13 17:53:49 2016 +0200
Eliminate some compiler warnings on BSD
Include <strings.h> in addition to <string.h>. According to the POSIX
specification the prototypes for strcasecmp() and strncasecmp() are
declared in <strings.h>. On some systems they are also declared in
<string.h> for compatibility reasons but we shouldn't rely on that.
Define _POSIX_C_SOURCE only when it's required to do so. Some BSD
variants doesn't declare certain function prototypes otherwise.
commit 64f4e24909924fceeea6e154d71b7dfbf586c7ea [revision 2692]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Apr 12 21:33:54 2016 +0200
osx: Add -D_DARWIN_C_SOURCE to CFLAGS
OSX doesn't like _POSIX_C_SOURCE being defined when _DARWIN_C_SOURCE isn't.
commit 00597d74c6223f3694e2c6614ef0574d7fca6b22 [revision 2691]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Apr 12 20:33:42 2016 +0300
Remove an unused parameter from x264_slicetype_frame_cost()
The b_intra_penalty parameter is no longer used anywhere after the
improvements to the --b-adapt 1 algorithm.
commit aa26e880bc2cd04cc81c776051d5e21d03fc975a [revision 2690]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 10 20:17:32 2016 +0300
Improve the --b-adapt 1 algorithm
Roughly the same speed as before but with significantly better results,
comparable to --b-adapt 2.
commit 24f25b6afd21488a93bd86098f98dfaf229fc149 [revision 2689]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Apr 3 15:49:26 2016 +0200
analyse: i_sub_partition write combining
commit 1507cfe80ecf5f8e240a35e9e9dc5a92bd25e792 [revision 2688]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Mar 15 20:16:45 2016 +0100
x86: Use one less register in mbtree_propagate_cost_avx2
Avoids the need to save and restore xmm6 on 64-bit Windows.
commit c82c7374938f4342971adf8b2495c3a1bbe621c4 [revision 2687]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Mar 4 17:53:08 2016 +0100
x86: Add asm for mbtree fixed point conversion
The QP offsets of each macroblock are stored as floats internally and
converted to big-endian Q8.8 fixed point numbers when written to the 2-pass
stats file, and converted back to floats when read from the stats file.
Add SSSE3 and AVX2 implementations for conversions in both directions.
About 8x faster than C on Haswell.
commit be677efc6313ade5eddf722fdf097cce56df1344 [revision 2686]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Apr 7 13:09:03 2016 +0300
x86inc: Enable AVX emulation in additional cases
Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.
commit b5661d322866df647e6084061a471eceac214c28 [revision 2685]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Apr 7 12:48:29 2016 +0300
x86inc: Improve handling of %ifid with multi-token parameters
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
commit 283663d4c13088f4811c78b75318bda59d696b2d [revision 2684]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Mar 28 18:35:38 2016 +0300
x86inc: Fix AVX emulation of some instructions
commit 54fd697668d0a04246ad0b0e9955a6583b2bb8b6 [revision 2683]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Mar 4 17:51:41 2016 +0100
x86inc: Fix AVX emulation of scalar float instructions
Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.
commit eeb9b66ddb0f27d8baaa8efa9597613e61140836 [revision 2682]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Feb 27 20:34:39 2016 +0100
x86: dct2x4dc asm
Only used in 4:2:2. MMX2 version implemented for 8-bit, SSE2 and AVX
versions implemented for high bit-depth.
2.5x faster on 32-bit and 1.6x faster on 64-bit compared to C on Ivy Bridge.
commit 23d1d8e89be2d99f5c6924a6055fc80d69429503 [revision 2681]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Feb 20 20:31:22 2016 +0100
x86: SSE2/AVX idct_dequant_2x4_(dc|dconly)
Only used in 4:2:2. Both 8-bit and high bit-depth implemented.
Approximate performance improvement compared to C on Ivy Bridge:
x86-32 x86-64
idct_dequant_2x4_dc 2.1x 1.7x
idct_dequant_2x4_dconly 2.7x 2.0x
Helps more on 32-bit due to the C versions being register starved.
commit dbbf1dd2836a21b65178442c1fb7a00ea089d7ec [revision 2680]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Feb 20 16:53:35 2016 +0100
checkasm: Fix idct_dequant_2x4_(dc|dconly) tests
They used the wrong qp values and the dconly test had the wrong name. This
was undetected before because there wasn't any assembly implementations.
commit 0db0ac3a05b80eee7994fab08cbce2d07e8b1586 [revision 2679]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Feb 7 14:55:26 2016 +0100
checkasm: Disable Windows Error Reporting
When developing new assembly code it's expected that checkasm may crash,
and the error reporting dialog popup can be somewhat annoying.
commit deae1b1001d134f5babc4fad3208bd951a454951 [revision 2678]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Feb 6 18:49:46 2016 +0100
windows: Flag debug builds in the resource file
commit 0082b717199bafb4abbb6638e7c30d50deaf2c1b [revision 2677]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Feb 4 20:06:57 2016 +0100
cli: Refactor filter option parsing
The old code contained a whole bunch of memory leaks, unchecked mallocs,
sections of dead code, etc. and was generally overly complex.
Also consolidate some memory allocations into a single one.
commit dfe394cadc8a39752de5b3f4a0be222c1b9290f2 [revision 2676]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jan 31 21:50:52 2016 +0100
ffms: Various improvements
* Drop the MinGW Unicode workarounds. Those were required at the time
Windows Unicode support was added to x264 but the underlying problem
has since been fixed in FFMS.
* Use FFMS_IndexBelongsToFile() as an additional sanity check when reading
an index file to ensure that it belongs to the current source video.
* Upgrade to the new API to prevent deprecation warnings when compiling.
* Fix a resource leak that would occur if FFMS_GetFirstTrackOfType() or
FFMS_CreateVideoSource() failed.
* Minor string handling adjustments related to progress reporting.
This increases the FFMS version requirement from 2.16.2 to 2.21.0.
commit 215afdbd8ecc924f2028f79851458076683e97ad [revision 2675]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Apr 11 16:59:46 2016 +0200
msvc: Add snprintf/vsnprintf replacements
MSVC pre-VS2015 has broken snprintf/vsnprintf implementations which are
incompatible with C99 and may lead to buffer overflows.
commit 5be32efc244d96aa56be462664b5c56d7318e86d [revision 2674]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jan 31 20:21:01 2016 +0100
configure: Define feature test macros for --std=gnu99
Makes the printf() family functions on MinGW use the correct C99 POSIX
versions instead of the broken pre-VS2015 Microsoft ones.
Also allows us to get rid of some _GNU_SOURCE and _ISOC99_SOURCE defines.
commit c01bf42117b811a0469f9f6c374f4a0daa98716d [revision 2673]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jan 28 18:37:37 2016 +0100
mingw: Enable high-entropy ASLR on 64-bit Windows
To fully utilize HEASLR the image base address must also be set above
4 GiB. For consistency use the same address as MSVC uses by default.
This requires binutils 2.25 which isn't available on all common
distributions, so only enable it after checking that it's supported.
commit dd6b7b974e0057da726f71e10c24d057a339605b [revision 2672]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jan 24 01:48:18 2016 +0100
msvs: WinRT support
To compile x264 for WinRT the following additional steps has to be performed.
* Ensure that the necessary SDK is installed.
* Set the correct environment variables in the VS command prompt as shown at
https://trac.ffmpeg.org/wiki/CompilationGuide/WinRT
* Add one of the following to --extra-cflags depending on the target OS:
"-DWINAPI_FAMILY=WINAPI_FAMILY_PC_APP -D_WIN32_WINNT=0x0A00" (Windows 10)
"-DWINAPI_FAMILY=WINAPI_FAMILY_PC_APP -D_WIN32_WINNT=0x0603" (Windows 8.1)
commit 7650a1367003e24f4f1b831682c012b5ba3e6c69 [revision 2671]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jan 24 23:58:40 2016 +0100
configure: Disable CLI libraries when CLI is disabled
commit 1ce062abb47ac59621b402cb26a1f14c91bb52bc [revision 2670]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Feb 5 18:46:13 2016 +0100
matroska: mk_close: Check fseek() return value
commit de7af9185e172122cd9b800845e1988a52ad7cc3 [revision 2669]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Feb 5 18:46:02 2016 +0100
parse_qpfile: Check ftell() and fseek() return values
commit fd2c324731c2199e502ded9eff723d29c6eafe0b [revision 2668]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Sun Apr 10 20:13:59 2016 +0300
Use the correct default B-ref placement with B-pyramid
Cost analyse functions expects the placement of the B-ref in a sequence of
an even number of B-frames to be located towards the beginning while the
actual placement was towards the end.
Change the placement to be consistent with the analyse expectations, e.g.
PbbBbP -> PbBbbP.
commit e6a3f2989dd9eba3434c21fa94a6d9a5d1c7a9fe [revision 2667]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Feb 5 18:45:47 2016 +0100
parse_zones: Fix memory leak
commit f86756985d42ac4a14866534c588061ede860b7b [revision 2666]
Author: Alexey Samsonov <vonosmas@gmail.com>
Date: Mon Jan 25 16:05:25 2016 -0800
Fix float-cast-overflow in x264_ratecontrol_end function
According to the C standard, it is undefined behavior to cast a negative
floating point number to an unsigned integer. Float-cast-overflow in
general is known to produce different results on different architectures.
Building x264 code with Clang and -fsanitize=float-cast-overflow
(http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#availablle-checks)
and running it on some real-life examples occasionally produces errors
of the form:
encoder/ratecontrol.c:1892: runtime error: value -5011.14 is outside the
range of representable values of type 'unsigned short'
Fix these errors by explicitly coding the de-facto x86 behavior: casting
float to uint16_t through int16_t.
commit a01e33913655f983df7a4d64b0a4178abb1eb618 [revision 2665]
Author: Sebastian Dröge <sebastian@centricular.com>
Date: Sun Dec 20 23:49:35 2015 +0300
Fix AVC-Intra padding for non-Annex B encoding
commit 1e4a24f305c006a95fec00131703d0e0ecae3a38 [revision 2664]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Jan 11 21:39:22 2016 +0300
ppc: Only perform AltiVec detection if compiled with AltiVec enabled
commit b5953629117adc2b8d0d0eed6eb323c00587b428 [revision 2663]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Oct 13 15:30:16 2015 +0300
2-pass: Take into account possible frame reordering
commit 20821a26ec510979e49fcfd6becc6ad7e2d8b388 [revision 2662]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Oct 13 12:54:05 2015 +0300
Revise the 2-pass algorithm
commit 065321c48d0d371c1735b3cc9d368b43e1b64aaa [revision 2661]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Jan 5 02:41:43 2016 +0300
Revise the row VBV algorithm (part 2)
Should fix rare cases of VBV emergency mode activation caused by too much trust
to the row predictors.
commit d23d18655249944c1ca894b451e2c82c7a584c62 [revision 2660]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Jan 1 12:44:31 2016 +0100
Bump dates to 2016
commit 3d972062c8a37d1a19586e2351e889b0a70beb40 [revision 2659]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Oct 26 19:54:20 2015 +0100
cli: Use memory-mapped input frames for yuv and y4m
Improves performance by avoiding extraneous memory copying.
Most beneficial on fast settings.
On average around 5-10% faster overall on ultrafast but the
performance improvement can be even larger in some cases.
commit 38a5268dbec56adea750e05c4981f3bbb176e735 [revision 2658]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Jan 7 01:59:24 2016 +0100
y4m: Support extended frame headers when seeking
Use the actual length of the frame header of the first frame instead of
assuming a header without extensions when calculating the frame size.
Also makes the frame counter more accurate with extended frame headers.
commit cc652c158c1fa65bfeafb6446b5be855850065d0 [revision 2657]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Nov 3 17:55:08 2015 +0100
configure: Simplify cygwin/mingw/msys code
Avoids some code duplication.
Also drop the -mno-cygwin check since that option was removed back in 2008.
commit 8b2d2a6d51abf51ad38dd8705d280448fbe63aaf [revision 2656]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Oct 26 18:52:46 2015 +0100
y4m: Avoid some redundant strlen() calls
commit 24f7705f15cf6d59028a76a894d866b9fad85f39 [revision 2655]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Oct 25 17:15:10 2015 +0100
Simplify threadpool_wait
commit 30ba5dc22fd0ae359e144847f2636574f659627d [revision 2654]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Oct 16 19:05:34 2015 +0200
windows: Use native threads by default
--disable-win32thread can be passed as an argument to configure to compile
with pthreads, which was the old default behavior.
commit 1637239a64f3ec9a491b91202bd37097f15a253d [revision 2653]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Oct 11 22:32:11 2015 +0200
x86: Avoid some bypass delays and false dependencies
A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
between int and float domains, so try to avoid that if possible.
commit 7688814a7ec994f8e5984d199b465ccc068b98af [revision 2652]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Oct 11 22:32:03 2015 +0200
x86: Enable high bit-depth x264_coeff_last64_avx2_lzcnt
The function existed but was never enabled.
commit 366fa85885053c7b836a4272a4fbec1852103979 [revision 2651]
Author: Geza Lore <gezalore@gmail.com>
Date: Mon Oct 12 13:13:42 2015 +0100
x86inc: Add debug symbols indicating sizes of compiled functions
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
commit 70c3ba42e610b4182edda4fdeb10b37a2a70eb8f [revision 2650]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Oct 16 21:28:49 2015 +0200
x86inc: Avoid creating unnecessary local labels
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter
up the symbol table and confuse debugging/profiling tools, so use EQU to
create SHN_ABS symbols instead of creating local labels. Furthermore, skip
the workaround completely in functions that definitely won't run on such cpus.
This patch doesn't modify any emitted instructions, and doesn't actually affect
x264 at all. It's only for other projects that use x86inc.asm without an
appropriate `strip` command in their buildsystem.
Note that EQU is just creating a local label when using nasm instead of yasm.
This is probably a bug, but at least it doesn't break anything.
commit 5c3d473a966e4b013759097fb98cd4a9cb5a34f5 [revision 2649]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Oct 15 17:42:49 2015 +0200
x86inc: Simplify AUTO_REP_RET
cpuflags is never undefined any more, it's set to 0 instead.
Also fix an incorrect comment.
commit 28d68f090c0103704f5f6a86fcf362251774cd78 [revision 2648]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Oct 12 21:55:11 2015 +0200
x86inc: Use more consistent indentation
commit 963b99efaaf1f0628b155e52b8a7c102cd1d37ff [revision 2647]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Oct 12 20:15:18 2015 +0200
x86inc: Preserve arguments when allocating stack space
When allocating stack space with a larger alignment than the known stack
alignment a temporary register is used for storing the stack pointer.
Ensure that this isn't one of the registers used for passing arguments.
commit 6e5033417a53fa66d002665618a1350d7417725e [revision 2646]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jan 17 00:25:47 2016 +0100
x86inc: Improve FMA instruction handling
* Correctly handle FMA instructions with memory operands.
* Print a warning if FMA instructions are used without the correct cpuflag.
* Simplify the instantiation code.
* Clarify documentation.
Only the last operand in FMA3 instructions can be a memory operand. When
converting FMA4 instructions to FMA3 instructions we can utilize the fact
that multiply is a commutative operation and reorder operands if necessary
to ensure that a memory operand is used only as the last operand.
commit 93cba743c78959ad97812dbaf894903c608912d0 [revision 2645]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Oct 11 22:31:53 2015 +0200
x86inc: Be more verbose in assertion failures
commit 8017b33454397d59b3285ec6d2ad35b6d0deb58a [revision 2644]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Sep 30 23:17:00 2015 +0200
x86inc: Make cpuflag() and notcpuflag() return 0 or 1
Makes it possible to use them in arithmetic expressions.
commit 5c6570495f8f1c716b294aee1430d8766a4beb9c [revision 2643]
Author: Henrik Gramner <henrik@gramner.com>
Date: Fri Oct 30 16:55:49 2015 +0100
encoder_open: Fix memory leak
Furthermore, the x264_analyse_prepare_costs() and x264_analyse_init_costs()
functions were only used in x264_encoder_open(), so move that entire section
of code to analyse.c as well to simplify things.
commit 424534537a249dcf913e02560303f6afca423489 [revision 2642]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Wed Nov 18 11:08:22 2015 +0100
arm: do not fill mc_weight*_neon tabs for HIGH_BIT_DEPTH
The asm is only for 8-bit and function prototypes reflect that. Avoids
numerous warnings with --bit-depth=9/10.
commit df51d8efa8ce9afcedda64acc69c1dba2648716d [revision 2641]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Tue Oct 13 23:50:11 2015 +0200
arm: Eliminate text relocations in asm
Android 6 does not link shared libraries with text relocations.
Make the movrel macro position independent and add movrelx for indirect
loads of external symbols.
Move the function pointer table for the aligned memcpy variants to the
data.rel.ro section on Linux/Android.
commit a2fe237af1b68f2bd53d64ed3faed62429d3ee5a [revision 2640]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Oct 15 11:50:33 2015 +0300
arm: Don't assume alignment in mbtree_propagate_list_internal where it isn't provided
commit 9f422c0cd9c0abcd6a7abb10b51f8be883c39b2b [revision 2639]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Tue Oct 13 23:50:12 2015 +0200
arm: Fix checkasm register clobber check on iOS
r9 is a volatile register in the iOS ABI and will therefore not be
preserved by compiled functions like the luma motion compensation.
Add the symbol prefix to the puts() call and use blx since a switch
between arm and thumb mode might be required.
commit 75992107adcc8317ba2888e3957a7d56f16b5cd4 [revision 2638]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Oct 1 01:02:16 2015 +0300
ppc: Add detection of AltiVec support for FreeBSD
Patch from FreeBSD ports.
commit 479d0c1fe73833ba65e0a10f6f5cf18df6def719 [revision 2637]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Mon Sep 28 21:07:55 2015 +0300
Don't assume 16-byte stack alignment by default on x86-32
Some compilers depending on target OS uses 4-byte stack alignment by default.
Explicitly check known good compilers and specific options for stack alignment.
commit fad44d59b3adeb29b9c92fde0b80116cde79020e [revision 2636]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Sep 22 21:33:07 2015 +0300
Fix a few static analyzer performance hints
commit de24c8c189364013e62d58d1e8f2fef878eb62bf [revision 2635]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Sep 22 20:19:23 2015 +0300
Revise the row VBV algorithm
commit 001d30598c75d9bbc3aa80f67f9bdac17692437d [revision 2634]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Sep 22 19:26:25 2015 +0300
Fix high bit depth lookahead cost compensation algorithm
Now high bit depth VBV should act more like 8-bit depth one.
commit 91368390db9179226b5b4ed718a5788b754f9302 [revision 2633]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Sep 22 19:05:52 2015 +0300
Correctly update the intra row predictor in B-frames
It was previously used but never updated from it's initialization value.
commit e0d722f85f8599e324be2bebef9430155b25c329 [revision 2632]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Tue Sep 22 18:58:24 2015 +0300
Change the predictors update algorithm
Keep predictor offsets more stable. This should fix VBV misprediction in frames
with a large difference in complexity between the top and bottom parts.
commit 6f04b146875c45e6f7845a7bb5fb7fdf8e7534f1 [revision 2631]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Sep 3 09:30:44 2015 +0300
arm: Implement x264_mbtree_propagate_{cost, list}_neon
The cost function could be simplified to avoid having to clobber
q4/q5, but this requires reordering instructions which increase
the total runtime.
checkasm timing Cortex-A7 A8 A9
mbtree_propagate_cost_c 63702 155835 62829
mbtree_propagate_cost_neon 17199 10454 11106
mbtree_propagate_list_c 104203 108949 84532
mbtree_propagate_list_neon 82035 78348 60410
commit 3e25eab0b7172e3c0b067b8b6d641ce148d03db9 [revision 2630]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Sep 3 09:30:43 2015 +0300
x86: Share the mbtree_propagate_list macro with aarch64
This avoids having to duplicate the same code for all architectures
that implement only the internal part of this function in assembler.
commit 654901dfca73a21e2bb2366dda79eb413e9bfb66 [revision 2629]
Author: Martin Storsjö <martin@martin.st>
Date: Wed Sep 2 22:39:51 2015 +0300
arm: Implement luma intra deblocking
checkasm timing Cortex-A7 A8 A9
deblock_luma_intra[0]_c 5988 4653 4316
deblock_luma_intra[0]_neon 3103 2170 2128
deblock_luma_intra[1]_c 7119 5905 5347
deblock_luma_intra[1]_neon 2068 1381 1412
This includes extra optimizations by Janne Grunau.
Timings from a separate build, on Exynos 5422:
Cortex-A7 A15
deblock_luma_intra[0]_c 6627 3300
deblock_luma_intra[0]_neon 3059 1128
deblock_luma_intra[1]_c 7314 4128
deblock_luma_intra[1]_neon 2038 720
commit e2696a60a3e58d92e88e149b63c0b06a066eea9e [revision 2628]
Author: Martin Storsjö <martin@martin.st>
Date: Mon Aug 31 22:40:31 2015 +0300
arm: Implement some neon 8x16c intra predict functions
checkasm timing Cortex-A7 A8 A9
intra_predict_8x16c_dct_c 862 540 590
intra_predict_8x16c_dct_neon 608 511 657
intra_predict_8x16c_h_c 972 707 719
intra_predict_8x16c_h_neon 722 656 672
intra_predict_8x16c_p_c 10183 9819 8655
intra_predict_8x16c_p_neon 2622 1972 1983
commit 5db8b6b93aa91079ab785b9b49413625430536fd [revision 2627]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Aug 28 00:15:01 2015 +0300
arm: Implement x264_plane_copy_neon
checkasm timing Cortex-A7 A8 A9
plane_copy_c 13124 10925 9106
plane_copy_neon 7349 5103 8945
commit 35d32d09e163bb0f2ce60a8e13f9f22125445346 [revision 2626]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Aug 28 09:40:24 2015 +0300
checkasm: arm: Check register clobbering
Cast the function pointer to a different type signature, to
be able to use uint64_t as return type (instead of intptr_t) for
those calls that require it.
Use two separate functions, depending on whether neon is available.
commit 9cbdb635a4bd78e6767e735a062c0d9a5766b849 [revision 2625]
Author: Martin Storsjö <martin@martin.st>
Date: Fri Aug 14 00:00:57 2015 +0300
checkasm: Try different widths for ssd_nv12
To test all codepaths in the aarch64 neon implementation, one at
the very least needs to test with width 8, 16, 24 and 32.
commit 39af8c72e618a544baa06ae427fb2b440861abcd [revision 2624]
Author: Jerome Duval <jerome.duval@gmail.com>
Date: Fri Jun 13 19:56:27 2014 +0000
Haiku support
Add Haiku as supported platform in configure.
Haiku has no nice() function, use the platform specific substitute instead.
commit 59683a97b50b34c6282457a959bb6b3e9e7f8c0d [revision 2623]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 14:38:20 2015 +0300
checkasm: aarch64: Check register clobbering
Disable this on iOS, since it has got a slightly different ABI
for vararg parameters.
commit 5c13589be828b524100c787057d6bef77898c657 [revision 2622]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 23:36:45 2015 +0300
arm: Implement x284_decimate_score15/16/64_neon
checkasm timing Cortex-A7 A8 A9
decimate_score15_c 764 736 535
decimate_score15_neon 487 494 453
decimate_score16_c 782 727 553
decimate_score16_neon 487 494 521
decimate_score64_c 2361 2597 2011
decimate_score64_neon 1017 802 785
commit 3902ae02a0edede5d6c44cb3ee9e24e618c66e6a [revision 2621]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 23:36:44 2015 +0300
arm: Implement chroma intra deblock
checkasm timing Cortex-A7 A8 A9
deblock_chroma_420_intra_mbaff_c 1469 1276 1181
deblock_chroma_420_intra_mbaff_neon 981 717 644
deblock_chroma_intra[1]_c 2954 2402 2321
deblock_chroma_intra[1]_neon 947 581 575
deblock_h_chroma_420_intra_c 2859 2509 2264
deblock_h_chroma_420_intra_neon 1480 1119 1028
deblock_h_chroma_422_intra_c 6211 5030 4792
deblock_h_chroma_422_intra_neon 2894 1990 2077
commit e8b95e92792d9353277995043757430cf3dc3bf7 [revision 2620]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 14:38:17 2015 +0300
arm: Implement x264_pixel_sa8d_satd_16x16_neon
This requires spilling some registers to the stack,
contray to the aarch64 version.
checkasm timing Cortex-A7 A8 A9
sa8d_satd_16x16_neon 12936 6365 7492
sa8d_satd_16x16_separate_neon 14841 6605 8324
commit 6bbaa2758d53d0d6d645142d7d818c960d137a0e [revision 2619]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 14:38:16 2015 +0300
arm: Implement x264_deblock_h_chroma_mbaff_neon
checkasm timing Cortex-A7 A8 A9
deblock_chroma_420_mbaff_c 1944 1706 1526
deblock_chroma_420_mbaff_neon 1210 873 865
commit 3c66591e859045ef79a7131b991a5f20c80ffbb4 [revision 2618]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 14:38:15 2015 +0300
arm: Implement x264_deblock_h_chroma_422_neon
checkasm timing Cortex-A7 A8 A9
deblock_h_chroma_422_c 6953 6269 5145
deblock_h_chroma_422_neon 3905 2569 2551
commit 5265b927b0f2e043dd39cbbbf3909da0862d60e6 [revision 2617]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 14:38:14 2015 +0300
arm: Implement integral_init4/8h/v_neon
checkasm timing Cortex-A7 A8 A9
integral_init4h_c 10466 8590 6161
integral_init4h_neon 3021 1494 1800
integral_init4v_c 16250 13590 13628
integral_init4v_neon 3473 2073 3291
integral_init8h_c 10100 8275 5705
integral_init8h_neon 4403 2344 2751
integral_init8v_c 6403 4632 4999
integral_init8v_neon 1184 783 1306
commit b08403b5593307b919bfe5bfbd743da825326a4c [revision 2616]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 14:38:13 2015 +0300
arm: Implement x264_denoise_dct_neon
checkasm timing Cortex-A7 A8 A9
denoise_dct_c 6604 5510 5858
denoise_dct_neon 1774 1139 1614
commit ceee976bde76a5f4126bfd9d8454f0e601e67204 [revision 2615]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 14:38:12 2015 +0300
arm: Add x264_nal_escape_neon
checkasm timing Cortex-A7 A8 A9
nal_escape_c 852758 879566 655497
nal_escape_neon 376831 450678 371673
commit 8feb733ed1dcb1cc94df3b0e6c98009832ea85cc [revision 2614]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 14:38:11 2015 +0300
arm: Add neon versions of vsad, asd8 and ssd_nv12_core
These are straight translations of the aarch64 versions.
checkasm timing Cortex-A7 A8 A9
vsad_c 16234 10984 9850
vsad_neon 2132 1020 789
asd8_c 5859 3561 3543
asd8_neon 1407 1279 1250
ssd_nv12_c 608096 591072 426285
ssd_nv12_neon 72752 33549 41347
commit 42b3b398664349d23b2122ac940417165424542d [revision 2613]
Author: Martin Storsjö <martin@martin.st>
Date: Tue Aug 25 14:38:10 2015 +0300
checkasm: Check the right output range for integral_initXh
These functions write their output into sum+stride, while we previously
only checked [0..stride-8] within the sum array.
This catches the previously broken aarch64 version of these functions.
Also check up until stride-4 elements for init4h.
commit 3d86abab097fa26d116112f188458269c6a0415f [revision 2612]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Thu Aug 20 13:55:54 2015 +0200
aarch64: Skip deblocking in 264_deblock_h_chroma_422_neon
If the parameters (alpha, beta, tc0[]) indicated that the deblocking
should have been skipped, every 2nd chrome line would have deblocked
anyway.
deblock_h_chroma_422_neon: 2259 (before)
deblock_h_chroma_422_neon: 2192 (after)
commit aec81efd3fe43008551916aa6073eb0732a58210 [revision 2611]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Mon Aug 17 16:39:20 2015 +0200
aarch64: Optimize various intra_predict asm functions
Make them at least as fast as the compiled C version (tested on
cortex-a53 vs. gcc 4.9.2).
C NEON (before) NEON (after)
intra_predict_4x4_dc: 260 335 260
intra_predict_4x4_dct: 210 265 200
intra_predict_8x8c_dc: 497 548 493
intra_predict_8x8c_v: 232 309 179 (arm64)
intra_predict_8x16c_dc: 795 830 790
commit b16268ac0826d78455d0d704ea0fc8b1edc6b6bf [revision 2610]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Tue Aug 18 10:25:10 2015 +0200
aarch64: Faster intra_predict_4x4_h
Use multiplication with 0x01010101 for splats.
On a cortex-a53:
gcc 4.9.2 llvm 3.6 neon (before) neon (after)
intra_predict_4x4_h: 162 147 160/155 139/135
commit f2a6be92e5e42e8ef1daf74f63dbdbc4819d2070 [revision 2609]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Tue Aug 18 10:25:09 2015 +0200
aarch64: Fix coeff_level_run* macros with LLVM's assembler
LLVM's integrated assembler does not treat symbols as integer constants.
commit 592e92e9a8e47c3f0d0017c8158df5a4830e0bbd [revision 2608]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Tue Aug 18 10:25:08 2015 +0200
aarch64: Remove commas LLVM's assembler complains about
commit 6efb57ada652fd015ec4cacffd09282632bb975b [revision 2607]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Aug 13 23:59:31 2015 +0300
arm: Implement x264_sub8x16_dct_dc_neon
checkasm timing Cortex-A7 A8 A9
sub8x16_dct_dc_c 6386 3901 4080
sub8x16_dct_dc_neon 1491 698 917
commit 89439b2c604c81e13eb3da9e692d2cdae5a18b53 [revision 2606]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Aug 13 23:59:28 2015 +0300
arm: Optimize x264_deblock_h_chroma_neon
Shuffle both chroma components together as a 16 bit unit, and
don't write the unchanged columns (like in x264_deblock_h_luma_neon
and in the aarch64 version of the function).
This causes a minor slowdown for x264_deblock_v_chroma_neon, but
it is negligible compared to the speedup.
checkasm timing Cortex-A7 A8 A9
deblock_chroma[1]_c 4817 4057 3601
deblock_chroma[1]_neon 1249 716 817 (before)
deblock_chroma[1]_neon 1249 766 845 (after)
deblock_h_chroma_420_c 3699 3275 2830
deblock_h_chroma_420_neon 2068 1414 1400 (before)
deblock_h_chroma_420_neon 1838 1355 1291 (after)
commit ff71457d71c5c11ed825d848677cab09c7639012 [revision 2605]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Aug 13 23:59:27 2015 +0300
aarch64: Remove leftover commented out code
commit ef6034812162fc8b51bfd5e87387f405d1cc30cb [revision 2604]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Aug 13 23:59:26 2015 +0300
aarch64: Simplify the decimate_score functions
After doing a left shift by the number of bits returned by clz,
only bits set to zero can be shifted out, so if the register
was nonzero to start with (which is checked), it can't become
zero here.
commit d2b04a26b26d02c41ffb05cf1a605dafe9e6fa59 [revision 2603]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Aug 13 23:59:25 2015 +0300
arm: Use aligned loads in x264_coeff_last15_neon
After subtracting 2, the pointer will be aligned.
checkasm timing Cortex-A7 A8 A9
coeff_last15_c 423 375 230
coeff_last15_neon 350 420 404 (before)
coeff_last15_neon 350 400 394 (after)
commit 3f89a6bbee061cb0361770cf5b8495448515a011 [revision 2602]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Aug 13 23:59:24 2015 +0300
arm: Simplify x264_predict_8x8c_p_neon
This gets rid of a few unnecessary (and confusing) steps in
calculating the increment to i00.
checkasm timing Cortex-A7 A8 A9
intra_predict_8x8c_p_c 5525 4732 4755
intra_predict_8x8c_p_neon 1719 1140 1262 (before)
intra_predict_8x8c_p_neon 1663 1142 1255 (after)
commit a0cd7d38acb6c31973228ab207e18344920e0aa3 [revision 2601]
Author: Vittorio Giovara <vittorio.giovara@gmail.com>
Date: Tue Sep 15 15:40:14 2015 +0200
lavf: Use the prefixed name for pixel format enum
commit 63555e696a997ff795798d3357d770f8ab373cd9 [revision 2600]
Author: Janne Grunau <janne-x264@jannau.net>
Date: Thu Sep 3 00:21:58 2015 +0200
aarch64: fix x264_mbtree_propagate_cost_neon
The branch conditon caused the loop to execute one time more than
intended. Detected by a memory corruption on arm with the 1 to 1 port of
the function.
commit 5c4728d8dd82ba46901824470db1609ae0f2521d [revision 2599]
Author: Martin Storsjö <martin@martin.st>
Date: Thu Aug 13 23:59:22 2015 +0300
aarch64: Fix integral_init4/8h_neon
The stride is the number of uint16_t elements and thus needs
to be shifted.
This issue had slipped unnoticed since checkasm didn't actually
verify the output of these functions.
commit 67076513267907b5601828ae6864cc063c8c7548 [revision 2598]
Author: Henrik Gramner <henrik@gramner.com>
Date: Thu Aug 27 19:53:00 2015 +0200
x86: Fix integral_init4/8h_avx2
The AVX2 implementation was using the wrong offsets. It went undetected due to
the checkasm test being incorrect.
commit e86f3a1993234e8f26050c243aa253651200fa6b [revision 2597]
Author: Mark Webster <mark.webster@gmail.com>
Date: Wed Aug 5 04:28:17 2015 +0100
Simplify inclusion of x264.h in C++ projects
Name all structs to support forward declarations.
Add a conditional extern "C" wrapper in x264.h itself instead of having to
specify it in every location where it's included.
commit 401941cc7099b322864600b62104940542497e7a [revision 2596]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Aug 16 21:59:26 2015 +0200
checkasm: Properly save rdx/edx in checkasm_call() on x86
If the return value doesn't fit in a single register rdx/edx can in some
cases be used in addition to rax/eax.
Doesn't affect any of the existing checkasm tests but it's more correct
behavior and it might be useful in the future.
commit 3dff8af3033a9e81d7966c5749fd361ce421467a [revision 2595]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Aug 11 17:19:35 2015 +0200
x86: Enable SSE2 by default on x86-32
It makes more sense to tune the defaults to benefit the vast majority of users.
Anyone still using a Pentium III for video encoding is of course free to
explicitly set different flags when compiling.
commit 51d8aa09b777dc2969deaa954d5f6af9836c02ba [revision 2594]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Aug 10 22:30:21 2015 +0200
msvs/icl: Improve default CFLAGS
Use -fp:fast as a substitute for -ffast-math.
Increase warning level from -W0 to -W1 (the default setting).
Disable -GS (stack cookies) on MSVS. It's disabled by default on ICL.
commit 7edaf4b966aaee098ff301436f8d2b33a6fe5983 [revision 2593]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Aug 12 22:23:31 2015 +0200
Use a relative $SRCPATH for out-of-tree builds
Fixes out-of-tree MSVS builds on Cygwin.
commit e7b4b863dc2555ed835569c400d3a30f7ddc15ff [revision 2592]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Aug 8 22:26:38 2015 +0200
cygwin: Enable MSVS support
`cl -showIncludes` creates absolute Windows paths for some files, attempt
to convert those to Unix paths.
Use relative paths for dependencies located in or below the working directory
in order to mimic the behavior of gcc and to make the paths more readable.
Make the dependency generation script a bit more robust in general.
commit 817a4414b98e8a511c626932e7d433388bc96507 [revision 2591]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Aug 8 18:34:21 2015 +0200
cltostr.sh: Minor fixes
commit 1a3d963441eaad25972763423d60158f597c5f65 [revision 2590]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat Aug 8 12:21:54 2015 +0200
Simplify version.sh
Also remove some non-POSIX syntax and improve robustness.
As a bonus the script now runs about 2-3 times faster.
`git rev-list --count` could be used to simplify things even further,
but that functionality was added in git 1.7.2 so keep `wc -l` for now
to maintain compatibility with older git versions.
commit f7f6af76ef22e812ef330e2839488e83dd553836 [revision 2589]
Author: 장영훈 <mieabby@gmail.com>
Date: Fri Aug 7 14:43:24 2015 +0900
msvs: Fix cl detection in non-English environments
commit e1a55bbbff2b4460ceb843f163e349fed7d32969 [revision 2588]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Aug 3 21:05:11 2015 +0200
x86inc: Sync minor changes from ffmpeg/libav
commit 36f537b141da076032fd11f1745bb62d466dd7bf [revision 2587]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Jul 29 19:30:52 2015 +0200
matroska: Add comments for the remaining element names
commit f04062e6380cbe10453dab33a3575c373e63ff9b [revision 2586]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed Jul 29 19:30:41 2015 +0200
Silence various static analyzer warnings
Those are false positives, but it doesn't hurt to get rid of them.
commit b1cbf7ebe4a192bbc25cc910cb2910a34992f807 [revision 2585]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jul 26 23:13:29 2015 +0200
mingw: Enable the tsaware linker flag
Avoids an irrelevant compatibility layer in Terminal Services environments.
https://msdn.microsoft.com/en-us/library/cc834995.aspx
commit 8a1ff031ecd4b423fc373540b9b68cdf97602bbf [revision 2584]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jul 26 23:13:26 2015 +0200
msvs: Don't redefine snprintf for VS2015
Visual Studio 2015 has a proper snprintf implementation.
commit aa9d22927c0264c08c11c9e72294fc651a155b3e [revision 2583]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun Jul 26 23:13:19 2015 +0200
msvs: Prefer link.exe from the same directory as cl.exe
/usr/bin/link from coreutils may be located before the MSVS linker in $PATH
which causes linking to fail due to using the wrong binary.
commit ca8bd68063d74227d917f34fd50942265f9a106c [revision 2582]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Jul 27 00:10:00 2015 +0200
frame_dump: check fseek() return value
commit 53b3b747e22f53204f6efb5106ab4a5a8eb57626 [revision 2581]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Jul 27 00:08:38 2015 +0200
x264_vfprintf: use va_copy
It's undefined behavior to use the same va_list twice.
This most likely didn't cause any issues in practice since the string would
have to be larger than 4 KiB to trigger the fallback path.
Use workaround for ICL as it doesn't define va_copy even for C99.
commit 59e7ded846a832125cb533aadff9895487771ea7 [revision 2580]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Jul 27 00:08:31 2015 +0200
param_parse: Fix framerate rounding issues
commit 73ae2d11d472d0eb3b7c218dc1659db32f649b14 [revision 2579]
Author: Marcin Juszkiewicz <mjuszkiewicz@redhat.com>
Date: Mon Jun 1 11:24:45 2015 +0200
aarch64: Remove broken CFLAGS in configure
GCC doesn't have an "-arch" switch, but works when that entire line is removed.
commit cc002bd545b008b1cdc7c6d7cc0c616ba125d4d5 [revision 2578]
Author: Rong Yan <rongyan236@foxmail.com>
Date: Mon Jul 20 03:34:20 2015 -0500
ppc: Add little-endian PowerPC support
commit 145f3a6275802a649b8dedb49bb0e054caf31717 [revision 2577]
Author: Rishikesh More <rishikesh.more@imgtec.com>
Date: Thu Jun 18 17:48:46 2015 +0530
mips: MSA quant optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
commit 16395d2b6f827b076612eb5b70711b79621da67e [revision 2576]
Author: Rishikesh More <rishikesh.more@imgtec.com>
Date: Thu Jun 18 17:48:45 2015 +0530
mips: MSA predict optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
commit 204e1a60237e0b3168ccbdb2905c9af8188b90ee [revision 2575]
Author: Rishikesh More <rishikesh.more@imgtec.com>
Date: Thu Jun 18 17:48:44 2015 +0530
mips: MSA pixel optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
commit 3ce6430eb11839c69d606c59c0f8c31ce0b6dd17 [revision 2574]
Author: Rishikesh More <rishikesh.more@imgtec.com>
Date: Thu Jun 18 17:48:43 2015 +0530
mips: MSA deblock optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
commit 57618eead025eaf654226add94689d6d2999ccf6 [revision 2573]
Author: Rishikesh More <rishikesh.more@imgtec.com>
Date: Thu Jun 18 17:48:42 2015 +0530
mips: MSA dct optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
commit 4ebb23aaf4f46b7a04aa8aefa3c08e7b6493de4c [revision 2572]
Author: Rishikesh More <rishikesh.more@imgtec.com>
Date: Thu Jun 18 17:48:40 2015 +0530
mips: MSA mc optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
commit cd19444d3f9915a5a33a95e308bc8021d7e62afe [revision 2571]
Author: Rishikesh More <rishikesh.more@imgtec.com>
Date: Thu Jun 18 17:48:38 2015 +0530
mips: Common MSA macros
Add macros for load/store, slide, shift, transpose and basic arithmetic
operations required by subsequent patches.
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
commit 72b82bd98a99b1d75322b70a74365547382ce062 [revision 2570]
Author: Rishikesh More <rishikesh.more@imgtec.com>
Date: Tue May 12 19:38:09 2015 +0530
mips: Add MSA support to checkasm
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
commit ce0757d9d2778e349a7c2f6445b6aa75d8765c30 [revision 2569]
Author: Kaustubh Raste <kaustubh.raste@imgtec.com>
Date: Fri Apr 17 17:38:58 2015 +0530
mips: Initial MSA support
MSA is the MIPS SIMD Architecture.
Add X264_CPU_MSA define.
Update configure to detect MIPS platform and set flags.
CPU-specific gcc options are expected through --extra-cflags.
Sample command line for mips32r5:
./configure --host=mipsel-linux-gnu --cross-prefix=<TOOLCHAIN>/mips-mti-linux-gnu-
--extra-cflags="-EL -mips32r5 -msched-weight -mload-store-pairs"
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
commit 9140ee1fb39bd4a4ccace28091398e8a96704f07 [revision 2568]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Fri Jul 17 00:22:29 2015 +0300
Limit autodetection of threads number according to the source height
commit aeaed2d07b5b43437bb640e1f987d42a6fab03b9 [revision 2567]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Jul 16 19:04:59 2015 +0300
Fine-tune of frame's size predictors at ratecontrol start
This is attempt to improve VBV at start of video with a lot of threads which
delay feedback for predictors.
commit aa275158641e94203003157947d43ff4cc685068 [revision 2566]
Author: Anton Mitrofanov <BugMaster@narod.ru>
Date: Thu Jul 16 16:15:56 2015 +0300
Use forced frame types in slicetype analysis
This should improve MBTree and VBV when a lot of forced frame types are used.
commit a83edfa053f60ad0c8a164f31e7492a680eef361 [revision 2565]
Author: Henrik Gramner <henrik@gramner.com>
Date: Mon Dec 1 22:05:42 2014 +0100
x86: SSSE3 and AVX2 implementations of plane_copy_swap
For NV21 input.
commit 627f891c571cacb51deb5e211b23c309b14a6587 [revision 2564]
Author: Yu Xiaolei <dreifachstein@gmail.com>
Date: Fri Jun 6 16:05:27 2014 +0800
NV21 input support
Eliminates an extra copy when encoding Android camera preview images.
Checkasm test by Janne Grunau.
ARM assembly with improvements from Janne Grunau.
commit 6ee94dc898dc029553e308f1e76891ccefb3f0a7 [revision 2563]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Jun 23 17:00:47 2015 +0200
deblock: Write combining
commit 08a9c51919f4edbd6e484155e5521a92a0800651 [revision 2562]
Author: Henrik Gramner <henrik@gramner.com>
Date: Tue Jun 23 14:59:59 2015 +0200
Get rid of some tabs and trailing whitespaces
commit b568a256b9bc6c500d7b1ffe4b9c3311ee5ff337 [revision 2561]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat May 23 19:44:16 2015 +0200
x86: Experimental nasm support
Enables the use of nasm as an alternative to yasm.
Note that nasm cannot assemble x264 with PIC enabled since it currently doesn't
support [symbol-$$] addressing which is used extensively by x264's PIC code.
This includes all 64-bit Windows and 64-bit OS X builds, even non-shared.
For the above reason nasm is currently intentionally not auto-detected, instead
the assembler must be explicitly specified using "AS=nasm ./configure".
Also drop -O2 from ASFLAGS since it's simply ignored anyway.
commit d14e38c059c9a2aecc82477b99d56ef74eb731ec [revision 2560]
Author: Timothy Gu <timothygu99@gmail.com>
Date: Tue May 26 19:12:42 2015 +0200
x86inc: Prevent warnings when using `struc` and `endstruc`
struc and endstruc attempts to revert to the previous section state set by
the SECTION macro.
Use the primitive [SECTION] directive instead of the SECTION macro for the
.note.GNU-stack section to prevent it from being emitted again during endstruc.
commit 353b1f888c34081e94727a1ffa0e4920e2cfe8a9 [revision 2559]
Author: Henrik Gramner <henrik@gramner.com>
Date: Wed May 27 21:38:14 2015 +0200
x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
commit b615f82e45c88b7915c5571ad09fa65a0b6130d7 [revision 2558]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sat May 23 13:38:05 2015 +0200
x86inc: Disable vpbroadcastq workaround in newer yasm versions
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.
commit 8f834d6ccc054d8c32d84310664dc07abac553ec [revision 2557]
Author: Henrik Gramner <henrik@gramner.com>
Date: Sun May 24 22:57:00 2015 +0200