Commit Graph

302 Commits

Author SHA1 Message Date
Michael Niedermayer c3814ab654 rename new lls code to lls2 to avoid conflict with the old which has a different ABI
also remove failed attempt at a compatibility layer, the code simply cannot work

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-17 16:41:08 +01:00
Michael Niedermayer bbe66ef912 avutil: rename lls to lls2
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-17 16:30:23 +01:00
Michael Niedermayer a665704402 Merge commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4'
* commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4':
  libavutil: x86: Add AVX2 capable CPU detection.

Conflicts:
	libavutil/cpu.c
	libavutil/cpu.h
	libavutil/x86/cpu.c

See: 865b70bc5d
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-26 02:36:36 +02:00
Kieran Kunhya 865b70bc5d Add AVX2 capable CPU detection. Patch based on x264's AVX2 detection
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-26 02:34:22 +02:00
Kieran Kunhya 4d6ee07255 libavutil: x86: Add AVX2 capable CPU detection.
Patch based on x264's AVX2 detection

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-25 19:36:55 +01:00
Michael Niedermayer f9bef2bec9 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: more AVX2 framework

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 16:13:57 +02:00
Michael Niedermayer e3e0e3d0c9 Merge commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497'
* commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497':
  x86inc: FMA3/4 Support

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 16:06:22 +02:00
Michael Niedermayer 9ac124c889 Merge commit '206895708ea2b464755d340e44501daf9a07c310'
* commit '206895708ea2b464755d340e44501daf9a07c310':
  x86inc: Remove our FMA4 support

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 15:54:23 +02:00
Michael Niedermayer 12e4493f9c Merge commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098'
* commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098':
  x86inc: Use VEX-encoded instructions in AVX functions

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 15:48:34 +02:00
Jason Garrett-Glaser a3fabc6cb3 x86: more AVX2 framework
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:56 +01:00
Jason Garrett-Glaser c6908d6b4b x86inc: FMA3/4 Support
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:54 +01:00
Derek Buitenhuis 206895708e x86inc: Remove our FMA4 support
This is so we can sync to x264's version of FMA4 support.

This partialy reverts commit 79687079a9.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:39:29 +01:00
Henrik Gramner c108ba0175 x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.

This change makes it easier to extend existing code to use AVX2.

Also add support for AVX emulation of a few instructions that
were missing before.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:36:11 +01:00
Michael Niedermayer 31d0d35560 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86inc: Remove .rodata kludges

Conflicts:
	libavutil/x86/x86inc.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-09 14:29:42 +02:00
Henrik Gramner ad7d7d4f6a x86inc: Remove .rodata kludges
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-09 07:44:30 -04:00
Michael Niedermayer 19c3890819 Merge commit '3e2fa991db7ef172579422accd61624d52777e5a'
* commit '3e2fa991db7ef172579422accd61624d52777e5a':
  x86inc: remove misaligned cpu flag

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 12:02:21 +02:00
Michael Niedermayer 31d9aa6b2e Merge commit '71155665414b551ad350622d5abed20e58371fbf'
* commit '71155665414b551ad350622d5abed20e58371fbf':
  x86inc: various minor backports from x264

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:57:39 +02:00
Michael Niedermayer 3f965ab95d Merge commit '47f9d7ce5493e119e09d1227d017414feaaf8d97'
* commit '47f9d7ce5493e119e09d1227d017414feaaf8d97':
  x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:37:22 +02:00
Michael Niedermayer 1f17619fe4 Merge commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450'
* commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450':
  x86inc: Utilize the shadow space on 64-bit Windows

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:23:00 +02:00
Michael Niedermayer 17d9c7c208 Merge commit '3fb78e99a04d0ed8db834d813d933eb86c37142a'
* commit '3fb78e99a04d0ed8db834d813d933eb86c37142a':
  x86inc: create xm# and ym#, analagous to m#

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:15:17 +02:00
Michael Niedermayer 3352fdb292 Merge commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2'
* commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2':
  x86inc: fix some corner cases of SWAP

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:07:03 +02:00
Michael Niedermayer 006c0fcfea Merge commit '63f0d623100bdb0c6081456127f4b6713e83d3db'
* commit '63f0d623100bdb0c6081456127f4b6713e83d3db':
  x86inc: Use SSE instead of SSE2 for copying data

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:01:40 +02:00
Michael Niedermayer faafffaf82 Merge commit 'ad76e6e7e193b98e7335156422d35467816f9ef1'
* commit 'ad76e6e7e193b98e7335156422d35467816f9ef1':
  x86inc: Set ELF hidden visibility for global constants

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 10:52:51 +02:00
Michael Niedermayer c1488fab3d Merge commit '25cb0c1a1e66edacc1667acf6818f524c0997f10'
* commit '25cb0c1a1e66edacc1667acf6818f524c0997f10':
  x86inc: activate REP_RET automatically

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 10:27:30 +02:00
Henrik Gramner 3e2fa991db x86inc: remove misaligned cpu flag
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.

Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:38 -04:00
Jason Garrett-Glaser 7115566541 x86inc: various minor backports from x264
Small backports that sneaked into other asm commits in x264.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:22 -04:00
Derek Buitenhuis 47f9d7ce54 x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
This is also a valid value for WIN64.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:08 -04:00
Henrik Gramner bbe4a6db44 x86inc: Utilize the shadow space on 64-bit Windows
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:35 -04:00
Loren Merritt 3fb78e99a0 x86inc: create xm# and ym#, analagous to m#
For when we want to mix simd sizes within one function.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:19 -04:00
Loren Merritt 49ebe3f9fe x86inc: fix some corner cases of SWAP
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:06 -04:00
Henrik Gramner 63f0d62310 x86inc: Use SSE instead of SSE2 for copying data
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:33 -04:00
Henrik Gramner ad76e6e7e1 x86inc: Set ELF hidden visibility for global constants
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:13 -04:00
Loren Merritt 25cb0c1a1e x86inc: activate REP_RET automatically
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.

The implementation involves lots of spurious labels, but that's OK
because we strip them.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:17:59 -04:00
Ronald S. Bultje c07ac8d467 VP9 MC (ssse3) optimizations.
Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.
2013-10-02 21:03:15 -04:00
Michael Niedermayer 361bc70731 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  avutil: Fix compilation with inline asm disabled on mingw

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-09-22 11:51:38 +02:00
Alex Smith 08fa828b3f avutil: Fix compilation with inline asm disabled on mingw
Because of -Werror=implicit-function-declaration the build will fail.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-09-22 00:50:32 +03:00
Thilo Borgmann d814a839ac Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
Michael Niedermayer f0a3562382 Merge commit '79aec43ce813a3e270743ca64fa3f31fa43df80b'
* commit '79aec43ce813a3e270743ca64fa3f31fa43df80b':
  x86: Add and use more convenience macros to check CPU extension availability

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-30 11:57:35 +02:00
Michael Niedermayer 2a60666d1d Merge commit '8410d6e93c2e074881f1c7b7e4cdefd2e497d52e'
* commit '8410d6e93c2e074881f1c7b7e4cdefd2e497d52e':
  avutil: Refactor CPU extension availability macros

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-29 14:15:10 +02:00
Michael Niedermayer c83d794936 Merge commit 'b78b10c4b78b696927f2801cf2d9f193b4eff28b'
* commit 'b78b10c4b78b696927f2801cf2d9f193b4eff28b':
  avutil: Move internal CPU detection function declarations to private header

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-29 14:05:15 +02:00
Diego Biurrun 79aec43ce8 x86: Add and use more convenience macros to check CPU extension availability 2013-08-29 13:07:37 +02:00
Diego Biurrun 8410d6e93c avutil: Refactor CPU extension availability macros 2013-08-28 23:54:14 +02:00
Diego Biurrun b78b10c4b7 avutil: Move internal CPU detection function declarations to private header 2013-08-28 23:54:14 +02:00
Michael Niedermayer 9d01bf7d66 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  Consistently use "cpu_flags" as variable/parameter name for CPU flags

Conflicts:
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/h264dsp_init.c
	libavcodec/x86/hpeldsp_init.c
	libavcodec/x86/motion_est.c
	libavcodec/x86/mpegvideo.c
	libavcodec/x86/proresdsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-18 09:53:47 +02:00
Diego Biurrun 3ac7fa81b2 Consistently use "cpu_flags" as variable/parameter name for CPU flags 2013-07-18 00:31:35 +02:00
Michael Niedermayer a478e99a60 avutil/x86: reenable ff_update_lls_avx()
The bug has been fixed in c8b920a9b7 by Loren Merritt

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-02 12:02:08 +02:00
Michael Niedermayer d1fa671895 Merge commit 'c8b920a9b7fa534a6141695ace4e8c2dfcd56cee'
* commit 'c8b920a9b7fa534a6141695ace4e8c2dfcd56cee':
  lls/x86: use 3-operator vaddpd in ADDPD_MEM

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-02 11:40:44 +02:00
Loren Merritt c8b920a9b7 lls/x86: use 3-operator vaddpd in ADDPD_MEM
Fixes build with yasm-1.1

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-07-02 10:15:09 +02:00
Michael Niedermayer a6e46ed51a Revert "avutil/x86: disable ff_evaluate_lls_sse2() for 32bit"
This reverts commit 247425241c.
2013-07-01 02:27:47 +02:00
Michael Niedermayer 4e488ac5f5 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: lpc: fix a segfault in av_evaluate_lls_sse2()

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-01 02:26:22 +02:00