ffmpeg-rockchip

mirror of https://github.com/nyanmisaka/ffmpeg-rockchip.git synced 2026-04-28 01:54:00 +08:00

Author	SHA1	Message	Date
Martin Storsjö	6a62795d40	aarch64: h264idct: Use the offset parameter to movrel Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:18:22 +02:00
Martin Storsjö	383d96aa22	aarch64: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. These are ported from the ARM version; it is essentially a 1:1 port with no extra added features, but with some hand tuning (especially for the plain copy/avg functions). The ARM version isn't very register starved to begin with, so there's not much to be gained from having more spare registers here - we only avoid having to clobber callee-saved registers. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_neon: 27.2 23.7 vp9_avg8_neon: 56.5 54.7 vp9_avg16_neon: 169.9 167.4 vp9_avg32_neon: 585.8 585.2 vp9_avg64_neon: 2460.3 2294.7 vp9_avg_8tap_smooth_4h_neon: 132.7 125.2 vp9_avg_8tap_smooth_4hv_neon: 478.8 442.0 vp9_avg_8tap_smooth_4v_neon: 126.0 93.7 vp9_avg_8tap_smooth_8h_neon: 241.7 234.2 vp9_avg_8tap_smooth_8hv_neon: 690.9 646.5 vp9_avg_8tap_smooth_8v_neon: 245.0 205.5 vp9_avg_8tap_smooth_64h_neon: 11273.2 11280.1 vp9_avg_8tap_smooth_64hv_neon: 22980.6 22184.1 vp9_avg_8tap_smooth_64v_neon: 11549.7 10781.1 vp9_put4_neon: 18.0 17.2 vp9_put8_neon: 40.2 37.7 vp9_put16_neon: 97.4 99.5 vp9_put32_neon/armv8: 346.0 307.4 vp9_put64_neon/armv8: 1319.0 1107.5 vp9_put_8tap_smooth_4h_neon: 126.7 118.2 vp9_put_8tap_smooth_4hv_neon: 465.7 434.0 vp9_put_8tap_smooth_4v_neon: 113.0 86.5 vp9_put_8tap_smooth_8h_neon: 229.7 221.6 vp9_put_8tap_smooth_8hv_neon: 658.9 621.3 vp9_put_8tap_smooth_8v_neon: 215.0 187.5 vp9_put_8tap_smooth_64h_neon: 10636.7 10627.8 vp9_put_8tap_smooth_64hv_neon: 21076.8 21026.9 vp9_put_8tap_smooth_64v_neon: 9635.0 9632.4 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is pretty much the same as for the 32 bit case; on the A53 it's around 6-13x for ther larger 8tap filters. The exact speedup varies a little, since the C versions generally don't end up exactly as slow/fast as on 32 bit. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:15:56 +02:00
Diego Biurrun	72a19f4013	mpegaudiodsp: aarch64: Adjust function prototype after `2caa93b813`	2016-11-10 00:13:48 +01:00
Martin Storsjö	9b2ccafb48	aarch64: Add missing sign extension in ff_h264_idct8_add_neon Signed-off-by: Martin Storsjö <martin@martin.st>	2016-10-10 14:57:53 +03:00
James Almer	42111e8543	avcodec: fix arguments on xmm/neon clobber test wrappers Signed-off-by: James Almer <jamrial@gmail.com>	2016-10-02 02:15:47 -03:00
James Almer	449f263f9f	avcodec: add missing xmm/neon clobber test wrappers for the new encode API Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-10-01 14:08:50 -03:00
Diego Biurrun	2caa93b813	mpegaudiodsp: Change type of array stride parameters to ptrdiff_t This avoids SIMD-optimized functions having to sign-extend their stride argument manually to be able to do pointer arithmetic.	2016-09-29 17:54:24 +02:00
Diego Biurrun	e4a94d8b36	h264chroma: Change type of stride parameters to ptrdiff_t This avoids SIMD-optimized functions having to sign-extend their stride argument manually to be able to do pointer arithmetic.	2016-09-29 14:48:04 +02:00
Anton Khirnov	de2ae3c1fa	lavc: add clobber tests for the new encoding/decoding API	2016-09-28 10:01:52 +02:00
Xiaolei Yu	5a70e56f2f	avcodec: fix vc1dsp dependencies	2016-09-25 13:11:45 +02:00
James Almer	293484fa5e	avcodec: add missing xmm/neon clobber test wrappers for the new decode API Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-07-03 18:04:30 -03:00
Clément Bœsch	4a081f224e	libavcodec: fix constness in clobber test avcodec_open2() wrappers Signed-off-by: Martin Storsjö <martin@martin.st>	2016-06-26 21:34:04 +03:00
Clément Bœsch	dfd0c0f981	lavc/neontest: fix constness in arm/aarch64 avcodec_open2() wrappers	2016-06-25 13:41:13 +02:00
Clément Bœsch	8ef57a0d61	Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' * commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb': cosmetics: Fix spelling mistakes Merged-by: Clément Bœsch <u@pkh.me>	2016-06-21 21:55:34 +02:00
James Almer	c8c14d0ffc	aarch64/synth_filter: fix compilation Signed-off-by: James Almer <jamrial@gmail.com>	2016-05-10 23:33:12 -03:00
Derek Buitenhuis	ca5ec2bf51	Merge commit '01621202aad7e27b2a05c71d9ad7a19dfcbe17ec' * commit '01621202aad7e27b2a05c71d9ad7a19dfcbe17ec': build: miscellaneous cosmetics Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2016-05-09 16:25:28 +01:00
Vittorio Giovara	41ed7ab45f	cosmetics: Fix spelling mistakes Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-05-04 18:16:21 +02:00
Derek Buitenhuis	87b8e95008	Merge commit 'cdb1665f70def544ddab3e3ed3763ef99c8b3873' * commit 'cdb1665f70def544ddab3e3ed3763ef99c8b3873': aarch64: Make transpose_4x4H do a regular transpose Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2016-04-24 12:51:42 +01:00
Derek Buitenhuis	197fa698c6	Merge commit '97aec6e75ef36ed0402653519daa8e1fc8ddb555' * commit '97aec6e75ef36ed0402653519daa8e1fc8ddb555': fft: arm: Drop unnecessary #include, add missing ones Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2016-04-12 15:43:09 +01:00
Diego Biurrun	01621202aa	build: miscellaneous cosmetics Restore alphabetical order in lists, break overly long lines, do some prettyprinting, add some explanatory section comments, group parts together that belong together logically.	2016-04-07 15:26:08 +02:00
Martin Storsjö	cdb1665f70	aarch64: Make transpose_4x4H do a regular transpose Previously, ff_h264_idct_add_neon (originally in the arm version) used a non-regular transpose in order to be able to use more instructions that deal with registers as 128 bit register pairs. The aarch64 translation doesn't do it to the same extent, but brought along the same structure since it was a straight translation. This reshuffles ff_h264_idct_add_neon, bringing it closer to the C implementation, making the transpose_4x4H macro do a regular transpose, usable for other algorithms as well. Previously, the third and fourth output from transpose_4x4H were swapped, and prior to `cc29d96d5a`, the same inputs as well. In addition to just swapping the outputs, also renumber the intermediate registers for better readability (making the register order match transpose_4x8B). This runs with the same number of cycles as before. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-03-26 21:25:56 +02:00
Diego Biurrun	1a094af638	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
Diego Biurrun	97aec6e75e	fft: arm: Drop unnecessary #include, add missing ones	2016-02-26 14:34:58 +01:00
foo86	ae5b2c5250	avcodec/dca: add new decoder based on libdcadec	2016-01-31 17:09:38 +01:00
foo86	4608996772	avcodec/dca: remove old decoder Remove all files and functions which are not going to be reused, and disable all functions and FATE tests temporarily which will be.	2016-01-31 17:09:38 +01:00
James Almer	209f50e16b	avcodec/synth_filter: split off remaining code from dcadec files Signed-off-by: James Almer <jamrial@gmail.com>	2016-01-25 14:57:38 -03:00
Hendrik Leppkes	d03da3e240	Merge commit '2008f76054906e9ff6bf744800af0e5a5bfe61be' * commit '2008f76054906e9ff6bf744800af0e5a5bfe61be': dca: remove unused decode_hf function and quant_d tables Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-01-02 13:17:48 +01:00
Hendrik Leppkes	e97e2588ca	Merge commit 'a0fc780a2093784e8664f88205ee1b215e109cee' * commit 'a0fc780a2093784e8664f88205ee1b215e109cee': arm64: int32_to_float_fmul neon asm Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-01-02 11:21:16 +01:00
Hendrik Leppkes	10e075c138	Merge commit '705f5e5e155f6f280a360af220fc5b30cfcee702' * commit '705f5e5e155f6f280a360af220fc5b30cfcee702': arm64: port synth_filter_float_neon from arm Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-01-02 11:14:28 +01:00
Hendrik Leppkes	de3a33784c	Merge commit 'c33c1fa8af2b2e82418a06901b6ad17b3d61b73e' * commit 'c33c1fa8af2b2e82418a06901b6ad17b3d61b73e': arm64: convert dcadsp neon asm from arm Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-01-02 11:10:24 +01:00
Alexandra Hájková	2008f76054	dca: remove unused decode_hf function and quant_d tables They were superseded with their integer equivalents. Rename integer decode_hf to decode_hf.	2015-12-24 13:58:18 +01:00
Janne Grunau	cc29d96d5a	arm64: fix inverted register order in transpose_4x4H Fix related register order issue in ff_h264_idct_add_neon. Found-by: zjh8890 <243186085@qq.com>	2015-12-21 13:44:20 +01:00
Janne Grunau	2dba0407fd	avcodec/arm64: fix inverted register order in transpose_4x4H Fix related register order issue in ff_h264_idct_add_neon. Found-by: zjh8890 <243186085@qq.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2015-12-19 03:58:46 +01:00
Michael Niedermayer	95b59bfb9d	Revert "avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H" The change was not correct and broke H264 This reverts commit cd83f899c94f691b045697d12efa21f83eb2329f.	2015-12-17 21:26:37 +01:00
Janne Grunau	a0fc780a20	arm64: int32_to_float_fmul neon asm 3% faster dts decoding on a cortex-a57. cortex-a57 cortex-a53 int32_to_float_fmul_array8_c: 1270.9 4475.6 int32_to_float_fmul_array8_neon: 328.6 569.2 int32_to_float_fmul_scalar_c: 928.5 4119.6 int32_to_float_fmul_scalar_neon: 309.1 524.1	2015-12-14 16:45:02 +01:00
Janne Grunau	705f5e5e15	arm64: port synth_filter_float_neon from arm ~25% faster dts decoding overall. The checkasm CPU cycles numbers are not that useful since synth_filter_float() calls FFTContext.imdct_half(). cortex-a57 cortex-a53 synth_filter_float_c: 1866.2 3490.9 synth_filter_float_neon: 915.0 1531.5 With fftc.imdct_half forced to imdct_half_neon: cortex-a57 cortex-a53 synth_filter_float_c: 1718.4 3025.3 synth_filter_float_neon: 926.2 1530.1	2015-12-14 16:45:01 +01:00
Janne Grunau	c33c1fa8af	arm64: convert dcadsp neon asm from arm ~2% faster dts decoding overall. cortex-a57 cortex-a53 dca_decode_hf_c: 474.8 1659.9 dca_decode_hf_neon: 225.2 301.1 dca_lfe_fir0_c: 913.2 1537.7 dca_lfe_fir0_neon: 286.8 451.9 dca_lfe_fir1_c: 848.7 1711.5 dca_lfe_fir1_neon: 387.1 506.4	2015-12-14 16:45:01 +01:00
zjh8890	c18176bd55	avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong, this bug waste me much time while I make aarch64 arm instruction which used the function.	2015-12-12 14:20:01 +01:00
Michael Niedermayer	5d5f8b29b4	Merge commit 'f56d8d8dd72b1ab52aa814c5a0fccabf8040ef68' * commit 'f56d8d8dd72b1ab52aa814c5a0fccabf8040ef68': h264: aarch64: intra prediction optimisations Conflicts: libavcodec/h264pred.c Merged-by: Michael Niedermayer <michael@niedermayer.cc>	2015-07-21 01:39:30 +02:00
Janne Grunau	f56d8d8dd7	h264: aarch64: intra prediction optimisations	2015-07-20 23:10:29 +02:00
Janne Grunau	c2de2cf0d2	arm64: constify src in h264qpel dsp function definitions	2015-06-24 08:41:32 +02:00
Michael Niedermayer	7b32b35bf5	Merge commit '3d5d46233cd81f78138a6d7418d480af04d3f6c8' * commit '3d5d46233cd81f78138a6d7418d480af04d3f6c8': opus: Factor out imdct15 into a standalone component Conflicts: configure libavcodec/opus_celt.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-02 20:43:13 +01:00
Diego Biurrun	3d5d46233c	opus: Factor out imdct15 into a standalone component It will be reused by the AAC decoder.	2015-02-02 16:07:33 +01:00
Carl Eugen Hoyos	4faea46bd9	lavc/aarch64: Do not use the neon horizontal chroma loop filter for H.264 4:2:2.	2015-01-31 10:05:10 +01:00
Michael Niedermayer	92d47e2aa3	Merge commit '780cd20b00a69e26bbfffbb8eec16fbe999ea793' * commit '780cd20b00a69e26bbfffbb8eec16fbe999ea793': aarch64: Use .data.rel.ro for const data with relocations Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-09 12:08:29 +01:00
Martin Storsjö	780cd20b00	aarch64: Use .data.rel.ro for const data with relocations This reverts commit `c00365b46d` in addition to using a different section. Signed-off-by: Martin Storsjö <martin@martin.st>	2014-12-09 11:43:31 +02:00
Michael Niedermayer	f3cba01cce	Merge commit 'c00365b46d464ce47716315c1801818d811bdb9a' * commit 'c00365b46d464ce47716315c1801818d811bdb9a': aarch64: Make the function pointer tables position independent Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-16 01:05:31 +01:00
Martin Storsjö	c00365b46d	aarch64: Make the function pointer tables position independent This allows running the code on android, where 64 bit binaries with text relocations aren't allowed to be loaded. Signed-off-by: Martin Storsjö <martin@martin.st>	2014-11-16 01:07:24 +02:00
Michael Niedermayer	e16b7338d8	avcodec/aarch64/h264qpel_init_aarch64: mark src as const Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-30 12:48:31 +02:00
Michael Niedermayer	7fd60d1e7a	Merge commit 'ac6b95dbc0b53b3ea461bd5e5e7f7f31d2983733' * commit 'ac6b95dbc0b53b3ea461bd5e5e7f7f31d2983733': aarch64: add ',' between assembler macro arguments where missing Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-04 04:06:13 +02:00

1 2 3 4 5

237 Commits