FFMPEG音频解码浅析

xiaoxiao2021-03-25 114

转自：http://blog.csdn.net/xiaozhu1100/article/details/16929181

结合各种资料和自己的理解，估计有些浅显。

FFMPEG解码流程：

1. 注册所有容器格式和CODEC: av_register_all()

2. 打开文件: av_open_input_file()

3. 从文件中提取流信息: av_find_stream_info()

4. 穷举所有的流，查找其中种类为CODEC_TYPE_AUDIO

5. 查找对应的解码器: avcodec_find_decoder()

6. 打开编解码器: avcodec_open()

7. 为解码帧分配内存: avcodec_alloc_frame()

8. 不停地从码流中提取出帧数据: av_read_frame()

9. 对于音频帧调用: avcodec_decode_audio()

10.解码完后，释放解码器:avcodec_close()

11.关闭输入文件:avformat_close_input_file()

Ffmpeg音频解码重要的数据结构分析：

AVFormatContext ：这个结构体描述了一个媒体文件或媒体流的构成和基本信息。

这个结构体是媒体流打开文件av_open_input_file()时在内部创建并且以缺省值完成部分成员的初始化。

然后由av_find_stream_info ()从文件中提取流信息。

得到流信息后，其他的上下文信息可以从它得到,是其他所有结构的根，是一个多媒体文件或流的根本抽象，比如。

AVCodecContext*pCodeCtx = pFmtCtx->streams[audioStream]->codec;

AVCodec *pCodec =avcodec_find_decoder(pFmtCtx->streams[audioStream]->codec->codec_id)

start_time和duration是从streams数组的各个AVStream中推断出的多媒体文件的起始时间和长度，以微妙为单位。

nb_streams和streams所表示的AVStream结构指针数组包含了所有内嵌媒体流的描述；

iformat和oformat指向对应的demuxer和muxer指针，解码和编码的格式，比如在解码过程中strcmp(pFmtCtx->iformat->name, "mp3")；判断是Mp3文件。

解码的过程，每一个packet取得也是由AVFormatContext得到的流信息取得，av_read_frame(pFmtCtx,&packet)。

AVStream：媒体流pFmtCtx->streams[]中的一个-，主要域的释义如下，其中大部分域的值可以由av_open_input_file根据文件头的信息确定，缺少的信息需要通过调用av_find_stream_info读帧及软解码进一步获取。

index/id：index对应流的索引，这个数字是自动生成的，根据index可以从AVFormatContext::streams表中索引到该流；而id则是流的标识，依赖于具体的容器格式。比如对于MPEG TS格式，id就是pid。

time_base：流的时间基准，是一个实数，该流中媒体数据的pts和dts都将以这个时间基准为粒度。通常，使用av_rescale/av_rescale_q可以实现不同时间基准的转换。

start_time：流的起始时间，以流的时间基准为单位，通常是该流中第一个帧的pts。

duration：流的总时间，以流的时间基准为单位。

need_parsing：对该流parsing过程的控制域。

nb_frames：流内的帧数目。

r_frame_rate/framerate/avg_frame_rate：帧率相关。

codec：指向该流对应的AVCodecContext结构，调用av_open_input_file时生成。

parser：指向该流对应的AVCodecParserContext结构，调用av_find_stream_info时生成。

AVCodecContext：描述编解码器上下文的数据结构，包含了众多编解码器需要的参数信息。AVCodecContext *pCodeCtx =pFmtCtx->streams[audioStream]->codec;由流信息直接得到，

其中几个主要域的释义如下：

extradata/extradata_size：这个buffer中存放了解码器可能会用到的额外信息，在av_read_frame中填充。一般来说，首先，某种具体格式的demuxer在读取格式头信息的时候会填充extradata，其次，如果demuxer没有做这个事情，比如可能在头部压根儿就没有相关的编解码信息，则相应的parser会继续从已经解复用出来的媒体流中继续寻找。在没有找到任何额外信息的情况下，这个buffer指针为空。

time_base：

width/height：视频的宽和高。

sample_rate/channels：音频的采样率和信道数目。

sample_fmt：音频的原始采样格式。

codec_name/codec_type/codec_id/codec_tag：编解码器的信息。

AVPacket：

FFMPEG使用AVPacket来暂存解复用之后、解码之前的媒体数据（一个音/视频帧、一个字幕包等）及附加信息（解码时间戳、显示时间戳、时长等）。其中：

具体在解码的过程得到，如av_read_frame(pFmtCtx,&packet)。

每次获得一个packet的信息，解码的单位也是按照一个packet的操作完成，解码的具体过程就是连续的取packet然后对其进行操作的。

dts 表示解码时间戳，pts表示显示时间戳，它们的单位是所属媒体流的时间基准。这个在获取进度信息的时候会用到，如果解码的是视频的话，通常会跟音频和视频的同步用到。

stream_index 给出所属媒体流的索引；

data 为数据缓冲区指针，是其具体的内容，size为长度；

duration 为数据的时长，也是以所属媒体流的时间基准为单位；

pos 表示该数据在媒体流中的字节偏移量（个人觉得这个也可以做跳转的）；

destruct 为用于释放数据缓冲区的函数指针；

flags 为标志域，其中，最低为置1表示该数据是一个关键帧。

AVPacket 结构本身只是个容器，它使用data成员指向实际的数据缓冲区，使用之后需要通过调用av_free_packet释放。

具体解码过程：

结合着一个简单的解码的代码：

#include <stdio.h>

#include <stdlib.h>

extern "C"{//

#include "avcodec.h"

#include "avformat.h"

}

int main(char arg,char *argv[])

{

char *filename ="02.swf";

av_register_all();//注册所有可解码类型

AVFormatContext *pInFmtCtx=NULL;//文件格式

AVCodecContext *pInCodecCtx=NULL;//编码格式

if (av_open_input_file(&pInFmtCtx,filename,NULL, 0, NULL)!=0)//获取文件格式

printf("av_open_input_file error\n");

if(av_find_stream_info(pInFmtCtx) < 0)//获取文件内音视频流的信息

printf("av_find_stream_info error\n");

unsigned int j;

// Find thefirst audio stream

int audioStream = -1;

for(j=0; j<pInFmtCtx->nb_streams; j++)//找到音频对应的stream

if(pInFmtCtx->streams[j]->codec->codec_type==CODEC_TYPE_AUDIO)

{

audioStream=j;

break;

}

if(audioStream==-1)

{

printf("input file has no audio stream\n");

return 0; // Didn't find a audio stream

}

printf("audio stream num: %d\n",audioStream);

pInCodecCtx = pInFmtCtx->streams[audioStream]->codec;//音频的编码上下文

AVCodec *pInCodec=NULL;

pInCodec = avcodec_find_decoder(pInCodecCtx->codec_id);//根据编码ID找到用于解码的结构体

if(pInCodec==NULL)

{

printf("error no Codec found\n");

return -1 ; // Codec not found

}

//使用test的代替pInCodecCtx也可以完成解码，可以看出只要获取以下几个重要信息就可以实现解码和重采样

AVCodecContext *test = avcodec_alloc_context();

test->bit_rate = pInCodecCtx->bit_rate;//重采样用

test->sample_rate = pInCodecCtx->sample_rate;//重采样用

test->channels = pInCodecCtx->channels;//重采样用

test->extradata = pInCodecCtx->extradata;//若有则必有

test->extradata_size = pInCodecCtx->extradata_size;//若有则必要

test->codec_type = CODEC_TYPE_AUDIO;//不必要

test->block_align = pInCodecCtx->block_align ;//必要

if(avcodec_open(test, pInCodec)<0)//将两者结合以便在下面的解码函数中调用pInCodec中的对应解码函数

{

printf("error avcodec_open failed.\n");

return -1; // Could not open codec

}

if(avcodec_open(pInCodecCtx, pInCodec)<0)

{

printf("error avcodec_open failed.\n");

return -1; // Could not opencodec

}

static AVPacket packet;

printf(" bit_rate = %d \r\n", pInCodecCtx->bit_rate);

printf(" sample_rate = %d \r\n", pInCodecCtx->sample_rate);

printf(" channels = %d \r\n", pInCodecCtx->channels);

printf(" code_name = %s \r\n",pInCodecCtx->codec->name);

uint8_t *pktdata;

int pktsize;

int out_size = AVCODEC_MAX_AUDIO_FRAME_SIZE*100;

uint8_t * inbuf = (uint8_t *)malloc(out_size);

FILE* pcm,*packetinfo;

packetinfo = fopen("packetinfo.txt","w");

pcm = fopen("result.pcm","wb");

long start = clock();

while(av_read_frame(pInFmtCtx, &packet)>=0)//pInFmtCtx中调用对应格式的packet获取函数

{

if(packet.stream_index==audioStream)//如果是音频

{

pktdata = packet.data;

pktsize = packet.size;

while(pktsize>0)

{

out_size = AVCODEC_MAX_AUDIO_FRAME_SIZE*100;

//解码

len =avcodec_decode_audio2(pInCodecCtx,(short*)inbuf,&out_size,pktdata,pktsize);

if (len<0)

{

printf("Errorwhile decoding.\n");

break;

}

if(out_size>0)

{

fwrite(inbuf,1,out_size,pcm);//pcm记录

fflush(pcm);

}

pktsize -= len;

pktdata += len;

}

av_free_packet(&packet);

}

long end = clock();

printf("cost time:%f\n",double(end-start)/(double)CLOCKS_PER_SEC);

free(inbuf);

fclose(pcm);

fclose(packetinfo);

if (pInCodecCtx!=NULL)

{

avcodec_close(pInCodecCtx);

}

if (test!=NULL)

{

avcodec_close(test);

}

av_free(test);

av_close_input_file(pInFmtCtx);

return 0;

}

其实有上面的解码的流程简介，加上重要的结构体的详细介绍，想要完成解码过程已经很简单了，其他就是一些细节的问题了。

首先是注册所有容器格式和CODEC:

av_register_all()，初始化 libavformat和注册所有的muxers、demuxers和protocols。是一种比较直接简便的方法，同样也可以对不同的音视频格式、协议等独立注册，这里就不一一列举了。

avformat_alloc_context()和avformat_free_context()：

AVFormatContext*avformat_alloc_context(void);

分配一个AVFormatContext结构

引入头文件：#include "libavformat/avformat.h"

其中负责申请一个AVFormatContext结构的内存,并进行简单初始化,

avformat_free_context()可以用来释放该结构里的所有东西以及该结构本身，也是就说使用 avformat_alloc_context()分配的结构,需要使用avformat_free_context()来释放,有些版本中函数名可能为:av_alloc_format_context()。

av_open_input_file()：

attribute_deprecatedint av_open_input_file(AVFormatContext **ic_ptr, const char *filename,AVInputFormat *fmt, int buf_size, AVFormatParameters *ap);

以输入方式打开一个媒体文件,也即源文件,codecs并没有打开,只读取了文件的头信息；AVFormatContext**ic_ptr 输入文件容器；

const char *filename 输入文件名，全路径,并且保证文件存在；

AVInputFormat *fmt 输入文件格式，填NULL即可；

int buf_size,缓冲区大小，直接填0即可；

AVFormatParameters *ap, 格式参数,添NULL即可。

av_find_stream_info()：

intav_find_stream_info(AVFormatContext *ic);

通过读取媒体文件的中的包来获取媒体文件中的流信息,对于没有头信息的文件如(mpeg)非常有用的,该函数通常重算类似mpeg-2帧模式的真实帧率,该函数并未改变逻辑文件position.引入头文件：#include "libavformat/avformat.h"

也就是把媒体文件中的音视频流等信息读出来,保存在容器中,以便解码时使用

avcodec_find_decoder()

AVCodec *avcodec_find_decoder(enum CodecIDid);

通过code ID查找一个已经注册的音视频解码器

引入 #include "libavcodec/avcodec.h"；

实现在: \ffmpeg\libavcodec\utils.c；

查找解码器之前,必须先调用av_register_all注册所有支持的解码器；

查找成功返回解码器指针,否则返回NULL；

音视频解码器保存在一个链表中,查找过程中,函数从头到尾遍历链表,通过比较解码器的ID来查找；

av_read_frame()

intav_read_frame(AVFormatContext *s, AVPacket *pkt);

// 从输入源文件容器中读取一个AVPacket数据包

// 该函数读出的包并不每次都是有效的,对于读出的包我们都应该进行相应的解码(视频解码/音频解码),

// 在返回值>=0时,循环调用该函数进行读取,循环调用之前请调用av_free_packet函数清理AVPacket

转载请注明原文地址: https://ju.6miu.com/read-21442.html

技术

最新回复(0)