【FFmpeg(2016)】PCM编码AAC

xiaoxiao2023-03-24 3

【前言】

本文章主要是将 PCM原始数据编码为AAC。

测试文件则是上一篇文章生成的PCM文件：

【FFmpeg(2016)】视频文件分离器(demuxing)——H264&PCM

音频数据format分很多种类型，16bit,32bit等，而2016 ffmpeg只支持最新的AAC格式，32bit，也就是AV_SAMPLE_FMT_FLTP。

所以，想对PCM进行编码得先确保PCM是AV_SAMPLE_FMT_FLTP类型的。

【AAC封装格式】

AAC有两种封装格式，分别是ADIF ADTS，多与流媒体一般使用ADTS格式。见：

AAC ADTS格式分析

【FFmpeg数据结构】

AVCodecContext AVCodec AVCodecID AVFrame AVPacket

对PCM文件的读写直接使用FILE文件指针。

AVCodec是一个编码器，可以单纯的理解为一个编解码算法的结构。

AVCodecContext是AVCodec的一个上下文，打个比如，在视频编码h264时，有i p b三种帧，如果有一个视频流是 I B B P这种顺序到达，由于B帧需要依靠前后的帧来计算出本帧现实的内容，所有需要一些buffer保存一些，以根据这些来计算出B帧的内容，当然还有很多其他的内容。

AVCodecID是编码器的ID，如编码AAC是，就使用AV_CODEC_ID_AAC。

AVFrame 是编码前、解码后保存的数据。 AVPacket是编码后、解码前保存的数据。

关于官方定义的AVFrame:

typedef struct AVFrame { #define AV_NUM_DATA_POINTERS 8 /** * pointer to the picture/channel planes. * This might be different from the first allocated byte * * Some decoders access areas outside 0,0 - width,height, please * see avcodec_align_dimensions2(). Some filters and swscale can read * up to 16 bytes beyond the planes, if these filters are to be used, * then 16 extra bytes must be allocated. * * NOTE: Except for hwaccel formats, pointers not needed by the format * MUST be set to NULL. */ uint8_t *data[AV_NUM_DATA_POINTERS]; /** * For video, size in bytes of each picture line. * For audio, size in bytes of each plane. * * For audio, only linesize[0] may be set. For planar audio, each channel * plane must be the same size. * * For video the linesizes should be multiples of the CPUs alignment * preference, this is 16 or 32 for modern desktop CPUs. * Some code requires such alignment other code can be slower without * correct alignment, for yet other it makes no difference. * * @note The linesize may be larger than the size of usable data -- there * may be extra padding present for performance reasons. */ int linesize[AV_NUM_DATA_POINTERS]; /** * pointers to the data planes/channels. * * For video, this should simply point to data[]. * * For planar audio, each channel has a separate data pointer, and * linesize[0] contains the size of each channel buffer. * For packed audio, there is just one data pointer, and linesize[0] * contains the total size of the buffer for all channels. * * Note: Both data and extended_data should always be set in a valid frame, * but for planar audio with more channels that can fit in data, * extended_data must be used in order to access all channels. */ uint8_t **extended_data; ......其他成员 } AVFrame; 对于视频，目前比较流行的是H264压缩标准，好像没见过其他编码方式，而H264只能由YUV图像编码，也就是说H264解码后就是三个YUV分量，他们的数据会分别存在，data[0],data[1],data[2] ,而linesize[0],linesize[1],linesize[2]分别代表各个数据的长度。

对于音频，由于有多声道的音频，那么音频解码出来的数据不同声道也储存在不同的指针，如data[0]是左声道,data[1]是右声道，由于各个声道的数据长度是一样的，所以linesize[0]就代表了所有声道数据的长度。

成员extended_data则指向了data，是一个拓展，上面可以看到data 是包含8个指针的数组，也就是说对于音频，最多只支持8个声道。

【代码】

extern "C" { #include "libavformat/avformat.h" #include "libavutil/avutil.h" #include "libavcodec/avcodec.h" #include "libavutil/frame.h" #include "libavutil/samplefmt.h" #include "libavformat/avformat.h" #include "libavcodec/avcodec.h" } #pragma comment(lib, "avcodec.lib") #pragma comment(lib, "avfilter.lib") #pragma comment(lib, "avformat.lib") #pragma comment(lib, "avutil.lib") /* PCM转AAC */ int main() { char *padts = (char *)malloc(sizeof(char) * 7); int profile = 2; //AAC LC int freqIdx = 4; //44.1KHz int chanCfg = 2; //MPEG-4 Audio Channel Configuration. 1 Channel front-center padts[0] = (char)0xFF; // 11111111 = syncword padts[1] = (char)0xF1; // 1111 1 00 1 = syncword MPEG-2 Layer CRC padts[2] = (char)(((profile - 1) << 6) + (freqIdx << 2) + (chanCfg >> 2)); padts[6] = (char)0xFC; AVCodec *pCodec; AVCodecContext *pCodecCtx = NULL; int i, ret, got_output; FILE *fp_in; FILE *fp_out; AVFrame *pFrame; uint8_t* frame_buf; int size = 0; AVPacket pkt; int y_size; int framecnt = 0; char filename_in[] = "audio.pcm"; AVCodecID codec_id = AV_CODEC_ID_AAC; char filename_out[] = "audio.aac"; int framenum = 100000; avcodec_register_all(); pCodec = avcodec_find_encoder(codec_id); if (!pCodec) { printf("Codec not found\n"); return -1; } pCodecCtx = avcodec_alloc_context3(pCodec); if (!pCodecCtx) { printf("Could not allocate video codec context\n"); return -1; } pCodecCtx->codec_id = codec_id; pCodecCtx->codec_type = AVMEDIA_TYPE_AUDIO; pCodecCtx->sample_fmt = AV_SAMPLE_FMT_FLTP; pCodecCtx->sample_rate = 44100; pCodecCtx->channel_layout = AV_CH_LAYOUT_STEREO; pCodecCtx->channels = av_get_channel_layout_nb_channels(pCodecCtx->channel_layout); qDebug() << av_get_channel_layout_nb_channels(pCodecCtx->channel_layout); if ((ret = avcodec_open2(pCodecCtx, pCodec, NULL)) < 0) { qDebug() << "avcodec_open2 error ----> " << ret; printf("Could not open codec\n"); return -1; } pFrame = av_frame_alloc(); pFrame->nb_samples = pCodecCtx->frame_size; //1024,默认每一帧的采样个数是frame_size,貌似也改变不了 pFrame->format = pCodecCtx->sample_fmt; pFrame->channels = 2; size = av_samples_get_buffer_size(NULL, pCodecCtx->channels, pCodecCtx->frame_size, pCodecCtx->sample_fmt, 0); frame_buf = (uint8_t *)av_malloc(size); /** * avcodec_fill_audio_frame 实现： * frame_buf是根据声道数、采样率和采样格式决定大小的。 * 调用次函数后，AVFrame存储音频数据的成员有以下变化：data[0]指向frame_buf，data[1]指向frame_buf长度的一半位置 * data[0] == frame_buf , data[1] == frame_buf + pCodecCtx->frame_size * av_get_bytes_per_sample(pCodecCtx->sample_fmt) */ ret = avcodec_fill_audio_frame(pFrame, pCodecCtx->channels, pCodecCtx->sample_fmt, (const uint8_t*)frame_buf, size, 0); if (ret < 0) { qDebug() << "avcodec_fill_audio_frame error "; return 0; } //Input raw data fp_in = fopen(filename_in, "rb"); if (!fp_in) { printf("Could not open %s\n", filename_in); return -1; } //Output bitstream fp_out = fopen(filename_out, "wb"); if (!fp_out) { printf("Could not open %s\n", filename_out); return -1; } //Encode for (i = 0; i < framenum; i++) { av_init_packet(&pkt); pkt.data = NULL; // packet data will be allocated by the encoder pkt.size = 0; //Read raw data if (fread(frame_buf, 1, size, fp_in) <= 0) { printf("Failed to read raw data! \n"); return -1; } else if (feof(fp_in)) { break; } pFrame->pts = i; ret = avcodec_encode_audio2(pCodecCtx, &pkt, pFrame, &got_output); if (ret < 0) { qDebug() << "error encoding"; return -1; } if (pkt.data == NULL) { av_free_packet(&pkt); continue; } qDebug() << "got_ouput = " << got_output; if (got_output) { qDebug() << "Succeed to encode frame : " << framecnt << " size :" << pkt.size; framecnt++; padts[3] = (char)(((chanCfg & 3) << 6) + ((7 + pkt.size) >> 11)); padts[4] = (char)(((7 + pkt.size) & 0x7FF) >> 3); padts[5] = (char)((((7 + pkt.size) & 7) << 5) + 0x1F); fwrite(padts, 7, 1, fp_out); fwrite(pkt.data, 1, pkt.size, fp_out); av_free_packet(&pkt); } } //Flush Encoder for (got_output = 1; got_output; i++) { ret = avcodec_encode_audio2(pCodecCtx, &pkt, NULL, &got_output); if (ret < 0) { printf("Error encoding frame\n"); return -1; } if (got_output) { printf("Flush Encoder: Succeed to encode 1 frame!\tsize:%5d\n", pkt.size); padts[3] = (char)(((chanCfg & 3) << 6) + ((7 + pkt.size) >> 11)); padts[4] = (char)(((7 + pkt.size) & 0x7FF) >> 3); padts[5] = (char)((((7 + pkt.size) & 7) << 5) + 0x1F); fwrite(padts, 7, 1, fp_out); fwrite(pkt.data, 1, pkt.size, fp_out); av_free_packet(&pkt); } } fclose(fp_out); avcodec_close(pCodecCtx); av_free(pCodecCtx); av_freep(&pFrame->data[0]); av_frame_free(&pFrame); return 0; }

【错误的结果】由于PCM格式不正确

使用上一篇文章产生的PCM来转换AAC是错误的。

因为上一篇文章保存的音频格式： L(一个采样点)R(一个采样点)LRLRLR..............

由于AVframe结构体data指针数组不同指针代表指向不同声道的数据，所以产生错误。

上述代码，data指向情况：

而FFmpeg编码PCM为AAC时，需要的是：

所以，我要让到读取一帧时，刚好让data[0]指向一个声道的数据，而data[1]指向另一个声道的数据。

【解决方法】

由上述代码我们知道AVFrame->nb_samples 默认是1024，所以每一帧一个声道读取的数据为：

int length = AVFrame->nb_samples * av_get_byte_per_sample((AVSampleFormat)AVFrame->format); 这里也就是4096字节。

所以写PCM文件时，应该是：

【FFmpeg(2016)】视频文件分离器(demuxing)——H264&PCM

写PCM文件的代码：

/** * 在这里写入文件时我做了一些处理，这是有原因的。 * 下面的意思是，LRLRLR...的方式写入文件，每次写入4096个字节 */ int k=0, h=0; for (int i = 0; i < 4; ++i) { if (i % 2 == 0) { int tmp = data_size / 4; for (int j = 0; j < tmp; j+=4,k++ ) { data[i * 4096 + j+0] = (char)(l[k] & 0xff); data[i * 4096 + j+1] = (char)(l[k] >> 8 & 0xff); data[i * 4096 + j+2] = (char)(l[k] >> 16 & 0xff); data[i * 4096 + j+3] = (char)(l[k] >> 24 & 0xff); } } else { int tmp = data_size / 4; for (int j = 0; j < tmp; j += 4,h++) { data[i * 4096 + j+0] = (char)(r[h] & 0xff); data[i * 4096 + j+1] = (char)(r[h] >> 8 & 0xff); data[i * 4096 + j+2] = (char)(r[h] >> 16 & 0xff); data[i * 4096 + j+3] = (char)(r[h] >> 24 & 0xff); } } }

【Planar】这是FFmpeg的一个新概念，貌似在2014版ffmpeg还没有这个概念

Planar即是：平面

官方对AVSampleFormat的定义如下：

enum AVSampleFormat { AV_SAMPLE_FMT_NONE = -1, AV_SAMPLE_FMT_U8, ///< unsigned 8 bits AV_SAMPLE_FMT_S16, ///< signed 16 bits AV_SAMPLE_FMT_S32, ///< signed 32 bits AV_SAMPLE_FMT_FLT, ///< float AV_SAMPLE_FMT_DBL, ///< double AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar AV_SAMPLE_FMT_FLTP, ///< float, planar AV_SAMPLE_FMT_DBLP, ///< double, planar AV_SAMPLE_FMT_S64, ///< signed 64 bits AV_SAMPLE_FMT_S64P, ///< signed 64 bits, planar AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically }; 后面有P的就代表是平面类型，所谓平面，即是音频数据不再是如此存储：

L（一个采样点）R（一个采样点）LRLRLRLR...............

对AVFrame而言，应该是data[0],data[1]分别指向左右声道数据，这就是平面的概念。（可以类比视频解码时的YUV存储方式）

L(一帧)R（一帧）LRLRLR............................

打开AAC编码器，可以看到只支持AV_SAMPLE_FMT_FLTP类型：

AVCodec *codec = avcodec_find_encoder(AV_CODEC_ID_AAC); AVCodecContext *codec_ctx ; avcodec_open2(codec_ctx, codec);

所以在编码AAC前，必须先却倒存储格式是正确的（虽然播放PCM时有点问题）。

【关于平面概念】

【FFmpeg(2016)】SwrContext重采样结构体

转载请注明原文地址: https://ju.6miu.com/read-1201385.html

最新回复(0)