Please excuse the lengthy title, but after spending the better part of the last two days getting this entire process nailed down I want to make sure that this write-up is indexed by the search engines as comprehensively as possible. The whole thing started when I was asked the seemingly simple question of "Is it possible to use a webcam on Linux?" My initial answer was "Yeah, sure, I think." The goal was clear - configure a low-cost webcam on Ubuntu Linux to capture video and audio in a form that could be used for both voice/video chat (Skype) as well as for upload to video sharing sites like YouTube.
A bit of digging later, I had put together enough information to convince myself that it could be done. To start, I needed the hardware - and I opted for the Logitech QuickCam for Noteooks Pro. It is a USB device with both camera and microphone. I selected this model because it supported relatively high resolutions and was fully supported by the Linux USB Video Class driver which are compatible with the Video For Linux, version 2 (v4l2) framework and kernel drivers. I wanted v4l2 support because the Skype Beta client for Linux uses that framework for voice and video calling. As it turns out, all of the drivers to make this work are included in a default installation of Ubuntu 7.10, so nothing additional needs to be installed as far as voice calling with Skype is concerned.
To first test things out, I downloaded and installed the Skype beta client for linux. I plugged in the USB webcam, launched Skype, and select Options -> Audio Devices. I selected the USB device from the list of possible sound input sources. I then selected Options -> Video Devices and did the same. Pressing the 'Test' button displayed a small video feed from the camera. To confirm, I placed a call to a friend, and everything just worked. The video quality was good and the audio quality was fine.
To get a closer look at how the kernel treats the device, I used the 'lsusb' command to get a list of the USB devices:
# lsusb
Bus 002 Device 002: ID 045e:00db Microsoft Corp.
us 002 Device 003: ID 047d:1020 Kensington
Bus 002 Device 001: ID 0000:0000
Bus 001 Device 009: ID 046d:0991 Logitech, Inc.
Bus 001 Device 006: ID 0424:223a Standard Microsystems Corp. 8-in-1 Card Reader
Bus 001 Device 005: ID 0424:2504 Standard Microsystems Corp.
Bus 001 Device 002: ID 0424:2502 Standard Microsystems Corp.
Bus 001 Device 001: ID 0000:0000
There in the middle of the list is 'Logitech, Inc.' the maker of the webcam. For information on the list of sound devices that had been detected by ALSA, the audio framework:
# arecord -l
card 0: CK804 [NVidia CK804], device 0: Intel ICH [NVidia CK804]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: CK804 [NVidia CK804], device 1: Intel ICH - MIC ADC [NVidia CK804 - MIC ADC]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: U0x46d0x991 [USB Device 0x46d:0x991], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
This told me that the microphone on the webcam was registered as 'card 1, subdevice 0'. This information will be used later when we attempt to record audio/video. This can also be determined from the output of /proc/asound/pcm:
# cat /proc/asound/pcm
00-02: Intel ICH - IEC958 : NVidia CK804 - IEC958 : playback 1
00-01: Intel ICH - MIC ADC : NVidia CK804 - MIC ADC : capture 1
00-00: Intel ICH : NVidia CK804 : playback 1 : capture 1
01-00: USB Audio : USB Audio : capture 1
Again, the USB Audio is registered as card 1, subdevice 0.
With the basics working, the next thing I wanted to do was make a high-quality video recording that would be suitable for upload to YouTube. After another round of digging it looked like two video capture frameworks had direct support for v4l2 - mplayer/mencoder and gstreamer. I first attempted to use mencoder, but after several hours was not able to get things working. I was able to capture video only, and as soon as I tried to mix in the audio input, mencoder would die with a 'Floating point exception' error. A message to the mplayer-devel mailing list about the problem went unanswered. So next I turned my attention to gstreamer.
Although I will spare you details, after about two days of hacking at this I was able to achieve my goal: High resolution video, good audio quality, and perfect synchronization between the two.
First, make sure you have the full set of gstreamer-plugins (good, bad, and ugly) installed. This should include:
# dpkg --list '*gst*' | grep ii
ii gstreamer-tools 0.10.14-1ubuntu3 Tools for use with GStreamer
ii gstreamer0.10-alsa 0.10.14-1ubuntu3 GStreamer plugin for ALSA
ii gstreamer0.10-esd 0.10.6-0ubuntu4 GStreamer plugin for ESD
ii gstreamer0.10-ffmpeg 0.10.2-2ubuntu1 FFmpeg plugin for GStreamer
ii gstreamer0.10-gnomevfs 0.10.14-1ubuntu3 GStreamer plugin for GnomeVFS
ii gstreamer0.10-gnonlin 0.10.9-1 non-linear editing module for GStreamer
ii gstreamer0.10-plugins-bad 0.10.5-4ubuntu1 GStreamer plugins from the "bad" set
ii gstreamer0.10-plugins-bad-multiverse 0.10.5-1 GStreamer plugins from the "bad" set (Multiv
ii gstreamer0.10-plugins-base 0.10.14-1ubuntu3 GStreamer plugins from the "base" set
ii gstreamer0.10-plugins-base-apps 0.10.14-1ubuntu3 GStreamer helper programs from the "base" se
ii gstreamer0.10-plugins-good 0.10.6-0ubuntu4 GStreamer plugins from the "good" set
ii gstreamer0.10-plugins-ugly 0.10.6-0ubuntu2 GStreamer plugins from the "ugly" set
ii gstreamer0.10-plugins-ugly-multiverse 0.10.6-0ubuntu1 GStreamer plugins from the "ugly" set (Multi
ii gstreamer0.10-tools 0.10.14-1ubuntu3 Tools for use with GStreamer
ii gstreamer0.10-x 0.10.14-1ubuntu3 GStreamer plugins for X11 and Pango
ii libgstreamer-plugins-base0.10-0 0.10.14-1ubuntu3 GStreamer libraries from the "base" set
ii libgstreamer-plugins-base0.10-dev 0.10.14-1ubuntu3 GStreamer development files for libraries fr
ii libgstreamer0.10-0 0.10.14-1ubuntu3 Core GStreamer libraries and elements
ii libgstreamer0.10-dev 0.10.14-1ubuntu3 GStreamer core development files
ii python-gst0.10 0.10.8-1ubuntu1 generic media-playing framework (Python bind
Next, we'll need to install one non-standard package called GEntrans which is maintained by Mark Nauwalaerts. I discovered this package via an email sent to the GStreamer-devel mailing list asking for help with my initial video recording problems: I was completely unable to get good synchonized audio and video recordinds. The audio always lagged or was shifted or garbled in some way. The GEntrans package is a gstreamer plugin that provides a gstreamer element named 'stamp' that can apply timestamps to incoming video streams in a way that they can later by perfectly matched to the incoming audio streams to produce perfectly sync'd output.
You can download the source from the sf.net site, and install it with:
./configure --prefix=/usr
sudo make install
I suggest you use the --prefix=/usr so that the compiled plugins will go into the standard gstreamer plugins directory (/usr/lib/gstreamer-0.10). That makes them easier to use in the next step. After many, many rounds of experimentation and help from Mark, I put together the following gstreamer recording command:
# gst-launch-0.10 v4l2src queue-size=16 ! stamp sync-margin=1 sync-interval=1
! video/x-raw-yuv,width=800,height=600,framerate=15/1
! queue2 max-size-buffers=1000 max-size-bytes=0 max-size-time=0 ! ffmpegcolorspace
! theoraenc quality=60 name=venc alsasrc device="hw:1,0"
! audio/x-raw-int,rate=16000,channels=1,depth=16
! audioconvert ! queue2 max-size-buffers=1000 max-size-bytes=0 max-size-time=0
! vorbisenc quality=0.9 name=aenc oggmux name=mux ! filesink location=test.ogg aenc. ! mux. venc. ! mux.
Yes, that is actually one single command-line command. The basic interpetation is this: Create a gstreamer processing pipeline with the following characteristics:
- Use v4l2 as the source of video, and tell it to use its max buffer setting
- Apply a timestamp signal to that video stream using 'stamp'
- Treat that incoming vidoe data as x-raw-yuv of size 800x600 @ 15 frames/second.
- Perform the remaining video transformation in a separate thread using 'queue2'
- Use the ffmpeg color space to properly map the colors on the inbound video signal
- Send that input into the 'theoraenc' video encoder to produce a Theora compressed video stream
- Combine that stream (or 'mux it') with sound input coming from ALSA sound input card 1, subdevice 0
- Treat the incoming audio data as x-raw-int at 16kHz mono with 16 bits, which is what the QuickCam produces
- Convert that audio stream into a generic form that gstreamer audio encoders can handle using 'audioconvert'
- Create a separate threat for audio processing using 'queue2' and plenty of buffering
- Encode the audio using the vorbis encoder ('vorbisenc') the combine (or 'mux it') into an Ogg container ('oggmux')
- Dump the Ogg formatted file data to disk into a file named 'test.ogg'
What I did not know, but was thrilled to discover, is that YouTube accepts files formatted like this - Theora encoded video, Vorbis encoded audio, wrapped into an Ogg formatted file - directly for upload! You can see the results of me wishing my nephew a happy birthday here on YouTube. Keep in mind that the YouTube processed version is not as high quality as the original Ogg file that was produced during the recording locally. The fact is that it can be done and that it works!
Side note: If you just want to record audio to an mp3 file using the microphone on your webcam, use the following command:
gst-launch-0.10 alsasrc device="hw:1,0"
! audio/x-raw-int,rate=16000,channels=1,depth=16
! audioconvert ! lame ! filesink location=test.mp3
To do the same but using Vorbis encoded Ogg-formatted output:
gst-launch-0.10 alsasrc device="hw:1,0"
! audio/x-raw-int,rate=16000,channels=1,depth=16
! audioconvert ! vorbisenc ! oggmux
! filesink location=test.ogg
And if you just want to play the audio back to yourself:
gst-launch-0.10 alsasrc device="hw:1,0"
! audio/x-raw-int,rate=16000,channels=1,depth=16
! audioconvert
! alsasink
If all you want is to show the video being captured by the camera in a window on your desktop:
gst-launch-0.10 v4l2src
! video/x-raw-yuv,width=800,height=600
! ffmpegcolorspace
! xvimagesink
For those of you who need to adjust the color/contrast/brightness settings of their webcam, I suggest
luvcview.