Extended Attributes
2015-12-30While working on support for FreeBSD's extended attributes in python, I tried to be conscious of other implementations of extended attributes in different operating systems, that way I wouldn't be inadvertently causing the sane problem that I was trying to fix: reliance on a particular API's semantics.
What are extended attributes?
To put it very simply, extended attributes are metadata that are attached to files. Typically, they're key/value pairs that the filesystem associates with a particular file on the filesystem, though that doesn't always have to be the case.
How they're implemented depends on both the filesystem, as well as the operating system. This means that implementations on the same filesystem (UFS, for example) can be complete incompatible across operating systems (Solaris and FreeBSD).
Extended attributes are not mandated by any standard. The tooling and APIs are quite different across operating systems and some operating systems (OpenBSD, HP-UX) don't implement them at all. Because support is non-standard and spotty, it's rare to see them used in cross-platform software. I'd be super-interested in seeing some counter-examples to this.
Extended attributes are sometimes namespaced. That is to say, there exists
some top-level grouping of attributes. Other than the top-level namespace,
there usually isn't hierarchy to attributes, other than any arbitrary
user-defined hierarchy. Namespaces are usually system
and user
, although
this isn't necessarily consistent, as we'll see. Extended attributes under the
system
namespace are only modifiable by root (and sometimes only queriable
by root).
Linux
ssize_t getxattr(const char *path, const char *name, void *value, size_t size); ssize_t lgetxattr(const char *path, const char *name, void *value, size_t size); ssize_t fgetxattr(int fd, const char *name, void *value, size_t size); ssize_t listxattr(const char *path, char *list, size_t size); ssize_t llistxattr(const char *path, char *list, size_t size); ssize_t flistxattr(int fd, char *list, size_t size); int removexattr(const char *path, const char *name); int lremovexattr(const char *path, const char *name); int fremovexattr(int fd, const char *name); int setxattr(const char *path, const char *name, const void *value, size_t size, int flags); int lsetxattr(const char *path, const char *name, const void *value, size_t size, int flags); int fsetxattr(int fd, const char *name, const void *value, size_t size, int flags);
The linux API is actually a fairly nice one, and for the rest of this post I'm
going to use it as my point of comparison. The return values of the getxattr
and
listxattr
functions are the total size of the attribute, not the size of
the data returned. This lends itself to a nice idiom for checking whether or
not truncation occurred:
char buf[BUFSIZ]; ssize_t res; res = getxattr("/home/worr/foo", "user.foo", buf, sizeof(buf)) > sizeof(buf); if (res > sizeof(buf)) /* truncation occured */ else if (res == -1) /* error occurred */
If value
is NULL
in a call to getxattr
or listxattr
, the size of
the buffer required to hold the contents of the EA will be returned. This
allows you to query the amount of space required to hold the return value,
allocate it, and then call the function again to populate that. That's
unfortunately racey, so it's preferable to call and then realloc
if
truncation.
Linux extended attributes are namespaced, and the namespace is specified as
part of the attribute name. Namespaces are separated from attribute names by a
.
. Currently, they support the common system
and user
namespaces, as
well as security
and trusted
.
It's important to note that listxattr
will never retrun EPERM
. If there
are EAs that the current user cannot access, they just won't be returned.
The attribute list returned by listxattr
is NULL
-delimited, and all of the
attribute names returned by listxattr
are fully-qualified.
AIX
It seems funny that I'm going to talk about AIX's interface right after linux's, but that's largely because it's...almost exactly the same.
ssize_t getea(const char *path, const char *name, void *value, size_t size); ssize_t fgetea(int filedes, const char *name, void *value, size_t size); ssize_t lgetea(const char *path, const char *name, void *value, size_t size); ssize_t listea(const char *path, char *list, size_t size); ssize_t flistea (int filedes, char *list, size_t size); ssize_t llistea (const char *path, char *list, size_t size); int removeea(const char *path, const char *name); int fremoveea(int filedes, const char *name); int lremoveea(const char *path, const char *name); int setea(const char *path, const char *name, void *value, size_t size, int flags); int fsetea(int filedes, const char *name, void *value, size_t size, int flags); int lsetea(const char *path, const char *name, void *value, size_t size, int flags);
Just like linux, getea
and listea
return the size of the actual attribute
value, which makes checking for truncation super easy. They also support
getting called with a zero size, which will just return the size of the list or
attribute value without writing any data to value
.
The only key difference is that is that they use the character 0xF8
to
separate the namespace from the attribute name. So querying for system
attributes involves querying the name 0xF8SYSTEM0xF8attr
.
There's also the statea
family of functions, which will fill in a
struct stat64x
, but that's of little consequence to us here.
FreeBSD / NetBSD
ssize_t extattr_get_fd(int fd, int attrnamespace, const char *attrname, void *data, size_t nbytes); ssize_t extattr_get_file(const char *path, int attrnamespace, const char *attrname, void *data, size_t nbytes); ssize_t extattr_get_link(const char *path, int attrnamespace, const char *attrname, void *data, size_t nbytes); int extattr_set_fd(int fd, int attrnamespace, const char *attrname, const void *data, size_t nbytes); int extattr_set_file(const char *path, int attrnamespace, const char *attrname, const void *data, size_t nbytes); int extattr_set_link(const char *path, int attrnamespace, const char *attrname, const void *data, size_t nbytes); int extattr_delete_fd(int fd, int attrnamespace, const char *attrname); int extattr_delete_file(const char *path, int attrnamespace, const char *attrname); int extattr_delete_link(const char *path, int attrnamespace, const char *attrname); ssize_t extattr_list_fd(int fd, int attrnamespace, void *data, size_t nbytes); ssize_t extattr_list_file(const char *path, int attrnamespace, void *data, size_t nbytes); ssize_t extattr_list_link(const char *path, int attrnamespace, void *data, size_t nbytes);
FreeBSD and NetBSD both use the same functions for the extended attribute calls. The most obvious difference is that the attribute namespace is no longer part of the attribute name. Each namespace is defined as an constant, and must be passed separately.
Almost seemingly as a result of this difference, extattr_list
can now error with
EPERM
, rather than hiding the attribute names that the caller doesn't have
access to.
The other, more annoying difference, is the return value of the extattr_get
and extattr_list
functions. Rather than behaving like linux, AIX or OS X,
they instead return the number of bytes written, making truncation detection
harder. This basically requires that you make two calls if you want to ensure
that no truncation will occur.
OS X
ssize_t getxattr(const char *path, const char *name, void *value, size_t size, u_int32_t position, int options); ssize_t fgetxattr(int fd, const char *name, void *value, size_t size, u_int32_t position, int options); ssize_t listxattr(const char *path, char *namebuf, size_t size, int options); ssize_t flistxattr(int fd, char *namebuf, size_t size, int options); int removexattr(const char *path, const char *name, int options); int fremovexattr(int fd, const char *name, int options); ssize_t listxattr(const char *path, char *namebuf, size_t size, int options); ssize_t flistxattr(int fd, char *namebuf, size_t size, int options);
OS X differs in a few ways. Notably, their functions all take an options arg.
Rather than calling an entirely different function to prevent following
symlinks, you can pass the XATTR_NOFOLLOW
to prevent traversing symlinks.
Another, fairly curious difference is the position
argument that's part of
the prototype for getxattr
. To really get a handle on this, we're going to
dive into the wonderful world of forks.
Forks
Forks are kind of like having multiple datastreams for the same file. The data that we typically think of being stored in a file is dumped into one fork (in the case of Mac OS, the data fork) and metadata, resources or any other type of data could exist in other forks, wholly independent.
On Mac OS filesystems (MFS, HFS, HFS+), each file could have at least a resource fork for the purpose of storing resources about a given file. This was used for things like splitting up icons that Finder would use to represent a file, or for separating presentation and content of text documents.
HFS+ (maybe HFS too? I'm not sure) allowed for any number of named forks.
Back to OS X
Extended attributes on OS X are actually just named forks. The extended attribute API wholly supplanted the old resource manager API. To ensure that applications could seek to arbitrary points in a fork, OS X's extended attribute API includes a position argument.
getxattr
is similar to Linux, in that it returns the size of the attribute's
data, not just the number of bytes read. This makes truncation detection
pretty easy.
It is worth noting that extended attribute names in OS X are not namespaced in any special way.
Solaris
Solaris gets weird. Solaris is probably closest to OS X in its implementation of extended attributes, in that extended attributes are just named forks. However, Solaris includes only one specialized function call to deal with extended attributes.
int attropen (const char *path, const char *attrpath, int oflag, ...); /* the varargs can include a mode argument of type mode_t */
But even this isn't required, since you can get the same results from using
a combination of open
and openat
:
int fd = open(path, O_RDONLY); int attrfd = openat(fd, attrpath, oflag|O_XATTR, mode); close(fd);
From there, all of the *at functions can be used to operate on extended attributes with some restrictions:
- no links between attribute space and non-attribute space
- no renames between attribute space and non-attribute space
- only regular files are allowed - no dirs, symlinks or devices
Otherwise, extended attributes are treated like regular files.
This sucks
This is awful when trying to expose a generic, cross-platform API for extended attributes; the only one that I've found is written for perl. I had to add support for FreeBSD in Go, Python and Rust - and none of these deal with Solaris or AIX! Adding FreeBSD support was pretty rough, largely since implementors assume that every OS has a Linux-compatible API.
No OS has a Linux-compatible extended attribute API
- Are attributes namespaced? Are namespaces strings? Are they
int
constants? - Are they named forks? What happens if I need to seek?
- How big can the data be? How do we check for truncation?
- Error conditions differ radically
Honestly, I wonder if this contributes to the lack of cross-platform apps that use extended attributes. They're super useful in any case where it's necessary to track metadata about files without having to keep track of it in a separate database. That's honestly fraught with peril anyway, since you're dependent on the name of the file (or whatever identifier you use in your db) staying constant across renames, deletes, etc.
Where to go from here?
A C wrapper lib around all of these implementations would be nice, but there are some obvious trade-offs that need to be made.
The way that I've done this in Python and Rust has been to:
- Assume linux-like namespaces, and translate accordingly. If there aren't namespaces in your OS's implementation, then just make the namespace part of the attribute name.
- Make two calls to get the size of the extended attribute. This works across AIX, Linux and OS X. Solaris will have to use
statat
to get the size. Unfortunately, race conditions abound. - When listing extended attributes, ignore
EPERM
for system-level attributes
Maybe when I get some time, I'll start working on one.
Finally: please, please stop assuming that the whole world is Linux.
Sources
- https://en.wikipedia.org/wiki/Extended_file_attributes
- http://man7.org/linux/man-pages/man2/listxattr.2.html
- http://man7.org/linux/man-pages/man2/getxattr.2.html
- http://man7.org/linux/man-pages/man2/setxattr.2.html
- http://man7.org/linux/man-pages/man2/removexattr.2.html
- http://man7.org/linux/man-pages/man7/xattr.7.html
- https://www-01.ibm.com/support/knowledgecenter/api/content/nl/en-us/ssw_aix_71/com.ibm.aix.basetrf2/removeea.htm
- https://www-01.ibm.com/support/knowledgecenter/api/content/nl/en-us/ssw_aix_71/com.ibm.aix.basetrf2/setea.htm#setea
- https://www-01.ibm.com/support/knowledgecenter/api/content/nl/en-us/ssw_aix_71/com.ibm.aix.basetrf1/listea.htm
- https://www-01.ibm.com/support/knowledgecenter/api/content/nl/en-us/ssw_aix_71/com.ibm.aix.basetrf1/getea.htm
- http://man.netbsd.org/6.0/usr/share/man/html2/extattr_get_file.html
- https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/getxattr.2.html
- https://en.wikipedia.org/wiki/Resource_fork
- https://docs.oracle.com/cd/E18752_01/html/816-5175/fsattr-5.html#REFMAN5fsattr-5