View Single Post
  #14   Report Post  
Posted to microsoft.public.word.docmanagement
p0 p0 is offline
external usenet poster
 
Posts: 254
Default text to bibliography?

On 17 aug, 15:14, grammatim wrote:
On Aug 17, 8:49 am, p0 wrote:

I'm stripping parts from the original message as it has become too
large to process decently.


Quite!





On a side note, the beauty of custom xml in ooxml is that you can
define your own way of storing data. And you don't even have to stick
to xml: you can store binary data in an xml file. So if you really are
unhappy with the format, you can easily extend Word with your own set
of bibliographic tools.


I don't know what any of that means.


Well, if you are concerned with size (little tags rather than big
ones), you are in for a surprise, your Word document actually contains
all bibliographic data twice (talking about overkill).


What you see as a docx file is nothing more than a zip-file. So if you
change the extension from docx to zip (make sure you have a backup),
you can use the compressed folders utility from Windows or an external
program such as WinRAR or WinZip to extract the contents of your
document. In it, you will normally find a file item1.xml in the
customXml directory. That is actually an xml notation of all the
bibliographic data in your source. You will also find a document.xml
file in the word directory. That file contains your entire text
including your well-formattedbibliography(no longer in xml format).
It is nice separation between the data and the view on the data.


So what I meant was, if you aren't happy with the current internal
data layout, you can very well define your own layout and then format
the data in the document.xml according to your layout (stored in your
version of item1.xml) and preferences.


I've no idea what the current internal data layout may be, nor should
I. As an end user, I expect the product to work as it should.


It is true, you shouldn't know and you don't have to. All you have to
do, is fill in the form which is presented when you want to enter a
citation. As soon as you want more than that, like having Word to
understand your way of formatting (be it tables, binary structures,
static text, ...), then it is up to you to learn the underlying format
and convert your datastructures to the underlying format. As an
alternative, you can of course extend the underlying format (part 5 of
the office open xml specification).

What would "ed" be? editor? edition?


ed vs. edn


And then I would think about "editorial notes".


Sorry, but "editorial notes" is not a category that appears in abibliography. It loos as though you are looking for details to
complain about, rather than understanding the user's needs.


It is in annotated bibliographies (something Word 2007 does not
support by the way).


Really, shortcutting
data entries to save space is, in my personal opinion, about the worst
thing you can do.


Not sure what "data entries" are, but if you're referring to entering
data, you're wrong.

EndNote allows importing data based on shortcut
codes. But once imported, the data is once again stored in
'understandable' xml as it should be done. And luckely for that,
because nobody without a decent manual would be able to figure out
that %I is actually the field representing the publisher.


Why would anyone ever need to "figure out" such a thing?



Well clearly you would, since you would add the code to your current
static text to convert it into a bibliographic source.




The entire point of using full
discriptive names in tags rather than crafty shortcuts is to make
things clear for the people who have to add them.


But the people shouldn't ever need to see them! They should see a form
to fill in, with each slot labeled with the category that goes in it.
"Author" would have a drop-down list of all Names, since most subject
bibliographies involve several works by the same person. (Likewise for
"Place" and "Publisher.")


Yes you will have to
type more, but at least elements will be defined in such a way that
there is no confusion for the user. And for non-english speaking
people, full words are a lot easier to understand than shady
abbreviations.


Not at all *problem if you have an internationalizationized, or
whatever they call it, interface.


That's what the source form (insert newcitation) is for in Word 2007.
Check your computer for a bibform.xml, if you are using an en-us
version of word, it should be in word directory\1033\bibliography
\bibform.xml. For other languages, you will have to replace 1033 with
your local culture id (lcid). The file contains a mapping of localized
strings (Label element) to xml tags (DataTag element). On a side note,
the bibform.xml claims to follow thebibliographyxml schema (default
namespace) but it is not doing so since the schema does not define
anything about the mapping.


Yes, I'll be sure to do all that as soon as I have my new system.
(Which didn't happen yesterday, without even a phone call to move it
to today.)





Have a look at the, alas, defunct Mac program Papyrus (it wasn't worth
the effort for the creator to adapt it for OS X, so he just offers it
as freeware to anyone with a "legacy system," but its discussion list
was still active back when I had to abandon the Mac, two+ years ago).


The setup of this tool is totally different, this is a tool for
storing and searching bibliographic information, even entire
libraries. As a side product, it also allows you to format the output
a bit. Microsoft's tool is intended only for providing formatted
output. They don't care about maintaining a library where you can find
stuff by keywords or authors or ...


But all this is besides the point, the original topic was about adding
textual sources to your document in an automated way. I have seen some
tools for converting BibTeX or EndNote files into Word 2007 sources.
And you can always create a converter which translates your home-made
format into Microsoft's format, but you can't expect Microsoft to
support your format by default. They have a format, and you either
stick to it, or you design something else (which is pretty easy using
custom xml). The choice is up to you.


I am not talking about "formats." I am talking about plain text, plain
text that looks exactly the way published bibliographies have looked
for about a century now.


And how do they look? Currently, my EndNote X1 style directory comes
with close to 3000 styles (2932 actually, but I have not downloaded
all available styles from their site). So this means, I currently have
3000 plain text versions of published bibliographies for a single
source. Are you going to write a converter which figures out which one
of those 3000 is used? Because you will have to before even starting
to parse the static text within one entry into a source.

Even within the same scientific journal, bibliographies tend to be
formatted differently.

It doesn't seem too much to ask that "Text to Table" could come up
with a tabular presentation, which some other module could then
convert to the "format" used by the bibliographic database: if it
knows that col. 1 is the author, col. 2 is the date, col. 3 is the
title, col. 4 is the place, and col. 5 is the publisher (that's a
basic Book entry), why can't it simply do that?


Now you are no longer talking static text, you are talking (poorly)
formatted text. And you would have to have a tool to map columns to
fields, since in my case, year should be the last entry (except maybe
for pages) in my bibliography and most certainly not the second.

And in your case, how is your book displayed if it is an anonymous
work? I would guess col. 1 is the title, col. 2 is the date, col. 3 is
the place, and col.4 is the publisher. So even between 2 entries of
the same type, the ordering of data would be different.

Maybe you don't have anonymous works, but it doesn't matter. What you
require is so specifc that you will probably be the only one using the
'import filter' anyway. The point is, Microsoft provides a set of
generic tools which works for 80% of their customers. There is no
point for them in developing a tool which will work in a very specifc
case (yours) and therefore will target 1% or less of their customer
base. If you want one, you will have to write it yourself. They
provide a specifcation of the bibliography format and even provide a
programming interface (I have no experience with it). They try to help
you a long way, but the last few steps you will have to take yourself.

And before you start thinking that I am a Microsoft evangelist, I am
most defintely not. I can point out at least half a dozen flaws with
the current bibliographic tools. Going from simple bugs to major
design issues. But those are not the point of this thread :-)