Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
The latest estimate is that I'll have my new computer (with Office 07)
this afternoon ... Is there a way to convert an existing bibliography, i.e. formatted list of references, into a Word 2007 table of sources (or whatever it's called) -- are they, like, like Excel tables, or (heaven forfend) Access tables, so that something along the lines of tab delimiters might work? |
#2
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On 14 aug, 13:53, grammatim wrote:
The latest estimate is that I'll have my new computer (with Office 07) this afternoon ... Is there a way to convert an existingbibliography, i.e. formatted list of references, into a Word 2007 table of sources (or whatever it's called) -- are they, like, like Excel tables, or (heaven forfend) Access tables, so that something along the lines of tab delimiters might work? I'm not really sure what you are trying to accomplish. Some background: in Word 2007, bibliographic entries are actually stored inside a custom XML file in the docx. The file, often called item1.xml, has the following format: b:Sources SelectedStyle="\something.xsl" StyleName="A style called something" xmlns:b="http://schemas.openxmlformats.org/officeDocument/ 2006/bibliography" xmlns="http://schemas.openxmlformats.org/ officeDocument/2006/bibliography" b:Source...b:Source b:Source...b:Source b:Source...b:Source /b:Sources where every b:Source element represents a bibliographic source. For a description of the content of a b:Source element, you can check out section 7.6 of http://www.ecma-international.org/pu...04%20(PDF).zip Then, when the sources need to be displayed, one of the stylesheets with the different bibliographic styles (APA, MLA, ...) gets a piece of XML containing one or more b:Source elements and outputs a piece of HTML. That HTML is then displayed in Word 2007 as your in-text citation or bibliography. Doing the reverse operation (at least that's what I think you try to do) is not possible. You could try to create your own parser for that but it seems overly complex to me. How would you expect a parser to be able to identify the type of a bibliographic entry: Book, BookSection, JournalArticle, ArticleInAPeriodical, ... And how would you make the difference between the different contributors of a work: Author, Artist, Editor, Translator, Writer, Producer, Performer, ... If references to a work are available online somewhere, it might be possible to more easily get them that way according to the following article http://savas.parastatidis.name/2007/...5f173b55c.aspx. However, I have never seen any code actually doing this and as far as I know, Microsoft pulled the plug on their academic search. BR, Yves -- http://www.codeplex.com/bibliography |
#3
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
I have, of course, many articles with bibliographies, and it would be
a lot quicker to insert tab delimiters -- or even a code -- before each element in each entry in the list, so that the bibliography database could be created/added to (as if a computer were involved) without retyping every entry in toto I should hope the software is smart enough to realize that the "Author" of a journal article is the same sort of beast as the "Author" of a book! On Aug 14, 10:02 am, p0 wrote: On 14 aug, 13:53, grammatim wrote: The latest estimate is that I'll have my new computer (with Office 07) this afternoon ... Is there a way to convert an existingbibliography, i.e. formatted list of references, into a Word 2007 table of sources (or whatever it's called) -- are they, like, like Excel tables, or (heaven forfend) Access tables, so that something along the lines of tab delimiters might work? I'm not really sure what you are trying to accomplish. Some background: in Word 2007, bibliographic entries are actually stored inside a custom XML file in the docx. The file, often called item1.xml, has the following format: b:Sources SelectedStyle="\something.xsl" StyleName="A style called something" xmlns:b="http://schemas.openxmlformats.org/officeDocument/ 2006/bibliography" xmlns="http://schemas.openxmlformats.org/ officeDocument/2006/bibliography" b:Source...b:Source b:Source...b:Source b:Source...b:Source /b:Sources where every b:Source element represents a bibliographic source. For a description of the content of a b:Source element, you can check out section 7.6 of http://www.ecma-international.org/pu...MA-ST/Office%2... Then, when the sources need to be displayed, one of the stylesheets with the different bibliographic styles (APA, MLA, ...) gets a piece of XML containing one or more b:Source elements and outputs a piece of HTML. That HTML is then displayed in Word 2007 as your in-text citation or bibliography. Doing the reverse operation (at least that's what I think you try to do) is not possible. You could try to create your own parser for that but it seems overly complex to me. How would you expect a parser to be able to identify the type of a bibliographic entry: Book, BookSection, JournalArticle, ArticleInAPeriodical, ... And how would you make the difference between the different contributors of a work: Author, Artist, Editor, Translator, Writer, Producer, Performer, ... If references to a work are available online somewhere, it might be possible to more easily get them that way according to the following articlehttp://savas.parastatidis.name/2007/01/25/595c0ffb-6595-41bb-9f81-952.... However, I have never seen any code actually doing this and as far as I know, Microsoft pulled the plug on their academic search. BR, Yves --http://www.codeplex.com/bibliography |
#4
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
The software does not support what you want. And I think it is
doubtful it ever will. An "author" of a book is not necessarely represented by a b:Author/ b:Author element in your XML. If the book would be an edited book, than the author would be a b:Author/b:Editor element. If the book would be a translated work, the author would be a b:Author/ b:Translator element. So there are several options. And what if the book would be replaced with a film. Then you wouldn't have an author at all, you would have performers, directors, writers, and producers. You adding a code in front of every element of every entry is pretty much the same as putting each element in between xml tags. So I guess you could create the database yourself. You can always create a book and a journal article through Word 2007 and study the resulting sources.xml (located at %appdata%\Microsoft\Bibliography) and then copy paste all other data into that xml file. That should be a bit faster than copy/pasting everything into a source form for every entry. Yves grammatim schreef: I have, of course, many articles with bibliographies, and it would be a lot quicker to insert tab delimiters -- or even a code -- before each element in each entry in the list, so that the bibliography database could be created/added to (as if a computer were involved) without retyping every entry in toto I should hope the software is smart enough to realize that the "Author" of a journal article is the same sort of beast as the "Author" of a book! On Aug 14, 10:02 am, p0 wrote: On 14 aug, 13:53, grammatim wrote: The latest estimate is that I'll have my new computer (with Office 07) this afternoon ... Is there a way to convert an existingbibliography, i.e. formatted list of references, into a Word 2007 table of sources (or whatever it's called) -- are they, like, like Excel tables, or (heaven forfend) Access tables, so that something along the lines of tab delimiters might work? I'm not really sure what you are trying to accomplish. Some background: in Word 2007, bibliographic entries are actually stored inside a custom XML file in the docx. The file, often called item1.xml, has the following format: b:Sources SelectedStyle="\something.xsl" StyleName="A style called something" xmlns:b="http://schemas.openxmlformats.org/officeDocument/ 2006/bibliography" xmlns="http://schemas.openxmlformats.org/ officeDocument/2006/bibliography" b:Source...b:Source b:Source...b:Source b:Source...b:Source /b:Sources where every b:Source element represents a bibliographic source. For a description of the content of a b:Source element, you can check out section 7.6 of http://www.ecma-international.org/pu...MA-ST/Office%2... Then, when the sources need to be displayed, one of the stylesheets with the different bibliographic styles (APA, MLA, ...) gets a piece of XML containing one or more b:Source elements and outputs a piece of HTML. That HTML is then displayed in Word 2007 as your in-text citation or bibliography. Doing the reverse operation (at least that's what I think you try to do) is not possible. You could try to create your own parser for that but it seems overly complex to me. How would you expect a parser to be able to identify the type of a bibliographic entry: Book, BookSection, JournalArticle, ArticleInAPeriodical, ... And how would you make the difference between the different contributors of a work: Author, Artist, Editor, Translator, Writer, Producer, Performer, ... If references to a work are available online somewhere, it might be possible to more easily get them that way according to the following articlehttp://savas.parastatidis.name/2007/01/25/595c0ffb-6595-41bb-9f81-952.... However, I have never seen any code actually doing this and as far as I know, Microsoft pulled the plug on their academic search. BR, Yves --http://www.codeplex.com/bibliography |
#5
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On Aug 14, 6:34 pm, p0 wrote:
The software does not support what you want. And I think it is doubtful it ever will. An "author" of a book is not necessarely represented by a b:Author/ b:Author element in your XML. You seem to be overlooking the bit where I said this could be done by inserting a code, which would still be vastly preferable to retyping the entire contents of a full subject bibliography. If the book would be an edited book, than the author would be a b:Author/b:Editor element. If the book would be a translated work, the author would be a b:Author/ b:Translator element. If it were a proper relational database, then there would be a list of "names," and in any particular instance, a name could be an "author," an "editor," a "translator," or even some combination of the above. So there are several options. And what if the book would be replaced with a film. Then you wouldn't have an author at all, you would have performers, directors, writers, and producers. Each one of which is a "name," and which, again, could be associated with several of the above categories. You adding a code in front of every element of every entry is pretty much the same as putting each element in between xml tags. So I guess you could create the database yourself. You can always create a book and a journal article through Word 2007 and study the resulting sources.xml (located at %appdata%\Microsoft\Bibliography) and then copy paste all other data into that xml file. That should be a bit faster than copy/pasting everything into a source form for every entry. Yves grammatim schreef: I have, of course, many articles with bibliographies, and it would be a lot quicker to insert tab delimiters -- or even a code -- before each element in each entry in the list, so that the bibliography database could be created/added to (as if a computer were involved) without retyping every entry in toto I should hope the software is smart enough to realize that the "Author" of a journal article is the same sort of beast as the "Author" of a book! On Aug 14, 10:02 am, p0 wrote: On 14 aug, 13:53, grammatim wrote: The latest estimate is that I'll have my new computer (with Office 07) this afternoon ... Is there a way to convert an existingbibliography, i.e. formatted list of references, into a Word 2007 table of sources (or whatever it's called) -- are they, like, like Excel tables, or (heaven forfend) Access tables, so that something along the lines of tab delimiters might work? I'm not really sure what you are trying to accomplish. Some background: in Word 2007, bibliographic entries are actually stored inside a custom XML file in the docx. The file, often called item1.xml, has the following format: b:Sources SelectedStyle="\something.xsl" StyleName="A style called something" xmlns:b="http://schemas.openxmlformats.org/officeDocument/ 2006/bibliography" xmlns="http://schemas.openxmlformats.org/ officeDocument/2006/bibliography" b:Source...b:Source b:Source...b:Source b:Source...b:Source /b:Sources where every b:Source element represents a bibliographic source. For a description of the content of a b:Source element, you can check out section 7.6 of http://www.ecma-international.org/pu...MA-ST/Office%2... Then, when the sources need to be displayed, one of the stylesheets with the different bibliographic styles (APA, MLA, ...) gets a piece of XML containing one or more b:Source elements and outputs a piece of HTML. That HTML is then displayed in Word 2007 as your in-text citation or bibliography. Doing the reverse operation (at least that's what I think you try to do) is not possible. You could try to create your own parser for that but it seems overly complex to me. How would you expect a parser to be able to identify the type of a bibliographic entry: Book, BookSection, JournalArticle, ArticleInAPeriodical, ... And how would you make the difference between the different contributors of a work: Author, Artist, Editor, Translator, Writer, Producer, Performer, ... If references to a work are available online somewhere, it might be possible to more easily get them that way according to the following articlehttp://savas.parastatidis.name/2007/01/25/595c0ffb-6595-41bb-9f81-952.... However, I have never seen any code actually doing this and as far as I know, Microsoft pulled the plug on their academic search. BR, Yves --http://www.codeplex.com/bibliography |
#6
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
grammatim schreef: On Aug 14, 6:34 pm, p0 wrote: The software does not support what you want. And I think it is doubtful it ever will. An "author" of a book is not necessarely represented by a b:Author/ b:Author element in your XML. You seem to be overlooking the bit where I said this could be done by inserting a code, which would still be vastly preferable to retyping the entire contents of a full subject bibliography. I was talking about automatically processing lists without codes. The second part of my reply stated that as soon as you start using codes, you can just as well use xml tags and there would be no point I doing anything automatically. If the book would be an edited book, than the author would be a b:Author/b:Editor element. If the book would be a translated work, the author would be a b:Author/ b:Translator element. If it were a proper relational database, then there would be a list of "names," and in any particular instance, a name could be an "author," an "editor," a "translator," or even some combination of the above. I am not a specialist when it comes to relational database, but I agree that the current layout is not in full normal form. However the way the names are stored seems to be similar as with other programs (EndNote - http://www.endnote.com/support/helpdocs/endnote.zip). Personally, I also see no gain in going for a relational database in full normal form where names (b:Person elements) are put in a separate list. For starters, if you would share the same names across multiple sources, you would have to create a two-way link: source to one or more names, and name to one or more sources. The reason for the first link is obvious: indicating which names participated in the source. The second link is necessary in case you would remove a source. You would have to know if a name became obsolete or not. I would suspect the overhead being greater than the benefits for this case. Alternatively, if you would not share the names across sources, and just use them within one source, it is highly unlikely that one name will be used multiple times. There are exceptions such as the author of a book section also being the editor of the book, but in most cases the 'database' for a single source would actually be larger and require more processing to obtain the same result. So there are several options. And what if the book would be replaced with a film. Then you wouldn't have an author at all, you would have performers, directors, writers, and producers. Each one of which is a "name," and which, again, could be associated with several of the above categories. You adding a code in front of every element of every entry is pretty much the same as putting each element in between xml tags. So I guess you could create the database yourself. You can always create a book and a journal article through Word 2007 and study the resulting sources.xml (located at %appdata%\Microsoft\Bibliography) and then copy paste all other data into that xml file. That should be a bit faster than copy/pasting everything into a source form for every entry. Yves grammatim schreef: I have, of course, many articles with bibliographies, and it would be a lot quicker to insert tab delimiters -- or even a code -- before each element in each entry in the list, so that the bibliography database could be created/added to (as if a computer were involved) without retyping every entry in toto I should hope the software is smart enough to realize that the "Author" of a journal article is the same sort of beast as the "Author" of a book! On Aug 14, 10:02 am, p0 wrote: On 14 aug, 13:53, grammatim wrote: The latest estimate is that I'll have my new computer (with Office 07) this afternoon ... Is there a way to convert an existingbibliography, i.e. formatted list of references, into a Word 2007 table of sources (or whatever it's called) -- are they, like, like Excel tables, or (heaven forfend) Access tables, so that something along the lines of tab delimiters might work? I'm not really sure what you are trying to accomplish. Some background: in Word 2007, bibliographic entries are actually stored inside a custom XML file in the docx. The file, often called item1.xml, has the following format: b:Sources SelectedStyle="\something.xsl" StyleName="A style called something" xmlns:b="http://schemas.openxmlformats.org/officeDocument/ 2006/bibliography" xmlns="http://schemas.openxmlformats.org/ officeDocument/2006/bibliography" b:Source...b:Source b:Source...b:Source b:Source...b:Source /b:Sources where every b:Source element represents a bibliographic source. For a description of the content of a b:Source element, you can check out section 7.6 of http://www.ecma-international.org/pu...MA-ST/Office%2... Then, when the sources need to be displayed, one of the stylesheets with the different bibliographic styles (APA, MLA, ...) gets a piece of XML containing one or more b:Source elements and outputs a piece of HTML. That HTML is then displayed in Word 2007 as your in-text citation or bibliography. Doing the reverse operation (at least that's what I think you try to do) is not possible. You could try to create your own parser for that but it seems overly complex to me. How would you expect a parser to be able to identify the type of a bibliographic entry: Book, BookSection, JournalArticle, ArticleInAPeriodical, ... And how would you make the difference between the different contributors of a work: Author, Artist, Editor, Translator, Writer, Producer, Performer, ... If references to a work are available online somewhere, it might be possible to more easily get them that way according to the following articlehttp://savas.parastatidis.name/2007/01/25/595c0ffb-6595-41bb-9f81-952.... However, I have never seen any code actually doing this and as far as I know, Microsoft pulled the plug on their academic search. BR, Yves --http://www.codeplex.com/bibliography |
#7
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On Aug 15, 7:28 pm, p0 wrote:
grammatim schreef: On Aug 14, 6:34 pm, p0 wrote: The software does not support what you want. And I think it is doubtful it ever will. An "author" of a book is not necessarely represented by a b:Author/ b:Author element in your XML. You seem to be overlooking the bit where I said this could be done by inserting a code, which would still be vastly preferable to retyping the entire contents of a full subject bibliography. I was talking about automatically processing lists without codes. The second part of my reply stated that as soon as you start using codes, you can just as well use xml tags and there would be no point I doing anything automatically. How is an "xml tag" not a code? How is typing (or pasting) a few-letter code not better than typing entire names, titles, etc.? If the book would be an edited book, than the author would be a b:Author/b:Editor element. If the book would be a translated work, the author would be a b:Author/ b:Translator element. If it were a proper relational database, then there would be a list of "names," and in any particular instance, a name could be an "author," an "editor," a "translator," or even some combination of the above. I am not a specialist when it comes to relational database, but I agree that the current layout is not in full normal form. However the way the names are stored seems to be similar as with other programs (EndNote -http://www.endnote.com/support/helpdocs/endnote.zip). Personally, I also see no gain in going for a relational database in full normal form where names (b:Person elements) are put in a separate list. For starters, if you would share the same names across multiple sources, you would have to create a two-way link: source to one or more names, and name to one or more sources. The reason for the first link is obvious: indicating which names participated in the source. The second link is necessary in case you would remove a source. You would have to know if a name became obsolete or not. I would suspect the overhead being greater than the benefits for this case. Not sure why I'd remove a source ... Alternatively, if you would not share the names across sources, and just use them within one source, it is highly unlikely that one name will be used multiple times. There are exceptions such as the author of a book section also being the editor of the book, but in most cases the 'database' for a single source would actually be larger and require more processing to obtain the same result. So there are several options. And what if the book would be replaced with a film. Then you wouldn't have an author at all, you would have performers, directors, writers, and producers. Each one of which is a "name," and which, again, could be associated with several of the above categories. You adding a code in front of every element of every entry is pretty much the same as putting each element in between xml tags. So I guess you could create the database yourself. You can always create a book and a journal article through Word 2007 and study the resulting sources.xml (located at %appdata%\Microsoft\Bibliography) and then copy paste all other data into that xml file. That should be a bit faster than copy/pasting everything into a source form for every entry. Yves grammatim schreef: I have, of course, many articles with bibliographies, and it would be a lot quicker to insert tab delimiters -- or even a code -- before each element in each entry in the list, so that the bibliography database could be created/added to (as if a computer were involved) without retyping every entry in toto I should hope the software is smart enough to realize that the "Author" of a journal article is the same sort of beast as the "Author" of a book! On Aug 14, 10:02 am, p0 wrote: On 14 aug, 13:53, grammatim wrote: The latest estimate is that I'll have my new computer (with Office 07) this afternoon ... Is there a way to convert an existingbibliography, i.e. formatted list of references, into a Word 2007 table of sources (or whatever it's called) -- are they, like, like Excel tables, or (heaven forfend) Access tables, so that something along the lines of tab delimiters might work? I'm not really sure what you are trying to accomplish. Some background: in Word 2007, bibliographic entries are actually stored inside a custom XML file in the docx. The file, often called item1.xml, has the following format: b:Sources SelectedStyle="\something.xsl" StyleName="A style called something" xmlns:b="http://schemas.openxmlformats.org/officeDocument/ 2006/bibliography" xmlns="http://schemas.openxmlformats.org/ officeDocument/2006/bibliography" b:Source...b:Source b:Source...b:Source b:Source...b:Source /b:Sources where every b:Source element represents a bibliographic source. For a description of the content of a b:Source element, you can check out section 7.6 of http://www.ecma-international.org/pu...MA-ST/Office%2... Then, when the sources need to be displayed, one of the stylesheets with the different bibliographic styles (APA, MLA, ...) gets a piece of XML containing one or more b:Source elements and outputs a piece of HTML. That HTML is then displayed in Word 2007 as your in-text citation or bibliography. Doing the reverse operation (at least that's what I think you try to do) is not possible. You could try to create your own parser for that but it seems overly complex to me. How would you expect a parser to be able to identify the type of a bibliographic entry: Book, BookSection, JournalArticle, ArticleInAPeriodical, ... And how would you make the difference between the different contributors of a work: Author, Artist, Editor, Translator, Writer, Producer, Performer, ... If references to a work are available online somewhere, it might be possible to more easily get them that way according to the following articlehttp://savas.parastatidis.name/2007/01/25/595c0ffb-6595-41bb-9f81-952.... However, I have never seen any code actually doing this and as far as I know, Microsoft pulled the plug on their academic search. BR, Yves --http://www.codeplex.com/bibliography |
#8
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
grammatim schreef: On Aug 15, 7:28 pm, p0 wrote: grammatim schreef: On Aug 14, 6:34 pm, p0 wrote: The software does not support what you want. And I think it is doubtful it ever will. An "author" of a book is not necessarely represented by a b:Author/ b:Author element in your XML. You seem to be overlooking the bit where I said this could be done by inserting a code, which would still be vastly preferable to retyping the entire contents of a full subject bibliography. I was talking about automatically processing lists without codes. The second part of my reply stated that as soon as you start using codes, you can just as well use xml tags and there would be no point I doing anything automatically. How is an "xml tag" not a code? It is a code, the point is, once you add the xml tags, there is no more need for processing. You have done the entire processing by hand. All that is left to do is copy/pasting the sources into the database and you are done. How is typing (or pasting) a few-letter code not better than typing entire names, titles, etc.? Xml was intended to be human readable. The disadvantage is that tags tend to be long, the major advantage is that you don't have to learn a few dozen non-descriptive codes by heart (would @c be city, comments, chapternumber, conferencename, country, court, or casenumber?). Also the use of closing tags is a good way in getting around punctuation issues. The following simple example: John Doe. "A book by myself". London, 2008. would result in the following string with xml 'codes': b:Sourceb:SourceTypeBook/ b:SourceTypeb:Authorb:Authorb:NameListb:Pe rsonb:FirstJohn/ b:Firstb:LastDoe/b:Last/b:Person/b:NameList/b:Author/ b:Authorb:Garbage. "/b:Garbageb:TitleA book by myself/ b:Titleb:Garbage". /b:Garbageb:CityLondon/b:Cityb:Garbage, /b:Garbageb:Year2008/b:Yearb:Garbage./b:Garbageb:Source where b:Garbage elements are meaningless elements and should be removed from the source at a later stage. Do you really think typing/ copying/pasting out all those 'codes' is easier then copy/pasting 5 elements into a form? If the book would be an edited book, than the author would be a b:Author/b:Editor element. If the book would be a translated work, the author would be a b:Author/ b:Translator element. If it were a proper relational database, then there would be a list of "names," and in any particular instance, a name could be an "author," an "editor," a "translator," or even some combination of the above. I am not a specialist when it comes to relational database, but I agree that the current layout is not in full normal form. However the way the names are stored seems to be similar as with other programs (EndNote -http://www.endnote.com/support/helpdocs/endnote.zip). Personally, I also see no gain in going for a relational database in full normal form where names (b:Person elements) are put in a separate list. For starters, if you would share the same names across multiple sources, you would have to create a two-way link: source to one or more names, and name to one or more sources. The reason for the first link is obvious: indicating which names participated in the source. The second link is necessary in case you would remove a source. You would have to know if a name became obsolete or not. I would suspect the overhead being greater than the benefits for this case. Not sure why I'd remove a source ... Space constraints, access to better sources, overlap, ... Alternatively, if you would not share the names across sources, and just use them within one source, it is highly unlikely that one name will be used multiple times. There are exceptions such as the author of a book section also being the editor of the book, but in most cases the 'database' for a single source would actually be larger and require more processing to obtain the same result. So there are several options. And what if the book would be replaced with a film. Then you wouldn't have an author at all, you would have performers, directors, writers, and producers. Each one of which is a "name," and which, again, could be associated with several of the above categories. You adding a code in front of every element of every entry is pretty much the same as putting each element in between xml tags. So I guess you could create the database yourself. You can always create a book and a journal article through Word 2007 and study the resulting sources.xml (located at %appdata%\Microsoft\Bibliography) and then copy paste all other data into that xml file. That should be a bit faster than copy/pasting everything into a source form for every entry. Yves grammatim schreef: I have, of course, many articles with bibliographies, and it would be a lot quicker to insert tab delimiters -- or even a code -- before each element in each entry in the list, so that the bibliography database could be created/added to (as if a computer were involved) without retyping every entry in toto I should hope the software is smart enough to realize that the "Author" of a journal article is the same sort of beast as the "Author" of a book! On Aug 14, 10:02 am, p0 wrote: On 14 aug, 13:53, grammatim wrote: The latest estimate is that I'll have my new computer (with Office 07) this afternoon ... Is there a way to convert an existingbibliography, i.e. formatted list of references, into a Word 2007 table of sources (or whatever it's called) -- are they, like, like Excel tables, or (heaven forfend) Access tables, so that something along the lines of tab delimiters might work? I'm not really sure what you are trying to accomplish. Some background: in Word 2007, bibliographic entries are actually stored inside a custom XML file in the docx. The file, often called item1.xml, has the following format: b:Sources SelectedStyle="\something.xsl" StyleName="A style called something" xmlns:b="http://schemas.openxmlformats.org/officeDocument/ 2006/bibliography" xmlns="http://schemas.openxmlformats.org/ officeDocument/2006/bibliography" b:Source...b:Source b:Source...b:Source b:Source...b:Source /b:Sources where every b:Source element represents a bibliographic source. For a description of the content of a b:Source element, you can check out section 7.6 of http://www.ecma-international.org/pu...MA-ST/Office%2... Then, when the sources need to be displayed, one of the stylesheets with the different bibliographic styles (APA, MLA, ...) gets a piece of XML containing one or more b:Source elements and outputs a piece of HTML. That HTML is then displayed in Word 2007 as your in-text citation or bibliography. Doing the reverse operation (at least that's what I think you try to do) is not possible. You could try to create your own parser for that but it seems overly complex to me. How would you expect a parser to be able to identify the type of a bibliographic entry: Book, BookSection, JournalArticle, ArticleInAPeriodical, ... And how would you make the difference between the different contributors of a work: Author, Artist, Editor, Translator, Writer, Producer, Performer, ... If references to a work are available online somewhere, it might be possible to more easily get them that way according to the following articlehttp://savas.parastatidis.name/2007/01/25/595c0ffb-6595-41bb-9f81-952.... However, I have never seen any code actually doing this and as far as I know, Microsoft pulled the plug on their academic search. BR, Yves --http://www.codeplex.com/bibliography |
#9
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On Aug 16, 8:23 am, p0 wrote:
grammatim schreef: On Aug 15, 7:28 pm, p0 wrote: grammatim schreef: On Aug 14, 6:34 pm, p0 wrote: The software does not support what you want. And I think it is doubtful it ever will. An "author" of a book is not necessarely represented by a b:Author/ b:Author element in your XML. You seem to be overlooking the bit where I said this could be done by inserting a code, which would still be vastly preferable to retyping the entire contents of a full subject bibliography. I was talking about automatically processing lists without codes. The second part of my reply stated that as soon as you start using codes, you can just as well use xml tags and there would be no point I doing anything automatically. How is an "xml tag" not a code? It is a code, the point is, once you add the xml tags, there is no more need for processing. You have done the entire processing by hand. All that is left to do is copy/pasting the sources into the database and you are done. How is typing (or pasting) a few-letter code not better than typing entire names, titles, etc.? Xml was intended to be human readable. The disadvantage is that tags tend to be long, the major advantage is that you don't have to learn a few dozen non-descriptive codes by heart (would @c be city, comments, chapternumber, conferencename, country, court, or casenumber?). Also the use of closing tags is a good way in getting around punctuation issues. The following simple example: John Doe. "A book by myself". London, 2008. would result in the following string with xml 'codes': b:Sourceb:SourceTypeBook/ b:SourceTypeb:Authorb:Authorb:NameListb:Pe rsonb:FirstJohn/ b:Firstb:LastDoe/b:Last/b:Person/b:NameList/b:Author/ b:Authorb:Garbage. "/b:Garbageb:TitleA book by myself/ b:Titleb:Garbage". /b:Garbageb:CityLondon/b:Cityb:Garbage, /b:Garbageb:Year2008/b:Yearb:Garbage./b:Garbageb:Source What a stupid system. I would expect something like auDoe, John/autiA Book by Myself/tiplLondon/plpuSmith & Wesson/puyr2008/yr And /yr isn't needed because it's always 4 digits, and the place and publisher would use codes rather than spelling out: plLpuSW. au is known to always select an item from the Name list -- as are also ed, tr, etc. where b:Garbage elements are meaningless elements and should be removed from the source at a later stage. Do you really think typing/ copying/pasting out all those 'codes' is easier then copy/pasting 5 elements into a form? A boo with a 20-word title can have six editors. If the book would be an edited book, than the author would be a b:Author/b:Editor element. If the book would be a translated work, the author would be a b:Author/ b:Translator element. If it were a proper relational database, then there would be a list of "names," and in any particular instance, a name could be an "author," an "editor," a "translator," or even some combination of the above. I am not a specialist when it comes to relational database, but I agree that the current layout is not in full normal form. However the way the names are stored seems to be similar as with other programs (EndNote -http://www.endnote.com/support/helpdocs/endnote.zip). Personally, I also see no gain in going for a relational database in full normal form where names (b:Person elements) are put in a separate list. For starters, if you would share the same names across multiple sources, you would have to create a two-way link: source to one or more names, and name to one or more sources. The reason for the first link is obvious: indicating which names participated in the source. The second link is necessary in case you would remove a source. You would have to know if a name became obsolete or not. I would suspect the overhead being greater than the benefits for this case. Not sure why I'd remove a source ... Space constraints, access to better sources, overlap, ... This is scholarship, not a public lending library that only has room for x number of books on its shelves. Alternatively, if you would not share the names across sources, and just use them within one source, it is highly unlikely that one name will be used multiple times. There are exceptions such as the author of a book section also being the editor of the book, but in most cases the 'database' for a single source would actually be larger and require more processing to obtain the same result. So there are several options. And what if the book would be replaced with a film. Then you wouldn't have an author at all, you would have performers, directors, writers, and producers. Each one of which is a "name," and which, again, could be associated with several of the above categories. You adding a code in front of every element of every entry is pretty much the same as putting each element in between xml tags. So I guess you could create the database yourself. You can always create a book and a journal article through Word 2007 and study the resulting sources.xml (located at %appdata%\Microsoft\Bibliography) and then copy paste all other data into that xml file. That should be a bit faster than copy/pasting everything into a source form for every entry. Yves grammatim schreef: I have, of course, many articles with bibliographies, and it would be a lot quicker to insert tab delimiters -- or even a code -- before each element in each entry in the list, so that the bibliography database could be created/added to (as if a computer were involved) without retyping every entry in toto I should hope the software is smart enough to realize that the "Author" of a journal article is the same sort of beast as the "Author" of a book! On Aug 14, 10:02 am, p0 wrote: On 14 aug, 13:53, grammatim wrote: The latest estimate is that I'll have my new computer (with Office 07) this afternoon ... Is there a way to convert an existingbibliography, i.e. formatted list of references, into a Word 2007 table of sources (or whatever it's called) -- are they, like, like Excel tables, or (heaven forfend) Access tables, so that something along the lines of tab delimiters might work? I'm not really sure what you are trying to accomplish. Some background: in Word 2007, bibliographic entries are actually stored inside a custom XML file in the docx. The file, often called item1.xml, has the following format: b:Sources SelectedStyle="\something.xsl" StyleName="A style called something" xmlns:b="http://schemas.openxmlformats.org/officeDocument/ 2006/bibliography" xmlns="http://schemas.openxmlformats.org/ officeDocument/2006/bibliography" b:Source...b:Source b:Source...b:Source b:Source...b:Source /b:Sources where every b:Source element represents a bibliographic source. For a description of the content of a b:Source element, you can check out section 7.6 of http://www.ecma-international.org/pu...MA-ST/Office%2... Then, when the sources need to be displayed, one of the stylesheets with the different bibliographic styles (APA, MLA, ...) gets a piece of XML containing one or more b:Source elements and outputs a piece of HTML. That HTML is then displayed in Word 2007 as your in-text citation or bibliography. Doing the reverse operation (at least that's what I think you try to do) is not possible. You could try to create your own parser for that but it seems overly complex to me. How would you expect a parser to be able to identify the type of a bibliographic entry: Book, BookSection, JournalArticle, ArticleInAPeriodical, ... And how would you make the difference between the different contributors of a work: Author, Artist, Editor, Translator, Writer, Producer, Performer, ... If references to a work are available online somewhere, it might be possible to more easily get them that way according to the following articlehttp://savas.parastatidis.name/2007/01/25/595c0ffb-6595-41bb-9f81-952.... However, I have never seen any code actually doing this and as far as I know, Microsoft pulled the plug on their academic search. BR, Yves --http://www.codeplex.com/bibliography |
#10
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
grammatim schreef: On Aug 16, 8:23 am, p0 wrote: grammatim schreef: On Aug 15, 7:28 pm, p0 wrote: grammatim schreef: On Aug 14, 6:34 pm, p0 wrote: The software does not support what you want. And I think it is doubtful it ever will. An "author" of a book is not necessarely represented by a b:Author/ b:Author element in your XML. You seem to be overlooking the bit where I said this could be done by inserting a code, which would still be vastly preferable to retyping the entire contents of a full subject bibliography. I was talking about automatically processing lists without codes. The second part of my reply stated that as soon as you start using codes, you can just as well use xml tags and there would be no point I doing anything automatically. How is an "xml tag" not a code? It is a code, the point is, once you add the xml tags, there is no more need for processing. You have done the entire processing by hand. All that is left to do is copy/pasting the sources into the database and you are done. How is typing (or pasting) a few-letter code not better than typing entire names, titles, etc.? Xml was intended to be human readable. The disadvantage is that tags tend to be long, the major advantage is that you don't have to learn a few dozen non-descriptive codes by heart (would @c be city, comments, chapternumber, conferencename, country, court, or casenumber?). Also the use of closing tags is a good way in getting around punctuation issues. The following simple example: John Doe. "A book by myself". London, 2008. would result in the following string with xml 'codes': b:Sourceb:SourceTypeBook/ b:SourceTypeb:Authorb:Authorb:NameListb:Pe rsonb:FirstJohn/ b:Firstb:LastDoe/b:Last/b:Person/b:NameList/b:Author/ b:Authorb:Garbage. "/b:Garbageb:TitleA book by myself/ b:Titleb:Garbage". /b:Garbageb:CityLondon/b:Cityb:Garbage, /b:Garbageb:Year2008/b:Yearb:Garbage./b:Garbageb:Source What a stupid system. I didn't design the xml schema for bibliographies, you will have to take that one up with Microsoft :-). On a side note, the beauty of custom xml in ooxml is that you can define your own way of storing data. And you don't even have to stick to xml: you can store binary data in an xml file. So if you really are unhappy with the format, you can easily extend Word with your own set of bibliographic tools. I would expect something like auDoe, John/autiA Book by Myself/tiplLondon/plpuSmith & Wesson/puyr2008/yr And /yr isn't needed because it's always 4 digits, and the place and publisher would use codes rather than spelling out: plLpuSW. au is known to always select an item from the Name list -- as are also ed, tr, etc. What would "ed" be? editor? edition? The entire point of using full discriptive names in tags rather than crafty shortcuts is to make things clear for the people who have to add them. Yes you will have to type more, but at least elements will be defined in such a way that there is no confusion for the user. And for non-english speaking people, full words are a lot easier to understand than shady abbreviations. In your au how would you see the difference between first, middle and last names? And what if your author is a corporation? In that case, it wouldn't be part of a namelist. And a year is not always displayed with 4 digits, some styles require you to only print 2. And what if a range of years would be entered? 2008-2009 or 2008-09 or 08-09 ... And I haven't come across it, but I wouldn't be surprised if some crazy citation style requires you to enter the year in roman numerals MMVIII. The point is, closing tags are necessary to define boundaries. In your version you are already conveniently letting out punctuation. Why would L represent London? To me, it represents Leichester. Once again, the small gain you can get with your code is hardly worth the effort and confusion you introduce. BibTeX allows for the usage of codes. So there you could define L for London. But it is never used like that. The only usage I have seen of codes is for the abbreviation of journal names (which is really a small bunch since they are grouped on topic) and the localization of month names. You can come up with dozens of shortcuts to store and process bibliographic data but with every shortcut you introduce, you get rid of functionality that others might need, and/or trade in usability for (non-expert) users. where b:Garbage elements are meaningless elements and should be removed from the source at a later stage. Do you really think typing/ copying/pasting out all those 'codes' is easier then copy/pasting 5 elements into a form? A boo with a 20-word title can have six editors. If the book would be an edited book, than the author would be a b:Author/b:Editor element. If the book would be a translated work, the author would be a b:Author/ b:Translator element. If it were a proper relational database, then there would be a list of "names," and in any particular instance, a name could be an "author," an "editor," a "translator," or even some combination of the above. I am not a specialist when it comes to relational database, but I agree that the current layout is not in full normal form. However the way the names are stored seems to be similar as with other programs (EndNote -http://www.endnote.com/support/helpdocs/endnote.zip). Personally, I also see no gain in going for a relational database in full normal form where names (b:Person elements) are put in a separate list. For starters, if you would share the same names across multiple sources, you would have to create a two-way link: source to one or more names, and name to one or more sources. The reason for the first link is obvious: indicating which names participated in the source. The second link is necessary in case you would remove a source. You would have to know if a name became obsolete or not. I would suspect the overhead being greater than the benefits for this case. Not sure why I'd remove a source ... Space constraints, access to better sources, overlap, ... This is scholarship, not a public lending library that only has room for x number of books on its shelves. Conference papers are limited in number of pages. And if you have to pick between reporting your data or having an extra reference, the reference is normally the first to go. Alternatively, if you would not share the names across sources, and just use them within one source, it is highly unlikely that one name will be used multiple times. There are exceptions such as the author of a book section also being the editor of the book, but in most cases the 'database' for a single source would actually be larger and require more processing to obtain the same result. So there are several options. And what if the book would be replaced with a film. Then you wouldn't have an author at all, you would have performers, directors, writers, and producers. Each one of which is a "name," and which, again, could be associated with several of the above categories. You adding a code in front of every element of every entry is pretty much the same as putting each element in between xml tags. So I guess you could create the database yourself. You can always create a book and a journal article through Word 2007 and study the resulting sources.xml (located at %appdata%\Microsoft\Bibliography) and then copy paste all other data into that xml file. That should be a bit faster than copy/pasting everything into a source form for every entry. Yves grammatim schreef: I have, of course, many articles with bibliographies, and it would be a lot quicker to insert tab delimiters -- or even a code -- before each element in each entry in the list, so that the bibliography database could be created/added to (as if a computer were involved) without retyping every entry in toto I should hope the software is smart enough to realize that the "Author" of a journal article is the same sort of beast as the "Author" of a book! On Aug 14, 10:02 am, p0 wrote: On 14 aug, 13:53, grammatim wrote: The latest estimate is that I'll have my new computer (with Office 07) this afternoon ... Is there a way to convert an existingbibliography, i.e. formatted list of references, into a Word 2007 table of sources (or whatever it's called) -- are they, like, like Excel tables, or (heaven forfend) Access tables, so that something along the lines of tab delimiters might work? I'm not really sure what you are trying to accomplish. Some background: in Word 2007, bibliographic entries are actually stored inside a custom XML file in the docx. The file, often called item1.xml, has the following format: b:Sources SelectedStyle="\something.xsl" StyleName="A style called something" xmlns:b="http://schemas.openxmlformats.org/officeDocument/ 2006/bibliography" xmlns="http://schemas.openxmlformats.org/ officeDocument/2006/bibliography" b:Source...b:Source b:Source...b:Source b:Source...b:Source /b:Sources where every b:Source element represents a bibliographic source. For a description of the content of a b:Source element, you can check out section 7.6 of http://www.ecma-international.org/pu...MA-ST/Office%2... Then, when the sources need to be displayed, one of the stylesheets with the different bibliographic styles (APA, MLA, ...) gets a piece of XML containing one or more b:Source elements and outputs a piece of HTML. That HTML is then displayed in Word 2007 as your in-text citation or bibliography. Doing the reverse operation (at least that's what I think you try to do) is not possible. You could try to create your own parser for that but it seems overly complex to me. How would you expect a parser to be able to identify the type of a bibliographic entry: Book, BookSection, JournalArticle, ArticleInAPeriodical, ... And how would you make the difference between the different contributors of a work: Author, Artist, Editor, Translator, Writer, Producer, Performer, ... If references to a work are available online somewhere, it might be possible to more easily get them that way according to the following articlehttp://savas.parastatidis.name/2007/01/25/595c0ffb-6595-41bb-9f81-952.... However, I have never seen any code actually doing this and as far as I know, Microsoft pulled the plug on their academic search. BR, Yves --http://www.codeplex.com/bibliography |
#11
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On Aug 16, 7:33 pm, p0 wrote:
grammatim schreef: On Aug 16, 8:23 am, p0 wrote: grammatim schreef: On Aug 15, 7:28 pm, p0 wrote: grammatim schreef: On Aug 14, 6:34 pm, p0 wrote: The software does not support what you want. And I think it is doubtful it ever will. An "author" of a book is not necessarely represented by a b:Author/ b:Author element in your XML. You seem to be overlooking the bit where I said this could be done by inserting a code, which would still be vastly preferable to retyping the entire contents of a full subject bibliography. I was talking about automatically processing lists without codes. The second part of my reply stated that as soon as you start using codes, you can just as well use xml tags and there would be no point I doing anything automatically. How is an "xml tag" not a code? It is a code, the point is, once you add the xml tags, there is no more need for processing. You have done the entire processing by hand.. All that is left to do is copy/pasting the sources into the database and you are done. How is typing (or pasting) a few-letter code not better than typing entire names, titles, etc.? Xml was intended to be human readable. The disadvantage is that tags tend to be long, the major advantage is that you don't have to learn a few dozen non-descriptive codes by heart (would @c be city, comments, chapternumber, conferencename, country, court, or casenumber?). Also the use of closing tags is a good way in getting around punctuation issues. The following simple example: John Doe. "A book by myself". London, 2008. would result in the following string with xml 'codes': b:Sourceb:SourceTypeBook/ b:SourceTypeb:Authorb:Authorb:NameListb:Pe rsonb:FirstJohn/ b:Firstb:LastDoe/b:Last/b:Person/b:NameList/b:Author/ b:Authorb:Garbage. "/b:Garbageb:TitleA book by myself/ b:Titleb:Garbage". /b:Garbageb:CityLondon/b:Cityb:Garbage, /b:Garbageb:Year2008/b:Yearb:Garbage./b:Garbageb:Source What a stupid system. I didn't design the xml schema for bibliographies, you will have to take that one up with Microsoft :-). That's why it wasn't rude to call it stupid! On a side note, the beauty of custom xml in ooxml is that you can define your own way of storing data. And you don't even have to stick to xml: you can store binary data in an xml file. So if you really are unhappy with the format, you can easily extend Word with your own set of bibliographic tools. I don't know what any of that means. I would expect something like auDoe, John/autiA Book by Myself/tiplLondon/plpuSmith & Wesson/puyr2008/yr And /yr isn't needed because it's always 4 digits, and the place and publisher would use codes rather than spelling out: plLpuSW. au is known to always select an item from the Name list -- as are also ed, tr, etc. What would "ed" be? editor? edition? ed vs. edn The entire point of using full discriptive names in tags rather than crafty shortcuts is to make things clear for the people who have to add them. But the people shouldn't ever need to see them! They should see a form to fill in, with each slot labeled with the category that goes in it. "Author" would have a drop-down list of all Names, since most subject bibliographies involve several works by the same person. (Likewise for "Place" and "Publisher.") Yes you will have to type more, but at least elements will be defined in such a way that there is no confusion for the user. And for non-english speaking people, full words are a lot easier to understand than shady abbreviations. Not at all problem if you have an internationalizationized, or whatever they call it, interface. In your au how would you see the difference between first, middle and last names? And what if your author is a corporation? In that case, it wouldn't be part of a namelist. Corporations don't author scholarly works. And a year is not always displayed with 4 digits, some styles require you to only print 2. And what if a range of years would be entered? 2008-2009 or 2008-09 or 08-09 ... And I haven't come across it, but I wouldn't be surprised if some crazy citation style requires you to enter the year in roman numerals MMVIII. The point is, closing tags are necessary to define boundaries. In your version you are already conveniently letting out punctuation. Why would L represent London? To me, it represents Leichester. How many publishers are headquartered in Leichester, wherever that is? Once again, the small gain you can get with your code is hardly worth the effort and confusion you introduce. BibTeX allows for the usage of codes. So there you could define L for London. But it is never used like that. The only usage I have seen of codes is for the abbreviation of journal names (which is really a small bunch since they are grouped on topic) and the localization of month names. You can come up with dozens of shortcuts to store and process bibliographic data but with every shortcut you introduce, you get rid of functionality that others might need, and/or trade in usability for (non-expert) users. Have a look at the, alas, defunct Mac program Papyrus (it wasn't worth the effort for the creator to adapt it for OS X, so he just offers it as freeware to anyone with a "legacy system," but its discussion list was still active back when I had to abandon the Mac, two+ years ago). where b:Garbage elements are meaningless elements and should be removed from the source at a later stage. Do you really think typing/ copying/pasting out all those 'codes' is easier then copy/pasting 5 elements into a form? A boo with a 20-word title can have six editors. If the book would be an edited book, than the author would be a b:Author/b:Editor element. If the book would be a translated work, the author would be a b:Author/ b:Translator element. If it were a proper relational database, then there would be a list of "names," and in any particular instance, a name could be an "author," an "editor," a "translator," or even some combination of the above. I am not a specialist when it comes to relational database, but I agree that the current layout is not in full normal form. However the way the names are stored seems to be similar as with other programs (EndNote -http://www.endnote.com/support/helpdocs/endnote.zip). Personally, I also see no gain in going for a relational database in full normal form where names (b:Person elements) are put in a separate list. For starters, if you would share the same names across multiple sources, you would have to create a two-way link: source to one or more names, and name to one or more sources. The reason for the first link is obvious: indicating which names participated in the source. The second link is necessary in case you would remove a source. You would have to know if a name became obsolete or not. I would suspect the overhead being greater than the benefits for this case. Not sure why I'd remove a source ... Space constraints, access to better sources, overlap, ... This is scholarship, not a public lending library that only has room for x number of books on its shelves. Conference papers are limited in number of pages. And if you have to pick between reporting your data or having an extra reference, the reference is normally the first to go. We're talking about a bibliographic database, not a list of references. read more » [I have to Send before I can see if you've added anything below here] |
#12
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
I'm stripping parts from the original message as it has become too
large to process decently. On a side note, the beauty of custom xml in ooxml is that you can define your own way of storing data. And you don't even have to stick to xml: you can store binary data in an xml file. So if you really are unhappy with the format, you can easily extend Word with your own set of bibliographic tools. I don't know what any of that means. Well, if you are concerned with size (little tags rather than big ones), you are in for a surprise, your Word document actually contains all bibliographic data twice (talking about overkill). What you see as a docx file is nothing more than a zip-file. So if you change the extension from docx to zip (make sure you have a backup), you can use the compressed folders utility from Windows or an external program such as WinRAR or WinZip to extract the contents of your document. In it, you will normally find a file item1.xml in the customXml directory. That is actually an xml notation of all the bibliographic data in your source. You will also find a document.xml file in the word directory. That file contains your entire text including your well-formatted bibliography (no longer in xml format). It is nice separation between the data and the view on the data. So what I meant was, if you aren't happy with the current internal data layout, you can very well define your own layout and then format the data in the document.xml according to your layout (stored in your version of item1.xml) and preferences. What would "ed" be? editor? edition? ed vs. edn And then I would think about "editorial notes". Really, shortcutting data entries to save space is, in my personal opinion, about the worst thing you can do. EndNote allows importing data based on shortcut codes. But once imported, the data is once again stored in 'understandable' xml as it should be done. And luckely for that, because nobody without a decent manual would be able to figure out that %I is actually the field representing the publisher. The entire point of using full discriptive names in tags rather than crafty shortcuts is to make things clear for the people who have to add them. But the people shouldn't ever need to see them! They should see a form to fill in, with each slot labeled with the category that goes in it. "Author" would have a drop-down list of all Names, since most subject bibliographies involve several works by the same person. (Likewise for "Place" and "Publisher.") Yes you will have to type more, but at least elements will be defined in such a way that there is no confusion for the user. And for non-english speaking people, full words are a lot easier to understand than shady abbreviations. Not at all problem if you have an internationalizationized, or whatever they call it, interface. That's what the source form (insert new citation) is for in Word 2007. Check your computer for a bibform.xml, if you are using an en-us version of word, it should be in word directory\1033\bibliography \bibform.xml. For other languages, you will have to replace 1033 with your local culture id (lcid). The file contains a mapping of localized strings (Label element) to xml tags (DataTag element). On a side note, the bibform.xml claims to follow the bibliography xml schema (default namespace) but it is not doing so since the schema does not define anything about the mapping. Have a look at the, alas, defunct Mac program Papyrus (it wasn't worth the effort for the creator to adapt it for OS X, so he just offers it as freeware to anyone with a "legacy system," but its discussion list was still active back when I had to abandon the Mac, two+ years ago). The setup of this tool is totally different, this is a tool for storing and searching bibliographic information, even entire libraries. As a side product, it also allows you to format the output a bit. Microsoft's tool is intended only for providing formatted output. They don't care about maintaining a library where you can find stuff by keywords or authors or ... But all this is besides the point, the original topic was about adding textual sources to your document in an automated way. I have seen some tools for converting BibTeX or EndNote files into Word 2007 sources. And you can always create a converter which translates your home-made format into Microsoft's format, but you can't expect Microsoft to support your format by default. They have a format, and you either stick to it, or you design something else (which is pretty easy using custom xml). The choice is up to you. |
#13
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On Aug 17, 8:49 am, p0 wrote:
I'm stripping parts from the original message as it has become too large to process decently. Quite! On a side note, the beauty of custom xml in ooxml is that you can define your own way of storing data. And you don't even have to stick to xml: you can store binary data in an xml file. So if you really are unhappy with the format, you can easily extend Word with your own set of bibliographic tools. I don't know what any of that means. Well, if you are concerned with size (little tags rather than big ones), you are in for a surprise, your Word document actually contains all bibliographic data twice (talking about overkill). What you see as a docx file is nothing more than a zip-file. So if you change the extension from docx to zip (make sure you have a backup), you can use the compressed folders utility from Windows or an external program such as WinRAR or WinZip to extract the contents of your document. In it, you will normally find a file item1.xml in the customXml directory. That is actually an xml notation of all the bibliographic data in your source. You will also find a document.xml file in the word directory. That file contains your entire text including your well-formatted bibliography (no longer in xml format). It is nice separation between the data and the view on the data. So what I meant was, if you aren't happy with the current internal data layout, you can very well define your own layout and then format the data in the document.xml according to your layout (stored in your version of item1.xml) and preferences. I've no idea what the current internal data layout may be, nor should I. As an end user, I expect the product to work as it should. What would "ed" be? editor? edition? ed vs. edn And then I would think about "editorial notes". Sorry, but "editorial notes" is not a category that appears in a bibliography. It loos as though you are looking for details to complain about, rather than understanding the user's needs. Really, shortcutting data entries to save space is, in my personal opinion, about the worst thing you can do. Not sure what "data entries" are, but if you're referring to entering data, you're wrong. EndNote allows importing data based on shortcut codes. But once imported, the data is once again stored in 'understandable' xml as it should be done. And luckely for that, because nobody without a decent manual would be able to figure out that %I is actually the field representing the publisher. Why would anyone ever need to "figure out" such a thing? The entire point of using full discriptive names in tags rather than crafty shortcuts is to make things clear for the people who have to add them. But the people shouldn't ever need to see them! They should see a form to fill in, with each slot labeled with the category that goes in it. "Author" would have a drop-down list of all Names, since most subject bibliographies involve several works by the same person. (Likewise for "Place" and "Publisher.") Yes you will have to type more, but at least elements will be defined in such a way that there is no confusion for the user. And for non-english speaking people, full words are a lot easier to understand than shady abbreviations. Not at all problem if you have an internationalizationized, or whatever they call it, interface. That's what the source form (insert new citation) is for in Word 2007. Check your computer for a bibform.xml, if you are using an en-us version of word, it should be in word directory\1033\bibliography \bibform.xml. For other languages, you will have to replace 1033 with your local culture id (lcid). The file contains a mapping of localized strings (Label element) to xml tags (DataTag element). On a side note, the bibform.xml claims to follow the bibliography xml schema (default namespace) but it is not doing so since the schema does not define anything about the mapping. Yes, I'll be sure to do all that as soon as I have my new system. (Which didn't happen yesterday, without even a phone call to move it to today.) Have a look at the, alas, defunct Mac program Papyrus (it wasn't worth the effort for the creator to adapt it for OS X, so he just offers it as freeware to anyone with a "legacy system," but its discussion list was still active back when I had to abandon the Mac, two+ years ago). The setup of this tool is totally different, this is a tool for storing and searching bibliographic information, even entire libraries. As a side product, it also allows you to format the output a bit. Microsoft's tool is intended only for providing formatted output. They don't care about maintaining a library where you can find stuff by keywords or authors or ... But all this is besides the point, the original topic was about adding textual sources to your document in an automated way. I have seen some tools for converting BibTeX or EndNote files into Word 2007 sources. And you can always create a converter which translates your home-made format into Microsoft's format, but you can't expect Microsoft to support your format by default. They have a format, and you either stick to it, or you design something else (which is pretty easy using custom xml). The choice is up to you. I am not talking about "formats." I am talking about plain text, plain text that looks exactly the way published bibliographies have looked for about a century now. It doesn't seem too much to ask that "Text to Table" could come up with a tabular presentation, which some other module could then convert to the "format" used by the bibliographic database: if it knows that col. 1 is the author, col. 2 is the date, col. 3 is the title, col. 4 is the place, and col. 5 is the publisher (that's a basic Book entry), why can't it simply do that? |
#14
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On 17 aug, 15:14, grammatim wrote:
On Aug 17, 8:49 am, p0 wrote: I'm stripping parts from the original message as it has become too large to process decently. Quite! On a side note, the beauty of custom xml in ooxml is that you can define your own way of storing data. And you don't even have to stick to xml: you can store binary data in an xml file. So if you really are unhappy with the format, you can easily extend Word with your own set of bibliographic tools. I don't know what any of that means. Well, if you are concerned with size (little tags rather than big ones), you are in for a surprise, your Word document actually contains all bibliographic data twice (talking about overkill). What you see as a docx file is nothing more than a zip-file. So if you change the extension from docx to zip (make sure you have a backup), you can use the compressed folders utility from Windows or an external program such as WinRAR or WinZip to extract the contents of your document. In it, you will normally find a file item1.xml in the customXml directory. That is actually an xml notation of all the bibliographic data in your source. You will also find a document.xml file in the word directory. That file contains your entire text including your well-formattedbibliography(no longer in xml format). It is nice separation between the data and the view on the data. So what I meant was, if you aren't happy with the current internal data layout, you can very well define your own layout and then format the data in the document.xml according to your layout (stored in your version of item1.xml) and preferences. I've no idea what the current internal data layout may be, nor should I. As an end user, I expect the product to work as it should. It is true, you shouldn't know and you don't have to. All you have to do, is fill in the form which is presented when you want to enter a citation. As soon as you want more than that, like having Word to understand your way of formatting (be it tables, binary structures, static text, ...), then it is up to you to learn the underlying format and convert your datastructures to the underlying format. As an alternative, you can of course extend the underlying format (part 5 of the office open xml specification). What would "ed" be? editor? edition? ed vs. edn And then I would think about "editorial notes". Sorry, but "editorial notes" is not a category that appears in abibliography. It loos as though you are looking for details to complain about, rather than understanding the user's needs. It is in annotated bibliographies (something Word 2007 does not support by the way). Really, shortcutting data entries to save space is, in my personal opinion, about the worst thing you can do. Not sure what "data entries" are, but if you're referring to entering data, you're wrong. EndNote allows importing data based on shortcut codes. But once imported, the data is once again stored in 'understandable' xml as it should be done. And luckely for that, because nobody without a decent manual would be able to figure out that %I is actually the field representing the publisher. Why would anyone ever need to "figure out" such a thing? Well clearly you would, since you would add the code to your current static text to convert it into a bibliographic source. The entire point of using full discriptive names in tags rather than crafty shortcuts is to make things clear for the people who have to add them. But the people shouldn't ever need to see them! They should see a form to fill in, with each slot labeled with the category that goes in it. "Author" would have a drop-down list of all Names, since most subject bibliographies involve several works by the same person. (Likewise for "Place" and "Publisher.") Yes you will have to type more, but at least elements will be defined in such a way that there is no confusion for the user. And for non-english speaking people, full words are a lot easier to understand than shady abbreviations. Not at all *problem if you have an internationalizationized, or whatever they call it, interface. That's what the source form (insert newcitation) is for in Word 2007. Check your computer for a bibform.xml, if you are using an en-us version of word, it should be in word directory\1033\bibliography \bibform.xml. For other languages, you will have to replace 1033 with your local culture id (lcid). The file contains a mapping of localized strings (Label element) to xml tags (DataTag element). On a side note, the bibform.xml claims to follow thebibliographyxml schema (default namespace) but it is not doing so since the schema does not define anything about the mapping. Yes, I'll be sure to do all that as soon as I have my new system. (Which didn't happen yesterday, without even a phone call to move it to today.) Have a look at the, alas, defunct Mac program Papyrus (it wasn't worth the effort for the creator to adapt it for OS X, so he just offers it as freeware to anyone with a "legacy system," but its discussion list was still active back when I had to abandon the Mac, two+ years ago). The setup of this tool is totally different, this is a tool for storing and searching bibliographic information, even entire libraries. As a side product, it also allows you to format the output a bit. Microsoft's tool is intended only for providing formatted output. They don't care about maintaining a library where you can find stuff by keywords or authors or ... But all this is besides the point, the original topic was about adding textual sources to your document in an automated way. I have seen some tools for converting BibTeX or EndNote files into Word 2007 sources. And you can always create a converter which translates your home-made format into Microsoft's format, but you can't expect Microsoft to support your format by default. They have a format, and you either stick to it, or you design something else (which is pretty easy using custom xml). The choice is up to you. I am not talking about "formats." I am talking about plain text, plain text that looks exactly the way published bibliographies have looked for about a century now. And how do they look? Currently, my EndNote X1 style directory comes with close to 3000 styles (2932 actually, but I have not downloaded all available styles from their site). So this means, I currently have 3000 plain text versions of published bibliographies for a single source. Are you going to write a converter which figures out which one of those 3000 is used? Because you will have to before even starting to parse the static text within one entry into a source. Even within the same scientific journal, bibliographies tend to be formatted differently. It doesn't seem too much to ask that "Text to Table" could come up with a tabular presentation, which some other module could then convert to the "format" used by the bibliographic database: if it knows that col. 1 is the author, col. 2 is the date, col. 3 is the title, col. 4 is the place, and col. 5 is the publisher (that's a basic Book entry), why can't it simply do that? Now you are no longer talking static text, you are talking (poorly) formatted text. And you would have to have a tool to map columns to fields, since in my case, year should be the last entry (except maybe for pages) in my bibliography and most certainly not the second. And in your case, how is your book displayed if it is an anonymous work? I would guess col. 1 is the title, col. 2 is the date, col. 3 is the place, and col.4 is the publisher. So even between 2 entries of the same type, the ordering of data would be different. Maybe you don't have anonymous works, but it doesn't matter. What you require is so specifc that you will probably be the only one using the 'import filter' anyway. The point is, Microsoft provides a set of generic tools which works for 80% of their customers. There is no point for them in developing a tool which will work in a very specifc case (yours) and therefore will target 1% or less of their customer base. If you want one, you will have to write it yourself. They provide a specifcation of the bibliography format and even provide a programming interface (I have no experience with it). They try to help you a long way, but the last few steps you will have to take yourself. And before you start thinking that I am a Microsoft evangelist, I am most defintely not. I can point out at least half a dozen flaws with the current bibliographic tools. Going from simple bugs to major design issues. But those are not the point of this thread :-) |
#15
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On Aug 17, 10:01 am, p0 wrote:
On 17 aug, 15:14, grammatim wrote: On Aug 17, 8:49 am, p0 wrote: I'm stripping parts from the original message as it has become too large to process decently. Quite! On a side note, the beauty of custom xml in ooxml is that you can define your own way of storing data. And you don't even have to stick to xml: you can store binary data in an xml file. So if you really are unhappy with the format, you can easily extend Word with your own set of bibliographic tools. I don't know what any of that means. Well, if you are concerned with size (little tags rather than big ones), you are in for a surprise, your Word document actually contains all bibliographic data twice (talking about overkill). What you see as a docx file is nothing more than a zip-file. So if you change the extension from docx to zip (make sure you have a backup), you can use the compressed folders utility from Windows or an external program such as WinRAR or WinZip to extract the contents of your document. In it, you will normally find a file item1.xml in the customXml directory. That is actually an xml notation of all the bibliographic data in your source. You will also find a document.xml file in the word directory. That file contains your entire text including your well-formattedbibliography(no longer in xml format). It is nice separation between the data and the view on the data. So what I meant was, if you aren't happy with the current internal data layout, you can very well define your own layout and then format the data in the document.xml according to your layout (stored in your version of item1.xml) and preferences. I've no idea what the current internal data layout may be, nor should I. As an end user, I expect the product to work as it should. It is true, you shouldn't know and you don't have to. All you have to do, is fill in the form which is presented when you want to enter a citation. As soon as you want more than that, like having Word to understand your way of formatting (be it tables, binary structures, static text, ...), then it is up to you to learn the underlying format and convert your datastructures to the underlying format. As an alternative, you can of course extend the underlying format (part 5 of the office open xml specification). What would "ed" be? editor? edition? ed vs. edn And then I would think about "editorial notes". Sorry, but "editorial notes" is not a category that appears in abibliography. It loos as though you are looking for details to complain about, rather than understanding the user's needs. It is in annotated bibliographies (something Word 2007 does not support by the way). Really, shortcutting data entries to save space is, in my personal opinion, about the worst thing you can do. Not sure what "data entries" are, but if you're referring to entering data, you're wrong. EndNote allows importing data based on shortcut codes. But once imported, the data is once again stored in 'understandable' xml as it should be done. And luckely for that, because nobody without a decent manual would be able to figure out that %I is actually the field representing the publisher. Why would anyone ever need to "figure out" such a thing? Well clearly you would, since you would add the code to your current static text to convert it into a bibliographic source. But why would you need to "figure it out"? (Oh, that's right, apps don't come with instructions any more -- developers think they're usable out of the box with no preparation.) The entire point of using full discriptive names in tags rather than crafty shortcuts is to make things clear for the people who have to add them. But the people shouldn't ever need to see them! They should see a form to fill in, with each slot labeled with the category that goes in it. "Author" would have a drop-down list of all Names, since most subject bibliographies involve several works by the same person. (Likewise for "Place" and "Publisher.") Yes you will have to type more, but at least elements will be defined in such a way that there is no confusion for the user. And for non-english speaking people, full words are a lot easier to understand than shady abbreviations. Not at all problem if you have an internationalizationized, or whatever they call it, interface. That's what the source form (insert newcitation) is for in Word 2007. Check your computer for a bibform.xml, if you are using an en-us version of word, it should be in word directory\1033\bibliography \bibform.xml. For other languages, you will have to replace 1033 with your local culture id (lcid). The file contains a mapping of localized strings (Label element) to xml tags (DataTag element). On a side note, the bibform.xml claims to follow thebibliographyxml schema (default namespace) but it is not doing so since the schema does not define anything about the mapping. Yes, I'll be sure to do all that as soon as I have my new system. (Which didn't happen yesterday, without even a phone call to move it to today.) Now it's going to be tomorrow afternoon ... Have a look at the, alas, defunct Mac program Papyrus (it wasn't worth the effort for the creator to adapt it for OS X, so he just offers it as freeware to anyone with a "legacy system," but its discussion list was still active back when I had to abandon the Mac, two+ years ago). The setup of this tool is totally different, this is a tool for storing and searching bibliographic information, even entire libraries. As a side product, it also allows you to format the output a bit. Microsoft's tool is intended only for providing formatted output. They don't care about maintaining a library where you can find stuff by keywords or authors or ... But all this is besides the point, the original topic was about adding textual sources to your document in an automated way. I have seen some tools for converting BibTeX or EndNote files into Word 2007 sources. And you can always create a converter which translates your home-made format into Microsoft's format, but you can't expect Microsoft to support your format by default. They have a format, and you either stick to it, or you design something else (which is pretty easy using custom xml). The choice is up to you. I am not talking about "formats." I am talking about plain text, plain text that looks exactly the way published bibliographies have looked for about a century now. And how do they look? They look like what the Chicago Manual of Style says they should look like, or a reasonable approximation thereto. Currently, my EndNote X1 style directory comes with close to 3000 styles (2932 actually, but I have not downloaded all available styles from their site). So this means, I currently have 3000 plain text versions of published bibliographies for a single source. Are you going to write a converter which figures out which one of those 3000 is used? Because you will have to before even starting to parse the static text within one entry into a source. Sounds like Microsoft-type overkill. Is that why the price is so prohibitively high? It ought to come with a dozen or so standard output styles, and the ability to combine the building blocks plus punctuation into any additional styles one might encounter. Even within the same scientific journal, bibliographies tend to be formatted differently. Not if the copyeditors are doing their job. (I was a Manuscript Editor at Astrophysical Journal for two years.) It doesn't seem too much to ask that "Text to Table" could come up with a tabular presentation, which some other module could then convert to the "format" used by the bibliographic database: if it knows that col. 1 is the author, col. 2 is the date, col. 3 is the title, col. 4 is the place, and col. 5 is the publisher (that's a basic Book entry), why can't it simply do that? Now you are no longer talking static text, you are talking (poorly) "Poorly"? CMS has been around since 1906 and is by far the leading style guide in the US. formatted text. And you would have to have a tool to map columns to fields, since in my case, year should be the last entry (except maybe for pages) in my bibliography and most certainly not the second. So you don't do author-date references in the text? Fine. That would be Chicago's "Humanities" style. And in your case, how is your book displayed if it is an anonymous work? I would guess col. 1 is the title, col. 2 is the date, col. 3 is the place, and col.4 is the publisher. So even between 2 entries of the same type, the ordering of data would be different. No, col. 1 would be empty. (Though there are circumstances in which the author is entered as "Anonymous"; see CMS.) Maybe you don't have anonymous works, but it doesn't matter. What you require is so specifc that you will probably be the only one using the 'import filter' anyway. The point is, Microsoft provides a set of generic tools which works for 80% of their customers. There is no You are again losing sight of the point. They provide _no_ tool for going from an existing bibliography to the bibliography database. point for them in developing a tool which will work in a very specifc case (yours) and therefore will target 1% or less of their customer base. If you want one, you will have to write it yourself. They provide a specifcation of the bibliography format and even provide a programming interface (I have no experience with it). They try to help you a long way, but the last few steps you will have to take yourself. On the occasions when programming new reference styles has been mentioned here, the MVPs have stated it appears to be impossibly complicated to do so. I will soon find out how greatly it respects CMS style, especially for complicated entries. And before you start thinking that I am a Microsoft evangelist, I am most defintely not. I can point out at least half a dozen flaws with the current bibliographic tools. Going from simple bugs to major design issues. But those are not the point of this thread :-) |
#16
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
Why would anyone ever need to "figure out" such a thing? Well clearly you would, since you would add the code to your current static text to convert it into a bibliographic source. But why would you need to "figure it out"? (Oh, that's right, apps don't come with instructions any more -- developers think they're usable out of the box with no preparation.) Your Papyrus example comes with over 500 pages of manual. Good luck in convincing any user of reading even 25 pages before he can start using a program, let alone 500. People just aren't patient enough anymore for looking up things in a help file (see also my comment below on creating new formatting styles). That aside, the bibliographic tools of Word 2007 lack almost all documentation; the promised SDK is almost a year overdue now (I doubt it will ever be released); and non of the people originally working on the academic features seem to be still doing that job nowadays. Have a look at the, alas, defunct Mac program Papyrus (it wasn't worth the effort for the creator to adapt it for OS X, so he just offers it as freeware to anyone with a "legacy system," but its discussion list was still active back when I had to abandon the Mac, two+ years ago). The setup of this tool is totally different, this is a tool for storing and searching bibliographic information, even entire libraries. As a side product, it also allows you to format the output a bit. Microsoft's tool is intended only for providing formatted output. They don't care about maintaining a library where you can find stuff by keywords or authors or ... But all this is besides the point, the original topic was about adding textual sources to your document in an automated way. I have seen some tools for converting BibTeX or EndNote files into Word 2007 sources. And you can always create a converter which translates your home-made format into Microsoft's format, but you can't expect Microsoft to support your format by default. They have a format, and you either stick to it, or you design something else (which is pretty easy using custom xml). The choice is up to you. I am not talking about "formats." I am talking about plain text, plain text that looks exactly the way published bibliographies have looked for about a century now. And how do they look? They look like what the Chicago Manual of Style says they should look like, or a reasonable approximation thereto. I would hate to be the programmer which gets "a reasonable approximation" of the specified input and has to write a program which takes that approximation and translates it into the desired output. Currently, my EndNote X1 style directory comes with close to 3000 styles (2932 actually, but I have not downloaded all available styles from their site). So this means, I currently have 3000 plain text versions of published bibliographies for a single source. Are you going to write a converter which figures out which one of those 3000 is used? Because you will have to before even starting to parse the static text within one entry into a source. Sounds like Microsoft-type overkill. Is that why the price is so prohibitively high? No idea about the price. But you can not blame EndNote for journals and magazines not sticking to one worldwide standardized way to display bibliographic data. It ought to come with a dozen or so standard output styles, and the ability to combine the building blocks plus punctuation into any additional styles one might encounter. Even within the same scientific journal, bibliographies tend to be formatted differently. Not if the copyeditors are doing their job. (I was a Manuscript Editor at Astrophysical Journal for two years.) It doesn't seem too much to ask that "Text to Table" could come up with a tabular presentation, which some other module could then convert to the "format" used by the bibliographic database: if it knows that col. 1 is the author, col. 2 is the date, col. 3 is the title, col. 4 is the place, and col. 5 is the publisher (that's a basic Book entry), why can't it simply do that? Now you are no longer talking static text, you are talking (poorly) "Poorly"? CMS has been around since 1906 and is by far the leading style guide in the US. If I would ask the medical doctors what the leading US style guide would be, they would say AMA. If I would ask the psychologists what the leading US style would be, they would say APA. If I would ask the legal people what the leading US style would be, they would say Bluebook. If I would ask the average school kid writing a little science paper what the leading US style would be, they would say Turabian. And if you would ask the rest of the world what the most commonly used style would be, they would probably say Harvard (which is also the oldest one if I'm not mistaken). I am not trying to say that CMS isn't important or widely used, it is just that everybody feels that his or her style is the most important and commonly used one while it isn't. On a side note, of the above list, only APA and Turabian are supported in Word 2007. formatted text. And you would have to have a tool to map columns to fields, since in my case, year should be the last entry (except maybe for pages) in mybibliographyand most certainly not the second. So you don't do author-date references in the text? Fine. That would be Chicago's "Humanities" style. The style I use is not supported by Word 2007 at all. I did write the transformation stylesheet for it from scratch. And in your case, how is your book displayed if it is an anonymous work? I would guess col. 1 is the title, col. 2 is the date, col. 3 is the place, and col.4 is the publisher. So even between 2 entries of the same type, the ordering of data would be different. No, col. 1 would be empty. (Though there are circumstances in which the author is entered as "Anonymous"; see CMS.) And how do you expect your static text parser to guess that column one is empty? Once you start adding delimeters, you can just as well use the delimeters Microsoft defined. Those delimeters being xml tags.They might not be what you prefer as delimeters, but they are delimeters. Maybe you don't have anonymous works, but it doesn't matter. What you require is so specifc that you will probably be the only one using the 'import filter' anyway. The point is, Microsoft provides a set of generic tools which works for 80% of their customers. There is no You are again losing sight of the point. They provide _no_ tool for going from an existingbibliographyto thebibliographydatabase. The tools are there, they are just not obvious in use for the average Word user. The format of a b:Source element is entirely defined by an xml schema. All you have to do is write a (simple) XSLT which transforms your format into the format described by that schema. Of course, if your format happens to be an incomprehensible static text, your XSLT will be very complicated. But you can not blame Microsoft for that. point for them in developing a tool which will work in a very specifc case (yours) and therefore will target 1% or less of their customer base. If you want one, you will have to write it yourself. They provide a specifcation of thebibliographyformat and even provide a programming interface (I have no experience with it). They try to help you a long way, but the last few steps you will have to take yourself. On the occasions when programming new reference styles has been mentioned here, the MVPs have stated it appears to be impossibly complicated to do so. It is not. It is pretty basic XSLT, nothing fancy at it. Like you said above, all people have to do is read the available help: * you have the open xml specification; * you have blog articles by Microsoft people; * you have MSDN articles describing the format (not extensively though); * you have 10 predefined styles, each consisting out of a couple of 1000 lines of XSLT code (that is over 10000 lines of example code) So there is plenty of information around. Maybe it is not perfectly organized, but it is there if you want to learn how to use it. But the MVPs are correct, as long as there is no point and click solution, it is too complicated for the average Word user. And no matter how much help files you are going to add, it will remain too complicated. I will soon find out how greatly it respects CMS style, especially for complicated entries. Well I never use the style, but since Word only defines one version, and the Chicago style is different for different research fields, I would not get my hopes up if I were you. |
#17
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On Aug 17, 6:47 pm, p0 wrote:
Why would anyone ever need to "figure out" such a thing? Well clearly you would, since you would add the code to your current static text to convert it into a bibliographic source. But why would you need to "figure it out"? (Oh, that's right, apps don't come with instructions any more -- developers think they're usable out of the box with no preparation.) Your Papyrus example comes with over 500 pages of manual. Good luck in I did not know that! I guess I found it highly intuitive. convincing any user of reading even 25 pages before he can start using a program, let alone 500. People just aren't patient enough anymore for looking up things in a help file (see also my comment below on creating new formatting styles). No, tht's not it at all. You cannot use a "Help" file unless you happen to know the exact name that the writers of the "Help" have assigned to a feature/bug. Look how many times a day it is asked here how to get rid of the dots between words, or why there's suddenly no vertical space between the pages. That aside, the bibliographic tools of Word 2007 lack almost all documentation; the promised SDK is almost a year overdue now (I doubt it will ever be released); and non of the people originally working on the academic features seem to be still doing that job nowadays. What's an SDK? You use their jargon, you must be one of them! Have a look at the, alas, defunct Mac program Papyrus (it wasn't worth the effort for the creator to adapt it for OS X, so he just offers it as freeware to anyone with a "legacy system," but its discussion list was still active back when I had to abandon the Mac, two+ years ago). The setup of this tool is totally different, this is a tool for storing and searching bibliographic information, even entire libraries. As a side product, it also allows you to format the output a bit. Microsoft's tool is intended only for providing formatted output. They don't care about maintaining a library where you can find stuff by keywords or authors or ... But all this is besides the point, the original topic was about adding textual sources to your document in an automated way. I have seen some tools for converting BibTeX or EndNote files into Word 2007 sources. And you can always create a converter which translates your home-made format into Microsoft's format, but you can't expect Microsoft to support your format by default. They have a format, and you either stick to it, or you design something else (which is pretty easy using custom xml). The choice is up to you. I am not talking about "formats." I am talking about plain text, plain text that looks exactly the way published bibliographies have looked for about a century now. And how do they look? They look like what the Chicago Manual of Style says they should look like, or a reasonable approximation thereto. I would hate to be the programmer which gets "a reasonable approximation" of the specified input and has to write a program which takes that approximation and translates it into the desired output. I gather from the comments here that the "Chicago" setting doesn't exactly mimic, or duplicate, the specifications of the CMS. Currently, my EndNote X1 style directory comes with close to 3000 styles (2932 actually, but I have not downloaded all available styles from their site). So this means, I currently have 3000 plain text versions of published bibliographies for a single source. Are you going to write a converter which figures out which one of those 3000 is used? Because you will have to before even starting to parse the static text within one entry into a source. Sounds like Microsoft-type overkill. Is that why the price is so prohibitively high? No idea about the price. But you can not blame EndNote for journals and magazines not sticking to one worldwide standardized way to display bibliographic data. There was no need for "one worldwide standardized way" before there were online bibliographies. Electronic catalogs were developed in the 1970s, long after each discipline and each major publisher had settled down with its preferred styles. The LC format for library cards was universally used in the US, but it contains and omits various categories of information that are not coextensive with those used in bibliographies. It ought to come with a dozen or so standard output styles, and the ability to combine the building blocks plus punctuation into any additional styles one might encounter. Even within the same scientific journal, bibliographies tend to be formatted differently. Not if the copyeditors are doing their job. (I was a Manuscript Editor at Astrophysical Journal for two years.) It doesn't seem too much to ask that "Text to Table" could come up with a tabular presentation, which some other module could then convert to the "format" used by the bibliographic database: if it knows that col. 1 is the author, col. 2 is the date, col. 3 is the title, col. 4 is the place, and col. 5 is the publisher (that's a basic Book entry), why can't it simply do that? Now you are no longer talking static text, you are talking (poorly) "Poorly"? CMS has been around since 1906 and is by far the leading style guide in the US. If I would ask the medical doctors what the leading US style guide would be, they would say AMA. If I would ask the psychologists what the leading US style would be, they would say APA. If I would ask the legal people what the leading US style would be, they would say Bluebook. If I would ask the average school kid writing a little science paper what the leading US style would be, they would say Turabian. Gotcha. Turabian is based on Chicago (which she was the editor of for 50 years or so). The other three probably do not predate 1906. And if you would ask the rest of the world what the most commonly used style would be, they would probably say Harvard (which is also the oldest one if I'm not mistaken). I'm not aware that a style called "Harvard" is used in the US. In what publication is it codified? I am not trying to say that CMS isn't important or widely used, it is just that everybody feels that his or her style is the most important and commonly used one while it isn't. On a side note, of the above list, only APA and Turabian are supported in Word 2007. I've noticed that. Kinda leaves the humanists, who tend to use MLA, up the creek. formatted text. And you would have to have a tool to map columns to fields, since in my case, year should be the last entry (except maybe for pages) in mybibliographyand most certainly not the second. So you don't do author-date references in the text? Fine. That would be Chicago's "Humanities" style. The style I use is not supported by Word 2007 at all. I did write the transformation stylesheet for it from scratch. And in your case, how is your book displayed if it is an anonymous work? I would guess col. 1 is the title, col. 2 is the date, col. 3 is the place, and col.4 is the publisher. So even between 2 entries of the same type, the ordering of data would be different. No, col. 1 would be empty. (Though there are circumstances in which the author is entered as "Anonymous"; see CMS.) And how do you expect your static text parser to guess that column one is empty? Once you start adding delimeters, you can just as well use the delimeters Microsoft defined. Those delimeters being xml tags.They might not be what you prefer as delimeters, but they are delimeters. I don't know what a "static text parser" is. Did you again forget that I've put tabs between the fields, in order to do Text to Table? (The punctuation between each pair of fields differs through each paragraph, so it can't go by comma or period or colon.) Maybe you don't have anonymous works, but it doesn't matter. What you require is so specifc that you will probably be the only one using the 'import filter' anyway. The point is, Microsoft provides a set of generic tools which works for 80% of their customers. There is no You are again losing sight of the point. They provide _no_ tool for going from an existingbibliographyto thebibliographydatabase. The tools are there, they are just not obvious in use for the average Word user. You just recently told me that it's _not_ possible. The format of a b:Source element is entirely defined by an xml schema. All you have to do is write a (simple) XSLT which transforms your format into the format described by that schema. Of course, if your format happens to be an incomprehensible static text, your XSLT will be very complicated. But you can not blame Microsoft for that. I have no idea what a "b:Source element," an "xml schema," an "XSLT schema," however simple or complex, or an "incomprehensible static text" may be. point for them in developing a tool which will work in a very specifc case (yours) and therefore will target 1% or less of their customer base. If you want one, you will have to write it yourself. They provide a specifcation of thebibliographyformat and even provide a programming interface (I have no experience with it). They try to help you a long way, but the last few steps you will have to take yourself. On the occasions when programming new reference styles has been mentioned here, the MVPs have stated it appears to be impossibly complicated to do so. It is not. It is pretty basic XSLT, nothing fancy at it. Like you said above, all people have to do is read the available help: * you have the open xml specification; I do? * you have blog articles by Microsoft people; I do? * you have MSDN articles describing the format (not extensively though); I do? * you have 10 predefined styles, each consisting out of a couple of 1000 lines of XSLT code (that is over 10000 lines of example code) That's an awful lot of code. So there is plenty of information around. Maybe it is not perfectly organized, but it is there if you want to learn how to use it. But the MVPs are correct, as long as there is no point and click solution, it is too complicated for the average Word user. And no matter how much help files you are going to add, it will remain too complicated. Yet somehow I didn't find Papyrus the least bit complicated -- though apparently it's too much for you?? I will soon find out how greatly it respects CMS style, especially for complicated entries. Well I never use the style, but since Word only defines one version, and the Chicago style is different for different research fields, I would not get my hopes up if I were you. Since the 14th ed., CMS has had two different and parallel schemata, the old humanities style, and the author-date style favored in the social sciences. The U of C Press does little or nothing in hard sciences that might need other provisions. (And the 15th has grown intolerably permissive, perhaps as those who knew Mrs. Turabian -- unfortunately I never met her -- themselves retire.) |
#18
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
convincing any user of reading even 25 pages before he can start using a program, let alone 500. People just aren't patient enough anymore for looking up things in a help file (see also my comment below on creating new formatting styles). No, tht's not it at all. You cannot use a "Help" file unless you happen to know the exact name that the writers of the "Help" have assigned to a feature/bug. Look how many times a day it is asked here how to get rid of the dots between words, or why there's suddenly no vertical space between the pages. Help files are perfectly searchable nowadays. You write about the 'dots between words' question. It is indeed a frequently asked question. So I just started Word pressed F1 and entered the following in the help box "dot words" (without the quotes). And guess what, the 8th entry in the list of results is titled: "I see dots and arrows in my document". I click the link, and yes, it tells me exactly how to get rid of those dots. It is there in the help, seconds away for people to find it. Yet they still go to newsgroups to get an answer to their question which they could easily find themselves. And they do get the answer in here. What is more, the question and the answer in the newsgroup are both indexed by Google. A 2 second Google search would prevent the next person from asking the exact same question. Note that this is not a complaint about the (quality of) posts in a newsgroup. I merely wish to point out that people don't read help pages or look for answers already out there. And remarks like 'but it is easier to just ask the question' don't count. Newsgroups are slow. They sometimes have to wait 24 hours for an answer they could have found in 2 seconds. And actually, your "I do?" remarks are plain examples of you not willing to look for help on the subject. I don't blame you for not wanting to look things up, but at least don't say the answers aren't available. That aside, the bibliographic tools of Word 2007 lack almost all documentation; the promised SDK is almost a year overdue now (I doubt it will ever be released); and non of the people originally working on the academic features seem to be still doing that job nowadays. What's an SDK? You use their jargon, you must be one of them! An SDK is a Software Development Kit, and it is a term widely used in the software business. It is most certainly not coined by Microsoft. It doesn't seem too much to ask that "Text to Table" could come up with a tabular presentation, which some other module could then convert to the "format" used by the bibliographic database: if it knows that col. 1 is the author, col. 2 is the date, col. 3 is the title, col. 4 is the place, and col. 5 is the publisher (that's a basic Book entry), why can't it simply do that? Now you are no longer talking static text, you are talking (poorly) "Poorly"? CMS has been around since 1906 and is by far the leading style guide in the US. If I would ask the medical doctors what the leading US style guide would be, they would say AMA. If I would ask the psychologists what the leading US style would be, they would say APA. If I would ask the legal people what the leading US style would be, they would say Bluebook. If I would ask the average school kid writing a little science paper what the leading US style would be, they would say Turabian. Gotcha. Turabian is based on Chicago (which she was the editor of for 50 years or so). Gotcha? Does it matter which one is derived from which one? They are different. Hence, any static text parser would have to work differently on both styles. The other three probably do not predate 1906. And if you would ask the rest of the world what the most commonly used style would be, they would probably say Harvard (which is also the oldest one if I'm not mistaken). I'm not aware that a style called "Harvard" is used in the US. In what publication is it codified? I personally don't use it. But it is used in almost all science fields in Western Europe and the British Commonwealth (so that's including Australia and New Zealand). So I think it is probably the most widely used system. I am not trying to say that CMS isn't important or widely used, it is just that everybody feels that his or her style is the most important and commonly used one while it isn't. On a side note, of the above list, only APA and Turabian are supported in Word 2007. I've noticed that. Kinda leaves the humanists, who tend to use MLA, up the creek. Not really, Word 2007 supports MLA out of the box. formatted text. And you would have to have a tool to map columns to fields, since in my case, year should be the last entry (except maybe for pages) in mybibliographyand most certainly not the second. So you don't do author-date references in the text? Fine. That would be Chicago's "Humanities" style. The style I use is not supported by Word 2007 at all. I did write the transformation stylesheet for it from scratch. And in your case, how is your book displayed if it is an anonymous work? I would guess col. 1 is the title, col. 2 is the date, col. 3 is the place, and col.4 is the publisher. So even between 2 entries of the same type, the ordering of data would be different. No, col. 1 would be empty. (Though there are circumstances in which the author is entered as "Anonymous"; see CMS.) And how do you expect your static text parser to guess that column one is empty? Once you start adding delimeters, you can just as well use the delimeters Microsoft defined. Those delimeters being xml tags.They might not be what you prefer as delimeters, but they are delimeters. I don't know what a "static text parser" is. Did you again forget that I've put tabs between the fields, in order to do Text to Table? (The punctuation between each pair of fields differs through each paragraph, so it can't go by comma or period or colon.) Static text is text without any kind of markup or delimeters indicating clearly were fields start and/or end. It is what you would call text without any codes. Once again, you want to use your tabs, Microsoft wants you to use their tabs, their tabs being the xml tags I showed earlier. It is not because their tabs are longer than yours (and a lot more descriptive), that they are worse. It is all a matter of taste. Maybe you don't have anonymous works, but it doesn't matter. What you require is so specifc that you will probably be the only one using the 'import filter' anyway. The point is, Microsoft provides a set of generic tools which works for 80% of their customers. There is no You are again losing sight of the point. They provide _no_ tool for going from an existingbibliographyto thebibliographydatabase. The tools are there, they are just not obvious in use for the average Word user. You just recently told me that it's _not_ possible. It is possible, I showed you which tags to use in the simple example I gave in one of the previous posts. However, to automate the entire process, the input format has to be perfectly known. That is, no single exception can be left aside (anonymous works, corporate authors, ...). Once you have fully defined your format, all you have to do is provide a mapping between your fields and the fields defined by Microsoft. So is it possible to do it in an automated way? Yes. Is it doable? No. There are so many versions of every style format that your 'translator' would be either just working in your specific case, or be a huge monster which takes years to make and would even then not cover some exceptions. Microsoft decided not to create the monster (and I can't blame them). Instead, they decided to give you the tools to create your translator for your specific case. But for someone without any programming skills, those tools are too hard to use. The format of a b:Source element is entirely defined by an xml schema. All you have to do is write a (simple) XSLT which transforms your format into the format described by that schema. Of course, if your format happens to be an incomprehensible static text, your XSLT will be very complicated. But you can not blame Microsoft for that. I have no idea what a "b:Source element," an "xml schema," an "XSLT schema," however simple or complex, or an "incomprehensible static text" may be. And that is the main problem. I do not blame you for not knowing them. But they are available to you. If you want to learn how to use them, you can. It is all about reading the documentation on those technologies. I never used XSLT before I started using Word 2007. It took me a couple of hours to figure out how it worked and I could start creating my own stuff. I agree that coming from a computer science background gave me an advantage, but still ... I had to start from zero. point for them in developing a tool which will work in a very specifc case (yours) and therefore will target 1% or less of their customer base. If you want one, you will have to write it yourself. They provide a specifcation of thebibliographyformat and even provide a programming interface (I have no experience with it). They try to help you a long way, but the last few steps you will have to take yourself. On the occasions when programming new reference styles has been mentioned here, the MVPs have stated it appears to be impossibly complicated to do so. It is not. It is pretty basic XSLT, nothing fancy at it. Like you said above, all people have to do is read the available help: * you have the open xml specification; I do? Yes. It is an ECMA standard (and now even an ISO standard). The specification is open and freely available. ECMA: http://www.ecma-international.org/pu...s/Ecma-376.htm Microsoft: http://msdn.microsoft.com/en-us/office/aa905545.aspx ISO: they are still finalizing the text * you have blog articles by Microsoft people; I do? Yes. Probably the best example out there to get you started on creating your own bibliographic style is http://blogs.msdn.com/microsoft_offi...ions-1011.aspx but there are others. * you have MSDN articles describing the format (not extensively though); I do? Yes. For example http://msdn.microsoft.com/en-us/library/bb258052.aspx * you have 10 predefined styles, each consisting out of a couple of 1000 lines of XSLT code (that is over 10000 lines of example code) That's an awful lot of code. Yes, that is an awful lot of EXAMPLES. Since when is having too much examples a bad thing? Besides, if you want an example of it being broken down to its bare mimimum, you can check the blog article above. So there is plenty of information around. Maybe it is not perfectly organized, but it is there if you want to learn how to use it. But the MVPs are correct, as long as there is no point and click solution, it is too complicated for the average Word user. And no matter how much help files you are going to add, it will remain too complicated. Yet somehow I didn't find Papyrus the least bit complicated -- though apparently it's too much for you?? I haven't tried it, I just pointed out that it comes with a lot of documentation. Word isn't complicated to use either if you stick to the basic tasks. It still comes with a huge documentation though. I will soon find out how greatly it respects CMS style, especially for complicated entries. Well I never use the style, but since Word only defines one version, and the Chicago style is different for different research fields, I would not get my hopes up if I were you. Since the 14th ed., CMS has had two different and parallel schemata, the old humanities style, and the author-date style favored in the social sciences. The U of C Press does little or nothing in hard sciences that might need other provisions. (And the 15th has grown intolerably permissive, perhaps as those who knew Mrs. Turabian -- unfortunately I never met her -- themselves retire.) This is the last post I make to this thread, because I feel we are starting to just argue for the sake of argueing. To come back to your original question: "text to bibliography?" Yes it is possible to automate that process but highly complex and therefore 99% of the people out there will not be able to do it and the practical answer is: No. If you have specific questions about changing existing styles or need help on creating your own style, just post a message to the newsgroup and if I am around, I will try to help you. Yves |
#19
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On Aug 18, 5:23 am, p0 wrote:
convincing any user of reading even 25 pages before he can start using a program, let alone 500. People just aren't patient enough anymore for looking up things in a help file (see also my comment below on creating new formatting styles). No, tht's not it at all. You cannot use a "Help" file unless you happen to know the exact name that the writers of the "Help" have assigned to a feature/bug. Look how many times a day it is asked here how to get rid of the dots between words, or why there's suddenly no vertical space between the pages. Help files are perfectly searchable nowadays. You write about the 'dots between words' question. It is indeed a frequently asked question. So I just started Word pressed F1 and entered the following in the help box "dot words" (without the quotes). And guess what, the 8th entry in the list of results is titled: "I see dots and arrows in my document". I click the link, and yes, it tells me exactly how to get rid of those dots. It is there in the help, seconds away for people to find it. Yet they still go to newsgroups to get an answer to their question which they could easily find themselves. And they do get the answer in here. What is more, the question and the answer in the newsgroup are both indexed by Google. A 2 second Google search would prevent the next person from asking the exact same question. Note that this is not a complaint about the (quality of) posts in a newsgroup. I merely wish to point out that people don't read help pages or look for answers already out there. And remarks like 'but it is easier to just ask the question' don't count. Newsgroups are slow. They sometimes have to wait 24 hours for an answer they could have found in 2 seconds. Irt sure looks like a complaint about the quality of posts. When I try to use Help it's generally for something a tad more complicated, like why I can't type Tibetan even though I have a Tibetan font and a Tibetan keyboard installed. Another frequent question is about those brackety things that appear instead of a ToC or an entire Index. If you don't know the name "field code," how do you find it in "Help"? And actually, your "I do?" remarks are plain examples of you not willing to look for help on the subject. I don't blame you for not wanting to look things up, but at least don't say the answers aren't available. I noted that I did not know that there is a 500-page manual for Papyrus, because I never needed a manual. In a "Help" system, the answers are only available if you happen to hit on the exact name used for the problem. That aside, the bibliographic tools of Word 2007 lack almost all documentation; the promised SDK is almost a year overdue now (I doubt it will ever be released); and non of the people originally working on the academic features seem to be still doing that job nowadays. What's an SDK? You use their jargon, you must be one of them! An SDK is a Software Development Kit, and it is a term widely used in the software business. It is most certainly not coined by Microsoft. You're one of them -- the people who talk about SDKs. It doesn't seem too much to ask that "Text to Table" could come up with a tabular presentation, which some other module could then convert to the "format" used by the bibliographic database: if it knows that col. 1 is the author, col. 2 is the date, col. 3 is the title, col. 4 is the place, and col. 5 is the publisher (that's a basic Book entry), why can't it simply do that? Now you are no longer talking static text, you are talking (poorly) "Poorly"? CMS has been around since 1906 and is by far the leading style guide in the US. If I would ask the medical doctors what the leading US style guide would be, they would say AMA. If I would ask the psychologists what the leading US style would be, they would say APA. If I would ask the legal people what the leading US style would be, they would say Bluebook. If I would ask the average school kid writing a little science paper what the leading US style would be, they would say Turabian. Gotcha. Turabian is based on Chicago (which she was the editor of for 50 years or so). Gotcha? Does it matter which one is derived from which one? They are different. Hence, any static text parser would have to work differently on both styles. The other three probably do not predate 1906. And if you would ask the rest of the world what the most commonly used style would be, they would probably say Harvard (which is also the oldest one if I'm not mistaken). I'm not aware that a style called "Harvard" is used in the US. In what publication is it codified? I personally don't use it. But it is used in almost all science fields in Western Europe and the British Commonwealth (so that's including Australia and New Zealand). So I think it is probably the most widely used system. Use "Help" to find out why it's called "Harvard" in the non-Harvard- country-English-speaking world. I am not trying to say that CMS isn't important or widely used, it is just that everybody feels that his or her style is the most important and commonly used one while it isn't. On a side note, of the above list, only APA and Turabian are supported in Word 2007. I've noticed that. Kinda leaves the humanists, who tend to use MLA, up the creek. Not really, Word 2007 supports MLA out of the box. formatted text. And you would have to have a tool to map columns to fields, since in my case, year should be the last entry (except maybe for pages) in mybibliographyand most certainly not the second. So you don't do author-date references in the text? Fine. That would be Chicago's "Humanities" style. The style I use is not supported by Word 2007 at all. I did write the transformation stylesheet for it from scratch. And in your case, how is your book displayed if it is an anonymous work? I would guess col. 1 is the title, col. 2 is the date, col. 3 is the place, and col.4 is the publisher. So even between 2 entries of the same type, the ordering of data would be different. No, col. 1 would be empty. (Though there are circumstances in which the author is entered as "Anonymous"; see CMS.) And how do you expect your static text parser to guess that column one is empty? Once you start adding delimeters, you can just as well use the delimeters Microsoft defined. Those delimeters being xml tags.They might not be what you prefer as delimeters, but they are delimeters. I don't know what a "static text parser" is. Did you again forget that I've put tabs between the fields, in order to do Text to Table? (The punctuation between each pair of fields differs through each paragraph, so it can't go by comma or period or colon.) Static text is text without any kind of markup or delimeters indicating clearly were fields start and/or end. It is what you would call text without any codes. Once again, you want to use your tabs, Microsoft wants you to use their tabs, their tabs being the xml tags I showed earlier. It is not because their tabs are longer than yours (and a lot more descriptive), that they are worse. It is all a matter of taste. Maybe you don't have anonymous works, but it doesn't matter. What you require is so specifc that you will probably be the only one using the 'import filter' anyway. The point is, Microsoft provides a set of generic tools which works for 80% of their customers. There is no You are again losing sight of the point. They provide _no_ tool for going from an existingbibliographyto thebibliographydatabase. The tools are there, they are just not obvious in use for the average Word user. You just recently told me that it's _not_ possible. It is possible, I showed you which tags to use in the simple example I gave in one of the previous posts. I said, a tool for going from text to database. As you pointred out, I can do it by myself without the assistance of any tool in Word. However, to automate the entire process, the input format has to be perfectly known. That is, no single exception can be left aside (anonymous works, corporate authors, ...). Once you have fully defined your format, all you have to do is provide a mapping between your fields and the fields defined by Microsoft. So is it possible to do it in an automated way? Yes. Is it doable? No. There are so many versions of every style format that your 'translator' would be either just working in your specific case, or be a huge monster which takes years to make and would even then not cover some exceptions. Microsoft decided not to create the monster (and I can't blame them). Instead, they decided to give you the tools to create your translator for your specific case. But for someone without any programming skills, those tools are too hard to use. The format of a b:Source element is entirely defined by an xml schema.. All you have to do is write a (simple) XSLT which transforms your format into the format described by that schema. Of course, if your format happens to be an incomprehensible static text, your XSLT will be very complicated. But you can not blame Microsoft for that. I have no idea what a "b:Source element," an "xml schema," an "XSLT schema," however simple or complex, or an "incomprehensible static text" may be. And that is the main problem. I do not blame you for not knowing them. But they are available to you. If you want to learn how to use them, you can. It is all about reading the documentation on those technologies. I never used XSLT before I started using Word 2007. It took me a couple of hours to figure out how it worked and I could start creating my own stuff. I agree that coming from a computer science background gave me an advantage, but still ... I had to start from zero. point for them in developing a tool which will work in a very specifc case (yours) and therefore will target 1% or less of their customer base. If you want one, you will have to write it yourself. They provide a specifcation of thebibliographyformat and even provide a programming interface (I have no experience with it). They try to help you a long way, but the last few steps you will have to take yourself. On the occasions when programming new reference styles has been mentioned here, the MVPs have stated it appears to be impossibly complicated to do so. It is not. It is pretty basic XSLT, nothing fancy at it. Like you said above, all people have to do is read the available help: * you have the ... read more » |
#20
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On Aug 18, 5:23 am, p0 wrote:
I tried to move my cursor to the next spot, and my Reply sent itself! On a side note, of the above list, only APA and Turabian are supported in Word 2007. I've noticed that. Kinda leaves the humanists, who tend to use MLA, up the creek. Not really, Word 2007 supports MLA out of the box. Since I don't have the box yet, how would I know that? formatted text. And you would have to have a tool to map columns to fields, since in my case, year should be the last entry (except maybe for pages) in mybibliographyand most certainly not the second. So you don't do author-date references in the text? Fine. That would be Chicago's "Humanities" style. The style I use is not supported by Word 2007 at all. I did write the transformation stylesheet for it from scratch. And in your case, how is your book displayed if it is an anonymous work? I would guess col. 1 is the title, col. 2 is the date, col. 3 is the place, and col.4 is the publisher. So even between 2 entries of the same type, the ordering of data would be different. No, col. 1 would be empty. (Though there are circumstances in which the author is entered as "Anonymous"; see CMS.) And how do you expect your static text parser to guess that column one is empty? Once you start adding delimeters, you can just as well use the delimeters Microsoft defined. Those delimeters being xml tags.They might not be what you prefer as delimeters, but they are delimeters. I don't know what a "static text parser" is. Did you again forget that I've put tabs between the fields, in order to do Text to Table? (The punctuation between each pair of fields differs through each paragraph, so it can't go by comma or period or colon.) Static text is text without any kind of markup or delimeters indicating clearly were fields start and/or end. It is what you would call text without any codes. Once again, you want to use your tabs, Microsoft wants you to use their tabs, their tabs being the xml tags I showed earlier. It is not because their tabs are longer than yours (and a lot more descriptive), that they are worse. It is all a matter of taste. And if they are n characters long, they take n times as long to type. Maybe you don't have anonymous works, but it doesn't matter. What you require is so specifc that you will probably be the only one using the 'import filter' anyway. The point is, Microsoft provides a set of generic tools which works for 80% of their customers. There is no You are again losing sight of the point. They provide _no_ tool for going from an existingbibliographyto thebibliographydatabase. The tools are there, they are just not obvious in use for the average Word user. You just recently told me that it's _not_ possible. It is possible, I showed you which tags to use in the simple example I gave in one of the previous posts. That's not a tool. That's handwork that might not be inconvenient if one had a grad student handy to do it. (My first job in Chicago was retyping pages of a professor's book ms. each day to incorporate the changes he'd made the previous day. Fortuntely it's a catalog of cuneiform texts so each entry began a new ms. page.) However, to automate the entire process, the input format has to be perfectly known. That is, no single exception can be left aside (anonymous works, corporate authors, ...). Once you have fully defined your format, all you have to do is provide a mapping between your fields and the fields defined by Microsoft. So is it possible to do it in an automated way? Yes. Is it doable? No. There are so many versions of every style format that your 'translator' would be either just working in your specific case, or be a huge monster which takes years to make and would even then not cover some exceptions. Microsoft decided not to create the monster (and I can't blame them). Instead, they decided to give you the tools to create your translator for your specific case. But for someone without any programming skills, those tools are too hard to use. Q.E.D. The format of a b:Source element is entirely defined by an xml schema.. All you have to do is write a (simple) XSLT which transforms your format into the format described by that schema. Of course, if your format happens to be an incomprehensible static text, your XSLT will be very complicated. But you can not blame Microsoft for that. I have no idea what a "b:Source element," an "xml schema," an "XSLT schema," however simple or complex, or an "incomprehensible static text" may be. And that is the main problem. I do not blame you for not knowing them. But they are available to you. If you want to learn how to use them, you can. It is all about reading the documentation on those technologies. I never used XSLT before I started using Word 2007. It took me a couple of hours to figure out how it worked and I could start creating my own stuff. I agree that coming from a computer science background gave me an advantage, but still ... I had to start from zero. No, you started from a computer science background. When I was an undergraduate at Cornell (1968-72), there was one class in "computer programming" available for non-majors, and it instantly filled up every semester. When I was a grad student at Chicago (1972-76), I was able to take Vic Yngve's class in COMIT II, a language he invented to be like human language (and carried out a couple of publishable projects as a result -- see ch. 5 [IIRC] of I. J. Gelb et al.'s *Computer-Aided Analysis of Amorite* [1980]). That's not much use in dealing with whatever programming may be today. point for them in developing a tool which will work in a very specifc case (yours) and therefore will target 1% or less of their customer base. If you want one, you will have to write it yourself. They provide a specifcation of thebibliographyformat and even provide a programming interface (I have no experience with it). They try to help you a long way, but the last few steps you will have to take yourself. On the occasions when programming new reference styles has been mentioned here, the MVPs have stated it appears to be impossibly complicated to do so. It is not. It is pretty basic XSLT, nothing fancy at it. Like you said above, all people have to do is read the available help: * you have the ... read more » [not without Sending this one!] |
#21
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
text to bibliography?
On Aug 18, 5:23 am, p0 wrote:
On the occasions when programming new reference styles has been mentioned here, the MVPs have stated it appears to be impossibly complicated to do so. It is not. It is pretty basic XSLT, nothing fancy at it. Like you said above, all people have to do is read the available help: * you have the open xml specification; I do? Yes. It is an ECMA standard (and now even an ISO standard). The specification is open and freely available. ECMA:http://www.ecma-international.org/pu...s/Ecma-376.htm Microsoft:http://msdn.microsoft.com/en-us/office/aa905545.aspx ISO: they are still finalizing the text * you have blog articles by Microsoft people; I do? Yes. Probably the best example out there to get you started on creating your own bibliographic style ishttp://blogs.msdn.com/microsoft_office_word/archive/2007/12/14/biblio... but there are others. * you have MSDN articles describing the format (not extensively though); I do? Yes. For examplehttp://msdn.microsoft.com/en-us/library/bb258052.aspx To come back to your original question: "text to bibliography?" Yes it is possible to automate that process but highly complex and therefore 99% of the people out there will not be able to do it and the practical answer is: No. Funny definition of "automate" ... Thanks for the links. I'll look at them to see how daunting they are. |
#22
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
bibliography tool turns out to be useless text to bibliography?
On Aug 18, 5:23 am, p0 wrote:
* you have blog articles by Microsoft people; I do? Yes. Probably the best example out there to get you started on creating your own bibliographic style ishttp://blogs.msdn.com/microsoft_office_word/archive/2007/12/14/biblio... but there are others. (1) I don't see any way to get to any other "blogs" that might have been posted. I see that many people asked questions, and none were answered. (2a) One of the comments notes that it can't handle (2007a, 2007b), and (2b) another notes that it can't handle "Smith (1997) states that ..." vs. "It has been claimed (Smith 1997) that ..." Both of those factors (2a) and (2b) mean that the entire tool is utterly useless. |
#23
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
bibliography tool turns out to be useless text tobibliography?
(2a) One of the comments notes that it can't handle (2007a, 2007b), and (2b) another notes that it can't handle "Smith (1997) states that ..." vs. "It has been claimed (Smith 1997) that ..." (2a) http://office.microsoft.com/en-us/wo...674921033.aspx "If you choose a GOST or ISO 690 style for your sources and a citation is not unique, append an alphabetic character to the year. For example, a citation would appear as [Pasteur, 1848a]." (2b) Right click on the citation, select "Edit citation" and then select the "Author" in the "Suppress" frame. Yves |
#24
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
bibliography tool turns out to be useless text tobibliography?
On Aug 19, 2:16 am, p0 wrote:
(2a) One of the comments notes that it can't handle (2007a, 2007b), and (2b) another notes that it can't handle "Smith (1997) states that ..." vs. "It has been claimed (Smith 1997) that ..." (2a)http://office.microsoft.com/en-us/wo...674921033.aspx "If you choose a GOST or ISO 690 style for your sources and a citation is not unique, append an alphabetic character to the year. For example, a citation would appear as [Pasteur, 1848a]." Is Chicago either a "GOST" or an "ISO 690" style? If I have already referenced Smith 2007, and then I find that Smith published another article in 2007 that also needs to be cited, then I would expect the machine to know whether it will be (a) or (b) according to its alphabetical order in the reference list, and to change all the existing (2007) references to (2007a) or (2007b) accordingly. (2b) Right click on the citation, select "Edit citation" and then select the "Author" in the "Suppress" frame. And then type the author's name again outside the reference? |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
convert bibliography to static text | Microsoft Word Help | |||
MLA bibliography | Microsoft Word Help | |||
Bibliography | Microsoft Word Help | |||
Bibliography | Microsoft Word Help |