Forum » Feedback and Ideas

Automated & manual Artist disambiguation: How to drastically improve this site,…

 
  • Done. You do have to remember this thread is over a year old.

  • HI iTranscendence, I read all the thread and fully support your idea. It's the same I had a couple hours ago when I was trying to figure out how to fix a "same name" issue that was affecting my tag cloud and related artists advices.

    Maybe CBS has not full awareness of the size of this issue, and of the benefit they could have by fixing it.
    That might be why after 1 year still nothing has changed...

    Let's put it this way: if I listen to a Jazz-funk band, and it happens a post-punk band has the same name, I'd probally get WRONG recommendations and related artists, reducing the chance I'd buy albums, reducing last.fm income from affiliate sales.
    This issue should have high priority not just for the correctness of the informations, but for plain dull cash.

    iTranscendence and others contributed with an -imho- awesome solution, doing part of the job.
    Now, where are the coders?

    I know the automated sorting can be troublesome for some reasons (ie, checking which artist the song belongs to using genre ID3 tags means a full rework of the databases, and a further load for the servers, and there might also be heavy inconsistencies among genre tags for same artist) but... auto-sorting is a step further. I'd be happy (and i'm sure many others would) with just user based sorting implementation.

    That's my 0.02c and a free bump, for what it's worth.

  • You have to remember that on top of the automated disambiguation I'm advocating an additional "Discogs" style layer where all the information being brought into the site is editable by members and can only be approved and voted on by trusted members.

  • After confirming that this issue is still not going to be resolved by the scrobbler update I feel it pertinent to bring it up in discussion again and share what was said.

    roserpens said:
    @iTranscendence Unfortunately we have been working on this a long time and we are well past the point where new features will make it into the api. This latest beta can be considered a final draft. Unless there's some really clear problems it will probably be the final version.

    About artist disambiguation... it will not (and could not) be addressed by the scrobbling api.

    That has a lot more to do with how we represent artists internally in our catalogue. It's on our road map, but it's a lot more work than it sounds because of the subtlety of the changes and the scope of systems it will effect.

    You're correct it's been a long time coming. We have been planning to do it for years. I wouldn't expect it any time soon, but there is a concrete plan on how to do it. Actually implementing that plan however is a huge ordeal that contains many steps. It's hard to convey the complexity to end users, and even people working at last.fm, since it intuitively doesn't seem like it should be that hard.

    Btw, this new api isn't actually going to be collecting much more data, the only new data is the album artist (which is the artist associated with the whole album) which might be useful for compilations and other cases. The main purpose of this api update, is to have proper error messages, make it more consistent with the rest of the api, and to simplify things by ditching handshakes, and letting people just use their api keys instead of needing a client id.

  • I wouldn't expect disambiguation to be tacked on as part of a scrobbler ver update... at all.

    First off, there's not a heck of a lot of overlap in terms of functionality. Possibly new catalog recs when unique metadata comes in, but it's not a given that those are handled as part of the scrobbling tran (could be batch jobbed instead... probably are).

    Second, the scrobbler update is a big ver upgrade. You want to keep the upgraded functionality as atomic as possible. Mix it with something like attacking disambiguation and you're just asking Murphy to bend you over.

    That said, I appreciate a couple of key things @roserpens said:

    It's on our road map, but it's a lot more work than it sounds because of the subtlety of the changes and the scope of systems it will effect.That's an understatement... he's being kind. It impacts virtually everything (most structural DB changes do), and the expense is huge (lots of time & effort = lots of dollars, not to mention the risks).

    I wouldn't expect it any time soon, but there is a concrete plan on how to do it.
    . . .
    It's hard to convey the complexity to end users, and even people working at last.fm, since it intuitively doesn't seem like it should be that hard.
    See the first quote again in terms of impact.

    This isn't an issue of LFM not knowing how to fix the problem. They've got some brilliant people working in the trenches, and I guarantee they know how to get it done... they definitely know the problem domain as well as anyone. Problem is, the fix will be EXPENSIVE and carries risk, and architectural changes like these are always an incredibly hard sell to upper management.

    This issue isn't stagnant because LFM doesn't have the ability to get it done. The issue is stagnant because the guys up stairs aren't willing to put the money & manpower forth or bear the risks required to get it done.

    Whether that's the right approach or not, you'd really have to be on the inside and know their future plans in terms of arch & catalog to make that call (could be a wise decision).

  • "SQL query took 17 minutes to run" !!

  • Joe, even if it wasn't directly tacked on, it would be an algorithm that would have to run synchronized with the scrobbler

    • DFA1979 said...
    • Subscriber
    • 21 Oct 2010, 08:43
    iTranscendence said:
    Joe, even if it wasn't directly tacked on, it would be an algorithm that would have to run synchronized with the scrobbler
    In what sense? All the client has to do is provide data for the algorithm. It already does that (a standard scrobble submission includes a track's title, artist, album title, track length, track number, and with the new API album artist). It's possible that something else could be needed later down the line,but adding that into the API for submitting scrobbles is a much simpler task than the disambiguation, and would probably also be useful to some extent for other areas of the site. The actual disambiguation itself would most logically be done server-side, there's no reason it should require/coincide with a new version of the client program.

  • Considering Yesterdays update, I respectfully disagree with you.

    • DFA1979 said...
    • Subscriber
    • 22 Oct 2010, 22:43
    Did I miss something? (that's a genuine question, ftr, not sarcasm. I saw the update but don't see the relevance, but I'm visiting friends abroad so I've not been too able to keep up-to-date with it, maybe there are further details in the thread which make it clear what you mean) Care to point me to whatever it is that suggests disambiguation would require or come via a new scrobbler API/client?

  • iTranscendence said:
    Joe, even if it wasn't directly tacked on, it would be an algorithm that would have to run synchronized with the scrobbler
    Any mechanism for track matching will not be part of the scrobble. Rather, it will be (and probably is) rolled into a lookup service "used by" the scrobble process.

    The scrobble already takes a true identifier (fingerprint), a surrogate identifier (MBID), and the relevant ID3 data (artist, album artist, album, & track name). The baseline data required to ID a track is already provided.

    The scrobble itself needs to be kept as atomic as possible. There's no way in hell you support the LFM level of traffic in any OLTP fashion unless you do so. The fact that they're actually reporting back auto-correction info within the scope of a tran now is, to be quite frank, pretty damned remarkable (even more so when you consider they're doing the same for batched trans).

    Any change to accommodate a different structure will require DB changes, lookup service changes, write service changes, and downstream service use changes (to support the site). The lookup service will be "used by" the scrobble; the write service may or may not, depending on whether it's included in the scope of a scrobble tran or implemented down-stream out'a tran via a secondary process.

    Just a couple of points, IT. First, any arch changes required to implement improved cataloging will be huge, risky, and will more than likely need to be implemented in a stepped fashion. It makes no sense to implement those types of changes at the same time... it's just asking for trouble.

    Second, and more important, the underlying theme of this thread is "LFM doesn't have a clue, lets tell 'em how to get it done." I agree with you that changes are necessary, but I don't agree that we need to tell 'em how or when to get 'em done. They know how.

    That said, I "do" absolutely agree with you that the catalog, in general, needs to be far more accurate, and I hope the "when" comes fairly soon. Without it, I'll figure out different solutions to my own problems.

  • DFA1979 said:
    Did I miss something? (that's a genuine question, ftr, not sarcasm. I saw the update but don't see the relevance, but I'm visiting friends abroad so I've not been too able to keep up-to-date with it, maybe there are further details in the thread which make it clear what you mean) Care to point me to whatever it is that suggests disambiguation would require or come via a new scrobbler API/client?


    You are missing the point entirely, I agree it doesn't have to be done scrobbler side, even though I thought they may be doing it as part of this update. What I am saying is if you take the service cuts to their logical conclusion, the only service lfm will really offer is data indexing, and I've tirelessly pointed out, they are severely lacking in that area, when they can't even perform a function as basic separating artists with the same name, what kind of statistical integrity does that show you they have???

    Edited by iTranscendence on 23 Oct 2010, 00:36
  • JustSomeOldJoe said:
    Second, and more important, the underlying theme of this thread is "LFM doesn't have a clue, lets tell 'em how to get it done." I agree with you that changes are necessary, but I don't agree that we need to tell 'em how or when to get 'em done. They know how.


    Really? Have you taken a look around you lately? If I were them I would be DESPERATE for input from the community, as to what will actually keep us coming around.

  • iTranscendence said:
    JustSomeOldJoe said:
    Second, and more important, the underlying theme of this thread is "LFM doesn't have a clue, lets tell 'em how to get it done." I agree with you that changes are necessary, but I don't agree that we need to tell 'em how or when to get 'em done. They know how.
    Really? Have you taken a look around you lately? If I were them I would be DESPERATE for input from the community, as to what will actually keep us coming around.
    I don't disagree with you re: changes being necessary to "keep us coming around." I can only speak for myself, but if the catalog doesn't get cleaned up and some of the processes around it polished, at some point I'll bolt... specifically if a viable, cleaner alternative is presented. The only things I disagree with are the "how" and whether or not they've got the knowledge to do it themselves.

    The thing about systems like this is, you don't have the luxury of easily iterating through architectural changes.

    With any system, especially one that's on the cutting edge (which LFM was in the early days), you're going to learn from your mistakes. These guys probably walk around at night with thoughts of how they'd do things differently filling their heads... can't turn that stuff off.

    But, with a system this large in terms of data, architectural changes end up being a freakin' nightmare to implement. You can't simply make a change, cleanse and migrate data, then throw up some new code. Any architectural change that touches existing data will be difficult, time-consuming, and damned expensive.

    And it's a soft dollar expense... the toughest sell to management.

    Again, I don't disagree with you on the sentiment, I just disagree on the timing (not during a scrobbler update) and the issue (knowledge vs. time, money, & mandate).

  • The bottom line is, I saw this coming, if they didn't and they didn't heed mine, and other peoples warnings, that's their bad.

  • iTranscendence said:
    We're trying to maximize automated disambiguation, but allow users who want to participate to participate in a more integral fashion, with much more abilities to edit and catalog the music properly.


    Yes, for one thing the users who are not at their computers while listening to the music (laying down, in the kitchen.. ect while listening to music) don't need to worry. If last.fm can allow us to edit our libraries more efficiently (for an example misspelled tracks, albums, ect..) i'm sure the artist disambiguation (vise versa) will be easier to fix.

    • dankine said...
    • User
    • 23 Oct 2010, 10:40
    letting users do it has worked out really well before

    "Those who can make you believe absurdities can make you commit atrocities"
    "I don't want to believe, I want to know"

    Auto Corrections Group
  • It would be nice if you elaborated what "it" is, because what I'm proposing covers a little more than "suggest a correction" or "flag for merging".

    • DFA1979 said...
    • Subscriber
    • 23 Oct 2010, 11:35
    iTranscendence said:
    You are missing the point entirely, I agree it doesn't have to be done scrobbler side, even though I thought they may be doing it as part of this update.
    Then which part of my post were you disagreeing with?

  • Nothing.. I was just fuming over what happened yesterday, lol.

  • Suggestion I made in the Oct 21 update thread.

    I think if they revamp their statistical indexing capabilities through automation with better algorithms and as you know my well discussed move more towards a discogs wiki model for editing site info, decentralize what is listened to on this site by integrating other sites ability to stream on last.fm, give the users the ability to easily in a micro-blog format share what they are listening to with their friends and have a small royalty free independent on demand streaming that is integrated with everything else and they could still pull this out of the fire.

  • The only solution that makes sense from the moral point of view is to credit only the artists who first appeared chronologically to use that name in discography.
    The other bands who appeared later and choosed to have the same name (either because they simly ignored the existence of a previous band with that name, either because they are damn lazy amd lack imagination-especially for the bands who appeared after the advent of the internet where you can check availabilty of name with a simple click of a mouse) they should be sorted with a number number beside the name like in discogs or with the country of origin like in rateyourmusic.

    • Sekir said...
    • User
    • 28 Nov 2010, 08:03
    OrsoOrso said:
    The only solution that makes sense from the moral point of view is to credit only the artists who first appeared chronologically to use that name in discography.
    The other bands who appeared later and choosed to have the same name (either because they simly ignored the existence of a previous band with that name, either because they are damn lazy amd lack imagination-especially for the bands who appeared after the advent of the internet where you can check availabilty of name with a simple click of a mouse) they should be sorted with a number number beside the name like in discogs or with the country of origin like in rateyourmusic.


    Altavista started to work in 1996.
    It makes sense for artists which began after 2000 or for artists from the same country

    Админы ластфм - казлы.
  • Discogs did it by the order in which they appeared on the site, not their actual existence. While I can agree from a "moral" standpoint on chronological based on the artists inception, the issue with doing it in this manner is that it means that URL's could be subject to frequent changes.

    For example if artist A started in 1996 but has just been scrobbled on last.fm and artist A (1, 2, 3, 4, 5) all started their musical careers after artists A but have already been scrobbled and separated, you have to then change the URL for every artist to shift them all down one.

    This is why Discogs chose the model of the order in which they came on the site and why last.fm should do it the same way. So they get a numbered url based on which the order were scrobbled.

  • Is there any update available from the developers about this case? Is there any estimated time to fix it?

    Slasher - Thrash Metal
    +55-19-8133.6922
    slasher@slasher.com.br
    www.slasher.com.br
Anonymous users may not post messages. Please log in or create an account to post in the forums.